Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Open Source

Checkpointing, CHPOX, and Linux


An Example

This example demonstrates all the current CHPOX features. To compile the program, you can use the provided Makefile in Listing Two. The program takes three arguments:

  • The first parameter defines the sleep interval (in seconds) when reading the source file line by line.

  • The second parameter defines the name of input file (plain text). The input file is nothing but numbers separated by newline. Use something like: $ seq 1 100 >input-file.txt to generate the sequence.

  • The third parameter defines the name of output file. The test program prints the square of number that is read from input file.

Here are the steps to do checkpointing with CHPOX:

  1. Make sure you have loaded the CHPOX kernel module. If you haven't, type modprobe chpox_mod. Confirm it with lsmod.

  2. Create the appropriate special character file in the /dev directory. Skip this if you use devfs. To automate the work, the CHPOX package provides a script to do this. Simply execute the mkchpoxdev script from the tools directory. Remember to do step #1 first because the script will check the existence of /proc/chpox and will bail out if it found none.

  3. Run the target application. In this case, you can execute the above "chpoxtest" program for a quick start.

  4. Register the target program with chpoxctl. The syntax is:

    
    #chpoxctl add <target-pid> <assigned signal number> <child flag> <checkpoint filename>
    
    

    For example, if you know that the process pid is 4500, you want to use "31" as signal number assigned for CHPOX handler and checkpoint the process' children, you type:

    #chpoxctl add 4500 31 9 /tmp/test.dmp
    

    (See the chpoxctl(1) man page for complete explanation of each paramete.r)

    If you just need to checkpoint a single process only, replace the fourth parameter with "1". For checkpoint filename, you can give any valid pathname.

    For signal handler's number, it is preferred to pick an unused one. The safe choices are SIGUSR1, SIGUSR2, or SIGUNUSED, but if you know there is a definitive handler assigned to these signals from within the target process you must select another one. Check /usr/include/bits/signum.h or type kill -l to see the actual signal number for the related signal name.

    Chpoxtest already implants a signal handler for SIGUSR1,thus you are forced to pick another one. Here you are assumed to pick SIGSYS (31). Please notice that you need to grab the PID of the parent process, not the child. You can use "ps" or "pgrep" looking for "chpoxtest" instance and pick the process with the lowest PID (the parent). Another way is by using "ps auf" or "ps auxf" to get the tree showing parent-child relationship.

  5. Now you are ready to checkpoint. During the execution of chpoxtest, send it a SIGSYS:

    # kill 31 4500
    

    If you succeed, a new file "test.dmp" in the /tmp directory is created. Repeat as needed if you want to checkpoint chpoxtest at a different time. Notice that latest checkpoint is saved on the specified dump file, so if you want to keep the previous checkpoint state, you need to copy/move it first before executing kill again. CHPOX can't do this automatically for you.

  6. Wait until chpoxtest finishes or just stop it (using Ctrl-C). Now you can restore the task using "ld-chpox". The syntax is:

    # ld-chpox <path and filename of the dump file>
    </pre class="code">
    <P>
    
    <P>
    In this case, to restore chpoxtest, you can type:
    <P>
    <pre class="brush: text; html: collapse;" style="font-family: Andale Mono,Lucida Console,Monaco,fixed,monospace; color: rgb(0, 0, 0); background-color: rgb(238, 238, 238); font-size: 12px; border: 1px dashed rgb(153, 153, 153); line-height: 14px; padding: 5px; overflow: auto; width: 100%;">
    # ld-chpox /tmp/test.dmp
    

    As you can see from the console output, it continues right from the same spot when you checkpointed it. Make sure the path and the name of input and output file are still same or you will get error that files can't be found.

    At this stage, you won't need CHPOX to remember the PID (process id) you have registered before. Therefore, you can safely delete it from CHPOX list using chpoxctl. This time, the syntax is:

    chpoxctl del <pid of process>
    

    Because we registered using 4500 as PID (pay attention, this is just an assumption), you type:

    # chpoxctl del 4500
    

    You can also type:

    #chpoxctl clear
    

    To clean up all the PIDs in the CHPOX checkpoint list.

You can also try another possibility here. As you can see in the code, sleep() is used inside the SIGUSR1 handler for pending the parent process' execution within several seconds. So you have the possibility to checkpoint when the code hits the signal handler:

  1. Send the parent process the USR1 signal:

    # kill -USR1 4500
    

  2. The application prints the status that it is now inside the signal handler. Quickly checkpoint it:

    # kill -31 4500
    

    Later when you restore it from the dump file, parent processes start from sleeping state, meanwhile both children are still running. Remember here that SIGUSR1 signal handler is assigned ONLY for parent process, thus the one who reacts to SIGUSR1 is just the parent process.

As another test case, you can pick grep. To make grep run long enough to checkpoint it, you can grep over something large; for example, the Linux kernel source. Here we assume you have extracted Linux kernel source in the /usr/src/linux directory. You can use any version of Linux kernel source. Try:

$ grep -I -r -i -n schedule /usr/src/linux/* > /tmp/grep-result.txt

The term "schedule" is selected because it returns bunch of search hit. You are free to pick any term you like. Try to pick the one which is likely found many many times so grep is kept busy and you have plenty of time to checkpoint.

As before, while grep is in action, register and send the checkpoint signal to the grep pid. Perhaps at this point, you might wonder how to prove that restoration is actually successful? Here is the trick:

  1. Do grep and let it finish. Here we assume that you save the output as grep-complete.txt.

  2. Do grep again. This time redirect the output as grep-partial.txt.

  3. Press Ctrl-Z to stop the grep--remember, stop it, not terminate it.

  4. While grep in stopped state, register and checkpoint it. Notice that the dump file is not created yet.

  5. Copy grep-partial.txt as grep-partial-a.txt.

  6. Resume grep work by typing:

    $ fg %1
    

    You might need to adjust the parameter by looking at "jobs" output.

  7. Terminate it by pressing Ctrl-C as soon as possible. Watch that the dump file now exists.

  8. Delete and recreate grep-partial.txt. Simply use touch:

    # rm -f grep-partial.txt
    # touch grep-partial.txt
    

  9. Now restore the process by using the dump file. Let it finish. Now you have grep-partial.txt and grep-partial-a.txt. Concatenate them using cat:

    $ cat grep-partial-a.txt grep-partial.txt > ./grep-cat.txt
    

  10. Compare the content of grep-cat.txt with grep-complete. By observing the result, you can draw your own conclusion. Hint: to get thorough view of file's content, use a file viewer that is able to dump the raw content of the file, for example: hexdump utility.

Conclusion

CHPOX still needs lots of improvement. As noted in its web site, support for Internet sockets, shared memory, System V IPC, processes with multiple threads are still in to-do list. So don't expect CHPOX to work flawlessly in every situation. Always do experimentation before using CHPOX in real production environment.

Reference

Understanding the Linux Kernel, Second Edition. Daniel P. Bovett and Marco Cesati, O'Reilly and Associates, ISBN 0-596-00213-0.

Linux Kernel Development. Robert Love, Sams Publishing, ISBN 0-672-32512-8.

Comparison of the Existing Checkpoint Systems", Byoung-Jip Kim, IBM Watson.

"Process Checkpointing and Restarting System for Linux," Sudakov O.O., Boyko Yu.V., Tretyak O.V., Korotkova T.P., Meshcheryakov E.S., Mathematical Machines and Systems, 2003. N.2, p.146-153.

Mulyadi Santosa is a software developer in Indonesia. He can be contacted at [email protected]. Eugeniy Meshcheryakov is a developer in the Ukraine. He can be contacted at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.