All of lore.kernel.org
 help / color / mirror / Atom feed
* Testing Linux-CR -- Some Documentation
@ 2011-05-13  5:09 Raghu D K
       [not found] ` <BANLkTimjvdKGr_ZKpoozV5JzdbJeSE7PEA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Raghu D K @ 2011-05-13  5:09 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello All,

Further to some checking out the sources ( linux-cr, user-cr and
test-cr ), I was able to make some progress after reading the
"linux/Documentation/checkpoint".
I just wanted to summarize the steps I did for reaching what I see. I
have a Ubuntu 10.10 64-bit machine ( however I have installed only a
32-bit ) Ubuntu version.

Checking out Source Code:

1. $ mkdir test-linux-cr
2. $ git clone git://www.linux-cr.org/pub/git/linux-cr
3. $ git clone git://www.linux-cr.org/pub/git/user-cr
4. $ git clone git://www.linux-cr.org/pub/git/tests-cr

I am behind a firewall so I had to configure my proxy setting since
GIT server port "9418" was not open, this is well documented in the
community. However for completeness.

Setting Proxy:

$ apt-get install socat

Create a file "gitproxy.sh" and give it execute permission.

#!/bin/bash
_proxy=<your proxy ip>
_proxyport=<your port>

exec socat STDIO PROXY:$_proxy:$1:$2,proxyport=$_proxyport

Edit ".bashrc" or ".profile" file and add

$ export GIT_PROXY_COMMAND=$(HOME)/gitproxy.sh

Compiling and Building Linux Kernel:

1. Since I have a Ubuntu 10.10 running, I used the kernel
configuration file "config-2.6.35-28-generic" from "/boot" folder

$ cp /boot/config-2.6.35-28-generic ~test-linux-cr/linux-cr/.config

Ensure the following tokens are enabled in the ".config" file.
CONFIG_CHECKPOINT_SUPPORT=y
CONFIG_SYSVIPC_CHECKPOINT=y
CONFIG_CHECKPOINT=y
CONFIG_NETNS_CHECKPOINT=y
CONFIG_CHECKPOINT_DEBUG=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_NAMESPACES=y
CONFIG_CGROUP_NS=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_FREEZER=y

$ sudo fakeroot make-kpkg --intrd --append-to-version=-cr kernel_image
kernel_headers

On a successful build we'll have two ".deb" packages for "kernel" and
"headers" and they can be installed with

$ sudo dpkg -i linux-image-2.6.37-cr.Custom_i386.deb
$ sudo dpkg -i linux-headers-2.6.37-cr.Custom_i386.deb

On the Ubuntu 10.10 the GRUB2 is automatically updated when the
package is successfully installed. Reboot and you are in
"Checkpoint/Reset" capable kernel :-)

Building "user-cr" and "tests-cr":
$ cd ~/test-linux-cr/user-cr
$ cd scripts
$ bash ./extract-headers.sh --kernel-src=< path to Linux kernel>
$ cd ..
$ make all

$ cd ~/test-linux-cr/tests-cr
$ bash ./rewrite-cr-header.sh --- This will create a "cr.h" header
file with appropriate macro defines for "__NR_checkpoint" and
"__NR_restart"
$ make all
$ cd simple
$ ./ckpt
$ cat /tmp/cr-test.out
Invoking checkpoint syscall... PASSED.
ret = 1

$ restart < /tmp/out
Invoking checkpoint syscall... PASSED.
ret = 0

However I was not able to run the test scripts ( runall.sh ) as it
gave some errors. Also analyzing the "out" file using "ckptinfo"
results in EOF error.
Any additional help on this is appreciated.

Warm Regards,
Raghu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing Linux-CR -- Some Documentation
       [not found] ` <BANLkTimjvdKGr_ZKpoozV5JzdbJeSE7PEA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-05-13 11:42   ` Serge E. Hallyn
       [not found]     ` <20110513114245.GA4121-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Serge E. Hallyn @ 2011-05-13 11:42 UTC (permalink / raw)
  To: Raghu D K; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Raghu D K (dk.raghu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> Hello All,
> 
> Further to some checking out the sources ( linux-cr, user-cr and
> test-cr ), I was able to make some progress after reading the
> "linux/Documentation/checkpoint".
> I just wanted to summarize the steps I did for reaching what I see. I
> have a Ubuntu 10.10 64-bit machine ( however I have installed only a
> 32-bit ) Ubuntu version.
> 
> Checking out Source Code:
> 
> 1. $ mkdir test-linux-cr
> 2. $ git clone git://www.linux-cr.org/pub/git/linux-cr
> 3. $ git clone git://www.linux-cr.org/pub/git/user-cr
> 4. $ git clone git://www.linux-cr.org/pub/git/tests-cr
> 
> I am behind a firewall so I had to configure my proxy setting since
> GIT server port "9418" was not open, this is well documented in the
> community. However for completeness.
> 
> Setting Proxy:
> 
> $ apt-get install socat
> 
> Create a file "gitproxy.sh" and give it execute permission.
> 
> #!/bin/bash
> _proxy=<your proxy ip>
> _proxyport=<your port>
> 
> exec socat STDIO PROXY:$_proxy:$1:$2,proxyport=$_proxyport
> 
> Edit ".bashrc" or ".profile" file and add
> 
> $ export GIT_PROXY_COMMAND=$(HOME)/gitproxy.sh
> 
> Compiling and Building Linux Kernel:
> 
> 1. Since I have a Ubuntu 10.10 running, I used the kernel
> configuration file "config-2.6.35-28-generic" from "/boot" folder
> 
> $ cp /boot/config-2.6.35-28-generic ~test-linux-cr/linux-cr/.config
> 
> Ensure the following tokens are enabled in the ".config" file.
> CONFIG_CHECKPOINT_SUPPORT=y
> CONFIG_SYSVIPC_CHECKPOINT=y
> CONFIG_CHECKPOINT=y
> CONFIG_NETNS_CHECKPOINT=y
> CONFIG_CHECKPOINT_DEBUG=y
> CONFIG_CGROUPS=y
> CONFIG_CGROUP_FREEZER=y
> CONFIG_NAMESPACES=y
> CONFIG_CGROUP_NS=y
> CONFIG_UTS_NS=y
> CONFIG_IPC_NS=y
> CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> CONFIG_NET_NS=y
> CONFIG_FREEZER=y
> 
> $ sudo fakeroot make-kpkg --intrd --append-to-version=-cr kernel_image
> kernel_headers
> 
> On a successful build we'll have two ".deb" packages for "kernel" and
> "headers" and they can be installed with
> 
> $ sudo dpkg -i linux-image-2.6.37-cr.Custom_i386.deb
> $ sudo dpkg -i linux-headers-2.6.37-cr.Custom_i386.deb
> 
> On the Ubuntu 10.10 the GRUB2 is automatically updated when the

Heh, right bc it's on 2.6.35.  If you were on natty, you'd have
to force it since it comes with 2.6.38.

Got lucky :)

> package is successfully installed. Reboot and you are in
> "Checkpoint/Reset" capable kernel :-)
> 
> Building "user-cr" and "tests-cr":
> $ cd ~/test-linux-cr/user-cr
> $ cd scripts
> $ bash ./extract-headers.sh --kernel-src=< path to Linux kernel>
> $ cd ..
> $ make all
> 
> $ cd ~/test-linux-cr/tests-cr
> $ bash ./rewrite-cr-header.sh --- This will create a "cr.h" header
> file with appropriate macro defines for "__NR_checkpoint" and
> "__NR_restart"
> $ make all
> $ cd simple
> $ ./ckpt
> $ cat /tmp/cr-test.out
> Invoking checkpoint syscall... PASSED.
> ret = 1
> 
> $ restart < /tmp/out
> Invoking checkpoint syscall... PASSED.
> ret = 0
> 
> However I was not able to run the test scripts ( runall.sh ) as it
> gave some errors. Also analyzing the "out" file using "ckptinfo"
> results in EOF error.
> Any additional help on this is appreciated.

Since you're on Ubuntu, sh points to dah, so many cr_tests will break
because they expect sh to point to bash.  You can fix that by going
through all the scripts and changing that.  I haven't had a chance to do
so yet, and definately won't today.  If you go through the effort,
please feel free to send a patch to Oren or myself.

Once you fix that, you'll likely still have trouble, but it'll look much
better than what you see right now.  There definately are some bugs in
the 2.6.37 tree.  If you can help us debug those that would be terrific.

thanks,
-serge

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing Linux-CR -- Some Documentation
       [not found]     ` <20110513114245.GA4121-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2011-05-13 14:07       ` Brian Haley
       [not found]         ` <4DCD3B3E.1040704-VXdhtT5mjnY@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Brian Haley @ 2011-05-13 14:07 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 05/13/2011 07:42 AM, Serge E. Hallyn wrote:
> Quoting Raghu D K (dk.raghu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
>> However I was not able to run the test scripts ( runall.sh ) as it
>> gave some errors. Also analyzing the "out" file using "ckptinfo"
>> results in EOF error.
>> Any additional help on this is appreciated.
> 
> Since you're on Ubuntu, sh points to dah, so many cr_tests will break
> because they expect sh to point to bash.  You can fix that by going
> through all the scripts and changing that.  I haven't had a chance to do
> so yet, and definately won't today.  If you go through the effort,
> please feel free to send a patch to Oren or myself.

Or you can run:

# sudo dpkg-reconfigure dash

and choose "No", which will make /bin/sh point to bash.

-Brian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing Linux-CR -- Some Documentation
       [not found]         ` <4DCD3B3E.1040704-VXdhtT5mjnY@public.gmane.org>
@ 2011-05-13 14:51           ` Serge Hallyn
       [not found]             ` <20110513145140.GA23848-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Serge Hallyn @ 2011-05-13 14:51 UTC (permalink / raw)
  To: Brian Haley; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Brian Haley (brian.haley-VXdhtT5mjnY@public.gmane.org):
> On 05/13/2011 07:42 AM, Serge E. Hallyn wrote:
> > Quoting Raghu D K (dk.raghu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> >> However I was not able to run the test scripts ( runall.sh ) as it
> >> gave some errors. Also analyzing the "out" file using "ckptinfo"
> >> results in EOF error.
> >> Any additional help on this is appreciated.
> > 
> > Since you're on Ubuntu, sh points to dah, so many cr_tests will break
> > because they expect sh to point to bash.  You can fix that by going
> > through all the scripts and changing that.  I haven't had a chance to do
> > so yet, and definately won't today.  If you go through the effort,
> > please feel free to send a patch to Oren or myself.
> 
> Or you can run:
> 
> # sudo dpkg-reconfigure dash
> 
> and choose "No", which will make /bin/sh point to bash.

Good point :)

I should still fix the testcases when I have time, but this'll get
you running more easily :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing Linux-CR -- Some Documentation
       [not found]             ` <20110513145140.GA23848-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2011-05-16 10:34               ` Raghu D K
       [not found]                 ` <BANLkTimSFKU8ibjSpQqQ4vHSFoptJorPFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Raghu D K @ 2011-05-16 10:34 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello All,

I moved  the "#!/bin/sh" to point to "bash" however I still see issues
in used the "git://www.linux-cr.org/pub/git/tests-cr" scripts.
Probably I am missing something with my wrong understanding, I am a
little confused with the usage of user space application "checkpoint"
and "restart" and the applications in the "test-cr" folder.

I wrote a sample shell script "my-test.sh" and tried the following
without much success.

#!/bin/sh
#
#
#***********************************************************************************

echo "Incrementing variable ..."
COUNT=$1
X=0
while [ $X -le $COUNT ];
do
        X=$(( $X + 1 ))
        echo "Value of X =" $X
        sleep 1
done


$ cd ~/user-cr
$ mount -tcgroup -o freezer cgroup /cgroup
$ mkdir -p /cgroup/1
$ nsexec -z5000 my-test.sh 100 &
$ echo 5000 > /cgroup/1/tasks
$ echo FROZEN > /cgroup/1/freezer.state

$ checkpoint 5000 > ckpt.image

This generated a "ckpt.image" file of size 2594550 bytes

$ ckptinfo -epv ckpt.image
info: [@8] object   1 HDR_HEADER len 72
info: [@80] object   4 HDR_BUFFER len 73
info: [@153] object   4 HDR_BUFFER len 73
info: [@226] object   4 HDR_BUFFER len 73
...
unexpected end of file (read 0 of 8)

$ kill -9 5000
$ echo THAWED > /cgroup/1/freezer.state
$ ./restart < ckpt.image

This one shows error "Bad file discriptor", what I am missing ?

Warm Regards,
Raghu


>>
>> Or you can run:
>>
>> # sudo dpkg-reconfigure dash
>>
>> and choose "No", which will make /bin/sh point to bash.
>
> Good point :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing Linux-CR -- Some Documentation
       [not found]                 ` <BANLkTimSFKU8ibjSpQqQ4vHSFoptJorPFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-05-16 13:27                   ` Serge E. Hallyn
       [not found]                     ` <20110516132734.GA26650-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Serge E. Hallyn @ 2011-05-16 13:27 UTC (permalink / raw)
  To: Raghu D K; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Raghu D K (dk.raghu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> Hello All,
> 
> I moved  the "#!/bin/sh" to point to "bash" however I still see issues
> in used the "git://www.linux-cr.org/pub/git/tests-cr" scripts.
> Probably I am missing something with my wrong understanding, I am a
> little confused with the usage of user space application "checkpoint"
> and "restart" and the applications in the "test-cr" folder.
> 
> I wrote a sample shell script "my-test.sh" and tried the following
> without much success.
> 
> #!/bin/sh
> #
> #
> #***********************************************************************************
> 
> echo "Incrementing variable ..."
> COUNT=$1
> X=0
> while [ $X -le $COUNT ];
> do
>         X=$(( $X + 1 ))
>         echo "Value of X =" $X
>         sleep 1
> done
> 
> 
> $ cd ~/user-cr
> $ mount -tcgroup -o freezer cgroup /cgroup
> $ mkdir -p /cgroup/1
> $ nsexec -z5000 my-test.sh 100 &
> $ echo 5000 > /cgroup/1/tasks
> $ echo FROZEN > /cgroup/1/freezer.state
> 
> $ checkpoint 5000 > ckpt.image
> 
> This generated a "ckpt.image" file of size 2594550 bytes
> 
> $ ckptinfo -epv ckpt.image
> info: [@8] object   1 HDR_HEADER len 72
> info: [@80] object   4 HDR_BUFFER len 73
> info: [@153] object   4 HDR_BUFFER len 73
> info: [@226] object   4 HDR_BUFFER len 73
> ...
> unexpected end of file (read 0 of 8)
> 
> $ kill -9 5000
> $ echo THAWED > /cgroup/1/freezer.state
> $ ./restart < ckpt.image
> 
> This one shows error "Bad file discriptor", what I am missing ?

First, you can find more information about what went wrong in a
few ways:

  1. add '-l logfile' arguments to checkpoint and restart commands,
     to put more debug messages into 'logfile'  (which must not yet
     exist)
  2. add '-v' argument to checkpoint and restart for debugging
  3. look at /var/log/syslog for lots of error messages, assuming
     you have CONFIG_CHECKPOINT_DEBUG (or whatever that is called)
     set in your kernel
  4. after doing checkpoint, use 'ckptinfo', which came with the
     user-cr programs, to analyze the checkpoint image

I suspect what happened to you, though, is that you left file
descriptors open.  If you look at counterloop/crcounter.c in
the tests, it does 'for i in (1..100) close(i)'.  The problem
with not doing this is that the program you are checkpointing has
inherited file descriptors from its parent task, and, at restart,
it has no way to recreate those.

-serge

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing Linux-CR -- Some Documentation
       [not found]                     ` <20110516132734.GA26650-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2011-05-18  7:09                       ` Raghu D K
       [not found]                         ` <BANLkTinQhntbVEvNUmdtFQvr04imEvxm9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Raghu D K @ 2011-05-18  7:09 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi,

>  1. add '-l logfile' arguments to checkpoint and restart commands,
>     to put more debug messages into 'logfile'  (which must not yet
>     exist)
>  2. add '-v' argument to checkpoint and restart for debugging
>  3. look at /var/log/syslog for lots of error messages, assuming
>     you have CONFIG_CHECKPOINT_DEBUG (or whatever that is called)
>     set in your kernel
>  4. after doing checkpoint, use 'ckptinfo', which came with the
>     user-cr programs, to analyze the checkpoint image

I have done all of these and tried even the "ckptinfo", which every
reports the error of "unexpected end of file".


> I suspect what happened to you, though, is that you left file
> descriptors open.  If you look at counterloop/crcounter.c in
> the tests, it does 'for i in (1..100) close(i)'.  The problem
> with not doing this is that the program you are checkpointing has
> inherited file descriptors from its parent task, and, at restart,
> it has no way to recreate those.

I am not testing the sample scripts, I just wrote a sample one as I am
not able to understand
how the Linux CR is supposed to work.
1. Is it mandatory to have the "mount -tcgroup -o freezer cgroup
/cgroup" mounted ?
2. Do we have to launch programs using "nsexec" to be able to
checkpoint and restart them ?

I have tried all the "--help" options, however failed to get the
described results. Even the "self_checkpoint" & "self_restart" code,
provided in the "linux/Documentation/checkpoint" folder is not
executing as described in "usage.txt"

The only application that is showing positive result is
"/test-cr/simple/ckpt.c"

Warm Regards,
Raghu

On Mon, May 16, 2011 at 6:57 PM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> Quoting Raghu D K (dk.raghu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
>> Hello All,
>>
>> I moved  the "#!/bin/sh" to point to "bash" however I still see issues
>> in used the "git://www.linux-cr.org/pub/git/tests-cr" scripts.
>> Probably I am missing something with my wrong understanding, I am a
>> little confused with the usage of user space application "checkpoint"
>> and "restart" and the applications in the "test-cr" folder.
>>
>> I wrote a sample shell script "my-test.sh" and tried the following
>> without much success.
>>
>> #!/bin/sh
>> #
>> #
>> #***********************************************************************************
>>
>> echo "Incrementing variable ..."
>> COUNT=$1
>> X=0
>> while [ $X -le $COUNT ];
>> do
>>         X=$(( $X + 1 ))
>>         echo "Value of X =" $X
>>         sleep 1
>> done
>>
>>
>> $ cd ~/user-cr
>> $ mount -tcgroup -o freezer cgroup /cgroup
>> $ mkdir -p /cgroup/1
>> $ nsexec -z5000 my-test.sh 100 &
>> $ echo 5000 > /cgroup/1/tasks
>> $ echo FROZEN > /cgroup/1/freezer.state
>>
>> $ checkpoint 5000 > ckpt.image
>>
>> This generated a "ckpt.image" file of size 2594550 bytes
>>
>> $ ckptinfo -epv ckpt.image
>> info: [@8] object   1 HDR_HEADER len 72
>> info: [@80] object   4 HDR_BUFFER len 73
>> info: [@153] object   4 HDR_BUFFER len 73
>> info: [@226] object   4 HDR_BUFFER len 73
>> ...
>> unexpected end of file (read 0 of 8)
>>
>> $ kill -9 5000
>> $ echo THAWED > /cgroup/1/freezer.state
>> $ ./restart < ckpt.image
>>
>> This one shows error "Bad file discriptor", what I am missing ?
>
> First, you can find more information about what went wrong in a
> few ways:
>
>  1. add '-l logfile' arguments to checkpoint and restart commands,
>     to put more debug messages into 'logfile'  (which must not yet
>     exist)
>  2. add '-v' argument to checkpoint and restart for debugging
>  3. look at /var/log/syslog for lots of error messages, assuming
>     you have CONFIG_CHECKPOINT_DEBUG (or whatever that is called)
>     set in your kernel
>  4. after doing checkpoint, use 'ckptinfo', which came with the
>     user-cr programs, to analyze the checkpoint image
>
> I suspect what happened to you, though, is that you left file
> descriptors open.  If you look at counterloop/crcounter.c in
> the tests, it does 'for i in (1..100) close(i)'.  The problem
> with not doing this is that the program you are checkpointing has
> inherited file descriptors from its parent task, and, at restart,
> it has no way to recreate those.
>
> -serge
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing Linux-CR -- Some Documentation
       [not found]                         ` <BANLkTinQhntbVEvNUmdtFQvr04imEvxm9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-05-18 13:42                           ` Serge Hallyn
  0 siblings, 0 replies; 8+ messages in thread
From: Serge Hallyn @ 2011-05-18 13:42 UTC (permalink / raw)
  To: Raghu D K; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Raghu D K (dk.raghu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> Hi,
> 
> >  1. add '-l logfile' arguments to checkpoint and restart commands,
> >     to put more debug messages into 'logfile'  (which must not yet
> >     exist)
> >  2. add '-v' argument to checkpoint and restart for debugging
> >  3. look at /var/log/syslog for lots of error messages, assuming
> >     you have CONFIG_CHECKPOINT_DEBUG (or whatever that is called)
> >     set in your kernel
> >  4. after doing checkpoint, use 'ckptinfo', which came with the
> >     user-cr programs, to analyze the checkpoint image
> 
> I have done all of these and tried even the "ckptinfo", which every
> reports the error of "unexpected end of file".

If 'cpktinfo -ve' is not showing info, then presumably checkpoint
failed early.  Check the syslog right after checkpoint to start
investigating where it stopped.

> > I suspect what happened to you, though, is that you left file
> > descriptors open.  If you look at counterloop/crcounter.c in
> > the tests, it does 'for i in (1..100) close(i)'.  The problem
> > with not doing this is that the program you are checkpointing has
> > inherited file descriptors from its parent task, and, at restart,
> > it has no way to recreate those.
> 
> I am not testing the sample scripts, I just wrote a sample one as I am
> not able to understand
> how the Linux CR is supposed to work.
> 1. Is it mandatory to have the "mount -tcgroup -o freezer cgroup
> /cgroup" mounted ?

Yes.  And you must freeze the task before checkpointing.

> 2. Do we have to launch programs using "nsexec" to be able to
> checkpoint and restart them ?

They should be in their own namespaces, nsexec is an easy way to
accomplish that.

If you look at 
https://code.launchpad.net/~serge-hallyn/+junk/crdemo
It has some scripts including 'start_job.sh' which starts an
isolated job so that the 'container' (not an lxc container)
is checkpointable.

-serge

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-05-18 13:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-13  5:09 Testing Linux-CR -- Some Documentation Raghu D K
     [not found] ` <BANLkTimjvdKGr_ZKpoozV5JzdbJeSE7PEA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-13 11:42   ` Serge E. Hallyn
     [not found]     ` <20110513114245.GA4121-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2011-05-13 14:07       ` Brian Haley
     [not found]         ` <4DCD3B3E.1040704-VXdhtT5mjnY@public.gmane.org>
2011-05-13 14:51           ` Serge Hallyn
     [not found]             ` <20110513145140.GA23848-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2011-05-16 10:34               ` Raghu D K
     [not found]                 ` <BANLkTimSFKU8ibjSpQqQ4vHSFoptJorPFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-16 13:27                   ` Serge E. Hallyn
     [not found]                     ` <20110516132734.GA26650-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2011-05-18  7:09                       ` Raghu D K
     [not found]                         ` <BANLkTinQhntbVEvNUmdtFQvr04imEvxm9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-18 13:42                           ` Serge Hallyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.