All of lore.kernel.org
 help / color / mirror / Atom feed
* C/R: File substitution at restart
@ 2010-09-08 10:03 Matthieu Fertré
       [not found] ` <4C875F6E.2030004-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Matthieu Fertré @ 2010-09-08 10:03 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Serge E. Hallyn, Nathan Lynch, Louis Rilling, Dan Smith,
	Sukadev Bhattiprolu

Hi,

Here is a proposal for a C/R related feature already developed in
Kerrighed: file substitution at restart.

The goal of this mail is to start a discussion about adding such feature
to Linux cr. Comments are welcome!



What is file substitution ?
===========================

It is the ability, at restart of a checkpointed application, to
substitute some of the opened files by some other files.

Only files accessed through a FD can be substituted, not mapped file
(unless they are reachable through a FD in the same time).

The feature ensures 'struct file' sharing as before the checkpoint.
Thus, if a process is for instance sharing the same struct file for
stdin, stdout, and stderr, it is not possible to give a different file
for each at restart.

Use cases
=========

1) Circumvent Checkpointer limitations:

* Allow to restart an application that has some files not supported by
checkpoint/restart implementation.

2) Conflicts between existing files and files that should be restored:

* Allow to restart an application of which one input data file is not
writable by the user and thus can not be restored/replaced.

* Allow to restart an application of which one output data file is
already open by another instance of the program.

3) Checkpoint/restart optimization/flexibility:

* Let the application checkpoint and restore files by itself to get
better performance or flexibility.

Example: OpenMPI sockets. Avoid to handle communication buffers and
ensure consistent distributed state.

How user(s) can use it ?
========================

(Kerrighed) restart manual page quote:
"-s file_identifier,fd, --substitute-file=file_identifier,fd

This option allows to replace one of the open files of the checkpointed
application by one of the file opened by the process calling the restart
command.

fd is the file descriptor (as given by open (2)) of the calling process
that will be used as a replacement after the restart.

file_identifier is an identifier of one the open files of the
checkpointed application. This identifier is generated at checkpoint
time. It can be retrieved from the file(s) user_info_*.txt that live(s)
in the checkpoint directory. Each line of this file refers to one of the
open files of the checkpointed application. For each open file, we get
the following information:
type|file_identifier|symbolic name|list of pid:fd

This option can be used several times to substitute several files."

Here is a simple (and stupid?) example extracted from Louis's talk at OLS:

$ krgcr-run ping localhost
Running application 6315097
(ping output omitted)
$ checkpoint -i 6315097
$
$ cat /var/chkpt/6315097/v1/user_info_1.txt
socket |0001FFFF880066D68EA8|socket:[219057]|6315097:3
tty    |0001FFFF88007D040EA8|/dev/pts/1|6315097:0,6315097:1,6315097:2

$ # Use current terminal for standard I/O at restart:
$ # ('-t' stands for tty and acts as a wrapper for option '-s')
$ restart -t 6315097 1

$ # Use stdin instead of socket at restart:
$ restart -s 0001FFFF880066D68EA8,0 6315097 1



Thanks,

Matthieu

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-09-09 11:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-08 10:03 C/R: File substitution at restart Matthieu Fertré
     [not found] ` <4C875F6E.2030004-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org>
2010-09-08 13:09   ` Serge E. Hallyn
     [not found]     ` <20100908130931.GA11161-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-09-08 17:56       ` Sukadev Bhattiprolu
     [not found]         ` <20100908175648.GA12281-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-09-08 22:49           ` Serge E. Hallyn
2010-09-08 19:35       ` Matt Helsley
     [not found]         ` <20100908193531.GB8957-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-09-09  1:03           ` Serge E. Hallyn
     [not found]             ` <20100909010352.GA13880-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-09-09  4:06               ` Matt Helsley
     [not found]                 ` <20100909040635.GE8957-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-09-09 10:37                   ` Louis Rilling
     [not found]                     ` <20100909103720.GF4812-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2010-09-09 11:02                       ` Matt Helsley
     [not found]                         ` <20100909110220.GF8957-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-09-09 11:34                           ` Louis Rilling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.