linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/10] container-based checkpoint/restart prototype
@ 2011-02-28 23:40 ntl
  2011-02-28 23:40 ` [PATCH 01/10] Make exec_mmap extern ntl
                   ` (10 more replies)
  0 siblings, 11 replies; 41+ messages in thread
From: ntl @ 2011-02-28 23:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: containers, Oren Laadan, Nathan Lynch

From: Nathan Lynch <ntl@pobox.com>

Checkpoint/restart is a facility by which one can save the state of a
job to a file and restart it later under the right conditions.  This
is a C/R prototype intended to illustrate how well (or poorly) it
would fit into the Linux kernel.  It is basically a fork of the
"linux-cr" patch set by Oren Laadan and others, but it is more limited
in scope and has a different system call interface.  I believe what I
have here is a decent starting point for a C/R implementation that can
go upstream, but I'm releasing early with the hope of receiving some
feedback/review on the overall approach before pursuing it too much
further.

The intended users are HPC, big homogeneous clusters, environments
with long-running jobs that are not easily interrupted without losing
work, for whatever reason (perhaps you've misplaced the source code
for your program and can't modify it to checkpoint and restore its own
state).  In these situations checkpoint/restart provides a rollback
mechanism to mitigate the effects of hardware/system failures as well
as a means of migrating jobs between nodes.


How it works:

Only a process with PID 1 ("init") can call checkpoint or restart.

Checkpoint freezes the rest of the pidns and goes about dumping the
state of all the other tasks in the PID namespace to the specificed
file descriptor.  The state of the caller is not recorded.

Before calling restart, init is expected to set up the environment
(mounts, net devices and such) in accord with the checkpointed job's
"expectations".  The restart system call recreates the task tree
(except for init itself) and the tasks resume execution; init can
then wait(2) for tasks to exit in the normal fashion.


Limitations:

This implementation is limited to containers by design (and this
prototype is limited to checkpoint/restore of a single simple task).
A Linux "container" doesn't have a universally agreed upon definition,
but in this context we are referring to a group of processes for which
the PID namespace (and possibly other namespaces) is isolated from the
rest of the system (see clone(2)).  This is the tradeoff we ask users
to make - the ability to C/R and migrate is provided in exchange for
accepting some isolation and slightly reduced ease of use.  A tool
such as lxc (http://lxc.sourceforge.net) can be used to isolate jobs.
A patch against lxc is available which adds C/R capability.

The user must ensure that a restarted job's view of the filesystem is
effectively the same as it was at the time of checkpoint.

Processes that map device memory and other such hardware-dependent
things will probably not be supported.


To do:

Multiple tasks
Signal state
System call restart blocks
More code cleanup/simplification
Other architecture support
System V IPC
Network/sockets
And much more


 Documentation/filesystems/vfs.txt  |   13 +-
 arch/x86/Kconfig                   |    4 +
 arch/x86/include/asm/checkpoint.h  |   17 +
 arch/x86/include/asm/elf.h         |    5 +
 arch/x86/include/asm/ldt.h         |    7 +
 arch/x86/include/asm/unistd_32.h   |    4 +-
 arch/x86/kernel/Makefile           |    2 +
 arch/x86/kernel/checkpoint.c       |  677 +++++++++++++++++++++++++++
 arch/x86/kernel/syscall_table_32.S |    2 +
 arch/x86/vdso/vdso32-setup.c       |   25 +-
 drivers/char/mem.c                 |    6 +
 drivers/char/random.c              |    6 +
 fs/Makefile                        |    1 +
 fs/aio.c                           |   27 ++
 fs/checkpoint.c                    |  695 +++++++++++++++++++++++++++
 fs/exec.c                          |    2 +-
 fs/ext2/dir.c                      |    3 +
 fs/ext2/file.c                     |    6 +
 fs/ext3/dir.c                      |    3 +
 fs/ext3/file.c                     |    3 +
 fs/ext4/dir.c                      |    3 +
 fs/ext4/file.c                     |    6 +
 fs/fcntl.c                         |   21 +-
 fs/locks.c                         |   35 ++
 include/linux/aio.h                |    2 +
 include/linux/checkpoint.h         |  347 ++++++++++++++
 include/linux/fs.h                 |   15 +
 include/linux/magic.h              |    3 +
 include/linux/mm.h                 |   15 +
 init/Kconfig                       |    2 +
 kernel/Makefile                    |    1 +
 kernel/checkpoint/Kconfig          |   15 +
 kernel/checkpoint/Makefile         |    9 +
 kernel/checkpoint/checkpoint.c     |  437 +++++++++++++++++
 kernel/checkpoint/objhash.c        |  368 +++++++++++++++
 kernel/checkpoint/restart.c        |  651 ++++++++++++++++++++++++++
 kernel/checkpoint/sys.c            |  208 +++++++++
 kernel/sys_ni.c                    |    4 +
 mm/Makefile                        |    1 +
 mm/checkpoint.c                    |  906 ++++++++++++++++++++++++++++++++++++
 mm/filemap.c                       |    4 +
 mm/mmap.c                          |    3 +
 42 files changed, 4549 insertions(+), 15 deletions(-)
 create mode 100644 arch/x86/include/asm/checkpoint.h
 create mode 100644 arch/x86/kernel/checkpoint.c
 create mode 100644 fs/checkpoint.c
 create mode 100644 include/linux/checkpoint.h
 create mode 100644 kernel/checkpoint/Kconfig
 create mode 100644 kernel/checkpoint/Makefile
 create mode 100644 kernel/checkpoint/checkpoint.c
 create mode 100644 kernel/checkpoint/objhash.c
 create mode 100644 kernel/checkpoint/restart.c
 create mode 100644 kernel/checkpoint/sys.c
 create mode 100644 mm/checkpoint.c

-- 
1.7.4


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2011-04-05 19:19 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-28 23:40 [RFC 00/10] container-based checkpoint/restart prototype ntl
2011-02-28 23:40 ` [PATCH 01/10] Make exec_mmap extern ntl
2011-04-03 16:56   ` Serge E. Hallyn
2011-02-28 23:40 ` [PATCH 02/10] Introduce mm_has_pending_aio() helper ntl
2011-03-01 15:40   ` Jeff Moyer
2011-03-01 16:04     ` Nathan Lynch
2011-02-28 23:40 ` [PATCH 03/10] Introduce has_locks_with_owner() helper ntl
2011-04-03 18:55   ` Serge E. Hallyn
2011-02-28 23:40 ` [PATCH 04/10] Introduce vfs_fcntl() helper ntl
2011-04-03 18:57   ` Serge E. Hallyn
2011-02-28 23:40 ` [PATCH 05/10] Core checkpoint/restart support code ntl
2011-04-03 19:03   ` Serge E. Hallyn
2011-04-04 15:00     ` Nathan Lynch
2011-04-04 15:10       ` Serge E. Hallyn
2011-04-04 15:40         ` Nathan Lynch
2011-04-04 16:27           ` Serge E. Hallyn
2011-04-04 17:32             ` Oren Laadan
2011-04-04 21:43               ` Nathan Lynch
2011-04-04 22:03                 ` Serge E. Hallyn
2011-04-04 23:42                   ` Dan Smith
2011-04-05  2:17                     ` Serge E. Hallyn
2011-04-05 19:18                       ` Nathan Lynch
2011-04-04 22:29                 ` Matt Helsley
2011-04-04 17:41             ` Andrew Morton
2011-04-04 18:51               ` Serge E. Hallyn
2011-04-04 19:42                 ` Andrew Morton
2011-04-04 20:29                   ` Serge E. Hallyn
2011-04-04 21:55                   ` Matt Helsley
2011-04-04 23:15                     ` Andrew Morton
2011-04-04 23:16                     ` Valdis.Kletnieks
2011-04-04 23:43                       ` Matt Helsley
2011-04-04 22:11                   ` Serge E. Hallyn
2011-04-04 22:53                   ` Serge E. Hallyn
2011-04-04 21:20             ` Nathan Lynch
2011-04-04 21:53               ` Serge E. Hallyn
2011-02-28 23:40 ` [PATCH 06/10] Checkpoint/restart mm support ntl
2011-02-28 23:40 ` [PATCH 07/10] Checkpoint/restart vfs support ntl
2011-02-28 23:40 ` [PATCH 08/10] Add generic '->checkpoint' f_op to ext filesystems ntl
2011-02-28 23:40 ` [PATCH 09/10] Add generic '->checkpoint()' f_op to simple char devices ntl
2011-02-28 23:40 ` [PATCH 10/10] x86_32 support for checkpoint/restart ntl
2011-03-01  1:08 ` [RFC 00/10] container-based checkpoint/restart prototype Nathan Lynch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).