linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave@linux.vnet.ibm.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Theodore Tso <tytso@mit.edu>, Arnd Bergmann <arnd@arndb.de>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org,
	Peter Chubb <peterc@gelato.unsw.edu.au>,
	Daniel Lezcano <daniel.lezcano@fr.ibm.com>
Subject: Re: checkpoint/restart ABI
Date: Tue, 12 Aug 2008 07:58:11 -0700	[thread overview]
Message-ID: <1218553091.5598.76.camel@nimitz> (raw)
In-Reply-To: <48A0CD86.6030704@goop.org>

On Mon, 2008-08-11 at 16:38 -0700, Jeremy Fitzhardinge wrote:
> This feature, as it currently stands, is essentially useless for any 
> practical purpose.  Self-checkpointing a single process with no handling 
> of non-file file descriptors and no proper handling of file 
> file-descriptors is not very useful.
> 
> My understanding that this is basically a prototype for a more useful 
> multi-process or container-wide checkpoint facility.

Yes, that's exactly it.  We're diverging from discussing the important
bits as it is, and I think we'd do that more and more with extra
code. :)

> While you could try to come up with an extensible file format that would 
> be able to handle any future extensions, the chances are you'd get it 
> wrong and need to break file format compatibility anyway.

Amen to that.  I won't speak for the rest of the whackos interested in
this stuff, but I *KNOW* I'm not clever enough to pull it off.

> I'm more interested in seeing a description of how you're doing to 
> handle things like:
> 
>     * multiple processes
>     * pipes
>     * UNIX domain sockets
>     * INET sockets (both inter and intra machine)
>     * unlinked open files
>     * checkpointing file content
>     * closed files (ie, files which aren't currently open, but will be
>       soon, esp tmp files)
>     * shared memory
>     * (Peter, what have I forgotten?)
> 
> Having gone through this before, I don't think an all-kernel solution 
> can work except for the most simple cases.

So, there's a lot of stuff there.  The networking stuff is way out of my
league, so I'll cc Daniel and make him answer. :)

All of the other stuff has been done in various in-kernel
implementations.  OpenVZ, IBM's Metacluster, Zap (Oren's work at
Columbia).  Most of it *can* be done from userspace, but some of it is
very painful.  There are some good OLS papers describing most of these
things.  Zap might have had one or two academic papers written about it.
Maybe.  ;)

Unlinked files, for instance, are actually available in /proc.  You can
freeze the app, write a helper that opens /proc/1234/fd, then copies its
contents to a linked file (ooooh, with splice!)  Anyway, if we can do it
in userspace, we can surely do it in the kernel.

I'm not sure what you mean by "closed files".  Either the app has a fd,
it doesn't, or it is in sys_open() somewhere.  We have to get the app
into a quiescent state before we can checkpoint, so we basically just
say that we won't checkpoint things that are *in* the kernel.

Is there anything specific you are thinking of that particularly worries
you?  I could write pages on the list you have there.

> Which, come to think of it, is an important point.  What are the 
> expected use-cases for this feature?  Do you really mean 
> checkpoint/restart?  Do you expect to be able to checkpoint a process, 
> leave it running, then "rewind" by restoring the image?  Or does 
> checkpoint always atomically kill the source process(es)?  Are you 
> expecting to be able to resume on another machine?

Yes.

We all want different things, and there are a lot of people interested
in this stuff.  So, I think all of what you've mentioned above are
goals, at least long term.  Some, *really* long term.

I don't want to get into a full virtualization vs. containers debate,
but we also want it for all the same reasons that you migrate Xen
partitions.

> Lightweight filesystem checkpointing, such as btrfs provides, would seem 
> like a powerful mechanism for handling a lot of the filesystem state 
> problems.  It would have been useful when we did this...

Yup.  We were just chatting about that with some filesystem folks last
week.  But, as the OpenVZ dudes like to mention, the poor man's way of
moving filesystem snapshots around is always rsync.

-- Dave


  parent reply	other threads:[~2008-08-12 14:58 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-07 22:40 [RFC][PATCH 0/4] kernel-based checkpoint restart Dave Hansen
2008-08-07 22:40 ` [RFC][PATCH 1/4] checkpoint-restart: general infrastructure Dave Hansen
2008-08-08  9:46   ` Arnd Bergmann
2008-08-08 18:50     ` Dave Hansen
2008-08-08 20:59       ` Oren Laadan
2008-08-08 22:17         ` Dave Hansen
2008-08-08 23:27           ` Oren Laadan
2008-08-08 22:23         ` Arnd Bergmann
2008-08-14  8:09         ` [Devel] " Pavel Emelyanov
2008-08-14 15:16           ` Dave Hansen
2008-08-08 22:13       ` Arnd Bergmann
2008-08-08 22:26         ` Dave Hansen
2008-08-08 22:39           ` Arnd Bergmann
2008-08-09  0:43             ` Dave Hansen
2008-08-09  6:37               ` Arnd Bergmann
2008-08-09 13:39                 ` Dave Hansen
2008-08-11 15:07           ` Serge E. Hallyn
2008-08-11 15:25             ` Arnd Bergmann
2008-08-14  5:53             ` Pavel Machek
2008-08-14 15:12               ` Dave Hansen
2008-08-20 21:40               ` Oren Laadan
2008-08-11 15:22         ` Serge E. Hallyn
2008-08-11 16:53           ` Arnd Bergmann
2008-08-11 17:11             ` Dave Hansen
2008-08-11 19:48             ` checkpoint/restart ABI Dave Hansen
2008-08-11 21:47               ` Arnd Bergmann
2008-08-11 23:14                 ` Jonathan Corbet
2008-08-11 23:23                   ` Dave Hansen
2008-08-21  5:56                 ` Oren Laadan
2008-08-21  8:43                   ` Arnd Bergmann
2008-08-21 15:43                     ` Oren Laadan
2008-08-11 21:54               ` Oren Laadan
2008-08-11 23:38               ` Jeremy Fitzhardinge
2008-08-11 23:54                 ` Peter Chubb
2008-08-12 14:49                   ` Serge E. Hallyn
2008-08-28 23:40                     ` Eric W. Biederman
2008-08-12 15:11                   ` Dave Hansen
2008-08-12 14:58                 ` Dave Hansen [this message]
2008-08-12 16:32                   ` Jeremy Fitzhardinge
2008-08-12 16:46                     ` Dave Hansen
2008-08-12 17:04                       ` Jeremy Fitzhardinge
2008-08-20 21:52                         ` Oren Laadan
2008-08-20 21:54                       ` Oren Laadan
2008-08-20 22:11                         ` Dave Hansen
2008-08-11 18:03   ` [RFC][PATCH 1/4] checkpoint-restart: general infrastructure Jonathan Corbet
2008-08-11 18:38     ` Dave Hansen
2008-08-12  3:44       ` Oren Laadan
2008-08-18  9:26   ` [Devel] " Pavel Emelyanov
2008-08-20 19:10     ` Dave Hansen
2008-08-07 22:40 ` [RFC][PATCH 2/4] checkpoint/restart: x86 support Dave Hansen
2008-08-08 12:09   ` Arnd Bergmann
2008-08-08 20:28     ` Oren Laadan
2008-08-08 22:29       ` Arnd Bergmann
2008-08-08 23:04         ` Oren Laadan
2008-08-09  0:38           ` Dave Hansen
2008-08-09  1:20             ` Oren Laadan
2008-08-09  2:20               ` Dave Hansen
2008-08-09  2:35                 ` Oren Laadan
2008-08-10 14:55             ` Jeremy Fitzhardinge
2008-08-11 15:36               ` Dave Hansen
2008-08-11 16:07                 ` Jeremy Fitzhardinge
2008-08-09  6:43           ` Arnd Bergmann
2008-08-07 22:40 ` [RFC][PATCH 3/4] checkpoint/restart: memory management Dave Hansen
2008-08-08 12:12   ` Arnd Bergmann
2008-08-07 22:40 ` [RFC][PATCH 4/4] introduce sys_checkpoint and sys_restore Dave Hansen
2008-08-08 12:15   ` Arnd Bergmann
2008-08-08 20:33     ` Oren Laadan
2008-08-08  9:25 ` [RFC][PATCH 0/4] kernel-based checkpoint restart Arnd Bergmann
2008-08-08 18:06   ` Dave Hansen
2008-08-08 18:18     ` Arnd Bergmann
2008-08-08 19:44   ` Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1218553091.5598.76.camel@nimitz \
    --to=dave@linux.vnet.ibm.com \
    --cc=arnd@arndb.de \
    --cc=containers@lists.linux-foundation.org \
    --cc=daniel.lezcano@fr.ibm.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterc@gelato.unsw.edu.au \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).