linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Theodore Tso <tytso@mit.edu>, Arnd Bergmann <arnd@arndb.de>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org,
	Peter Chubb <peterc@gelato.unsw.edu.au>,
	Daniel Lezcano <daniel.lezcano@fr.ibm.com>
Subject: Re: checkpoint/restart ABI
Date: Tue, 12 Aug 2008 09:32:57 -0700	[thread overview]
Message-ID: <48A1BB39.3090108@goop.org> (raw)
In-Reply-To: <1218553091.5598.76.camel@nimitz>

Dave Hansen wrote:
>> I'm more interested in seeing a description of how you're doing to 
>> handle things like:
>>
>>     * multiple processes
>>     * pipes
>>     * UNIX domain sockets
>>     * INET sockets (both inter and intra machine)
>>     * unlinked open files
>>     * checkpointing file content
>>     * closed files (ie, files which aren't currently open, but will be
>>       soon, esp tmp files)
>>     * shared memory
>>     * (Peter, what have I forgotten?)
>>
>> Having gone through this before, I don't think an all-kernel solution 
>> can work except for the most simple cases.
>>     
>
> So, there's a lot of stuff there.  The networking stuff is way out of my
> league, so I'll cc Daniel and make him answer. :)
>   

Inter-machine networking stuff is hard because its outside the 
checkpointed set, so the checkpoint is observable.  Migration is easier, 
in principle, because you might be able to shift the connection endpoint 
without bringing it down.  Dealing with networking within your 
checkpointed set is just fiddly, particularly remembering and restoring 
all the details of things like urgent messages, on-the-fly file 
descriptors, packet boundaries, etc.

> Unlinked files, for instance, are actually available in /proc.  You can
> freeze the app, write a helper that opens /proc/1234/fd, then copies its
> contents to a linked file (ooooh, with splice!)  Anyway, if we can do it
> in userspace, we can surely do it in the kernel.
>   

Sure, there's no inherent problem.  But do you imagine including the 
file contents within your checkpoint image, or would they be saved 
separately?

> I'm not sure what you mean by "closed files".  Either the app has a fd,
> it doesn't, or it is in sys_open() somewhere.  We have to get the app
> into a quiescent state before we can checkpoint, so we basically just
> say that we won't checkpoint things that are *in* the kernel.
>   

It's common for an app to write a tmp file, close it, and then open it a 
bit later expecting to find the content it just wrote.  If you 
checkpoint-kill it in the interim, reboot (clearing out /tmp) and then 
resume, then it will lose its tmp file.  There's no explicit connection 
between the process and its potential working set of files.  We had to 
deal with it by setting a bunch of policy files to tell the 
checkpoint/restart system what filename patterns it had to look out 
for.  But if you just checkpoint the whole filesystem state along with 
the process(es), then perhaps it isn't an issue.

> Is there anything specific you are thinking of that particularly worries
> you?  I could write pages on the list you have there.
>   

No, that's the problem; it all worries me.  It's a big problem space.

>> Which, come to think of it, is an important point.  What are the 
>> expected use-cases for this feature?  Do you really mean 
>> checkpoint/restart?  Do you expect to be able to checkpoint a process, 
>> leave it running, then "rewind" by restoring the image?  Or does 
>> checkpoint always atomically kill the source process(es)?  Are you 
>> expecting to be able to resume on another machine?
>>     
>
> Yes.
>
> We all want different things, and there are a lot of people interested
> in this stuff.  So, I think all of what you've mentioned above are
> goals, at least long term.  Some, *really* long term.
>   

So, in other words: whoever wants to work on it gets to define (their) 
goals.  Fair enough.

> I don't want to get into a full virtualization vs. containers debate,
> but we also want it for all the same reasons that you migrate Xen
> partitions.
>   

No, I don't have any real opinion about containers vs virtualization.  I 
think they're quite distinct solutions for distinct problems.

But I was involved in the design and implementation of a 
checkpoint-restart system (along with Peter Chubb), and have the scars 
to prove it.  We implemented it for IRIX; we called it Hibernator, and 
licensed it to SGI for a while (I don't remember what name they marketed 
it under).  The list of problems that Peter and I mentioned are ones we 
had to solve (or, in some cases, failed to solve) to get a workable system.

    J

  reply	other threads:[~2008-08-12 16:33 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-07 22:40 [RFC][PATCH 0/4] kernel-based checkpoint restart Dave Hansen
2008-08-07 22:40 ` [RFC][PATCH 1/4] checkpoint-restart: general infrastructure Dave Hansen
2008-08-08  9:46   ` Arnd Bergmann
2008-08-08 18:50     ` Dave Hansen
2008-08-08 20:59       ` Oren Laadan
2008-08-08 22:17         ` Dave Hansen
2008-08-08 23:27           ` Oren Laadan
2008-08-08 22:23         ` Arnd Bergmann
2008-08-14  8:09         ` [Devel] " Pavel Emelyanov
2008-08-14 15:16           ` Dave Hansen
2008-08-08 22:13       ` Arnd Bergmann
2008-08-08 22:26         ` Dave Hansen
2008-08-08 22:39           ` Arnd Bergmann
2008-08-09  0:43             ` Dave Hansen
2008-08-09  6:37               ` Arnd Bergmann
2008-08-09 13:39                 ` Dave Hansen
2008-08-11 15:07           ` Serge E. Hallyn
2008-08-11 15:25             ` Arnd Bergmann
2008-08-14  5:53             ` Pavel Machek
2008-08-14 15:12               ` Dave Hansen
2008-08-20 21:40               ` Oren Laadan
2008-08-11 15:22         ` Serge E. Hallyn
2008-08-11 16:53           ` Arnd Bergmann
2008-08-11 17:11             ` Dave Hansen
2008-08-11 19:48             ` checkpoint/restart ABI Dave Hansen
2008-08-11 21:47               ` Arnd Bergmann
2008-08-11 23:14                 ` Jonathan Corbet
2008-08-11 23:23                   ` Dave Hansen
2008-08-21  5:56                 ` Oren Laadan
2008-08-21  8:43                   ` Arnd Bergmann
2008-08-21 15:43                     ` Oren Laadan
2008-08-11 21:54               ` Oren Laadan
2008-08-11 23:38               ` Jeremy Fitzhardinge
2008-08-11 23:54                 ` Peter Chubb
2008-08-12 14:49                   ` Serge E. Hallyn
2008-08-28 23:40                     ` Eric W. Biederman
2008-08-12 15:11                   ` Dave Hansen
2008-08-12 14:58                 ` Dave Hansen
2008-08-12 16:32                   ` Jeremy Fitzhardinge [this message]
2008-08-12 16:46                     ` Dave Hansen
2008-08-12 17:04                       ` Jeremy Fitzhardinge
2008-08-20 21:52                         ` Oren Laadan
2008-08-20 21:54                       ` Oren Laadan
2008-08-20 22:11                         ` Dave Hansen
2008-08-11 18:03   ` [RFC][PATCH 1/4] checkpoint-restart: general infrastructure Jonathan Corbet
2008-08-11 18:38     ` Dave Hansen
2008-08-12  3:44       ` Oren Laadan
2008-08-18  9:26   ` [Devel] " Pavel Emelyanov
2008-08-20 19:10     ` Dave Hansen
2008-08-07 22:40 ` [RFC][PATCH 2/4] checkpoint/restart: x86 support Dave Hansen
2008-08-08 12:09   ` Arnd Bergmann
2008-08-08 20:28     ` Oren Laadan
2008-08-08 22:29       ` Arnd Bergmann
2008-08-08 23:04         ` Oren Laadan
2008-08-09  0:38           ` Dave Hansen
2008-08-09  1:20             ` Oren Laadan
2008-08-09  2:20               ` Dave Hansen
2008-08-09  2:35                 ` Oren Laadan
2008-08-10 14:55             ` Jeremy Fitzhardinge
2008-08-11 15:36               ` Dave Hansen
2008-08-11 16:07                 ` Jeremy Fitzhardinge
2008-08-09  6:43           ` Arnd Bergmann
2008-08-07 22:40 ` [RFC][PATCH 3/4] checkpoint/restart: memory management Dave Hansen
2008-08-08 12:12   ` Arnd Bergmann
2008-08-07 22:40 ` [RFC][PATCH 4/4] introduce sys_checkpoint and sys_restore Dave Hansen
2008-08-08 12:15   ` Arnd Bergmann
2008-08-08 20:33     ` Oren Laadan
2008-08-08  9:25 ` [RFC][PATCH 0/4] kernel-based checkpoint restart Arnd Bergmann
2008-08-08 18:06   ` Dave Hansen
2008-08-08 18:18     ` Arnd Bergmann
2008-08-08 19:44   ` Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48A1BB39.3090108@goop.org \
    --to=jeremy@goop.org \
    --cc=arnd@arndb.de \
    --cc=containers@lists.linux-foundation.org \
    --cc=daniel.lezcano@fr.ibm.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterc@gelato.unsw.edu.au \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).