linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matt Helsley <matthltc@us.ibm.com>
To: Oren Laadan <orenl@cs.columbia.edu>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
	containers@lists.linux-foundation.org, akpm@linux-foundation.org,
	xemul@parallels.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 18/38] C/R: core stuff
Date: Thu, 28 May 2009 15:33:34 -0700	[thread overview]
Message-ID: <20090528223334.GB17402@us.ibm.com> (raw)
In-Reply-To: <4A1F0E29.2040506@cs.columbia.edu>

On Thu, May 28, 2009 at 06:20:25PM -0400, Oren Laadan wrote:
> 
> 
> Alexey Dobriyan wrote:
> > On Wed, May 27, 2009 at 06:45:04PM -0400, Oren Laadan wrote:
> >> Alexey Dobriyan wrote:
> >>> On Wed, May 27, 2009 at 04:56:27PM -0400, Oren Laadan wrote:
> >>>> Alexey Dobriyan wrote:
> >>>>> On Tue, May 26, 2009 at 08:16:44AM -0500, Serge E. Hallyn wrote:
> >>>>>> Quoting Alexey Dobriyan (adobriyan@gmail.com):
> >>>>>>> Introduction
> >>>>>>> ------------
> >>>>>>> Checkpoint/restart (C/R from now) allows to dump group of processes to disk
> >>>>>>> for various reasons like saving process state in case of box failure or
> >>>>>>> restoration of group of processes on another or same machine later.
> >>>>>>>
> >>>>>>> Unlike, let's say, hypervisor C/R style which only needs to freeze guest kernel
> >>>>>>> and dump more or less raw pages, proposed C/R doesn't require hypervisor.
> >>>>>>> For that C/R code needs to know about all little and big intimate kernel details.
> >>>>>>>
> >>>>>>> The good thing is that not all details needs to be serialized and saved
> >>>>>>> like, say, readahead state. The bad things is still quite a few things
> >>>>>>> need to be.
> >>>>>> Hi Alexey,
> >>>>>>
> >>>>>> the last time you posted this, I went through and tried to discern the
> >>>>>> meaningful differences between yours and Oren's patchsets.  Then I sent some
> >>>>>> patches to Oren to make his set configurable to act more like yours.  And Oren
> >>>>>> took them!  But now you resend this patchset with no real changelog, no
> >>>>>> acknowledgment that Oren's set even exists
> >>>>> Is this a requirement? Everybody following topic already knows about
> >>>>> Oren's patchset.
> >>>> Some people do ack other people's work. See for example patches #1
> >>>> and #24 in my recent post. You're welcome.
> >>>>
> >>>>>> - or is much farther along and pretty widely reviewed and tested (which is
> >>>>>> only because he started earlier and, when we asked for your counterpatches
> >>>>>> at an earlier stage, you would never reply) - or, most importantly, what
> >>>>>> it is that you think your patchset does that his does not and cannot.
> >>>>> There are differences. And they're not small like you're trying to describe
> >>>>> but pretty big compared the scale of the problem.
> >>>> I've asked before, and I repeat now: can you enumerate these "big"
> >>>> scary differences that make it such a "big" problem ?
> >>>>
> >>>> So far, we identified two main "design" issues -
> >>> Why in "? Yes, they are high-level design issues.
> >>>
> >> In quotes, because I argued further on that, although my patchset
> >> takes a stand on both issues, it can be easily reverted _within_
> >> that patchset. Moreover, I argue that they can co-exist.
> >>
> >>>> 1) Whether or not allow c/r of sub-container (partial hierarchy)
> >>>>
> >>>> 2) Creation of restarting process hierarchy in kernel or in userspace
> >>>>
> >>>> As for #1, you are the _only_ one who advocates restricting c/r to
> >>>> a full container only. I guess you have your reasons, but I'm unsure
> >>>> what they may be.
> >>> The reason is that checkpointing half-frozen, half-live container is
> >>> essentially equivalent to live container which adds much complexity
> >>> to code fundamentally preventing kernel from taking coherent snapshot.
> >>>
> >>> In such situations kernel will do its job badly.
> >> In such situation the kernel will do a bad job if the user is asking
> >> for a bad job.
> > 
> > User doesn't even understand why we're discussing this issue so hard.
> > 
> >> Just like checkpointing without snapshotting the file system and expecting
> >> it to always work.
> > 
> > This is different.
> > 
> > Kernel can't do anything about not-synced fs. Because nodoby is
> > advocating that kernel should sync fs. Consequently, screwup in fs sync is
> > clearly user failure. Any (yours, mine) in-kernel C/R has this failure mode,
> > so we skip it and discuss what's left.
> > 
> > Now, kernel CAN do something about tasks and other data structures
> > because it easily controls them.
> > 
> > Your procedure for checkpointing starts with "kill -STOP".
> 
> Wrong. It requires the processes to be frozen.
> 
> > To make anything reliable, you have to ban "kill -CONT" for the duration of
> > checkpointing. Is this done BTW? I don't remember new flags added
> > in task_struct. Or this is going to be skipped on grounds that it's
> > user screwup (potentially oopsable).
> > 
> > That's why, OpenVZ relies on suspend-to-ram freezer solely, because userspace
> > can't arbitrarily send suspend and freeze notifications. We only need to
> > protect against untimely STR unfreeze which only adds code in C/R code
> > not in task_struct.
> 
> Same principle for both patchsets:  tasks may *not* be permitted to
> execute while being checkpointed.
> 
> For this I suggested a CHECKPOINTING freezer state: transition to/from
> this state is done _only_ by sys_checkpoint(), so that checkpointed
> processes cannot be unfrozen. Matt Helseley already posted a patch to
> implement this.

In case it helps, here's the patch and some feedback Oren gave me:

https://lists.linux-foundation.org/pipermail/containers/2009-May/017586.html

Cheers,
	-Matt Helsley

  reply	other threads:[~2009-05-28 22:33 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-22  4:54 [PATCH 01/38] cred: #include init.h in cred.h Alexey Dobriyan
2009-05-22  4:54 ` [PATCH 02/38] utsns: extract create_uts_ns() Alexey Dobriyan
2009-05-24 22:37   ` Serge E. Hallyn
2009-05-22  4:54 ` [PATCH 03/38] ipcns 1/4: remove useless get/put while CLONE_NEWIPC Alexey Dobriyan
2009-05-22  9:00   ` Amerigo Wang
2009-05-22  4:54 ` [PATCH 04/38] ipcns 2/4: extract create_ipc_ns() Alexey Dobriyan
2009-05-22  8:59   ` Amerigo Wang
2009-05-22  4:54 ` [PATCH 05/38] ipcns 3/4: make free_ipc_ns() static Alexey Dobriyan
2009-05-24 22:40   ` Serge E. Hallyn
2009-05-22  4:55 ` [PATCH 06/38] ipcns 4/2: move free_ipcs() proto Alexey Dobriyan
2009-05-24 22:49   ` Serge E. Hallyn
2009-05-22  4:55 ` [PATCH 07/38] pidns 1/2: make create_pid_namespace() accept parent pidns Alexey Dobriyan
2009-05-22  9:20   ` Amerigo Wang
2009-05-24 22:44   ` Serge E. Hallyn
2009-06-04  0:20   ` Sukadev Bhattiprolu
2009-05-22  4:55 ` [PATCH 08/38] pidns 2/2: rewrite copy_pid_ns() Alexey Dobriyan
2009-05-22  9:14   ` Amerigo Wang
2009-05-24 22:45   ` Serge E. Hallyn
2009-06-04  0:17   ` Sukadev Bhattiprolu
2009-05-22  4:55 ` [PATCH 09/38] netns 1/2: don't get/put old netns on CLONE_NEWNET Alexey Dobriyan
2009-05-22  6:30   ` David Miller
2009-05-22  4:55 ` [PATCH 10/38] netns 2/2: extract net_create() Alexey Dobriyan
2009-05-22  6:30   ` David Miller
2009-05-22  4:55 ` [PATCH 11/38] nsproxy: extract create_nsproxy() Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 12/38] i386: ifdef out struct thread_struct::fs Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 13/38] x86_64: ifdef out struct thread_struct::ip Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 14/38] Remove struct mm_struct::exe_file et al Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 15/38] dcache: extract and use d_unlinked() Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 16/38] x86: ptrace debugreg checks rewrite Alexey Dobriyan
2009-05-26 23:25   ` Andrew Morton
2009-05-22  4:55 ` [PATCH 17/38] groups: move code to kernel/groups.c Alexey Dobriyan
2009-05-25  0:53   ` Serge E. Hallyn
2009-05-26 14:48   ` Serge E. Hallyn
2009-05-26 18:34     ` Alexey Dobriyan
2009-05-26 23:25       ` Serge E. Hallyn
2009-05-22  4:55 ` [PATCH 18/38] C/R: core stuff Alexey Dobriyan
2009-05-26 13:16   ` Serge E. Hallyn
2009-05-26 19:35     ` Alexey Dobriyan
2009-05-26 23:14       ` Serge E. Hallyn
2009-05-26 23:44       ` Serge E. Hallyn
2009-05-28 15:38         ` Alexey Dobriyan
2009-05-28 18:17           ` Serge E. Hallyn
2009-05-28 22:42           ` Oren Laadan
2009-05-27 18:52       ` Dave Hansen
2009-05-27 20:56       ` Oren Laadan
2009-05-27 22:17         ` Alexey Dobriyan
2009-05-27 22:40           ` Andrew Morton
2009-05-27 22:45           ` Oren Laadan
2009-05-28 15:33             ` Alexey Dobriyan
2009-05-28 22:20               ` Oren Laadan
2009-05-28 22:33                 ` Matt Helsley [this message]
2009-05-29  6:01                 ` Alexey Dobriyan
2009-05-29 17:26                   ` Dave Hansen
2009-05-27 22:25         ` Alexey Dobriyan
2009-05-27 16:28   ` Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 19/38] C/R: multiple tasks Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 20/38] C/R: i386 support Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 21/38] C/R: i386 debug registers Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 22/38] C/R: i386 xstate Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 23/38] C/R: x86_64 support Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 24/38] C/R: x86_64 debug registers Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 25/38] C/R: x86_64 xstate Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 26/38] C/R: nsproxy Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 27/38] C/R: checkpoint/restore struct uts_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 28/38] C/R: formally checkpoint/restore struct ipc_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 29/38] C/R: formally checkpoint/restore struct mnt_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 30/38] C/R: checkpoint/restore struct pid_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 31/38] C/R: formally checkpoint/restore struct net_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 32/38] C/R: checkpoint/restore struct cred Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 33/38] C/R: checkpoint/restore aux groups (structy group_info) Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 34/38] C/R: checkpoint/restore struct user Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 35/38] C/R: checkpoint/restore struct user_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 36/38] C/R: checkpoint/restore struct pid Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 37/38] C/R: checkpoint/restore opened files Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 38/38] C/R: checkpoint/restart struct sighand_struct Alexey Dobriyan
2009-05-22  5:02 ` [PATCH 01/38] cred: #include init.h in cred.h Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090528223334.GB17402@us.ibm.com \
    --to=matthltc@us.ibm.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=orenl@cs.columbia.edu \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).