linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oren Laadan <orenl@cs.columbia.edu>
To: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	containers@lists.linux-foundation.org, xemul@parallels.com
Subject: Re: [PATCH 18/38] C/R: core stuff
Date: Wed, 27 May 2009 18:45:04 -0400	[thread overview]
Message-ID: <4A1DC270.8080807@cs.columbia.edu> (raw)
In-Reply-To: <20090527221753.GB8321@x200.localdomain>



Alexey Dobriyan wrote:
> On Wed, May 27, 2009 at 04:56:27PM -0400, Oren Laadan wrote:
>> Alexey Dobriyan wrote:
>>> On Tue, May 26, 2009 at 08:16:44AM -0500, Serge E. Hallyn wrote:
>>>> Quoting Alexey Dobriyan (adobriyan@gmail.com):
>>>>> Introduction
>>>>> ------------
>>>>> Checkpoint/restart (C/R from now) allows to dump group of processes to disk
>>>>> for various reasons like saving process state in case of box failure or
>>>>> restoration of group of processes on another or same machine later.
>>>>>
>>>>> Unlike, let's say, hypervisor C/R style which only needs to freeze guest kernel
>>>>> and dump more or less raw pages, proposed C/R doesn't require hypervisor.
>>>>> For that C/R code needs to know about all little and big intimate kernel details.
>>>>>
>>>>> The good thing is that not all details needs to be serialized and saved
>>>>> like, say, readahead state. The bad things is still quite a few things
>>>>> need to be.
>>>> Hi Alexey,
>>>>
>>>> the last time you posted this, I went through and tried to discern the
>>>> meaningful differences between yours and Oren's patchsets.  Then I sent some
>>>> patches to Oren to make his set configurable to act more like yours.  And Oren
>>>> took them!  But now you resend this patchset with no real changelog, no
>>>> acknowledgment that Oren's set even exists
>>> Is this a requirement? Everybody following topic already knows about
>>> Oren's patchset.
>> Some people do ack other people's work. See for example patches #1
>> and #24 in my recent post. You're welcome.
>>
>>>> - or is much farther along and pretty widely reviewed and tested (which is
>>>> only because he started earlier and, when we asked for your counterpatches
>>>> at an earlier stage, you would never reply) - or, most importantly, what
>>>> it is that you think your patchset does that his does not and cannot.
>>> There are differences. And they're not small like you're trying to describe
>>> but pretty big compared the scale of the problem.
>> I've asked before, and I repeat now: can you enumerate these "big"
>> scary differences that make it such a "big" problem ?
>>
>> So far, we identified two main "design" issues -
> 
> Why in "? Yes, they are high-level design issues.
> 

In quotes, because I argued further on that, although my patchset
takes a stand on both issues, it can be easily reverted _within_
that patchset. Moreover, I argue that they can co-exist.

>> 1) Whether or not allow c/r of sub-container (partial hierarchy)
>>
>> 2) Creation of restarting process hierarchy in kernel or in userspace
>>
>> As for #1, you are the _only_ one who advocates restricting c/r to
>> a full container only. I guess you have your reasons, but I'm unsure
>> what they may be.
> 
> The reason is that checkpointing half-frozen, half-live container is
> essentially equivalent to live container which adds much complexity
> to code fundamentally preventing kernel from taking coherent snapshot.
> 
> In such situations kernel will do its job badly.

In such situation the kernel will do a bad job if the user is asking
for a bad job. Just like checkpointing without snapshotting the
file system and expecting it to always work.

But if the user is a bit more careful (and even then, not that much),
she can enjoy the wonderful benefits of c/r without the wonderful
benefits of containers.

If useful, it's easy to pass a flag to checkpoint() that will ask
to enforce, say, shared memory "leaks" but not nsproxy or file "leaks".

In fact, even shared memory "leaks" may be useful for some users (e.g.
what the guys from kerlabs pointed out).

> 
> Manpage will be filled with strings like "if $FOO is shared then $BAR is
> not guaranteed".
> 
> What to do if user simply doesn't know if container is bounded?
> Checkpoint and to hell with consequences?
> 
> If two tasks share mm_struct you can't even detect that pages you dump
> aren't filled with garbage meanwhile from second task.
> 
> If two tasks share mm_struct, other task can issue AIO indefinitely
> preventing from taking even coherent filesystem snapshot.
> 
> That's why I raise this issue again to hear from people what they think
> and these people shouldn't be containers and C/R people, because the
> latter already made up their minds.

Lol .. and disagreement persists among us :)

And indeed, I have heard and seen already a few opinions in favor
of permitting non-container checkpoint. From potential users (not
c/r people).

> 
> This is super-important issue to get right from the beginning.
> 
>> On the other hand, there has been a handful of use-cases and opinions
>> in favor of allowing both capabilities to co-exist. Not the mention
>> that nearly no additional code is necessary, on the contrary.
>>
>> As for #2, you didn't even bother to reply to the discussion that I
>> had started about it. This decision is important to allow future
>> flexibility of the mechanism, and to address the needs of several
>> potential users, as seen in that discussion and others. Here, too,
>> you are the _only_ one that advocates that direction.
> 
> Are you going to fork to-become-zombies, make them call restart(2) and
> zombify?

Yes.

> 
>> And the funniest thing -- *both* decisions can be *easily* overturned
>> in my patchset. In fact, regarding #2 - either way can be easily done
>> in it.
>>
>> So I wonder, what are the "big" issues that bother you so much ?
>> "if there is a will, there is a way".
> 
> Oren, don't you really understand?
> 
> Users want millions of things, but every thing has price.

I beg to differ: there is marginal price to support both -- in fact,
enforcing the container requirement (e.g. leaks detection - which,
btw, is imperfect and cannot be made race-free) *adds* code over
the non-container case. So in a sense, we get the no-container case
for free.

Oren.


  parent reply	other threads:[~2009-05-27 22:45 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-22  4:54 [PATCH 01/38] cred: #include init.h in cred.h Alexey Dobriyan
2009-05-22  4:54 ` [PATCH 02/38] utsns: extract create_uts_ns() Alexey Dobriyan
2009-05-24 22:37   ` Serge E. Hallyn
2009-05-22  4:54 ` [PATCH 03/38] ipcns 1/4: remove useless get/put while CLONE_NEWIPC Alexey Dobriyan
2009-05-22  9:00   ` Amerigo Wang
2009-05-22  4:54 ` [PATCH 04/38] ipcns 2/4: extract create_ipc_ns() Alexey Dobriyan
2009-05-22  8:59   ` Amerigo Wang
2009-05-22  4:54 ` [PATCH 05/38] ipcns 3/4: make free_ipc_ns() static Alexey Dobriyan
2009-05-24 22:40   ` Serge E. Hallyn
2009-05-22  4:55 ` [PATCH 06/38] ipcns 4/2: move free_ipcs() proto Alexey Dobriyan
2009-05-24 22:49   ` Serge E. Hallyn
2009-05-22  4:55 ` [PATCH 07/38] pidns 1/2: make create_pid_namespace() accept parent pidns Alexey Dobriyan
2009-05-22  9:20   ` Amerigo Wang
2009-05-24 22:44   ` Serge E. Hallyn
2009-06-04  0:20   ` Sukadev Bhattiprolu
2009-05-22  4:55 ` [PATCH 08/38] pidns 2/2: rewrite copy_pid_ns() Alexey Dobriyan
2009-05-22  9:14   ` Amerigo Wang
2009-05-24 22:45   ` Serge E. Hallyn
2009-06-04  0:17   ` Sukadev Bhattiprolu
2009-05-22  4:55 ` [PATCH 09/38] netns 1/2: don't get/put old netns on CLONE_NEWNET Alexey Dobriyan
2009-05-22  6:30   ` David Miller
2009-05-22  4:55 ` [PATCH 10/38] netns 2/2: extract net_create() Alexey Dobriyan
2009-05-22  6:30   ` David Miller
2009-05-22  4:55 ` [PATCH 11/38] nsproxy: extract create_nsproxy() Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 12/38] i386: ifdef out struct thread_struct::fs Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 13/38] x86_64: ifdef out struct thread_struct::ip Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 14/38] Remove struct mm_struct::exe_file et al Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 15/38] dcache: extract and use d_unlinked() Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 16/38] x86: ptrace debugreg checks rewrite Alexey Dobriyan
2009-05-26 23:25   ` Andrew Morton
2009-05-22  4:55 ` [PATCH 17/38] groups: move code to kernel/groups.c Alexey Dobriyan
2009-05-25  0:53   ` Serge E. Hallyn
2009-05-26 14:48   ` Serge E. Hallyn
2009-05-26 18:34     ` Alexey Dobriyan
2009-05-26 23:25       ` Serge E. Hallyn
2009-05-22  4:55 ` [PATCH 18/38] C/R: core stuff Alexey Dobriyan
2009-05-26 13:16   ` Serge E. Hallyn
2009-05-26 19:35     ` Alexey Dobriyan
2009-05-26 23:14       ` Serge E. Hallyn
2009-05-26 23:44       ` Serge E. Hallyn
2009-05-28 15:38         ` Alexey Dobriyan
2009-05-28 18:17           ` Serge E. Hallyn
2009-05-28 22:42           ` Oren Laadan
2009-05-27 18:52       ` Dave Hansen
2009-05-27 20:56       ` Oren Laadan
2009-05-27 22:17         ` Alexey Dobriyan
2009-05-27 22:40           ` Andrew Morton
2009-05-27 22:45           ` Oren Laadan [this message]
2009-05-28 15:33             ` Alexey Dobriyan
2009-05-28 22:20               ` Oren Laadan
2009-05-28 22:33                 ` Matt Helsley
2009-05-29  6:01                 ` Alexey Dobriyan
2009-05-29 17:26                   ` Dave Hansen
2009-05-27 22:25         ` Alexey Dobriyan
2009-05-27 16:28   ` Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 19/38] C/R: multiple tasks Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 20/38] C/R: i386 support Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 21/38] C/R: i386 debug registers Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 22/38] C/R: i386 xstate Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 23/38] C/R: x86_64 support Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 24/38] C/R: x86_64 debug registers Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 25/38] C/R: x86_64 xstate Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 26/38] C/R: nsproxy Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 27/38] C/R: checkpoint/restore struct uts_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 28/38] C/R: formally checkpoint/restore struct ipc_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 29/38] C/R: formally checkpoint/restore struct mnt_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 30/38] C/R: checkpoint/restore struct pid_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 31/38] C/R: formally checkpoint/restore struct net_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 32/38] C/R: checkpoint/restore struct cred Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 33/38] C/R: checkpoint/restore aux groups (structy group_info) Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 34/38] C/R: checkpoint/restore struct user Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 35/38] C/R: checkpoint/restore struct user_namespace Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 36/38] C/R: checkpoint/restore struct pid Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 37/38] C/R: checkpoint/restore opened files Alexey Dobriyan
2009-05-22  4:55 ` [PATCH 38/38] C/R: checkpoint/restart struct sighand_struct Alexey Dobriyan
2009-05-22  5:02 ` [PATCH 01/38] cred: #include init.h in cred.h Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A1DC270.8080807@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=serge@hallyn.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).