All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: Sukadev Bhattiprolu
	<sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: [PATCH][usercr]: Ghost tasks must be detached
Date: Tue, 08 Feb 2011 22:35:20 -0500	[thread overview]
Message-ID: <4D520B78.9020300@cs.columbia.edu> (raw)
In-Reply-To: <20110209020942.GA5339-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>



On 02/08/2011 09:09 PM, Sukadev Bhattiprolu wrote:
> Oren Laadan [orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org] wrote:
> | 
> | 
> | On 02/05/2011 04:40 PM, Sukadev Bhattiprolu wrote:
> | > Oren Laadan [orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org] wrote:
> | > | Suka,
> | > | 
> | > | This patch - and the corresponding kernel patch - are wrong
> | > 
> | > Ah, I see that now.
> | > 
> | > But am not sure about the kernel part though. We were getting a crash
> | > reliably (with older kernels) because of the ->exit_signal = -1 in
> | > do_ghost_task().
> | 
> | Are we still getting it with 2.6.37 ?
> 
> I am not currently getting the crash on 2.6.37 - I thought it was due to
> the following commit which removed the check for task_detached() in
> do_wait_thread().
> 
> 	commit 9cd80bbb07fcd6d4d037fad4297496d3b132ac6b
> 	Author: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 	Date:   Thu Dec 17 15:27:15 2009 -0800
> 
> But if that is true, I need to investigate why Louis Rilling was getting
> the crash in Jun 2010 - which he tried to fix here:
> 
> 	http://lkml.org/lkml/2010/6/16/295

I see. So basically there is a kerenl bug that can be potentially
exposed by the c/r code. Therefore, we need to fix the kernel bug...
(and until such a fix makes it to mainline, we'll add it as part of
the linux-cr patchset).

> 
> Even if we are not currently not getting the crash, I think user-space
> actions can result in the container-init being unable to forcibly kill
> all its children and exit.
> 
> Eg: if ghost tasks are pushed into a child pid namespace (by intentionally
> setting ->piddepth in usercr/restart.c), we can have a situation where the
> ghost task exits silently, the parent (i.e container-init can be left hanging).

I don't quite understand what you mean here. Basically, the ghost tasks
are only alive _during_ restart and are gone when the restart completes.
Therefore they cannot affect whether the init task of the new pidns will
hang or terminate -- that init task has no knowledge of ghost tasks.

Let's consider the two possible scenarios:
(1) container restart (that includes the container init)
(2) subtree restart in a new pidns (that does not include the init task,
and instead user-cr provides an init process to hold the new pidns alive).

Case 1: the (restarted) init task was part of the checkpoint. Typically
it would "wait()" in a loop for children until it gets ECHILD and then
exits (the container). Ghost tasks are not a factor here.

Case 2: the (injected) init task was not part of the checkpoint. It does
the same as the typical init in case 1: loop until wait() says no more
children, then exits. In this case, there will be at least one child of
that init task, because at least one task was restarted ... Typically,
when that child exits, our injected init task will exit. Again, ghost
tasks do no participate.

> 
> It can be argued that the incorrect changes in usercr code result in the
> application hang.
> 
> But pid namespace is supposed to guarantee that if a container-init is
> terminated, it will take the pid namespace down. But some userspace 
> actions can result in kill -9 of container-init leaving the container-init
> hung forever.

So I guess I don't quite understand the concern. Can you describe a
concrete example ?

> 
> | > 
> | > One fix I was watching for was Eric Biederman's 
> | > 
> | > 	http://lkml.org/lkml/2010/7/12/213
> | > 
> | > which AFAICT has not been merged yet.
> | 
> | If we need it and it isn't in mainline (any reason why ?) then
> | we can just add it to our linux-cr tree, as a preparatory patch.
> | 
> | > 
> | > Was there another change to 2.6.37 that would prevent the crash ?
> | 
> | I don't know whether *that* crash still happens in 2.6.37 - 
> | because I still didn't test it with that kernel line back.
> | (Actually, I never experienced that crash here even with
> | earlier kernels).
> 
> Yes, it needed some "accidental" usercr change to expose the crash :-)
> 
> (I will try to send a patch to existing usercr and a test case to repro
> this problem)
> 

Thanks,

Oren.

  parent reply	other threads:[~2011-02-09  3:35 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-11  3:35 [PATCH][usercr]: Ghost tasks must be detached Sukadev Bhattiprolu
     [not found] ` <20101211033548.GA12584-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2011-01-11  1:51   ` Oren Laadan
     [not found]     ` <4D2BB78A.9090701-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2011-02-05 18:55       ` Oren Laadan
     [not found]         ` <4D4D9D1B.3000209-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2011-02-05 21:40           ` Sukadev Bhattiprolu
     [not found]             ` <20110205214032.GA12944-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2011-02-05 22:02               ` Oren Laadan
     [not found]                 ` <4D4DC90B.3010103-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2011-02-05 22:33                   ` Oren Laadan
2011-02-09  2:09                   ` Sukadev Bhattiprolu
     [not found]                     ` <20110209020942.GA5339-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2011-02-09  3:35                       ` Oren Laadan [this message]
     [not found]                         ` <4D520B78.9020300-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2011-02-10  2:44                           ` Sukadev Bhattiprolu
     [not found]                             ` <20110210024430.GA23167-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2011-02-10  3:53                               ` Oren Laadan
     [not found]                                 ` <4D536154.8000900-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2011-02-10  6:17                                   ` Sukadev Bhattiprolu
     [not found]                                     ` <20110210061730.GA25432-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2011-02-10 14:56                                       ` Oren Laadan
     [not found]                                         ` <4D53FC9C.1050405-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2011-02-10 17:42                                           ` Sukadev Bhattiprolu
2011-02-16 20:10                                           ` Sukadev Bhattiprolu
     [not found]                                             ` <20110216201019.GA27698-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2011-02-17 15:21                                               ` Louis Rilling
     [not found]                                                 ` <20110217152116.GM518-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2011-02-21 20:40                                                   ` Sukadev Bhattiprolu
     [not found]                                                     ` <20110221204058.GC14377-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2011-02-22 10:28                                                       ` Louis Rilling
2011-02-09 12:01                       ` Louis Rilling
     [not found]                         ` <20110209120100.GD13323-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2011-02-09 12:18                           ` Oren Laadan
     [not found]                             ` <4D528629.7030905-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2011-02-09 12:35                               ` Louis Rilling
     [not found]                                 ` <20110209123550.GG13323-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2011-02-09 12:37                                   ` Louis Rilling
2011-02-09 19:02                           ` Sukadev Bhattiprolu
     [not found]                             ` <20110209190216.GA17051-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2011-02-10 10:23                               ` Louis Rilling
     [not found]                                 ` <20110210102312.GC6360-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2011-02-10 17:54                                   ` Sukadev Bhattiprolu
     [not found]                                     ` <20110210175409.GB1025-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2011-02-10 18:04                                       ` Louis Rilling
     [not found]                                         ` <20110210180433.GI6360-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2011-02-10 22:31                                           ` Sukadev Bhattiprolu
2011-02-25  7:58   ` Sukadev Bhattiprolu
     [not found]     ` <20110225075808.GC24361-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2011-02-25 15:46       ` Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D520B78.9020300@cs.columbia.edu \
    --to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.