All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH][cr]: Mark ghost tasks as detached earlier
@ 2010-10-30  7:01 Sukadev Bhattiprolu
       [not found] ` <20101030070151.GA4850-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Sukadev Bhattiprolu @ 2010-10-30  7:01 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Dan Smith, Containers

From ce9dd2fc7332597d46872f3f8c52ac0806f381d1 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Date: Fri, 29 Oct 2010 23:16:10 -0700
Subject: [PATCH 1/1] Mark ghost task as detached earlier

During restart() of an application, ghost tasks are be marked as "detached"
so they don't send a SIGCHLD to their parent when they exit. But this is
currently being done a little too late in the "life" of the ghost and
ends up confusing the container-init.

Suppose a ghost child of the container-init is waiting in do_ghost_task().
It is not yet detached. If the container-init is terminated for some
reason, the container-init sends SIGKILL to its children (including this
ghost). The container-init then waits for the un-detached children to
exit, expecting to be notified via SIGCHLD.

When the ghost-child receives the SIGKILL, it wakes up and marks itself
detached and proceeds to exit. Since it is now detached, it will not
notify the parent, thus leaving the container-init blocked indefintely.

Some background:

When running some tests on the C/R code we ran into the problem of the
container-init not waiting for detached processes. This problem was
extensively discssued here:

	http://lkml.org/lkml/2010/6/16/295

Eric Biederman had a fix for the problem:

	http://lkml.org/lkml/2010/7/12/213

When I applied this fix to the C/R tree and repeated the tests, I ran
into the above issue of the container-init hanging. Marking the ghost
as detached earlier seems to fix the confusion in the container-init.

Oren, is there a reason not to mark the ghost task detached earlier
than is currently being done ?

Signed-off-by: Sukadev Bhattiprolu (sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org)
---
 kernel/checkpoint/restart.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/checkpoint/restart.c b/kernel/checkpoint/restart.c
index 17270b8..95789c0 100644
--- a/kernel/checkpoint/restart.c
+++ b/kernel/checkpoint/restart.c
@@ -953,6 +953,7 @@ static int do_ghost_task(void)
 	struct ckpt_ctx *ctx;
 	int ret;
 
+	current->exit_signal = -1;
 	ctx = wait_checkpoint_ctx();
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
@@ -972,7 +973,6 @@ static int do_ghost_task(void)
 	if (ret < 0)
 		ckpt_err(ctx, ret, "ghost restart failed\n");
 
-	current->exit_signal = -1;
 	restore_debug_exit(ctx);
 	ckpt_ctx_put(ctx);
 	do_exit(0);
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC][PATCH][cr]: Mark ghost tasks as detached earlier
       [not found] ` <20101030070151.GA4850-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-11-01 18:02   ` Oren Laadan
       [not found]     ` <4CCF00BA.2030500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Oren Laadan @ 2010-11-01 18:02 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: Dan Smith, Containers



On 10/30/2010 03:01 AM, Sukadev Bhattiprolu wrote:
> From ce9dd2fc7332597d46872f3f8c52ac0806f381d1 Mon Sep 17 00:00:00 2001
> From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Date: Fri, 29 Oct 2010 23:16:10 -0700
> Subject: [PATCH 1/1] Mark ghost task as detached earlier
> 
> During restart() of an application, ghost tasks are be marked as "detached"
> so they don't send a SIGCHLD to their parent when they exit. But this is
> currently being done a little too late in the "life" of the ghost and
> ends up confusing the container-init.
> 
> Suppose a ghost child of the container-init is waiting in do_ghost_task().
> It is not yet detached. If the container-init is terminated for some
> reason, the container-init sends SIGKILL to its children (including this
> ghost). The container-init then waits for the un-detached children to
> exit, expecting to be notified via SIGCHLD.
> 
> When the ghost-child receives the SIGKILL, it wakes up and marks itself
> detached and proceeds to exit. Since it is now detached, it will not
> notify the parent, thus leaving the container-init blocked indefintely.
> 
> Some background:
> 
> When running some tests on the C/R code we ran into the problem of the
> container-init not waiting for detached processes. This problem was
> extensively discssued here:
> 
> 	http://lkml.org/lkml/2010/6/16/295
> 
> Eric Biederman had a fix for the problem:
> 
> 	http://lkml.org/lkml/2010/7/12/213
> 
> When I applied this fix to the C/R tree and repeated the tests, I ran
> into the above issue of the container-init hanging. Marking the ghost
> as detached earlier seems to fix the confusion in the container-init.

Let me see if I get this correctly: first, the container-init calls
wait(), then goes to sleep because children aren't ready, then the
ghost changes state to detached and exits, but the container-init is
not notified so not woken up. Is that correct ?

If so, then the problem is that changing from non-detached to detached
is racy. Therefore, just doing it earlier may still not be correct,
because the race (theoretically) still exists. It just makes it less
likely to occur.

If this analysis is correct, then I can think of two options:

1) when creating the ghost tasks, create them as detached from
user-space, and they won't need to be waited-for not become
detached.

- or -

2) in the kernel, when changing from non-detached to detached, we
should also try to wake up the parent if it is sleeping in wait,
to force the notify - the parent will wake up, check again and will
see no remaining children.

Thoughts ?

> 
> Oren, is there a reason not to mark the ghost task detached earlier
> than is currently being done ?

I don't think so, but let me sleep on it :)

Oren.


> 
> Signed-off-by: Sukadev Bhattiprolu (sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org)
> ---
>  kernel/checkpoint/restart.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/checkpoint/restart.c b/kernel/checkpoint/restart.c
> index 17270b8..95789c0 100644
> --- a/kernel/checkpoint/restart.c
> +++ b/kernel/checkpoint/restart.c
> @@ -953,6 +953,7 @@ static int do_ghost_task(void)
>  	struct ckpt_ctx *ctx;
>  	int ret;
>  
> +	current->exit_signal = -1;
>  	ctx = wait_checkpoint_ctx();
>  	if (IS_ERR(ctx))
>  		return PTR_ERR(ctx);
> @@ -972,7 +973,6 @@ static int do_ghost_task(void)
>  	if (ret < 0)
>  		ckpt_err(ctx, ret, "ghost restart failed\n");
>  
> -	current->exit_signal = -1;
>  	restore_debug_exit(ctx);
>  	ckpt_ctx_put(ctx);
>  	do_exit(0);

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC][PATCH][cr]: Mark ghost tasks as detached earlier
       [not found]     ` <4CCF00BA.2030500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-11-02 15:57       ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 3+ messages in thread
From: Sukadev Bhattiprolu @ 2010-11-02 15:57 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Dan Smith, Containers

Oren Laadan [orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org] wrote:
| 
| 
| On 10/30/2010 03:01 AM, Sukadev Bhattiprolu wrote:
| Let me see if I get this correctly: first, the container-init calls
| wait(), then goes to sleep because children aren't ready, then the
| ghost changes state to detached and exits, but the container-init is
| not notified so not woken up. Is that correct ?
| 
| If so, then the problem is that changing from non-detached to detached
| is racy. Therefore, just doing it earlier may still not be correct,
| because the race (theoretically) still exists. It just makes it less
| likely to occur.

True. I was trying to understand if there was a need to mark it detached
that late.

| 
| If this analysis is correct, then I can think of two options:
| 
| 1) when creating the ghost tasks, create them as detached from
| user-space, and they won't need to be waited-for not become
| detached.

Yes and if we remove the assignment to ->exit_signal in do_ghost_task(),
we should be fine. This would be a better option than sending a signal
to the parent.

Sukadev

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-11-02 15:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-30  7:01 [RFC][PATCH][cr]: Mark ghost tasks as detached earlier Sukadev Bhattiprolu
     [not found] ` <20101030070151.GA4850-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-11-01 18:02   ` Oren Laadan
     [not found]     ` <4CCF00BA.2030500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-11-02 15:57       ` Sukadev Bhattiprolu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.