All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fix handling of SIGCHILD from reaped child
@ 2007-02-20  9:32 KAMEZAWA Hiroyuki
  2007-02-20 14:22 ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-02-20  9:32 UTC (permalink / raw)
  To: LKML; +Cc: oleg, mingo, akpm

SUSv3 says
==
if SIGCHLD is blocked, if wait() or waitpid() return because the status of a
child process is available, any pending SIGCHLD signal shall be cleared unless
the status of another child process is available.
==

This patch tries to implement above functionality (clear SIGCHLD from reaped 
process.)
please review...

works well on 2.6.20/ia64/NUMA environment and passed my easy test.

Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Index: linux-2.6.20-devel/kernel/signal.c
===================================================================
--- linux-2.6.20-devel.orig/kernel/signal.c
+++ linux-2.6.20-devel/kernel/signal.c
@@ -382,7 +382,7 @@ unblock_all_signals(void)
 	spin_unlock_irqrestore(&current->sighand->siglock, flags);
 }
 
-static int collect_signal(int sig, struct sigpending *list, siginfo_t *info)
+static int collect_signal(int sig, struct sigpending *list, siginfo_t *info, pid_t checkpid)
 {
 	struct sigqueue *q, *first = NULL;
 	int still_pending = 0;
@@ -394,13 +394,24 @@ static int collect_signal(int sig, struc
 	 * Collect the siginfo appropriate to this signal.  Check if
 	 * there is another siginfo for the same signal.
 	*/
-	list_for_each_entry(q, &list->list, list) {
-		if (q->info.si_signo == sig) {
-			if (first) {
-				still_pending = 1;
-				break;
+	if (unlikely(checkpid)) {
+		list_for_each_entry(q, &list->list, list) {
+			if (q->info.si_signo == sig) {
+			    if (q->info.si_pid == checkpid)
+					first = q;
+				else
+					still_pending = 1;
+			}
+		}
+	} else {
+		list_for_each_entry(q, &list->list, list) {
+			if (q->info.si_signo == sig) {
+				if (first) {
+					still_pending = 1;
+					break;
+				}
+				first = q;
 			}
-			first = q;
 		}
 	}
 	if (first) {
@@ -415,7 +426,8 @@ static int collect_signal(int sig, struc
 		   a fast-pathed signal or we must have been
 		   out of queue space.  So zero out the info.
 		 */
-		sigdelset(&list->signal, sig);
+		if (!still_pending)
+			sigdelset(&list->signal, sig);
 		info->si_signo = sig;
 		info->si_errno = 0;
 		info->si_code = 0;
@@ -440,7 +452,7 @@ static int __dequeue_signal(struct sigpe
 			}
 		}
 
-		if (!collect_signal(sig, pending, info))
+		if (!collect_signal(sig, pending, info, 0))
 			sig = 0;
 	}
 
@@ -493,6 +505,24 @@ int dequeue_signal(struct task_struct *t
 }
 
 /*
+ * only called by do_wait() when it reaps its child.
+ * This function clears pending SIGCHLD from the reaped child.
+ * rerely does real job...
+ * Becasue we only check SIGCHLD, just checks shared_pending.
+ */
+void clear_stale_sigchild(struct task_struct *tsk, pid_t child_id)
+{
+	siginfo_t info;
+	unsigned long flags;
+	spin_lock_irqsave(&tsk->sighand->siglock, flags);
+	collect_signal(SIGCHLD, &tsk->signal->shared_pending,  &info, child_id);
+	spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
+	return;
+}
+
+
+
+/*
  * Tell a process that it has a new active signal..
  *
  * NOTE! we rely on the previous spin_lock to
Index: linux-2.6.20-devel/kernel/exit.c
===================================================================
--- linux-2.6.20-devel.orig/kernel/exit.c
+++ linux-2.6.20-devel/kernel/exit.c
@@ -1252,8 +1252,12 @@ static int wait_task_zombie(struct task_
 		}
 		write_unlock_irq(&tasklist_lock);
 	}
-	if (p != NULL)
+	if (p != NULL) {
 		release_task(p);
+		/* if we received sigchild from "p" and p is released,
+		   we remove sigchild from it. */
+		clear_stale_sigchild(current, retval);
+	}
 	BUG_ON(!retval);
 	return retval;
 }


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fix handling of SIGCHILD from reaped child
  2007-02-20  9:32 [PATCH] fix handling of SIGCHILD from reaped child KAMEZAWA Hiroyuki
@ 2007-02-20 14:22 ` Oleg Nesterov
  2007-02-20 15:25   ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2007-02-20 14:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: LKML, mingo, akpm, Roland McGrath

On 02/20, KAMEZAWA Hiroyuki wrote:
>
> SUSv3 says
> ==
> if SIGCHLD is blocked, if wait() or waitpid() return because the status of a
> child process is available, any pending SIGCHLD signal shall be cleared unless
> the status of another child process is available.
> ==

Ingo, Roland, should we implement this?

I must admit, I don't understand the text above, "blocked" is per-thread, but
wait() is process wide (any sub-thread can reap a dead child).

> -static int collect_signal(int sig, struct sigpending *list, siginfo_t *info)
> +static int collect_signal(int sig, struct sigpending *list, siginfo_t *info, pid_t checkpid)
>  {
> -	list_for_each_entry(q, &list->list, list) {
> -		if (q->info.si_signo == sig) {
> -			if (first) {
> -				still_pending = 1;
> -				break;
> +	if (unlikely(checkpid)) {
> +		list_for_each_entry(q, &list->list, list) {
> +			if (q->info.si_signo == sig) {
> +			    if (q->info.si_pid == checkpid)
> +					first = q;
> +				else
> +					still_pending = 1;
> +			}
> +		}
> +	} else {
> +		list_for_each_entry(q, &list->list, list) {
> +			if (q->info.si_signo == sig) {
> +				if (first) {
> +					still_pending = 1;
> +					break;
> +				}
> +				first = q;
>  			}
> -			first = q;

I'd suggest to make a separate function, but not complicate collect_signal().

> --- linux-2.6.20-devel.orig/kernel/exit.c
> +++ linux-2.6.20-devel/kernel/exit.c
> @@ -1252,8 +1252,12 @@ static int wait_task_zombie(struct task_
>  		}
>  		write_unlock_irq(&tasklist_lock);
>  	}
> -	if (p != NULL)
> +	if (p != NULL) {
>  		release_task(p);
> +		/* if we received sigchild from "p" and p is released,
> +		   we remove sigchild from it. */

current may be ptracer, not a parent. Should be ok, clear_stale_sigchild(pid)
can't have a false positive (until we have namespace for pid_t), but the comment
is misleading a bit.

> +		clear_stale_sigchild(current, retval);

But we are not checking that SIGCHLD is blocked?

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fix handling of SIGCHILD from reaped child
  2007-02-20 14:22 ` Oleg Nesterov
@ 2007-02-20 15:25   ` KAMEZAWA Hiroyuki
  2007-02-20 17:20     ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-02-20 15:25 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, mingo, akpm, roland

On Tue, 20 Feb 2007 17:22:57 +0300
Oleg Nesterov <oleg@tv-sign.ru> wrote:

>
> I'd suggest to make a separate function, but not complicate collect_signal().
>
okay. I'll try again if people admit me to go ahead. 

> > --- linux-2.6.20-devel.orig/kernel/exit.c
> > +++ linux-2.6.20-devel/kernel/exit.c
> > @@ -1252,8 +1252,12 @@ static int wait_task_zombie(struct task_
> >  		}
> >  		write_unlock_irq(&tasklist_lock);
> >  	}
> > -	if (p != NULL)
> > +	if (p != NULL) {
> >  		release_task(p);
> > +		/* if we received sigchild from "p" and p is released,
> > +		   we remove sigchild from it. */
> 
> current may be ptracer, not a parent. Should be ok, clear_stale_sigchild(pid)
> can't have a false positive (until we have namespace for pid_t), but the comment
> is misleading a bit.
> 
I'll rewrite and make this clearer.

> > +		clear_stale_sigchild(current, retval);
> 
> But we are not checking that SIGCHLD is blocked?
> 
I'm sorry if I don't read SUSv3 correctly. SUSv3 doesn't define how we should
do if SIGCHLD is not blocked.(so I don't check not-blocked case.)

IMHO, user's sig-child-handler is tend to call wait()/waitpid() and expects
successful return. So removing stale signal here may be good.

If this breaks assumptions of applications on Linux, I'll not go eagerly.

Thanks,
-Kame






^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fix handling of SIGCHILD from reaped child
  2007-02-20 15:25   ` KAMEZAWA Hiroyuki
@ 2007-02-20 17:20     ` Oleg Nesterov
  2007-02-20 23:10       ` Roland McGrath
  2007-02-21  0:42       ` KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 7+ messages in thread
From: Oleg Nesterov @ 2007-02-20 17:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, mingo, akpm, roland, Michael Kerrisk

On 02/21, KAMEZAWA Hiroyuki wrote:
>
> On Tue, 20 Feb 2007 17:22:57 +0300
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
> 
> >
> > I'd suggest to make a separate function, but not complicate collect_signal().
> >
> okay. I'll try again if people admit me to go ahead.

Yes, it would be nice to know what maintainers think. This is a user visible
change, even if good.

> > > +		clear_stale_sigchild(current, retval);
> > 
> > But we are not checking that SIGCHLD is blocked?
> > 
> I'm sorry if I don't read SUSv3 correctly. SUSv3 doesn't define how we should
> do if SIGCHLD is not blocked.(so I don't check not-blocked case.)

Probably it is me who misunderstands SUSv3. Could you point me the reference
to authoritative document? My understanding: if blocked AND wait() succeeds.

> IMHO, user's sig-child-handler is tend to call wait()/waitpid() and expects
> successful return. So removing stale signal here may be good.

Yes. But sig-child-handler should do

	while (wait() >= 0)
		....

anyway, because SIGCHLD is not a realtime signal.

> If this breaks assumptions of applications on Linux, I'll not go eagerly.

I just don't know... (Michael Kerrisk cc'ed).

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fix handling of SIGCHILD from reaped child
  2007-02-20 17:20     ` Oleg Nesterov
@ 2007-02-20 23:10       ` Roland McGrath
  2007-02-21  0:51         ` KAMEZAWA Hiroyuki
  2007-02-21  0:42       ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 7+ messages in thread
From: Roland McGrath @ 2007-02-20 23:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: KAMEZAWA Hiroyuki, linux-kernel, mingo, akpm, Michael Kerrisk

I'm usually the stickler for anal POSIX compliance, but this is one thing
that I did notice a while ago, realized Linux had never done it, and
decided I didn't care.

This is one of those parts of the standard that was originally written in a
single-threaded process frame of mind, and was never amended or clarified
later when multi-threaded semantics got well-specified in the standard.

It's clear what the requirement is trying to achieve.  It lets you have a
SIGCHLD signal handler that calls wait, and be sure its call never blocks,
as long as you block SIGCHLD while making any other wait calls.  But Linux
has never done this even for single-threaded processes, so existing
application code already has to cope with the race.  (Anyway, this
guarantee is not all that helpful if you have more than one child and so
might be running the handler once after SIGCHLD was generated more than
once.  You can't just use WNOHANG in your handler because you aren't
actually guaranteed that the zombie is ready already when you get the SIGCHLD.)

This guarantee is not of any use when there might be other threads with
SIGCHLD unblocked or other threads that call wait* functions (calls that
draw from the same pool of PIDs anyway).  There can always be another
thread that just dequeued the SIGCHLD but hasn't gotten into its handler
yet, so clearing the pending SIGCHLD doesn't really cover it.

Unhelpful as it is the multithreaded context, I think it's clear that the
standard's wording means "when SIGCHLD is blocked by the thread calling
wait", but in fact as to being a guarantee it's only meaningful when
SIGCHLD is blocked by all threads.  The mention of blocking the signal is
only there to remind you that well-defined semantics about a "pending"
signal only ever apply when the signal is blocked.  If any thread has it
unblocked, then "pending" is an ephemeral condition not necessarily
observable at all--as soon as you could say it's pending, some such thread
might be handling it.

The "if there is another child available" test is rather ugly to do
correctly now.  It would be less so if the children list moved into
signal_struct and was just shared directly.  The most "correct" it can get
is still not all that useful in a multithreaded context.  So I'm pretty
ambivalent about bothering with this.


Thanks,
Roland


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fix handling of SIGCHILD from reaped child
  2007-02-20 17:20     ` Oleg Nesterov
  2007-02-20 23:10       ` Roland McGrath
@ 2007-02-21  0:42       ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-02-21  0:42 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, mingo, akpm, roland, mtk-manpages

On Tue, 20 Feb 2007 20:20:49 +0300
Oleg Nesterov <oleg@tv-sign.ru> wrote:
> > > > +		clear_stale_sigchild(current, retval);
> > > 
> > > But we are not checking that SIGCHLD is blocked?
> > > 
> > I'm sorry if I don't read SUSv3 correctly. SUSv3 doesn't define how we should
> > do if SIGCHLD is not blocked.(so I don't check not-blocked case.)
> 
> Probably it is me who misunderstands SUSv3. Could you point me the reference
> to authoritative document? My understanding: if blocked AND wait() succeeds.
> 
I read this:
http://www.opengroup.org/onlinepubs/009695399/


> > IMHO, user's sig-child-handler is tend to call wait()/waitpid() and expects
> > successful return. So removing stale signal here may be good.
> 
> Yes. But sig-child-handler should do
> 
> 	while (wait() >= 0)
> 		....
> 
> anyway, because SIGCHLD is not a realtime signal.
> 

Ah...it looks I should explation why I found this more.

A user (who is migrated from Solaris) met following situation.
==(single threaded program.)

pid1 = fork();
if (!pid1) {
	do_something_forever();
}
....set SIGCHLD handler here to catch pid1's error-exit....

if (!pid2) {
	do_something_in_shortterm();
}
Block SIGCHLD
ret = waitpid(pid2, hoge, hoge); // wait for pid2
UNBlock SIGCHLD
==

And SIGCHLD handler didn't use WNOHANG. 
I asked him to fix his program. He agreed.(So, no problem now.)

While our problem was fixed, it seems Linux doesn't meet spec.(SUSv3)
So I posted. 
But this is rare situation and this fix makes codes ugly....


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fix handling of SIGCHILD from reaped child
  2007-02-20 23:10       ` Roland McGrath
@ 2007-02-21  0:51         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-02-21  0:51 UTC (permalink / raw)
  To: Roland McGrath; +Cc: oleg, linux-kernel, mingo, akpm, mtk-manpages

On Tue, 20 Feb 2007 15:10:07 -0800 (PST)
Roland McGrath <roland@redhat.com> wrote:

> I'm usually the stickler for anal POSIX compliance, but this is one thing
> that I did notice a while ago, realized Linux had never done it, and
> decided I didn't care.
> 
Okay, I don't think this is a big trouble.

> This is one of those parts of the standard that was originally written in a
> single-threaded process frame of mind, and was never amended or clarified
> later when multi-threaded semantics got well-specified in the standard.
> 
> It's clear what the requirement is trying to achieve.  It lets you have a
> SIGCHLD signal handler that calls wait, and be sure its call never blocks,
> as long as you block SIGCHLD while making any other wait calls.  But Linux
> has never done this even for single-threaded processes, so existing
> application code already has to cope with the race.  (Anyway, this
> guarantee is not all that helpful if you have more than one child and so
> might be running the handler once after SIGCHLD was generated more than
> once.  You can't just use WNOHANG in your handler because you aren't
> actually guaranteed that the zombie is ready already when you get the SIGCHLD.)
> 
> This guarantee is not of any use when there might be other threads with
> SIGCHLD unblocked or other threads that call wait* functions (calls that
> draw from the same pool of PIDs anyway).  There can always be another
> thread that just dequeued the SIGCHLD but hasn't gotten into its handler
> yet, so clearing the pending SIGCHLD doesn't really cover it.
> 
> Unhelpful as it is the multithreaded context, I think it's clear that the
> standard's wording means "when SIGCHLD is blocked by the thread calling
> wait", but in fact as to being a guarantee it's only meaningful when
> SIGCHLD is blocked by all threads.  The mention of blocking the signal is
> only there to remind you that well-defined semantics about a "pending"
> signal only ever apply when the signal is blocked.  If any thread has it
> unblocked, then "pending" is an ephemeral condition not necessarily
> observable at all--as soon as you could say it's pending, some such thread
> might be handling it.
> 
> The "if there is another child available" test is rather ugly to do
> correctly now.  It would be less so if the children list moved into
> signal_struct and was just shared directly.  The most "correct" it can get
> is still not all that useful in a multithreaded context.  So I'm pretty
> ambivalent about bothering with this.
> 
Hmm, okay. It seems a good workaround to say "please use WNOHANG always in
your SIGCHLD handler's wait*() call" 

My only concerns is that LSB people say wait/waitpid meets SUSv3. 

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-02-21  0:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-20  9:32 [PATCH] fix handling of SIGCHILD from reaped child KAMEZAWA Hiroyuki
2007-02-20 14:22 ` Oleg Nesterov
2007-02-20 15:25   ` KAMEZAWA Hiroyuki
2007-02-20 17:20     ` Oleg Nesterov
2007-02-20 23:10       ` Roland McGrath
2007-02-21  0:51         ` KAMEZAWA Hiroyuki
2007-02-21  0:42       ` KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.