linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] signals: do_tkill: don't use tasklist_lock
@ 2008-03-07  9:52 Oleg Nesterov
  0 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2008-03-07  9:52 UTC (permalink / raw)
  To: apkm; +Cc: Eric W. Biederman, Ingo Molnar, Roland McGrath, linux-kernel

Convert do_tkill() to use rcu_read_lock() + lock_task_sighand() to avoid taking
tasklist lock.

Note that we don't return an error if lock_task_sighand() fails, we pretend the
task dies after receiving the signal. Otherwise, we should fight with the nasty
races with mt-exec without having any advantage.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 25/kernel/signal.c~2_TKILL_NO_TASKLIST	2008-03-07 10:47:02.000000000 +0300
+++ 25/kernel/signal.c	2008-03-07 12:20:34.000000000 +0300
@@ -2194,6 +2194,7 @@ static int do_tkill(int tgid, int pid, i
 	int error;
 	struct siginfo info;
 	struct task_struct *p;
+	unsigned long flags;
 
 	error = -ESRCH;
 	info.si_signo = sig;
@@ -2202,7 +2203,7 @@ static int do_tkill(int tgid, int pid, i
 	info.si_pid = task_tgid_vnr(current);
 	info.si_uid = current->uid;
 
-	read_lock(&tasklist_lock);
+	rcu_read_lock();
 	p = find_task_by_vpid(pid);
 	if (p && (tgid <= 0 || task_tgid_vnr(p) == tgid)) {
 		error = check_kill_permission(sig, &info, p);
@@ -2210,13 +2211,12 @@ static int do_tkill(int tgid, int pid, i
 		 * The null signal is a permissions and process existence
 		 * probe.  No signal is actually delivered.
 		 */
-		if (!error && sig && p->sighand) {
-			spin_lock_irq(&p->sighand->siglock);
+		if (!error && sig && lock_task_sighand(p, &flags)) {
 			error = specific_send_sig_info(sig, &info, p);
-			spin_unlock_irq(&p->sighand->siglock);
+			unlock_task_sighand(p, &flags);
 		}
 	}
-	read_unlock(&tasklist_lock);
+	rcu_read_unlock();
 
 	return error;
 }


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] signals: do_tkill: don't use tasklist_lock
  2008-03-07 21:32       ` Oleg Nesterov
@ 2008-03-07 21:51         ` Roland McGrath
  0 siblings, 0 replies; 9+ messages in thread
From: Roland McGrath @ 2008-03-07 21:51 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, Eric W. Biederman, Ingo Molnar, linux-kernel

> > This is the big problem with exec that I've cited before.  It can even
> > happen with group-wide signals that should be fatal, but avoided the
> > __group_complete_signal special fatal case.  (e.g. the thread racing with
> > the exec thread just now unblocked the signal and dequeued it.)  IIRC it
> > was the biggest reason we wanted to revisit the whole MT exec plan.
> 
> Oh. Could you clarify? Afaics, currently exec() can't miss the fatal group
> signal?

It's a while since I thought hard about the exec stuff.  Just now I was
thinking of one particular scenario.  Perhaps it can't really happen.
Here it is:

Threads A and B both block SIGTERM.  An outside process sends SIGTERM,
so it is queued in shared_pending but noone wakes.

A starts an exec.		

B unblocks SIGTERM.  
B enters get_signal_to_deliver, locks, dequeues SIGTERM, unlocks.
Now B is e.g. just before "current->flags |= PF_SIGNALED;".

A locks.  No group-exit is in yet progress.  
A does zap_other_threads, and unlocks.

B enters do_group_exit.  A group-exit is in progress, so it just exits.
SIGTERM is lost.

So I think it really can happen.  Anyway, this is just one example.
It's not so hard to think of ways to address this one (though it gets
nontrivial quick with the coredump case).  But for me it is just an
example of why we still need to step back and think over the whole
exec picture.  

As I said, let's not conflate all that with this thread about a
relatively conservative cleanup.  We can discuss that separately.
(But I think it might need to wait for a breather and time for
other dust to settle.)


Thanks,
Roland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] signals: do_tkill: don't use tasklist_lock
  2008-03-07 21:19     ` Roland McGrath
@ 2008-03-07 21:32       ` Oleg Nesterov
  2008-03-07 21:51         ` Roland McGrath
  0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2008-03-07 21:32 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Eric W. Biederman, Ingo Molnar, linux-kernel

On 03/07, Roland McGrath wrote:
>
> > Btw, a question: we are buggy or just "not perfect" ? After all, the
> > main thread actually exits although this is just linux's implementation
> > detail.
> 
> I think it's buggy.  The SIGKILL should kill the whole process.

OK.

> > Suppose that the main thread is already dead (dequeued SIGKILL), but
> > not yet released. This window is not that small. In that window (before
> > de_thread() switches pids) any private signal (even SIGKILL) sent to the
> > main thread will be silently lost.
> 
> This is the big problem with exec that I've cited before.  It can even
> happen with group-wide signals that should be fatal, but avoided the
> __group_complete_signal special fatal case.  (e.g. the thread racing with
> the exec thread just now unblocked the signal and dequeued it.)  IIRC it
> was the biggest reason we wanted to revisit the whole MT exec plan.

Oh. Could you clarify? Afaics, currently exec() can't miss the fatal group
signal?

> > We can change __group_complete_signal/zap_other_threads so that they won't
> > do sigaddset(), just signal_wake_up(). But in that case dequeue_signal()
> > and recalc_signal() should take signal_group_exit into account...
> 
> I'd like to revisit the use of "fake" SIGKILL for group exits.  That goes
> well with a rethink of MT exec.  But let's not get into all of that right now.

Yes.

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] signals: do_tkill: don't use tasklist_lock
  2008-03-07 20:13   ` Oleg Nesterov
  2008-03-07 20:23     ` Oleg Nesterov
@ 2008-03-07 21:19     ` Roland McGrath
  2008-03-07 21:32       ` Oleg Nesterov
  1 sibling, 1 reply; 9+ messages in thread
From: Roland McGrath @ 2008-03-07 21:19 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, Eric W. Biederman, Ingo Molnar, linux-kernel

> Btw, a question: we are buggy or just "not perfect" ? After all, the
> main thread actually exits although this is just linux's implementation
> detail.

I think it's buggy.  The SIGKILL should kill the whole process.

> Yep. kill_pid_info() does exactly this. Initially I was going to repeat
> this logic,

I think that precedent is enough reason to follow its example as the first
change.  We can consider more cleanups from there.

> Suppose that the main thread is already dead (dequeued SIGKILL), but
> not yet released. This window is not that small. In that window (before
> de_thread() switches pids) any private signal (even SIGKILL) sent to the
> main thread will be silently lost.

This is the big problem with exec that I've cited before.  It can even
happen with group-wide signals that should be fatal, but avoided the
__group_complete_signal special fatal case.  (e.g. the thread racing with
the exec thread just now unblocked the signal and dequeued it.)  IIRC it
was the biggest reason we wanted to revisit the whole MT exec plan.

> We can change __group_complete_signal/zap_other_threads so that they won't
> do sigaddset(), just signal_wake_up(). But in that case dequeue_signal()
> and recalc_signal() should take signal_group_exit into account...

I'd like to revisit the use of "fake" SIGKILL for group exits.  That goes
well with a rethink of MT exec.  But let's not get into all of that right now.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] signals: do_tkill: don't use tasklist_lock
  2008-03-07 20:13   ` Oleg Nesterov
@ 2008-03-07 20:23     ` Oleg Nesterov
  2008-03-07 21:19     ` Roland McGrath
  1 sibling, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2008-03-07 20:23 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Eric W. Biederman, Ingo Molnar, linux-kernel

On 03/07, Oleg Nesterov wrote:
>
> Suppose that the main thread is already dead (dequeued SIGKILL), but
> not yet released. This window is not that small. In that window (before
> de_thread() switches pids) any private signal (even SIGKILL) sent to the
> main thread will be silently lost.
> 
> To do a proper fix, we should change de_thread() so that the new leader
> also "inherits" old_leader->pending signals.

Ah, even this is not enough if we want a "perfect" fix. Because the private
SIGKILL will be "ignored" until the main thread dequeues the SIGKILL which
was sent by de_thread().

We can change __group_complete_signal/zap_other_threads so that they won't
do sigaddset(), just signal_wake_up(). But in that case dequeue_signal()
and recalc_signal() should take signal_group_exit into account...

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] signals: do_tkill: don't use tasklist_lock
  2008-03-07 19:31 ` Roland McGrath
@ 2008-03-07 20:13   ` Oleg Nesterov
  2008-03-07 20:23     ` Oleg Nesterov
  2008-03-07 21:19     ` Roland McGrath
  0 siblings, 2 replies; 9+ messages in thread
From: Oleg Nesterov @ 2008-03-07 20:13 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Eric W. Biederman, Ingo Molnar, linux-kernel

On 03/07, Roland McGrath wrote:
> 
> > Note that we don't return an error if lock_task_sighand() fails, we
> > pretend the task dies after receiving the signal. Otherwise, we should
> > fight with the nasty races with mt-exec without having any advantage.
> 
> To clarify, this is not a change from the existing behavior.
> So your change is fine regardless of this issue.
> 
> The case you have in mind is that p was the old group_leader
> being replaced by another thread that exec'd, right?

Yes.

> It is the most obscure of nits, but I think it can be wrong to drop a
> signal in this case.  If it's a fatal signal (especially SIGKILL),
> then either the thread group should be killed or the call should
> return an error.

Yes, we are not consistent here with or without this patch.

Btw, a question: we are buggy or just "not perfect" ? After all, the
main thread actually exits although this is just linux's implementation
detail.

> For the exec case, if p->sighand is cleared that means the
> release_task(leader) call at the end of de_thread started.  So by
> now, the pid has been transferred to the exec'ing thread.  If we just
> restart the lookup, it will find the new thread (or not, and we can
> return -ESRCH).

Yep. kill_pid_info() does exactly this. Initially I was going to repeat
this logic,

> I'm inclined to do that, but it certainly should be a second patch
> after this one.

but actually it doesn't buy much here.

Suppose that the main thread is already dead (dequeued SIGKILL), but
not yet released. This window is not that small. In that window (before
de_thread() switches pids) any private signal (even SIGKILL) sent to the
main thread will be silently lost.

To do a proper fix, we should change de_thread() so that the new leader
also "inherits" old_leader->pending signals.

This is certainly possible as a separate change.

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] signals: do_tkill: don't use tasklist_lock
  2008-03-07  9:58 Oleg Nesterov
  2008-03-07 10:50 ` Christoph Hellwig
@ 2008-03-07 19:31 ` Roland McGrath
  2008-03-07 20:13   ` Oleg Nesterov
  1 sibling, 1 reply; 9+ messages in thread
From: Roland McGrath @ 2008-03-07 19:31 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, Eric W. Biederman, Ingo Molnar, linux-kernel

> Convert do_tkill() to use rcu_read_lock() + lock_task_sighand() to avoid
> taking tasklist lock.

That part looks good.

> Note that we don't return an error if lock_task_sighand() fails, we
> pretend the task dies after receiving the signal. Otherwise, we should
> fight with the nasty races with mt-exec without having any advantage.

To clarify, this is not a change from the existing behavior.
So your change is fine regardless of this issue.

The case you have in mind is that p was the old group_leader
being replaced by another thread that exec'd, right?

It is the most obscure of nits, but I think it can be wrong to drop a
signal in this case.  If it's a fatal signal (especially SIGKILL),
then either the thread group should be killed or the call should
return an error.  

For the exec case, if p->sighand is cleared that means the
release_task(leader) call at the end of de_thread started.  So by
now, the pid has been transferred to the exec'ing thread.  If we just
restart the lookup, it will find the new thread (or not, and we can
return -ESRCH).

I'm inclined to do that, but it certainly should be a second patch
after this one.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] signals: do_tkill: don't use tasklist_lock
  2008-03-07  9:58 Oleg Nesterov
@ 2008-03-07 10:50 ` Christoph Hellwig
  2008-03-07 19:31 ` Roland McGrath
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2008-03-07 10:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Ingo Molnar, Roland McGrath,
	linux-kernel

On Fri, Mar 07, 2008 at 12:58:13PM +0300, Oleg Nesterov wrote:
> Note that we don't return an error if lock_task_sighand() fails, we pretend the
> task dies after receiving the signal. Otherwise, we should fight with the nasty
> races with mt-exec without having any advantage.

This should be mentioned in a comment in the code.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] signals: do_tkill: don't use tasklist_lock
@ 2008-03-07  9:58 Oleg Nesterov
  2008-03-07 10:50 ` Christoph Hellwig
  2008-03-07 19:31 ` Roland McGrath
  0 siblings, 2 replies; 9+ messages in thread
From: Oleg Nesterov @ 2008-03-07  9:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, Ingo Molnar, Roland McGrath, linux-kernel

(sorry to all, re-sending with the correct Andrew's email)

Convert do_tkill() to use rcu_read_lock() + lock_task_sighand() to avoid taking
tasklist lock.

Note that we don't return an error if lock_task_sighand() fails, we pretend the
task dies after receiving the signal. Otherwise, we should fight with the nasty
races with mt-exec without having any advantage.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 25/kernel/signal.c~2_TKILL_NO_TASKLIST	2008-03-07 10:47:02.000000000 +0300
+++ 25/kernel/signal.c	2008-03-07 12:20:34.000000000 +0300
@@ -2194,6 +2194,7 @@ static int do_tkill(int tgid, int pid, i
 	int error;
 	struct siginfo info;
 	struct task_struct *p;
+	unsigned long flags;
 
 	error = -ESRCH;
 	info.si_signo = sig;
@@ -2202,7 +2203,7 @@ static int do_tkill(int tgid, int pid, i
 	info.si_pid = task_tgid_vnr(current);
 	info.si_uid = current->uid;
 
-	read_lock(&tasklist_lock);
+	rcu_read_lock();
 	p = find_task_by_vpid(pid);
 	if (p && (tgid <= 0 || task_tgid_vnr(p) == tgid)) {
 		error = check_kill_permission(sig, &info, p);
@@ -2210,13 +2211,12 @@ static int do_tkill(int tgid, int pid, i
 		 * The null signal is a permissions and process existence
 		 * probe.  No signal is actually delivered.
 		 */
-		if (!error && sig && p->sighand) {
-			spin_lock_irq(&p->sighand->siglock);
+		if (!error && sig && lock_task_sighand(p, &flags)) {
 			error = specific_send_sig_info(sig, &info, p);
-			spin_unlock_irq(&p->sighand->siglock);
+			unlock_task_sighand(p, &flags);
 		}
 	}
-	read_unlock(&tasklist_lock);
+	rcu_read_unlock();
 
 	return error;
 }


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-03-07 21:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-07  9:52 [PATCH] signals: do_tkill: don't use tasklist_lock Oleg Nesterov
2008-03-07  9:58 Oleg Nesterov
2008-03-07 10:50 ` Christoph Hellwig
2008-03-07 19:31 ` Roland McGrath
2008-03-07 20:13   ` Oleg Nesterov
2008-03-07 20:23     ` Oleg Nesterov
2008-03-07 21:19     ` Roland McGrath
2008-03-07 21:32       ` Oleg Nesterov
2008-03-07 21:51         ` Roland McGrath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).