linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Additional/catchup RCU signal fixes for -mm
@ 2005-11-05  1:36 Paul E. McKenney
  2005-11-05 16:32 ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2005-11-05  1:36 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, dipankar, mingo, suzannew, oleg

Hello!

Additional RCU signal fixes to cover a race with exec() and to add
some rcu_dereference()s from earlier patches.

Signed-off-by: <paulmck@us.ibm.com>

---

 signal.c |   28 +++++++++++++++-------------
 1 files changed, 15 insertions(+), 13 deletions(-)

diff -urpNa -X dontdiff linux-2.6.14-mm0/kernel/signal.c linux-2.6.14-mm0-fix/kernel/signal.c
--- linux-2.6.14-mm0/kernel/signal.c	2005-11-04 14:37:18.000000000 -0800
+++ linux-2.6.14-mm0-fix/kernel/signal.c	2005-11-04 17:23:40.000000000 -0800
@@ -337,7 +337,7 @@ void exit_sighand(struct task_struct *ts
 	write_lock_irq(&tasklist_lock);
 	rcu_read_lock();
 	if (tsk->sighand != NULL) {
-		struct sighand_struct *sighand = tsk->sighand;
+		struct sighand_struct *sighand = rcu_dereference(tsk->sighand);
 		spin_lock(&sighand->siglock);
 		__exit_sighand(tsk);
 		spin_unlock(&sighand->siglock);
@@ -352,13 +352,14 @@ void exit_sighand(struct task_struct *ts
 void __exit_signal(struct task_struct *tsk)
 {
 	struct signal_struct * sig = tsk->signal;
-	struct sighand_struct * sighand = tsk->sighand;
+	struct sighand_struct * sighand;
 
 	if (!sig)
 		BUG();
 	if (!atomic_read(&sig->count))
 		BUG();
 	rcu_read_lock();
+	sighand = rcu_dereference(tsk->sighand);
 	spin_lock(&sighand->siglock);
 	posix_cpu_timers_exit(tsk);
 	if (atomic_dec_and_test(&sig->count)) {
@@ -1100,7 +1101,7 @@ void zap_other_threads(struct task_struc
 }
 
 /*
- * Must be called with the tasklist_lock held for reading!
+ * Must be called under rcu_read_lock() or with tasklist_lock read-held.
  */
 int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
 {
@@ -1386,7 +1387,7 @@ send_sigqueue(int sig, struct sigqueue *
 {
 	unsigned long flags;
 	int ret = 0;
-	struct sighand_struct *sh = p->sighand;
+	struct sighand_struct *sh;
 
 	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
 
@@ -1405,7 +1406,15 @@ send_sigqueue(int sig, struct sigqueue *
 		goto out_err;
 	}
 
+retry:
+	sh = rcu_dereference(p->sighand);
+
 	spin_lock_irqsave(&sh->siglock, flags);
+	if (p->sighand != sh) {
+		/* We raced with exec() in a multithreaded process... */
+		spin_unlock_irqrestore(&sh->siglock, flags);
+		goto retry;
+	}
 
 	/*
 	 * We do the check here again to handle the following scenario:
@@ -1464,15 +1473,8 @@ send_group_sigqueue(int sig, struct sigq
 
 	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
 
-	while (!read_trylock(&tasklist_lock)) {
-		if (!p->sighand)
-			return -1;
-		cpu_relax();
-	}
-	if (unlikely(!p->sighand)) {
-		ret = -1;
-		goto out_err;
-	}
+	read_lock(&tasklist_lock);
+	/* Since it_lock is held, p->sighand cannot be NULL. */
 	spin_lock_irqsave(&p->sighand->siglock, flags);
 	handle_stop_signal(sig, p);
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Additional/catchup RCU signal fixes for -mm
  2005-11-05  1:36 [PATCH] Additional/catchup RCU signal fixes for -mm Paul E. McKenney
@ 2005-11-05 16:32 ` Oleg Nesterov
  2005-11-06  1:00   ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2005-11-05 16:32 UTC (permalink / raw)
  To: paulmck; +Cc: akpm, linux-kernel, dipankar, mingo, suzannew

"Paul E. McKenney" wrote:
>
> @@ -1386,7 +1387,7 @@ send_sigqueue(int sig, struct sigqueue *
>  {
>  	unsigned long flags;
>  	int ret = 0;
> -	struct sighand_struct *sh = p->sighand;
> +	struct sighand_struct *sh;
>
>  	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
>
> @@ -1405,7 +1406,15 @@ send_sigqueue(int sig, struct sigqueue *
>  		goto out_err;
>  	}
>
> +retry:
> +	sh = rcu_dereference(p->sighand);
> +
>  	spin_lock_irqsave(&sh->siglock, flags);
> +	if (p->sighand != sh) {
> +		/* We raced with exec() in a multithreaded process... */
> +		spin_unlock_irqrestore(&sh->siglock, flags);
> +		goto retry;

p->sighand can't be changed, de_thread calls exit_itimers() before
changing ->sighand. But I still think it can be NULL, and send_sigqueue()
should return -1 in that case.

> @@ -1464,15 +1473,8 @@ send_group_sigqueue(int sig, struct sigq
>
>  	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
>
> -	while (!read_trylock(&tasklist_lock)) {
> -		if (!p->sighand)
> -			return -1;
> -		cpu_relax();
> -	}
> -	if (unlikely(!p->sighand)) {
> -		ret = -1;
> -		goto out_err;
> -	}
> +	read_lock(&tasklist_lock);
> +	/* Since it_lock is held, p->sighand cannot be NULL. */
>  	spin_lock_irqsave(&p->sighand->siglock, flags);

Again, I think the comment is wrong.

However, now I think we really have a race with exec, and this race was not
introduced by your patches!

If !thread_group_leader() does exec de_thread() calls release_task(->group_leader)
before calling exit_itimers(). This means that send_group_sigqueue() which
always has p == ->group_leader parameter can oops here.

Oleg.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Additional/catchup RCU signal fixes for -mm
  2005-11-05 16:32 ` Oleg Nesterov
@ 2005-11-06  1:00   ` Paul E. McKenney
  2005-11-06 14:17     ` Oleg Nesterov
  2005-11-06 14:32     ` Posix timers vs exec problems Oleg Nesterov
  0 siblings, 2 replies; 16+ messages in thread
From: Paul E. McKenney @ 2005-11-06  1:00 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, linux-kernel, dipankar, mingo, suzannew

On Sat, Nov 05, 2005 at 07:32:47PM +0300, Oleg Nesterov wrote:
> "Paul E. McKenney" wrote:
> >
> > @@ -1386,7 +1387,7 @@ send_sigqueue(int sig, struct sigqueue *
> >  {
> >  	unsigned long flags;
> >  	int ret = 0;
> > -	struct sighand_struct *sh = p->sighand;
> > +	struct sighand_struct *sh;
> >
> >  	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> >
> > @@ -1405,7 +1406,15 @@ send_sigqueue(int sig, struct sigqueue *
> >  		goto out_err;
> >  	}
> >
> > +retry:
> > +	sh = rcu_dereference(p->sighand);
> > +
> >  	spin_lock_irqsave(&sh->siglock, flags);
> > +	if (p->sighand != sh) {
> > +		/* We raced with exec() in a multithreaded process... */
> > +		spin_unlock_irqrestore(&sh->siglock, flags);
> > +		goto retry;
> 
> p->sighand can't be changed, de_thread calls exit_itimers() before
> changing ->sighand. But I still think it can be NULL, and send_sigqueue()
> should return -1 in that case.

OK, I put the NULL check in with my previous patch.

And you are absolutely right in the de_thread() case.  I need to add 
more cases to steamroller...

> > @@ -1464,15 +1473,8 @@ send_group_sigqueue(int sig, struct sigq
> >
> >  	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> >
> > -	while (!read_trylock(&tasklist_lock)) {
> > -		if (!p->sighand)
> > -			return -1;
> > -		cpu_relax();
> > -	}
> > -	if (unlikely(!p->sighand)) {
> > -		ret = -1;
> > -		goto out_err;
> > -	}
> > +	read_lock(&tasklist_lock);
> > +	/* Since it_lock is held, p->sighand cannot be NULL. */
> >  	spin_lock_irqsave(&p->sighand->siglock, flags);
> 
> Again, I think the comment is wrong.
> 
> However, now I think we really have a race with exec, and this race was not
> introduced by your patches!

This patch was not mine, though I guess that it is by now.  ;-)

> If !thread_group_leader() does exec de_thread() calls release_task(->group_leader)
> before calling exit_itimers(). This means that send_group_sigqueue() which
> always has p == ->group_leader parameter can oops here.

But in that case, __exit_sighand(->group_leader) would have been called,
so ->sighand would be NULL.  And none of this can change while we are holding
tasklist_lock.

If we don't want to be hitting the exec()ed task with a signal, the
thing to do would be to drop the signal, as in the attached patch.
I believe that this is an acceptable approach, since had the timer
fired slightly later, it would have been disabled, right?

Thoughts?

						Thanx, Paul

Signed-off-by: <paulmck@us.ibm.com>

diff -urpNa -X dontdiff linux-2.6.14-mm0-fix-2/kernel/signal.c linux-2.6.14-mm0-fix-3/kernel/signal.c
--- linux-2.6.14-mm0-fix-2/kernel/signal.c	2005-11-05 15:05:38.000000000 -0800
+++ linux-2.6.14-mm0-fix-3/kernel/signal.c	2005-11-05 16:27:52.000000000 -0800
@@ -1481,6 +1481,10 @@ send_group_sigqueue(int sig, struct sigq
 	read_lock(&tasklist_lock);
 	while (p->group_leader != p)
 		p = p->group_leader;
+	if (p->sighand == NULL) {
+		ret = 1;
+		goto out_err;
+	}
 	spin_lock_irqsave(&p->sighand->siglock, flags);
 	handle_stop_signal(sig, p);
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Additional/catchup RCU signal fixes for -mm
  2005-11-06  1:00   ` Paul E. McKenney
@ 2005-11-06 14:17     ` Oleg Nesterov
  2005-11-06 14:46       ` Oleg Nesterov
  2005-11-06 14:32     ` Posix timers vs exec problems Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2005-11-06 14:17 UTC (permalink / raw)
  To: paulmck; +Cc: akpm, linux-kernel, dipankar, mingo, suzannew

"Paul E. McKenney" wrote:
> 
> > If !thread_group_leader() does exec de_thread() calls release_task(->group_leader)
> > before calling exit_itimers(). This means that send_group_sigqueue() which
> > always has p == ->group_leader parameter can oops here.
> 
> But in that case, __exit_sighand(->group_leader) would have been called,
> so ->sighand would be NULL.

Yes, that is why (I think) oops can happen.

> And none of this can change while we are holding
> tasklist_lock.

Yes, but de_thread()->release_task(->group_leader) can take tasklist_lock
before us.

> If we don't want to be hitting the exec()ed task with a signal, the
> thing to do would be to drop the signal, as in the attached patch.
> I believe that this is an acceptable approach, since had the timer
> fired slightly later, it would have been disabled, right?
> 
> Thoughts?
> 
>                                                 Thanx, Paul
> 
> Signed-off-by: <paulmck@us.ibm.com>
> 
> diff -urpNa -X dontdiff linux-2.6.14-mm0-fix-2/kernel/signal.c linux-2.6.14-mm0-fix-3/kernel/signal.c
> --- linux-2.6.14-mm0-fix-2/kernel/signal.c      2005-11-05 15:05:38.000000000 -0800
> +++ linux-2.6.14-mm0-fix-3/kernel/signal.c      2005-11-05 16:27:52.000000000 -0800
> @@ -1481,6 +1481,10 @@ send_group_sigqueue(int sig, struct sigq
>         read_lock(&tasklist_lock);
>         while (p->group_leader != p)
>                 p = p->group_leader;
> +       if (p->sighand == NULL) {
> +               ret = 1;

Oh, I think there is another problem here. I'll post a separate
message.

Oleg.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Posix timers vs exec problems
  2005-11-06  1:00   ` Paul E. McKenney
  2005-11-06 14:17     ` Oleg Nesterov
@ 2005-11-06 14:32     ` Oleg Nesterov
  2005-11-07 18:12       ` [PATCH] fix de_thread() vs send_group_sigqueue() race Oleg Nesterov
  2005-11-16 23:26       ` [PATCH] sigaction should clear all signals on SIG_IGN, not just < 32 George Anzinger
  1 sibling, 2 replies; 16+ messages in thread
From: Oleg Nesterov @ 2005-11-06 14:32 UTC (permalink / raw)
  To: paulmck, Roland McGrath, George Anzinger
  Cc: akpm, linux-kernel, dipankar, mingo, suzannew

Roland, George, I think posix timers have problems with de_thread().

First, when non-leader thread does exec it calls release_task(->group_leader)
before calling exit_itimers(). This means that send_group_sigqueue() can oops
after taking tasklist_lock while doing spin_lock_irqsave(->sighand->siglock).
This is easy to fix.

cpu-timers have another problem. In !CPUCLOCK_PERTHREAD() case we attach the
timer to ->signal->cpu_timers, so these timers (when created by another process)
will survive after exec. However they will continue to profile execed process
only if it was group_leader who did exec, otherwise posix_cpu_timer_schedule()
will notice that timer->it.cpu.task has ->signal == NULL and stop this timer.

So, should de_thread() call posix_cpu_timers_exit_group() after exit_itimers()
thus stoping cpu-timers after exec? Another option is to change ->it.cpu.task for
each timer in ->signal->cpu_timers, this way cpu-timers will continue to work.
But this is not trivial.

Oleg.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Additional/catchup RCU signal fixes for -mm
  2005-11-06 14:17     ` Oleg Nesterov
@ 2005-11-06 14:46       ` Oleg Nesterov
  2005-11-06 23:02         ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2005-11-06 14:46 UTC (permalink / raw)
  To: paulmck, akpm, linux-kernel, dipankar, mingo, suzannew

Oleg Nesterov wrote:
> 
> "Paul E. McKenney" wrote:
> >
> > +       if (p->sighand == NULL) {
> > +               ret = 1;
> 
> Oh, I think there is another problem here. I'll post a separate
> message.

Sorry, I was not clear. This problem is unrelated. Yes, I think we
should drop the signal. But please note that ret = 1 (sig_ignored)
means (surprise!) "reschedule and re-arm this timer right now" in
cpu-timer case. It is not critical, but may be ret = 0 is better.

Oleg.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Additional/catchup RCU signal fixes for -mm
  2005-11-06 14:46       ` Oleg Nesterov
@ 2005-11-06 23:02         ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2005-11-06 23:02 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, linux-kernel, dipankar, mingo, suzannew

On Sun, Nov 06, 2005 at 05:46:09PM +0300, Oleg Nesterov wrote:
> Oleg Nesterov wrote:
> > 
> > "Paul E. McKenney" wrote:
> > >
> > > +       if (p->sighand == NULL) {
> > > +               ret = 1;
> > 
> > Oh, I think there is another problem here. I'll post a separate
> > message.
> 
> Sorry, I was not clear. This problem is unrelated. Yes, I think we
> should drop the signal. But please note that ret = 1 (sig_ignored)
> means (surprise!) "reschedule and re-arm this timer right now" in
> cpu-timer case. It is not critical, but may be ret = 0 is better.

OK.  Seems like the next firing of the timer would then see the
changed situation, so the current code should at least be safe.

Thoughts?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] fix de_thread() vs send_group_sigqueue() race
  2005-11-06 14:32     ` Posix timers vs exec problems Oleg Nesterov
@ 2005-11-07 18:12       ` Oleg Nesterov
  2005-11-08 20:36         ` Chris Wright
  2005-11-16 23:26       ` [PATCH] sigaction should clear all signals on SIG_IGN, not just < 32 George Anzinger
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2005-11-07 18:12 UTC (permalink / raw)
  To: paulmck, Roland McGrath, George Anzinger, akpm, linux-kernel,
	dipankar, mingo, suzannew, Chris Wright

When non-leader thread does exec, de_thread calls release_task(leader) before
calling exit_itimers(). If local timer interrupt happens in between, it can
oops in send_group_sigqueue() while taking ->sighand->siglock == NULL.

However, we can't change send_group_sigqueue() to check p->signal != NULL,
because sys_timer_create() does get_task_struct() only in SIGEV_THREAD_ID
case. So it is possible that this task_struct was already freed and we can't
trust p->signal.

This patch changes de_thread() so that leader released after exit_itimers()
call.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 2.6.14/fs/exec.c~	2005-09-21 21:08:33.000000000 +0400
+++ 2.6.14/fs/exec.c	2005-11-07 23:54:42.000000000 +0300
@@ -593,6 +593,7 @@ static inline int de_thread(struct task_
 	struct signal_struct *sig = tsk->signal;
 	struct sighand_struct *newsighand, *oldsighand = tsk->sighand;
 	spinlock_t *lock = &oldsighand->siglock;
+	struct task_struct *leader = NULL;
 	int count;
 
 	/*
@@ -668,7 +669,7 @@ static inline int de_thread(struct task_
 	 * and to assume its PID:
 	 */
 	if (!thread_group_leader(current)) {
-		struct task_struct *leader = current->group_leader, *parent;
+		struct task_struct *parent;
 		struct dentry *proc_dentry1, *proc_dentry2;
 		unsigned long exit_state, ptrace;
 
@@ -677,6 +678,7 @@ static inline int de_thread(struct task_
 		 * It should already be zombie at this point, most
 		 * of the time.
 		 */
+		leader = current->group_leader;
 		while (leader->exit_state != EXIT_ZOMBIE)
 			yield();
 
@@ -736,7 +738,6 @@ static inline int de_thread(struct task_
 		proc_pid_flush(proc_dentry2);
 
 		BUG_ON(exit_state != EXIT_ZOMBIE);
-		release_task(leader);
         }
 
 	/*
@@ -746,8 +747,11 @@ static inline int de_thread(struct task_
 	sig->flags = 0;
 
 no_thread_group:
-	BUG_ON(atomic_read(&sig->count) != 1);
 	exit_itimers(sig);
+	if (leader)
+		release_task(leader);
+
+	BUG_ON(atomic_read(&sig->count) != 1);
 
 	if (atomic_read(&oldsighand->count) == 1) {
 		/*

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix de_thread() vs send_group_sigqueue() race
  2005-11-07 18:12       ` [PATCH] fix de_thread() vs send_group_sigqueue() race Oleg Nesterov
@ 2005-11-08 20:36         ` Chris Wright
  2005-11-08 20:55           ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Wright @ 2005-11-08 20:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: paulmck, Roland McGrath, George Anzinger, akpm, linux-kernel,
	dipankar, mingo, suzannew, Chris Wright, torvalds

* Oleg Nesterov (oleg@tv-sign.ru) wrote:
> When non-leader thread does exec, de_thread calls release_task(leader) before
> calling exit_itimers(). If local timer interrupt happens in between, it can
> oops in send_group_sigqueue() while taking ->sighand->siglock == NULL.
> 
> However, we can't change send_group_sigqueue() to check p->signal != NULL,
> because sys_timer_create() does get_task_struct() only in SIGEV_THREAD_ID
> case. So it is possible that this task_struct was already freed and we can't
> trust p->signal.
> 
> This patch changes de_thread() so that leader released after exit_itimers()
> call.

Nice catch.  As soon as Linus picks it up we'll put it in -stable as
well.

> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

Acked-by: Chris Wright <chrisw@osdl.org>

> --- 2.6.14/fs/exec.c~	2005-09-21 21:08:33.000000000 +0400
> +++ 2.6.14/fs/exec.c	2005-11-07 23:54:42.000000000 +0300
> @@ -593,6 +593,7 @@ static inline int de_thread(struct task_
>  	struct signal_struct *sig = tsk->signal;
>  	struct sighand_struct *newsighand, *oldsighand = tsk->sighand;
>  	spinlock_t *lock = &oldsighand->siglock;
> +	struct task_struct *leader = NULL;
>  	int count;
>  
>  	/*
> @@ -668,7 +669,7 @@ static inline int de_thread(struct task_
>  	 * and to assume its PID:
>  	 */
>  	if (!thread_group_leader(current)) {
> -		struct task_struct *leader = current->group_leader, *parent;
> +		struct task_struct *parent;
>  		struct dentry *proc_dentry1, *proc_dentry2;
>  		unsigned long exit_state, ptrace;
>  
> @@ -677,6 +678,7 @@ static inline int de_thread(struct task_
>  		 * It should already be zombie at this point, most
>  		 * of the time.
>  		 */
> +		leader = current->group_leader;
>  		while (leader->exit_state != EXIT_ZOMBIE)
>  			yield();
>  
> @@ -736,7 +738,6 @@ static inline int de_thread(struct task_
>  		proc_pid_flush(proc_dentry2);
>  
>  		BUG_ON(exit_state != EXIT_ZOMBIE);
> -		release_task(leader);
>          }
>  
>  	/*
> @@ -746,8 +747,11 @@ static inline int de_thread(struct task_
>  	sig->flags = 0;
>  
>  no_thread_group:
> -	BUG_ON(atomic_read(&sig->count) != 1);
>  	exit_itimers(sig);
> +	if (leader)
> +		release_task(leader);
> +
> +	BUG_ON(atomic_read(&sig->count) != 1);
>  
>  	if (atomic_read(&oldsighand->count) == 1) {
>  		/*

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix de_thread() vs send_group_sigqueue() race
  2005-11-08 20:36         ` Chris Wright
@ 2005-11-08 20:55           ` Linus Torvalds
  0 siblings, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2005-11-08 20:55 UTC (permalink / raw)
  To: Chris Wright
  Cc: Oleg Nesterov, paulmck, Roland McGrath, George Anzinger, akpm,
	linux-kernel, dipankar, mingo, suzannew



On Tue, 8 Nov 2005, Chris Wright wrote:

> * Oleg Nesterov (oleg@tv-sign.ru) wrote:
> > When non-leader thread does exec, de_thread calls release_task(leader) before
> > calling exit_itimers(). If local timer interrupt happens in between, it can
> > oops in send_group_sigqueue() while taking ->sighand->siglock == NULL.
> > 
> > However, we can't change send_group_sigqueue() to check p->signal != NULL,
> > because sys_timer_create() does get_task_struct() only in SIGEV_THREAD_ID
> > case. So it is possible that this task_struct was already freed and we can't
> > trust p->signal.
> > 
> > This patch changes de_thread() so that leader released after exit_itimers()
> > call.
> 
> Nice catch.  As soon as Linus picks it up we'll put it in -stable as
> well.

Gaah. For some reason I was pretty much the only one not cc'd on the 
original patch ;)

Found it on linux-kernel.

		Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] sigaction should clear all signals on SIG_IGN, not just < 32
  2005-11-06 14:32     ` Posix timers vs exec problems Oleg Nesterov
  2005-11-07 18:12       ` [PATCH] fix de_thread() vs send_group_sigqueue() race Oleg Nesterov
@ 2005-11-16 23:26       ` George Anzinger
  2005-11-22  1:09         ` Thread group exec race -> null pointer... HELP George Anzinger
  1 sibling, 1 reply; 16+ messages in thread
From: George Anzinger @ 2005-11-16 23:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: paulmck, Roland McGrath, akpm, linux-kernel, dipankar, mingo, suzannew

[-- Attachment #1: Type: text/plain, Size: 488 bytes --]

While rooting aroung in the signal code trying to understand how to
fix the SIG_IGN	ploy (set sig handler to SIG_IGN and flood system with
high speed repeating timers) I came across what, I think, is a problem
in sigaction() in that when processing a SIG_IGN request it flushes
signals from 1 to SIGRTMIN and leaves the rest.  The attached patch is
an attempt to fix this.

-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/


[-- Attachment #2: sigaction-fix.patch --]
[-- Type: text/plain, Size: 3026 bytes --]

Source: MontaVista Software, Inc.
Type: Defect Fix 
Description:
It appeares that the sigaction system call, when processing a SIG_IGN or
a SIG_DFL, is removing only signals that appear in the first mask word.

Signed-off-by: George Anzinger <george@mvista.com>

 include/linux/signal.h |   16 ++++++++++++++++
 kernel/signal.c        |   34 ++++++++++++++++++++++++++++++++--
 2 files changed, 48 insertions(+), 2 deletions(-)

Index: linux-2.6.15-rc/kernel/signal.c
===================================================================
--- linux-2.6.15-rc.orig/kernel/signal.c
+++ linux-2.6.15-rc/kernel/signal.c
@@ -633,6 +633,33 @@ void signal_wake_up(struct task_struct *
  * Returns 1 if any signals were found.
  *
  * All callers must be holding the siglock.
+ *
+ * This version takes a sigset mask and looks at all signals,
+ * not just those in the first mask word.
+ */
+static int rm_from_queue_full(sigset_t *mask, struct sigpending *s)
+{
+	struct sigqueue *q, *n;
+	sigset_t m;
+
+	sigandsets(&m, mask, &s->signal);
+	if (sigisemptyset(&m))
+		return 0;
+
+	signandsets(&s->signal, &s->signal, mask);
+	list_for_each_entry_safe(q, n, &s->list, list) {
+		if (sigismember(mask, q->info.si_signo)) {
+			list_del_init(&q->list);
+			__sigqueue_free(q);
+		}
+	}
+	return 1;
+}
+/*
+ * Remove signals in mask from the pending set and queue.
+ * Returns 1 if any signals were found.
+ *
+ * All callers must be holding the siglock.
  */
 static int rm_from_queue(unsigned long mask, struct sigpending *s)
 {
@@ -2471,6 +2498,7 @@ int
 do_sigaction(int sig, const struct k_sigaction *act, struct k_sigaction *oact)
 {
 	struct k_sigaction *k;
+	sigset_t mask;
 
 	if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig)))
 		return -EINVAL;
@@ -2518,9 +2546,11 @@ do_sigaction(int sig, const struct k_sig
 			*k = *act;
 			sigdelsetmask(&k->sa.sa_mask,
 				      sigmask(SIGKILL) | sigmask(SIGSTOP));
-			rm_from_queue(sigmask(sig), &t->signal->shared_pending);
+			sigemptyset(&mask);
+			sigaddset(&mask, sig);
+			rm_from_queue_full(&mask, &t->signal->shared_pending);
 			do {
-				rm_from_queue(sigmask(sig), &t->pending);
+				rm_from_queue_full(&mask, &t->pending);
 				recalc_sigpending_tsk(t);
 				t = next_thread(t);
 			} while (t != current);
Index: linux-2.6.15-rc/include/linux/signal.h
===================================================================
--- linux-2.6.15-rc.orig/include/linux/signal.h
+++ linux-2.6.15-rc/include/linux/signal.h
@@ -82,6 +82,22 @@ static inline int sigfindinword(unsigned
 
 #endif /* __HAVE_ARCH_SIG_BITOPS */
 
+static inline int sigisemptyset(sigset_t *set)
+{
+	extern void _NSIG_WORDS_is_unsupported_size(void);
+	switch (_NSIG_WORDS) {
+	case 4:
+		return (set->sig[3] | set->sig[2] |
+			set->sig[1] | set->sig[0]) == 0;
+	case 2:
+		return (set->sig[1] | set->sig[0]) == 0;
+	case 1:
+		return set->sig[0] == 0;
+	default:
+		_NSIG_WORDS_is_unsupported_size();
+	}
+}
+
 #define sigmask(sig)	(1UL << ((sig) - 1))
 
 #ifndef __HAVE_ARCH_SIG_SETOPS

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Thread group exec race -> null pointer... HELP
  2005-11-16 23:26       ` [PATCH] sigaction should clear all signals on SIG_IGN, not just < 32 George Anzinger
@ 2005-11-22  1:09         ` George Anzinger
  2005-11-22 14:45           ` Oleg Nesterov
  2005-11-22 19:20           ` [PATCH] fix do_wait() vs exec() race Oleg Nesterov
  0 siblings, 2 replies; 16+ messages in thread
From: George Anzinger @ 2005-11-22  1:09 UTC (permalink / raw)
  To: george
  Cc: Oleg Nesterov, paulmck, Roland McGrath, akpm, linux-kernel,
	dipankar, mingo, suzannew

[-- Attachment #1: Type: text/plain, Size: 2163 bytes --]

George Anzinger wrote:
> While rooting aroung in the signal code trying to understand how to
> fix the SIG_IGN    ploy (set sig handler to SIG_IGN and flood system with
> high speed repeating timers) I came across what, I think, is a problem
> in sigaction() in that when processing a SIG_IGN request it flushes
> signals from 1 to SIGRTMIN and leaves the rest.  

Still rooting around in the above.  The test program is attached.  It 
creates and arms a repeating timer and then clones a thread which does 
an exec() call.

If I run the test with top (only two programs running) I quickly get 
an OOPS on trying to derefence a NULL pointer.  It is comming from a 
call the posix timer code is making to deliver a timer.  Call is to 
send_group_sigqueue() at ~445 in posix-timers.c.  The process being 
passed in is DEAD with current->signal ==NULL, thus the OOPS.  In the 
first instance of this, we see that the thread-group leader is dead 
and the exec code at line ~718 is setting the old leaders group-leader 
to him self.  The failure then happens when the IRQ release is done on 
the write_unlock_irq() at ~732 thus allowing the timer interrupt.

Thinking that it makes no real sense to set the group leader to a dead 
process, I did the following:

--- linux-2.6.15-rc.orig/fs/exec.c
+++ linux-2.6.15-rc/fs/exec.c
@@ -715,7 +715,7 @@ static inline int de_thread(struct task_
  		current->parent = current->real_parent = leader->real_parent;
  		leader->parent = leader->real_parent = child_reaper;
  		current->group_leader = current;
-		leader->group_leader = leader;
+		leader->group_leader = current;

  		add_parent(current, current->parent);
  		add_parent(leader, leader->parent);

This also fails as there is still a window where the group leader is 
dead with a null signal pointer, i.e. the interrupt happens (this time 
on another cpu) before the above changed code is executed.

It seems to me that the group leader needs to change prior to setting 
the signal pointer to NULL, but I don't really know this code very well.

Help !
-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

[-- Attachment #2: timer_kill.c --]
[-- Type: text/x-csrc, Size: 1193 bytes --]

#include <errno.h>
#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/wait.h>
#include <time.h>

void die(const char* msg)
{
	fprintf(stderr, "ERR!! %s: %s\n", msg, strerror(errno));
	exit(-1);
}

char thread_stack[4096];

int thread_func(void *arg)
{
	execl("/bin/true", NULL);
	die("exec");
	return 0;
}

void proc_func(void)
{
	int pid;

	for (;;)
		if ((pid = fork())) {
			if (pid != waitpid(pid, NULL, 0))
				die("wait4");
		} else {
			struct sigevent sigev = {};
			struct itimerspec itsp = {};
			timer_t tid;

			sigev.sigev_signo = SIGRTMIN;
			sigev.sigev_notify = SIGEV_SIGNAL;
			if (timer_create(CLOCK_MONOTONIC, &sigev, &tid) == -1)
				die("timer_create");

			itsp.it_value.    tv_nsec = 1;
			itsp.it_interval. tv_nsec = 1;
			if (timer_settime(tid, 0, &itsp, NULL))
				die("timer_settime");

			if (clone(thread_func, thread_stack + 2048,
					CLONE_THREAD|CLONE_SIGHAND|CLONE_VM|CLONE_FILES,
					NULL) < 0)
				die("clone");

			pause();
		}
}

int main(void)
{
	int pn;

	signal(SIGRTMIN, SIG_IGN);

	for (pn = 0; pn < 16; ++pn)
		if (!fork())
			proc_func();

	pause();

	return 0;
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Thread group exec race -> null pointer... HELP
  2005-11-22  1:09         ` Thread group exec race -> null pointer... HELP George Anzinger
@ 2005-11-22 14:45           ` Oleg Nesterov
  2005-11-23 20:30             ` George Anzinger
  2005-11-22 19:20           ` [PATCH] fix do_wait() vs exec() race Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2005-11-22 14:45 UTC (permalink / raw)
  To: george
  Cc: paulmck, Roland McGrath, akpm, linux-kernel, dipankar, mingo, suzannew

George Anzinger wrote:
>
> Still rooting around in the above.  The test program is attached.  It
> creates and arms a repeating timer and then clones a thread which does
> an exec() call.

This patch:

	http://marc.theaimsgroup.com/?l=linux-kernel&m=113138286512847

was intended to fix exactly this problem (and the same test program was
used to exploit the race and test the fix).

So, it does not help? I can't reproduce the problem.

Note: I think you also need this patch:

	http://marc.theaimsgroup.com/?l=linux-kernel&m=113059955626598

otherwise I beleive OOPS can happen while killing this program if you are
running the kernel with this change applied:

	[PATCH] Call exit_itimers from do_exit, not __exit_signal
	http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=25f407f0b668f5e4ebd5d13e1fb4306ba6427ead


> first instance of this, we see that the thread-group leader is dead
> and the exec code at line ~718 is setting the old leaders group-leader
> to him self.

I think this code at line ~718

	leader->group_leader = leader;

is noop, because leader->group_leader == leader here.

> -               leader->group_leader = leader;
> +               leader->group_leader = current;

This can't help, without SIGEV_THREAD_ID we don't check ->group_leader,
the signal goes to the thread group via timer->it_process, which is equal
to the old leader.

Oleg.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] fix do_wait() vs exec() race
  2005-11-22  1:09         ` Thread group exec race -> null pointer... HELP George Anzinger
  2005-11-22 14:45           ` Oleg Nesterov
@ 2005-11-22 19:20           ` Oleg Nesterov
  1 sibling, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2005-11-22 19:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: george, paulmck, Roland McGrath, akpm, dipankar, mingo,
	Linus Torvalds, Chris Wright

When non-leader thread does exec, de_thread adds old leader to the
init's ->children list in EXIT_ZOMBIE state and drops tasklist_lock.

This means that release_task(leader) in de_thread() is racy vs do_wait()
from init task.

I think de_thread() should set old leader's state to EXIT_DEAD instead.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 2.6.15-rc2/fs/exec.c~	2005-11-22 19:35:31.000000000 +0300
+++ 2.6.15-rc2/fs/exec.c	2005-11-23 00:49:23.000000000 +0300
@@ -668,7 +668,7 @@ static inline int de_thread(struct task_
 	if (!thread_group_leader(current)) {
 		struct task_struct *parent;
 		struct dentry *proc_dentry1, *proc_dentry2;
-		unsigned long exit_state, ptrace;
+		unsigned long ptrace;
 
 		/*
 		 * Wait for the thread group leader to be a zombie.
@@ -726,15 +726,15 @@ static inline int de_thread(struct task_
 		list_del(&current->tasks);
 		list_add_tail(&current->tasks, &init_task.tasks);
 		current->exit_signal = SIGCHLD;
-		exit_state = leader->exit_state;
+
+		BUG_ON(leader->exit_state != EXIT_ZOMBIE);
+		leader->exit_state = EXIT_DEAD;
 
 		write_unlock_irq(&tasklist_lock);
 		spin_unlock(&leader->proc_lock);
 		spin_unlock(&current->proc_lock);
 		proc_pid_flush(proc_dentry1);
 		proc_pid_flush(proc_dentry2);
-
-		BUG_ON(exit_state != EXIT_ZOMBIE);
         }
 
 	/*

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Thread group exec race -> null pointer... HELP
  2005-11-22 14:45           ` Oleg Nesterov
@ 2005-11-23 20:30             ` George Anzinger
  2005-11-25 15:03               ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: George Anzinger @ 2005-11-23 20:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: paulmck, Roland McGrath, akpm, linux-kernel, dipankar, mingo, suzannew

Oleg Nesterov wrote:
> George Anzinger wrote:
> 
>>Still rooting around in the above.  The test program is attached.  It
>>creates and arms a repeating timer and then clones a thread which does
>>an exec() call.
> 
> 
> This patch:
> 
> 	http://marc.theaimsgroup.com/?l=linux-kernel&m=113138286512847
> 
> was intended to fix exactly this problem (and the same test program was
> used to exploit the race and test the fix).
> 
> So, it does not help? I can't reproduce the problem.

Yes, it does fix it.  Somehow I missed the posting of that patch.
> 
> Note: I think you also need this patch:
> 
> 	http://marc.theaimsgroup.com/?l=linux-kernel&m=113059955626598
> 
> otherwise I beleive OOPS can happen while killing this program if you are
> running the kernel with this change applied:
> 
> 	[PATCH] Call exit_itimers from do_exit, not __exit_signal
> 	http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=25f407f0b668f5e4ebd5d13e1fb4306ba6427ead
> 
> 
> 
>>first instance of this, we see that the thread-group leader is dead
>>and the exec code at line ~718 is setting the old leaders group-leader
>>to him self.
> 
> 
> I think this code at line ~718
> 
> 	leader->group_leader = leader;
> 
> is noop, because leader->group_leader == leader here.
> 
> 
>>-               leader->group_leader = leader;
>>+               leader->group_leader = current;
> 
> 
> This can't help, without SIGEV_THREAD_ID we don't check ->group_leader,
> the signal goes to the thread group via timer->it_process, which is equal
> to the old leader.

The signal code returns <0 so posix-timers digs into up the 
group_leader and trys again.  Still, the patch fixes it all.

-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Thread group exec race -> null pointer... HELP
  2005-11-23 20:30             ` George Anzinger
@ 2005-11-25 15:03               ` Ingo Molnar
  0 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2005-11-25 15:03 UTC (permalink / raw)
  To: George Anzinger
  Cc: Oleg Nesterov, paulmck, Roland McGrath, akpm, linux-kernel,
	dipankar, suzannew


* George Anzinger <george@mvista.com> wrote:

> >This patch:
> >
> >	http://marc.theaimsgroup.com/?l=linux-kernel&m=113138286512847
> >
> >was intended to fix exactly this problem (and the same test program was
> >used to exploit the race and test the fix).
> >
> >So, it does not help? I can't reproduce the problem.
> 
> Yes, it does fix it.  Somehow I missed the posting of that patch.

fyi, i've included the patch in -rt15.

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-11-25 15:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-05  1:36 [PATCH] Additional/catchup RCU signal fixes for -mm Paul E. McKenney
2005-11-05 16:32 ` Oleg Nesterov
2005-11-06  1:00   ` Paul E. McKenney
2005-11-06 14:17     ` Oleg Nesterov
2005-11-06 14:46       ` Oleg Nesterov
2005-11-06 23:02         ` Paul E. McKenney
2005-11-06 14:32     ` Posix timers vs exec problems Oleg Nesterov
2005-11-07 18:12       ` [PATCH] fix de_thread() vs send_group_sigqueue() race Oleg Nesterov
2005-11-08 20:36         ` Chris Wright
2005-11-08 20:55           ` Linus Torvalds
2005-11-16 23:26       ` [PATCH] sigaction should clear all signals on SIG_IGN, not just < 32 George Anzinger
2005-11-22  1:09         ` Thread group exec race -> null pointer... HELP George Anzinger
2005-11-22 14:45           ` Oleg Nesterov
2005-11-23 20:30             ` George Anzinger
2005-11-25 15:03               ` Ingo Molnar
2005-11-22 19:20           ` [PATCH] fix do_wait() vs exec() race Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).