linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()
@ 2007-06-06  2:17 john stultz
  2007-06-06  9:45 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: john stultz @ 2007-06-06  2:17 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Steven Rostedt, Paul E. McKenney, lkml

Hey Ingo,
	So we've been seeing the following trace fairly frequently on our SMP
boxes when running kernbench:

BUG: at kernel/softirq.c:639 __tasklet_action()

Call Trace:
 [<ffffffff8106d5da>] dump_trace+0xaa/0x32a
 [<ffffffff8106d89b>] show_trace+0x41/0x5c
 [<ffffffff8106d8cb>] dump_stack+0x15/0x17
 [<ffffffff81094a97>] __tasklet_action+0xdf/0x12e
 [<ffffffff81094f76>] tasklet_action+0x27/0x29
 [<ffffffff8109530a>] ksoftirqd+0x16c/0x271
 [<ffffffff81033d4d>] kthread+0xf5/0x128
 [<ffffffff8105ff68>] child_rip+0xa/0x12


Paul also pointed this out awhile back: http://lkml.org/lkml/2007/2/25/1


Anyway, I think I finally found the issue. Its a bit hard to explain,
but the idea is while __tasklet_action is running the tasklet function
on CPU1, if a call to tasklet_schedule() on CPU2 is made, and if right
after we mark the TASKLET_STATE_SCHED bit we are preempted,
__tasklet_action on CPU1 might be able to re-run the function, clear the
bit and unlock the tasklet before CPU2 enters __tasklet_common_schedule.
Once __tasklet_common_schedule locks the tasklet, we will add the
tasklet to the list with the TASKLET_STATE_SCHED *unset*. 

I've verified this race occurs w/ a WARN_ON in
__tasklet_common_schedule().


This fix avoids this race by making sure *after* we've locked the
tasklet that the STATE_SCHED bit is set before adding it to the list.

Does it look ok to you?

thanks
-john

Signed-off-by: John Stultz <johnstul@us.ibm.com>

Index: 2.6-rt/kernel/softirq.c
===================================================================
--- 2.6-rt.orig/kernel/softirq.c	2007-06-05 18:30:54.000000000 -0700
+++ 2.6-rt/kernel/softirq.c	2007-06-05 18:36:44.000000000 -0700
@@ -544,10 +544,17 @@ static void inline
 __tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
 {
 	if (tasklet_trylock(t)) {
-		WARN_ON(t->next != NULL);
-		t->next = head->list;
-		head->list = t;
-		raise_softirq_irqoff(nr);
+		/* We may have been preempted before tasklet_trylock
+		 * and __tasklet_action may have already run.
+		 * So double check the sched bit while the takslet
+		 * is locked before adding it to the list.
+		 */
+		if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
+			WARN_ON(t->next != NULL);
+			t->next = head->list;
+			head->list = t;
+			raise_softirq_irqoff(nr);
+		}
 		tasklet_unlock(t);
 	}
 }



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()
  2007-06-06  2:17 [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON() john stultz
@ 2007-06-06  9:45 ` Ingo Molnar
  2007-06-06 17:39   ` john stultz
  2007-06-06 10:31 ` Jesper Juhl
  2007-06-14 21:20 ` john stultz
  2 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2007-06-06  9:45 UTC (permalink / raw)
  To: john stultz; +Cc: Thomas Gleixner, Steven Rostedt, Paul E. McKenney, lkml


* john stultz <johnstul@us.ibm.com> wrote:

> This fix avoids this race by making sure *after* we've locked the 
> tasklet that the STATE_SCHED bit is set before adding it to the list.
> 
> Does it look ok to you?

ah - nice!! What would be the worst-case effect of this bug? (besides 
the WARN_ON()?) In any case, this fix will be in -rt10. Great work!

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()
  2007-06-06  2:17 [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON() john stultz
  2007-06-06  9:45 ` Ingo Molnar
@ 2007-06-06 10:31 ` Jesper Juhl
  2007-06-14 21:20 ` john stultz
  2 siblings, 0 replies; 8+ messages in thread
From: Jesper Juhl @ 2007-06-06 10:31 UTC (permalink / raw)
  To: john stultz
  Cc: Ingo Molnar, Thomas Gleixner, Steven Rostedt, Paul E. McKenney, lkml

On 06/06/07, john stultz <johnstul@us.ibm.com> wrote:
> --- 2.6-rt.orig/kernel/softirq.c        2007-06-05 18:30:54.000000000 -0700
> +++ 2.6-rt/kernel/softirq.c     2007-06-05 18:36:44.000000000 -0700
> @@ -544,10 +544,17 @@ static void inline
>  __tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
>  {
>         if (tasklet_trylock(t)) {
> -               WARN_ON(t->next != NULL);
> -               t->next = head->list;
> -               head->list = t;
> -               raise_softirq_irqoff(nr);
> +               /* We may have been preempted before tasklet_trylock
> +                * and __tasklet_action may have already run.
> +                * So double check the sched bit while the takslet

s/takslet/tasklet/

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()
  2007-06-06  9:45 ` Ingo Molnar
@ 2007-06-06 17:39   ` john stultz
  0 siblings, 0 replies; 8+ messages in thread
From: john stultz @ 2007-06-06 17:39 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Steven Rostedt, Paul E. McKenney, lkml

On Wed, 2007-06-06 at 11:45 +0200, Ingo Molnar wrote:
> * john stultz <johnstul@us.ibm.com> wrote:
> 
> > This fix avoids this race by making sure *after* we've locked the 
> > tasklet that the STATE_SCHED bit is set before adding it to the list.
> > 
> > Does it look ok to you?
> 
> ah - nice!! What would be the worst-case effect of this bug? (besides 
> the WARN_ON()?) In any case, this fix will be in -rt10. Great work!

The only side effect I can come up w/ is you might run a tasklet 3 times
if it was only scheduled twice.

thanks
-john


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()
  2007-06-06  2:17 [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON() john stultz
  2007-06-06  9:45 ` Ingo Molnar
  2007-06-06 10:31 ` Jesper Juhl
@ 2007-06-14 21:20 ` john stultz
  2 siblings, 0 replies; 8+ messages in thread
From: john stultz @ 2007-06-14 21:20 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Steven Rostedt, Paul E. McKenney, lkml

On Tue, 2007-06-05 at 19:17 -0700, john stultz wrote:
> Hey Ingo,
> 	So we've been seeing the following trace fairly frequently on our SMP
> boxes when running kernbench:
> 
> BUG: at kernel/softirq.c:639 __tasklet_action()
> 
> Call Trace:
>  [<ffffffff8106d5da>] dump_trace+0xaa/0x32a
>  [<ffffffff8106d89b>] show_trace+0x41/0x5c
>  [<ffffffff8106d8cb>] dump_stack+0x15/0x17
>  [<ffffffff81094a97>] __tasklet_action+0xdf/0x12e
>  [<ffffffff81094f76>] tasklet_action+0x27/0x29
>  [<ffffffff8109530a>] ksoftirqd+0x16c/0x271
>  [<ffffffff81033d4d>] kthread+0xf5/0x128
>  [<ffffffff8105ff68>] child_rip+0xa/0x12
> 
> 
> Paul also pointed this out awhile back: http://lkml.org/lkml/2007/2/25/1
> 
> 
> Anyway, I think I finally found the issue. Its a bit hard to explain,
> but the idea is while __tasklet_action is running the tasklet function
> on CPU1, if a call to tasklet_schedule() on CPU2 is made, and if right
> after we mark the TASKLET_STATE_SCHED bit we are preempted,
> __tasklet_action on CPU1 might be able to re-run the function, clear the
> bit and unlock the tasklet before CPU2 enters __tasklet_common_schedule.
> Once __tasklet_common_schedule locks the tasklet, we will add the
> tasklet to the list with the TASKLET_STATE_SCHED *unset*. 
> 
> I've verified this race occurs w/ a WARN_ON in
> __tasklet_common_schedule().
> 
> 
> This fix avoids this race by making sure *after* we've locked the
> tasklet that the STATE_SCHED bit is set before adding it to the list.
> 
> Does it look ok to you?
> 
> thanks
> -john
> 
> Signed-off-by: John Stultz <johnstul@us.ibm.com>
> 
> Index: 2.6-rt/kernel/softirq.c
> ===================================================================
> --- 2.6-rt.orig/kernel/softirq.c	2007-06-05 18:30:54.000000000 -0700
> +++ 2.6-rt/kernel/softirq.c	2007-06-05 18:36:44.000000000 -0700
> @@ -544,10 +544,17 @@ static void inline
>  __tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
>  {
>  	if (tasklet_trylock(t)) {
> -		WARN_ON(t->next != NULL);
> -		t->next = head->list;
> -		head->list = t;
> -		raise_softirq_irqoff(nr);
> +		/* We may have been preempted before tasklet_trylock
> +		 * and __tasklet_action may have already run.
> +		 * So double check the sched bit while the takslet
> +		 * is locked before adding it to the list.
> +		 */
> +		if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
> +			WARN_ON(t->next != NULL);
> +			t->next = head->list;
> +			head->list = t;
> +			raise_softirq_irqoff(nr);
> +		}
>  		tasklet_unlock(t);
>  	}
>  }

So while digging on a strange OOM issue we were seeing (which actually
ended up being fixed by Steven's softirq patch), I noticed that the fix
above is incomplete. With only the patch above, we may no longer have
unscheduled tasklets added to the list, but we may end up with scheduled
tasklets that are not on the list (and will stay that way!).

The following additional patch should correct this issue. Although since
we weren't actually hitting it, the issue is a bit theoretical, so I've
not been able to prove it really fixes anything.

thanks
-john

Index: 2.6-rt/kernel/softirq.c
===================================================================
--- 2.6-rt.orig/kernel/softirq.c	2007-06-12 20:30:11.000000000 -0700
+++ 2.6-rt/kernel/softirq.c	2007-06-12 20:37:22.000000000 -0700
@@ -544,6 +544,7 @@
 __tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
 {
 	if (tasklet_trylock(t)) {
+again:
 		/* We may have been preempted before tasklet_trylock
 		 * and __tasklet_action may have already run.
 		 * So double check the sched bit while the takslet
@@ -554,8 +555,21 @@
 			t->next = head->list;
 			head->list = t;
 			raise_softirq_irqoff(nr);
+			tasklet_unlock(t);
+		} else {
+			/* This is subtle. If we hit the corner case above
+			 * It is possible that we get preempted right here,
+			 * and another task has successfully called
+			 * tasklet_schedule(), then this function, and
+			 * failed on the trylock. Thus we must be sure
+			 * before releasing the tasklet lock, that the
+			 * SCHED_BIT is clear. Otherwise the tasklet
+			 * may get its SCHED_BIT set, but not added to the
+			 * list
+			 */
+			if (!tasklet_tryunlock(t))
+				goto again;
 		}
-		tasklet_unlock(t);
 	}
 }
 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()
  2007-06-15 21:51 ` john stultz
@ 2007-06-15 23:59   ` Oleg Nesterov
  0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2007-06-15 23:59 UTC (permalink / raw)
  To: john stultz
  Cc: Ingo Molnar, Paul E. McKenney, Steven Rostedt, Thomas Gleixner,
	linux-kernel

On 06/15, john stultz wrote:
>
> On Fri, 2007-06-15 at 19:52 +0400, Oleg Nesterov wrote:
> > 
> > Could you please look at the message below? I sent it privately near a month
> > ago, but I think these problems are not fixed yet.
> 
> Hmm. Maybe you sent it to others on the cc list, as I can't find it in
> my box. But apologies anyway.

checking my mbox... Oops, you are right, sorry!

> > > +		if (unlikely(atomic_read(&t->count))) {
> > > +out_disabled:
> > > +			/* implicit unlock: */
> > > +			wmb();
> > > +			t->state = TASKLET_STATEF_PENDING;
> > 
> > What if tasklet_enable() happens just before this line ?
> > 
> > After the next schedule_tasklet() we have all bits set: SCHED, RUN, PENDING.
> > The next invocation of __tasklet_action() clears SCHED, but tasklet_tryunlock()
> > below can never succeed because of PENDING.
> 
> Yep. I've only been focusing on races in schedule/action, as I've been
> hunting issues w/ rcu. But I'll agree that the other state changes look
> problematic. I know Paul McKenney was looking at some of the other state
> changes and was seeing some potential problems as well.

OK, thanks. But doesn't this mean your 2-nd patch is questionable?

> +               } else {
> +                       /* This is subtle. If we hit the corner case above
> +                        * It is possible that we get preempted right here,
> +                        * and another task has successfully called
> +                        * tasklet_schedule(), then this function, and
> +                        * failed on the trylock. Thus we must be sure
> +                        * before releasing the tasklet lock, that the
> +                        * SCHED_BIT is clear. Otherwise the tasklet
> +                        * may get its SCHED_BIT set, but not added to the
> +                        * list
> +                        */
> +                       if (!tasklet_tryunlock(t))
> +                               goto again;

Again, tasklet_tryunlock() can fail because _PENDING was set by __tasklet_action().
In that case __tasklet_common_schedule() goes to the endless loop, no?

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()
  2007-06-15 15:52 Oleg Nesterov
@ 2007-06-15 21:51 ` john stultz
  2007-06-15 23:59   ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: john stultz @ 2007-06-15 21:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Paul E. McKenney, Steven Rostedt, Thomas Gleixner,
	linux-kernel

On Fri, 2007-06-15 at 19:52 +0400, Oleg Nesterov wrote:
> john stultz wrote:
> >
> > The following additional patch should correct this issue. Although since
> > we weren't actually hitting it, the issue is a bit theoretical, so I've
> > not been able to prove it really fixes anything.
> 
> Could you please look at the message below? I sent it privately near a month
> ago, but I think these problems are not fixed yet.

Hmm. Maybe you sent it to others on the cc list, as I can't find it in
my box. But apologies anyway.


> ---------------------------------------------------------------------------
> 
> > +__tasklet_action(struct softirq_action *a, struct tasklet_struct *list)
> > +{
> > +	int loops = 1000000;
> >  
> >  	while (list) {
> >  		struct tasklet_struct *t = list;
> >  
> >  		list = list->next;
> > +		/*
> > +		 * Should always succeed - after a tasklist got on the
> > +		 * list (after getting the SCHED bit set from 0 to 1),
> > +		 * nothing but the tasklet softirq it got queued to can
> > +		 * lock it:
> > +		 */
> > +		if (!tasklet_trylock(t)) {
> > +			WARN_ON(1);
> > +			continue;
> > +		}
> > +
> > +		t->next = NULL;
> > +
> > +		/*
> > +		 * If we cannot handle the tasklet because it's disabled,
> > +		 * mark it as pending. tasklet_enable() will later
> > +		 * re-schedule the tasklet.
> > +		 */
> > +		if (unlikely(atomic_read(&t->count))) {
> > +out_disabled:
> > +			/* implicit unlock: */
> > +			wmb();
> > +			t->state = TASKLET_STATEF_PENDING;
> 
> What if tasklet_enable() happens just before this line ?
> 
> After the next schedule_tasklet() we have all bits set: SCHED, RUN, PENDING.
> The next invocation of __tasklet_action() clears SCHED, but tasklet_tryunlock()
> below can never succeed because of PENDING.

Yep. I've only been focusing on races in schedule/action, as I've been
hunting issues w/ rcu. But I'll agree that the other state changes look
problematic. I know Paul McKenney was looking at some of the other state
changes and was seeing some potential problems as well.

thanks
-john


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()
@ 2007-06-15 15:52 Oleg Nesterov
  2007-06-15 21:51 ` john stultz
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2007-06-15 15:52 UTC (permalink / raw)
  To: john stultz
  Cc: Ingo Molnar, Paul E. McKenney, Steven Rostedt, Thomas Gleixner,
	linux-kernel

john stultz wrote:
>
> The following additional patch should correct this issue. Although since
> we weren't actually hitting it, the issue is a bit theoretical, so I've
> not been able to prove it really fixes anything.

Could you please look at the message below? I sent it privately near a month
ago, but I think these problems are not fixed yet.

Oleg.

---------------------------------------------------------------------------

> +__tasklet_action(struct softirq_action *a, struct tasklet_struct *list)
> +{
> +	int loops = 1000000;
>  
>  	while (list) {
>  		struct tasklet_struct *t = list;
>  
>  		list = list->next;
> +		/*
> +		 * Should always succeed - after a tasklist got on the
> +		 * list (after getting the SCHED bit set from 0 to 1),
> +		 * nothing but the tasklet softirq it got queued to can
> +		 * lock it:
> +		 */
> +		if (!tasklet_trylock(t)) {
> +			WARN_ON(1);
> +			continue;
> +		}
> +
> +		t->next = NULL;
> +
> +		/*
> +		 * If we cannot handle the tasklet because it's disabled,
> +		 * mark it as pending. tasklet_enable() will later
> +		 * re-schedule the tasklet.
> +		 */
> +		if (unlikely(atomic_read(&t->count))) {
> +out_disabled:
> +			/* implicit unlock: */
> +			wmb();
> +			t->state = TASKLET_STATEF_PENDING;

What if tasklet_enable() happens just before this line ?

After the next schedule_tasklet() we have all bits set: SCHED, RUN, PENDING.
The next invocation of __tasklet_action() clears SCHED, but tasklet_tryunlock()
below can never succeed because of PENDING.

> +again:
> +		t->func(t->data);
> +
> +		/*
> +		 * Try to unlock the tasklet. We must use cmpxchg, because
> +		 * another CPU might have scheduled or disabled the tasklet.
> +		 * We only allow the STATE_RUN -> 0 transition here.
> +		 */
> +		while (!tasklet_tryunlock(t)) {
> +			/*
> +			 * If it got disabled meanwhile, bail out:
> +			 */
> +			if (atomic_read(&t->count))
> +				goto out_disabled;
> +			/*
> +			 * If it got scheduled meanwhile, re-execute
> +			 * the tasklet function:
> +			 */
> +			if (test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
> +				goto again;

TASKLET_STATE_SCHED could be set by tasklet_kill(), in this case it is not nice
to call t->func() again (but probably harmless).


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-06-15 23:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-06  2:17 [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON() john stultz
2007-06-06  9:45 ` Ingo Molnar
2007-06-06 17:39   ` john stultz
2007-06-06 10:31 ` Jesper Juhl
2007-06-14 21:20 ` john stultz
2007-06-15 15:52 Oleg Nesterov
2007-06-15 21:51 ` john stultz
2007-06-15 23:59   ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).