linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* current_thread_info() vs task_thread_info(current)
@ 2011-07-18 11:23 Peter Zijlstra
  2011-07-18 11:36 ` Gleb Natapov
  2011-07-18 11:54 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2011-07-18 11:23 UTC (permalink / raw)
  To: linux-arch; +Cc: linux-kernel, Thomas Gleixner, Paul E. McKenney

Thomas just spend a lovely morning trying to make sense of a trace where
TIF_NEED_RESCHED wasn't set after resched_task() and magically appeared
after local_bh_enable().

What happened is that on that particular platform softirqs ran on a
separate stack, and current_thread_info() is relative to the stack
pointer.

The result is that current_thread_info() isn't the same as
task_thread_info(current), *surprise*!!

The immediate problem is of course that we can loose TIF flags when set
through current_thread_info() from IRQ/SoftIRQ context.

Now I was going to add a WARN() in x86_64's current_thread_info() to
catch all these, sadly x86_64's implementation isn't prone to this
particular issue, which means most people (kernel devs) will not be
affected (i386 is affected, but nobody sane uses that anymore).


Just to give an example, RCU uses set_need_resched(), set_need_resched()
uses current_thread_info(). The use in force_quiescent_state() is from
softirq afaict, the one in __rcu_pending() is from hardirq.

On such platforms as Thomas was playing on, the TIF bit will be lost,
since it will be set on the thread_info associated with some interrupt
stack, not the current process.


So how are we going to solve this? Naively I'd think that
current_thread_info() is short for task_thread_info(current), and thus
the platforms for where this isn't true are broken.

I mean, what use is the thread_info not of a thread?

Comments?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: current_thread_info() vs task_thread_info(current)
  2011-07-18 11:23 current_thread_info() vs task_thread_info(current) Peter Zijlstra
@ 2011-07-18 11:36 ` Gleb Natapov
  2011-07-18 11:44   ` Peter Zijlstra
  2011-07-18 11:54 ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 11+ messages in thread
From: Gleb Natapov @ 2011-07-18 11:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, Thomas Gleixner, Paul E. McKenney

On Mon, Jul 18, 2011 at 01:23:03PM +0200, Peter Zijlstra wrote:
> Thomas just spend a lovely morning trying to make sense of a trace where
> TIF_NEED_RESCHED wasn't set after resched_task() and magically appeared
> after local_bh_enable().
> 
> What happened is that on that particular platform softirqs ran on a
> separate stack, and current_thread_info() is relative to the stack
> pointer.
> 
> The result is that current_thread_info() isn't the same as
> task_thread_info(current), *surprise*!!
> 
> The immediate problem is of course that we can loose TIF flags when set
> through current_thread_info() from IRQ/SoftIRQ context.
> 
> Now I was going to add a WARN() in x86_64's current_thread_info() to
> catch all these, sadly x86_64's implementation isn't prone to this
> particular issue, which means most people (kernel devs) will not be
> affected (i386 is affected, but nobody sane uses that anymore).
> 
> 
> Just to give an example, RCU uses set_need_resched(), set_need_resched()
> uses current_thread_info(). The use in force_quiescent_state() is from
> softirq afaict, the one in __rcu_pending() is from hardirq.
> 
> On such platforms as Thomas was playing on, the TIF bit will be lost,
> since it will be set on the thread_info associated with some interrupt
> stack, not the current process.
> 
> 
> So how are we going to solve this? Naively I'd think that
> current_thread_info() is short for task_thread_info(current), and thus
> the platforms for where this isn't true are broken.
> 
> I mean, what use is the thread_info not of a thread?
> 
> Comments?

Why not use per cpu kernel_stack variable on all arches as x86_64 does?
How big the advantage of using stack pointer to find current thread info is?

--
			Gleb.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: current_thread_info() vs task_thread_info(current)
  2011-07-18 11:36 ` Gleb Natapov
@ 2011-07-18 11:44   ` Peter Zijlstra
  2011-07-18 11:48     ` Gleb Natapov
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2011-07-18 11:44 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: linux-arch, linux-kernel, Thomas Gleixner, Paul E. McKenney

On Mon, 2011-07-18 at 14:36 +0300, Gleb Natapov wrote:

> Why not use per cpu kernel_stack variable on all arches as x86_64 does?
> How big the advantage of using stack pointer to find current thread info is?

One less load I imagine.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: current_thread_info() vs task_thread_info(current)
  2011-07-18 11:44   ` Peter Zijlstra
@ 2011-07-18 11:48     ` Gleb Natapov
  0 siblings, 0 replies; 11+ messages in thread
From: Gleb Natapov @ 2011-07-18 11:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, Thomas Gleixner, Paul E. McKenney

On Mon, Jul 18, 2011 at 01:44:50PM +0200, Peter Zijlstra wrote:
> On Mon, 2011-07-18 at 14:36 +0300, Gleb Natapov wrote:
> 
> > Why not use per cpu kernel_stack variable on all arches as x86_64 does?
> > How big the advantage of using stack pointer to find current thread info is?
> 
> One less load I imagine.
Oh, yes of course. But what I mean is that if using kernel_stack is good
enough for most popular architecture (x86_64) may be it is good enough for
other architectures too?

--
			Gleb.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: current_thread_info() vs task_thread_info(current)
  2011-07-18 11:23 current_thread_info() vs task_thread_info(current) Peter Zijlstra
  2011-07-18 11:36 ` Gleb Natapov
@ 2011-07-18 11:54 ` Benjamin Herrenschmidt
  2011-07-18 14:37   ` Paul E. McKenney
  1 sibling, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2011-07-18 11:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, Thomas Gleixner, Paul E. McKenney

On Mon, 2011-07-18 at 13:23 +0200, Peter Zijlstra wrote:

> So how are we going to solve this? Naively I'd think that
> current_thread_info() is short for task_thread_info(current), and thus
> the platforms for where this isn't true are broken.
> 
> I mean, what use is the thread_info not of a thread?
> 
> Comments?

Thomas just hit a bug in the platform code of said platform (powerpc
heh ?) :-)

We do it right for hard IRQs and for some reason never did it right for
softirqs.

The code is like this for the former:

static inline void handle_one_irq(unsigned int irq)
{

        .../...

	call_handle_irq(irq, desc, irqtp, desc->handle_irq);
	current->thread.ksp_limit = saved_sp_limit;
	irqtp->task = NULL;

	/* Set any flag that may have been set on the
	 * alternate stack
	 */
	if (irqtp->flags)
		set_bits(irqtp->flags, &curtp->flags);
}

So what we need, I suppose is to add those two last line to
do_softirq_onstack() as well.

Now indeed i386 needs a similar treatment on both hard and soft
irqs (along with getting rid of that stupid duplication of
call_on_stack in there, I don't think it's worth making the code
horrible like that to save one clobber and PeterZ reckons we can
probably avoid it using always_inline anyways).

I'll let you guys sort i386 out tho, I'll look at fixing ppc tomorrow :-)

Cheers,
Ben.
 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: current_thread_info() vs task_thread_info(current)
  2011-07-18 11:54 ` Benjamin Herrenschmidt
@ 2011-07-18 14:37   ` Paul E. McKenney
  2011-07-18 21:36     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2011-07-18 14:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Zijlstra, linux-arch, linux-kernel, Thomas Gleixner

On Mon, Jul 18, 2011 at 09:54:57PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2011-07-18 at 13:23 +0200, Peter Zijlstra wrote:
> 
> > So how are we going to solve this? Naively I'd think that
> > current_thread_info() is short for task_thread_info(current), and thus
> > the platforms for where this isn't true are broken.
> > 
> > I mean, what use is the thread_info not of a thread?
> > 
> > Comments?
> 
> Thomas just hit a bug in the platform code of said platform (powerpc
> heh ?) :-)
> 
> We do it right for hard IRQs and for some reason never did it right for
> softirqs.
> 
> The code is like this for the former:
> 
> static inline void handle_one_irq(unsigned int irq)
> {
> 
>         .../...
> 
> 	call_handle_irq(irq, desc, irqtp, desc->handle_irq);
> 	current->thread.ksp_limit = saved_sp_limit;
> 	irqtp->task = NULL;
> 
> 	/* Set any flag that may have been set on the
> 	 * alternate stack
> 	 */
> 	if (irqtp->flags)
> 		set_bits(irqtp->flags, &curtp->flags);
> }
> 
> So what we need, I suppose is to add those two last line to
> do_softirq_onstack() as well.

Hmmm...  Would this explain preempt_count() inexplicably increasing by
three across a spin_unlock_irqrestore()?  I ran into this situation when
testing on Power over the weekend.

							Thanx, Paul

> Now indeed i386 needs a similar treatment on both hard and soft
> irqs (along with getting rid of that stupid duplication of
> call_on_stack in there, I don't think it's worth making the code
> horrible like that to save one clobber and PeterZ reckons we can
> probably avoid it using always_inline anyways).
> 
> I'll let you guys sort i386 out tho, I'll look at fixing ppc tomorrow :-)
> 
> Cheers,
> Ben.
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: current_thread_info() vs task_thread_info(current)
  2011-07-18 14:37   ` Paul E. McKenney
@ 2011-07-18 21:36     ` Benjamin Herrenschmidt
  2011-07-19  0:39       ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2011-07-18 21:36 UTC (permalink / raw)
  To: paulmck; +Cc: Peter Zijlstra, linux-arch, linux-kernel, Thomas Gleixner

On Mon, 2011-07-18 at 07:37 -0700, Paul E. McKenney wrote:
> On Mon, Jul 18, 2011 at 09:54:57PM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2011-07-18 at 13:23 +0200, Peter Zijlstra wrote:
> > 
> > > So how are we going to solve this? Naively I'd think that
> > > current_thread_info() is short for task_thread_info(current), and thus
> > > the platforms for where this isn't true are broken.
> > > 
> > > I mean, what use is the thread_info not of a thread?
> > > 
> > > Comments?
> > 
> > Thomas just hit a bug in the platform code of said platform (powerpc
> > heh ?) :-)
> > 
> > We do it right for hard IRQs and for some reason never did it right for
> > softirqs.
> > 
> > The code is like this for the former:
> > 
> > static inline void handle_one_irq(unsigned int irq)
> > {
> > 
> >         .../...
> > 
> > 	call_handle_irq(irq, desc, irqtp, desc->handle_irq);
> > 	current->thread.ksp_limit = saved_sp_limit;
> > 	irqtp->task = NULL;
> > 
> > 	/* Set any flag that may have been set on the
> > 	 * alternate stack
> > 	 */
> > 	if (irqtp->flags)
> > 		set_bits(irqtp->flags, &curtp->flags);
> > }
> > 
> > So what we need, I suppose is to add those two last line to
> > do_softirq_onstack() as well.
> 
> Hmmm...  Would this explain preempt_count() inexplicably increasing by
> three across a spin_unlock_irqrestore()?  I ran into this situation when
> testing on Power over the weekend.

Hrm, no I don't see that happening no. The preempt count when exiting an
irq or softirq stack should be the exact same as when entering it, which
is why we don't bother copying it over. Do you see any case where that
wouldn't hold ?

Cheers,
Ben.

> 							Thanx, Paul
> 
> > Now indeed i386 needs a similar treatment on both hard and soft
> > irqs (along with getting rid of that stupid duplication of
> > call_on_stack in there, I don't think it's worth making the code
> > horrible like that to save one clobber and PeterZ reckons we can
> > probably avoid it using always_inline anyways).
> > 
> > I'll let you guys sort i386 out tho, I'll look at fixing ppc tomorrow :-)
> > 
> > Cheers,
> > Ben.
> > 
> > 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: current_thread_info() vs task_thread_info(current)
  2011-07-18 21:36     ` Benjamin Herrenschmidt
@ 2011-07-19  0:39       ` Paul E. McKenney
  2011-07-19  0:57         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2011-07-19  0:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Zijlstra, linux-arch, linux-kernel, Thomas Gleixner

On Tue, Jul 19, 2011 at 07:36:23AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2011-07-18 at 07:37 -0700, Paul E. McKenney wrote:
> > On Mon, Jul 18, 2011 at 09:54:57PM +1000, Benjamin Herrenschmidt wrote:
> > > On Mon, 2011-07-18 at 13:23 +0200, Peter Zijlstra wrote:
> > > 
> > > > So how are we going to solve this? Naively I'd think that
> > > > current_thread_info() is short for task_thread_info(current), and thus
> > > > the platforms for where this isn't true are broken.
> > > > 
> > > > I mean, what use is the thread_info not of a thread?
> > > > 
> > > > Comments?
> > > 
> > > Thomas just hit a bug in the platform code of said platform (powerpc
> > > heh ?) :-)
> > > 
> > > We do it right for hard IRQs and for some reason never did it right for
> > > softirqs.
> > > 
> > > The code is like this for the former:
> > > 
> > > static inline void handle_one_irq(unsigned int irq)
> > > {
> > > 
> > >         .../...
> > > 
> > > 	call_handle_irq(irq, desc, irqtp, desc->handle_irq);
> > > 	current->thread.ksp_limit = saved_sp_limit;
> > > 	irqtp->task = NULL;
> > > 
> > > 	/* Set any flag that may have been set on the
> > > 	 * alternate stack
> > > 	 */
> > > 	if (irqtp->flags)
> > > 		set_bits(irqtp->flags, &curtp->flags);
> > > }
> > > 
> > > So what we need, I suppose is to add those two last line to
> > > do_softirq_onstack() as well.
> > 
> > Hmmm...  Would this explain preempt_count() inexplicably increasing by
> > three across a spin_unlock_irqrestore()?  I ran into this situation when
> > testing on Power over the weekend.
> 
> Hrm, no I don't see that happening no. The preempt count when exiting an
> irq or softirq stack should be the exact same as when entering it, which
> is why we don't bother copying it over. Do you see any case where that
> wouldn't hold ?

Nope, other than seeing preempt_count() transition from zero to three
across a spin_unlock_irqrestore() for no good reason that I could see.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: current_thread_info() vs task_thread_info(current)
  2011-07-19  0:39       ` Paul E. McKenney
@ 2011-07-19  0:57         ` Benjamin Herrenschmidt
  2011-07-19  3:04           ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2011-07-19  0:57 UTC (permalink / raw)
  To: paulmck; +Cc: Peter Zijlstra, linux-arch, linux-kernel, Thomas Gleixner

On Mon, 2011-07-18 at 17:39 -0700, Paul E. McKenney wrote:
> > Hrm, no I don't see that happening no. The preempt count when
> exiting an
> > irq or softirq stack should be the exact same as when entering it,
> which
> > is why we don't bother copying it over. Do you see any case where
> that
> > wouldn't hold ?
> 
> Nope, other than seeing preempt_count() transition from zero to three
> across a spin_unlock_irqrestore() for no good reason that I could see.

Do you have a nice repro-case ? :-)

That sounds really nasty ... smells really like something bad's
happening from an interrupt, but we don't copy back the preempt-count
from the interrupt stacks at all, so that's really really odd.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: current_thread_info() vs task_thread_info(current)
  2011-07-19  0:57         ` Benjamin Herrenschmidt
@ 2011-07-19  3:04           ` Paul E. McKenney
  2011-07-19  3:46             ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2011-07-19  3:04 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Zijlstra, linux-arch, linux-kernel, Thomas Gleixner

On Tue, Jul 19, 2011 at 10:57:37AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2011-07-18 at 17:39 -0700, Paul E. McKenney wrote:
> > > Hrm, no I don't see that happening no. The preempt count when
> > exiting an
> > > irq or softirq stack should be the exact same as when entering it,
> > which
> > > is why we don't bother copying it over. Do you see any case where
> > that
> > > wouldn't hold ?
> > 
> > Nope, other than seeing preempt_count() transition from zero to three
> > across a spin_unlock_irqrestore() for no good reason that I could see.
> 
> Do you have a nice repro-case ? :-)
> 
> That sounds really nasty ... smells really like something bad's
> happening from an interrupt, but we don't copy back the preempt-count
> from the interrupt stacks at all, so that's really really odd.

Good question...  The system I reproduced on four times over the
weekend is out of commission.  Trying the same test on another system
with a minimal patch -- will let you know how it goes.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: current_thread_info() vs task_thread_info(current)
  2011-07-19  3:04           ` Paul E. McKenney
@ 2011-07-19  3:46             ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2011-07-19  3:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Zijlstra, linux-arch, linux-kernel, Thomas Gleixner

On Mon, Jul 18, 2011 at 08:04:04PM -0700, Paul E. McKenney wrote:
> On Tue, Jul 19, 2011 at 10:57:37AM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2011-07-18 at 17:39 -0700, Paul E. McKenney wrote:
> > > > Hrm, no I don't see that happening no. The preempt count when
> > > exiting an
> > > > irq or softirq stack should be the exact same as when entering it,
> > > which
> > > > is why we don't bother copying it over. Do you see any case where
> > > that
> > > > wouldn't hold ?
> > > 
> > > Nope, other than seeing preempt_count() transition from zero to three
> > > across a spin_unlock_irqrestore() for no good reason that I could see.
> > 
> > Do you have a nice repro-case ? :-)
> > 
> > That sounds really nasty ... smells really like something bad's
> > happening from an interrupt, but we don't copy back the preempt-count
> > from the interrupt stacks at all, so that's really really odd.
> 
> Good question...  The system I reproduced on four times over the
> weekend is out of commission.  Trying the same test on another system
> with a minimal patch -- will let you know how it goes.

OK, the ABAT system trillian reproduces this.  Here is the repeat-by:

1.	Apply the patch shown below to 3.0-rc7.

2.	Build the system with the following config:

	CONFIG_RCU_TRACE=n
	CONFIG_NO_HZ=n
	CONFIG_SMP=y
	CONFIG_RCU_FANOUT=4
	CONFIG_NR_CPUS=8
	CONFIG_RCU_FANOUT_EXACT=n
	CONFIG_HOTPLUG_CPU=n
	CONFIG_SUSPEND=n
	CONFIG_HIBERNATION=n
	CONFIG_PREEMPT_NONE=n
	CONFIG_PREEMPT_VOLUNTARY=n
	CONFIG_PREEMPT=y
	#CHECK#CONFIG_TREE_PREEMPT_RCU=y
	CONFIG_RCU_TORTURE_TEST=m
	CONFIG_MODULE_UNLOAD=y
	CONFIG_SYSFS_DEPRECATED_V2=y
	CONFIG_IKCONFIG=y
	CONFIG_IKCONFIG_PROC=y
	CONFIG_PRINTK_TIME=y

3.	Boot the system and do the following:

	modprobe rcutorture torture_type=rcu_expedited stat_interval=30 test_no_idle_hz

4.	Within a few seconds (12.7 milliseconds on trillian), you
	should get something like the following in dmesg:

	rcu_expedited-torture:--- Start of test: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
	BUG: kernel/rcutree_plugin.h:802: preemption disabled
	BUG: scheduling while atomic: rcu_torture_fak/9422/0x00000003//825
	Modules linked in: rcutorture [last unloaded: scsi_wait_scan]
	Call Trace:
	[c0000000bdc93940] [c000000000015104] .show_stack+0x74/0x1c0 (unreliable)
	[c0000000bdc939f0] [c00000000007b1d8] .__schedule_bug+0x88/0x90
	[c0000000bdc93a70] [c000000000687f2c] .schedule+0x94c/0xa80
	[c0000000bdc93d30] [c000000000105e04] .synchronize_rcu_expedited+0x3a4/0x550
	[c0000000bdc93e20] [d000000001511a08] .rcu_torture_fakewriter+0xc8/0x190 [rcutorture]
	[c0000000bdc93ed0] [c0000000000b49dc] .kthread+0xbc/0xd0
	[c0000000bdc93f90] [c000000000021fe4] .kernel_thread+0x54/0x70

Lines 802 of kernel/rcutree_plugin.h is the one marked below with
/*-----*/:

	current->rcu_debug_loc = __LINE__;
	rcu_check_preempt(__LINE__);
	synchronize_sched_expedited();
	rcu_check_preempt(__LINE__);

	raw_spin_lock_irqsave(&rsp->onofflock, flags);

	/* Initialize ->expmask for all non-leaf rcu_node structures. */
	current->rcu_debug_loc = __LINE__;
	rcu_check_preempt(__LINE__);  /*-----*/
	rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) {
		raw_spin_lock(&rnp->lock); /* irqs already disabled. */
		rnp->expmask = rnp->qsmaskinit;
		raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
	}

So preempt_count() was zero just before the raw_spin_lock_irqsave()
and 3 just after.  The scheduling-while-atomic bug is from the
wait_event() a few lines further down.

							Thanx, Paul

------------------------------------------------------------------------

rcu: decrease rcu_report_exp_rnp coupling with scheduler

PREEMPT_RCU read-side critical sections blocking an expedited grace
period invoke rcu_report_exp_rnp().  When the last such critical section
has completed, rcu_report_exp_rnp() invokes the scheduler to wake up the
task that invoked synchronize_rcu_expedited() -- needlessly holding the
root rcu_node structure's lock while doing so, thus needlessly providing
a way for RCU and the scheduler to deadlock.

This commit therefore releases the root rcu_node structure's lock before
calling wake_up().  And also adds a bunch of diagnostics.

Reported-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 496770a..3501f3f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1258,6 +1258,7 @@ struct task_struct {
 #endif /* #ifdef CONFIG_PREEMPT_RCU */
 #ifdef CONFIG_TREE_PREEMPT_RCU
 	struct rcu_node *rcu_blocked_node;
+	int rcu_debug_loc;
 #endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
 #ifdef CONFIG_RCU_BOOST
 	struct rt_mutex *rcu_boost_mutex;
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 14dc7dd..e182224 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -696,8 +696,10 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	for (;;) {
 		if (!sync_rcu_preempt_exp_done(rnp))
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			break;
 		if (rnp->parent == NULL) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			wake_up(&sync_rcu_preempt_exp_wq);
 			break;
 		}
@@ -707,7 +709,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 		raw_spin_lock(&rnp->lock); /* irqs already disabled */
 		rnp->expmask &= ~mask;
 	}
-	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
 
 /*
@@ -735,6 +736,19 @@ sync_rcu_preempt_exp_init(struct rcu_state *rsp, struct rcu_node *rnp)
 		rcu_report_exp_rnp(rsp, rnp);
 }
 
+static inline void rcu_check_preempt(int lineno)
+{
+	static int warned;
+
+	if (warned)
+		return;
+	if (preempt_count() != 0) {
+		warned = 1;
+		printk(KERN_ERR "BUG: %s:%d: preemption disabled\n",
+		       __FILE__, lineno);
+	}
+}
+
 /*
  * Wait for an rcu-preempt grace period, but expedite it.  The basic idea
  * is to invoke synchronize_sched_expedited() to push all the tasks to
@@ -748,10 +762,13 @@ void synchronize_rcu_expedited(void)
 	long snap;
 	int trycount = 0;
 
+	rcu_check_preempt(__LINE__);
 	smp_mb(); /* Caller's modifications seen first by other CPUs. */
 	snap = ACCESS_ONCE(sync_rcu_preempt_exp_count) + 1;
 	smp_mb(); /* Above access cannot bleed into critical section. */
 
+	current->rcu_debug_loc = __LINE__;
+
 	/*
 	 * Acquire lock, falling back to synchronize_rcu() if too many
 	 * lock-acquisition failures.  Of course, if someone does the
@@ -761,6 +778,8 @@ void synchronize_rcu_expedited(void)
 		if (trycount++ < 10)
 			udelay(trycount * num_online_cpus());
 		else {
+			rcu_check_preempt(__LINE__);
+			current->rcu_debug_loc = __LINE__;
 			synchronize_rcu();
 			return;
 		}
@@ -771,27 +790,39 @@ void synchronize_rcu_expedited(void)
 		goto unlock_mb_ret; /* Others did our work for us. */
 
 	/* force all RCU readers onto ->blkd_tasks lists. */
+	current->rcu_debug_loc = __LINE__;
+	rcu_check_preempt(__LINE__);
 	synchronize_sched_expedited();
+	rcu_check_preempt(__LINE__);
 
 	raw_spin_lock_irqsave(&rsp->onofflock, flags);
 
 	/* Initialize ->expmask for all non-leaf rcu_node structures. */
+	current->rcu_debug_loc = __LINE__;
+	rcu_check_preempt(__LINE__);
 	rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) {
 		raw_spin_lock(&rnp->lock); /* irqs already disabled. */
 		rnp->expmask = rnp->qsmaskinit;
 		raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
 	}
+	rcu_check_preempt(__LINE__);
 
 	/* Snapshot current state of ->blkd_tasks lists. */
+	current->rcu_debug_loc = __LINE__;
+	rcu_check_preempt(__LINE__);
 	rcu_for_each_leaf_node(rsp, rnp)
 		sync_rcu_preempt_exp_init(rsp, rnp);
+	rcu_check_preempt(__LINE__);
 	if (NUM_RCU_NODES > 1)
 		sync_rcu_preempt_exp_init(rsp, rcu_get_root(rsp));
+	rcu_check_preempt(__LINE__);
 
 	raw_spin_unlock_irqrestore(&rsp->onofflock, flags);
 
 	/* Wait for snapshotted ->blkd_tasks lists to drain. */
+	rcu_check_preempt(__LINE__);
 	rnp = rcu_get_root(rsp);
+	current->rcu_debug_loc = __LINE__;
 	wait_event(sync_rcu_preempt_exp_wq,
 		   sync_rcu_preempt_exp_done(rnp));
 
@@ -802,6 +833,7 @@ unlock_mb_ret:
 	mutex_unlock(&sync_rcu_preempt_exp_mutex);
 mb_ret:
 	smp_mb(); /* ensure subsequent action seen after grace period. */
+	current->rcu_debug_loc = 0;
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
diff --git a/kernel/sched.c b/kernel/sched.c
index 9769c75..a37c9bf 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4135,6 +4135,8 @@ EXPORT_SYMBOL(sub_preempt_count);
 
 #endif
 
+DECLARE_PER_CPU(int, rcu_debug_loc);
+
 /*
  * Print scheduling while atomic bug:
  */
@@ -4142,8 +4144,9 @@ static noinline void __schedule_bug(struct task_struct *prev)
 {
 	struct pt_regs *regs = get_irq_regs();
 
-	printk(KERN_ERR "BUG: scheduling while atomic: %s/%d/0x%08x\n",
-		prev->comm, prev->pid, preempt_count());
+	printk(KERN_ERR "BUG: scheduling while atomic: %s/%d/0x%08x//%d\n",
+		prev->comm, prev->pid, preempt_count(),
+		current->rcu_debug_loc);
 
 	debug_show_held_locks(prev);
 	print_modules();

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-07-19  3:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-18 11:23 current_thread_info() vs task_thread_info(current) Peter Zijlstra
2011-07-18 11:36 ` Gleb Natapov
2011-07-18 11:44   ` Peter Zijlstra
2011-07-18 11:48     ` Gleb Natapov
2011-07-18 11:54 ` Benjamin Herrenschmidt
2011-07-18 14:37   ` Paul E. McKenney
2011-07-18 21:36     ` Benjamin Herrenschmidt
2011-07-19  0:39       ` Paul E. McKenney
2011-07-19  0:57         ` Benjamin Herrenschmidt
2011-07-19  3:04           ` Paul E. McKenney
2011-07-19  3:46             ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).