All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.0-git15 Atomic scheduling in pidmap_init
@ 2011-08-01 15:46 Josh Boyer
  2011-08-04 11:46 ` Josh Boyer
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-01 15:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton

We're seeing a scheduling while atomic backtrace in rawhide from pidmap_init
(https://bugzilla.redhat.com/show_bug.cgi?id=726877).  While this seems
mostly harmless given that there isn't anything else to schedule to at
this point, I do wonder why things are marked as needing rescheduled so
early.

We get to might_sleep through the might_sleep_if call in
slab_pre_alloc_hook because both kzalloc and KMEM_CACHE are called with
GFP_KERNEL.  That eventually has a call chain like:

might_resched->_cond_resched->should_resched

which apparently returns true.  Why the initial thread says it should
reschedule at this point, I'm not sure.

I tried cheating by making the kzalloc call in pidmap_init use GFP_IOFS
instead of GFP_KERNEL to avoid the might_sleep_if call, and that worked
but I can't do the same for the kmalloc calls in kmem_cache_create, so
getting to the bottom of why should_resched is returning true seems to
be a better approach.

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-01 15:46 3.0-git15 Atomic scheduling in pidmap_init Josh Boyer
@ 2011-08-04 11:46 ` Josh Boyer
  2011-08-04 14:04   ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-04 11:46 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linux-kernel, Andrew Morton, Paul E. McKenney

On Mon, Aug 1, 2011 at 11:46 AM, Josh Boyer <jwboyer@redhat.com> wrote:
> We're seeing a scheduling while atomic backtrace in rawhide from pidmap_init
> (https://bugzilla.redhat.com/show_bug.cgi?id=726877).  While this seems
> mostly harmless given that there isn't anything else to schedule to at
> this point, I do wonder why things are marked as needing rescheduled so
> early.
>
> We get to might_sleep through the might_sleep_if call in
> slab_pre_alloc_hook because both kzalloc and KMEM_CACHE are called with
> GFP_KERNEL.  That eventually has a call chain like:
>
> might_resched->_cond_resched->should_resched
>
> which apparently returns true.  Why the initial thread says it should
> reschedule at this point, I'm not sure.
>
> I tried cheating by making the kzalloc call in pidmap_init use GFP_IOFS
> instead of GFP_KERNEL to avoid the might_sleep_if call, and that worked
> but I can't do the same for the kmalloc calls in kmem_cache_create, so
> getting to the bottom of why should_resched is returning true seems to
> be a better approach.

A bit more info on this.

What seems to be happening is that late_time_init is called, which
gets around to calling hpet_time_init, which enables the HPET, and
then calls setup_default_timer_irq.  setup_default_timer_irq in
arch/x86/kernel/time.c calls setup_irq with the timer_interrupt
handler.

At this point the timer interrupt hits, and then tick_handle_periodic is called

timer int
tick_handle_periodic -> tick_periodic -> update_process_times ->
rcu_check_callbacks -> rcu_pending ->
__rcp_pending -> set_need_resched (this is called around line 1685 in
kernel/rcutree.c)

So what's happening is that once the timer interrupt starts, RCU is
coming in and marking current as needing reschedule, and that in turn
makes the slab_pre_alloc_hook -> might_sleep_if -> might_sleep ->
might_resched -> _cond_resched to trigger when pidmap_init calls
kzalloc later on and produce the oops below later on in the init
sequence.  I believe we see this because of all the debugging options
we have enabled in the kernel configs.

This might be normal for all I know, but the oops is rather annoying.
It seems RCU isn't in a quiescent state, we aren't preemptible yet,
and it _really_ wants to reschedule things to make itself happy.
Anyone have any thoughts on how to either keep RCU from marking
current as needing reschdule so early, or otherwise working around the
bug?

josh

[    0.007005] pid_max: default: 32768 minimum: 301
[    0.008619] BUG: scheduling while atomic: swapper/0/0x10000002
[    0.009004] no locks held by swapper/0.
[    0.010005] Modules linked in:
[    0.010628] Pid: 0, comm: swapper Not tainted
3.1.0-0.rc0.git18.1.fc17.x86_64 #17
[    0.011004] Call Trace:
[    0.012009]  [<ffffffff814e5c9e>] __schedule_bug+0x75/0x7a
[    0.013011]  [<ffffffff814edced>] schedule+0x95/0x7bb
[    0.013621]  [<ffffffff81077199>] ? kzalloc.constprop.3+0x29/0x2b
[    0.014007]  [<ffffffff81056cbf>] __cond_resched+0x28/0x34
[    0.015006]  [<ffffffff814ee62b>] _cond_resched+0x19/0x22
[    0.015630]  [<ffffffff8112d1b1>] slab_pre_alloc_hook+0x3b/0x54
[    0.016006]  [<ffffffff8112f9ea>] kmem_cache_alloc_trace+0x29/0xca
[    0.017006]  [<ffffffff81077199>] kzalloc.constprop.3+0x29/0x2b
[    0.017646]  [<ffffffff81d6793b>] pidmap_init+0xb7/0xf6
[    0.018007]  [<ffffffff81d4f05a>] start_kernel+0x898/0x92f
[    0.019006]  [<ffffffff81d4e2c4>] x86_64_start_reservations+0xaf/0xb3
[    0.019653]  [<ffffffff81d4e140>] ? early_idt_handlers+0x140/0x140
[    0.020006]  [<ffffffff81d4e3ca>] x86_64_start_kernel+0x102/0x111
[    0.021288] Security Framework initialized
[    0.022011] SELinux:  Initializing.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-04 11:46 ` Josh Boyer
@ 2011-08-04 14:04   ` Paul E. McKenney
  2011-08-04 15:06     ` Josh Boyer
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-04 14:04 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Josh Boyer, linux-kernel, Andrew Morton

On Thu, Aug 04, 2011 at 07:46:03AM -0400, Josh Boyer wrote:
> On Mon, Aug 1, 2011 at 11:46 AM, Josh Boyer <jwboyer@redhat.com> wrote:
> > We're seeing a scheduling while atomic backtrace in rawhide from pidmap_init
> > (https://bugzilla.redhat.com/show_bug.cgi?id=726877).  While this seems
> > mostly harmless given that there isn't anything else to schedule to at
> > this point, I do wonder why things are marked as needing rescheduled so
> > early.
> >
> > We get to might_sleep through the might_sleep_if call in
> > slab_pre_alloc_hook because both kzalloc and KMEM_CACHE are called with
> > GFP_KERNEL.  That eventually has a call chain like:
> >
> > might_resched->_cond_resched->should_resched
> >
> > which apparently returns true.  Why the initial thread says it should
> > reschedule at this point, I'm not sure.
> >
> > I tried cheating by making the kzalloc call in pidmap_init use GFP_IOFS
> > instead of GFP_KERNEL to avoid the might_sleep_if call, and that worked
> > but I can't do the same for the kmalloc calls in kmem_cache_create, so
> > getting to the bottom of why should_resched is returning true seems to
> > be a better approach.
> 
> A bit more info on this.
> 
> What seems to be happening is that late_time_init is called, which
> gets around to calling hpet_time_init, which enables the HPET, and
> then calls setup_default_timer_irq.  setup_default_timer_irq in
> arch/x86/kernel/time.c calls setup_irq with the timer_interrupt
> handler.
> 
> At this point the timer interrupt hits, and then tick_handle_periodic is called
> 
> timer int
> tick_handle_periodic -> tick_periodic -> update_process_times ->
> rcu_check_callbacks -> rcu_pending ->
> __rcp_pending -> set_need_resched (this is called around line 1685 in
> kernel/rcutree.c)
> 
> So what's happening is that once the timer interrupt starts, RCU is
> coming in and marking current as needing reschedule, and that in turn
> makes the slab_pre_alloc_hook -> might_sleep_if -> might_sleep ->
> might_resched -> _cond_resched to trigger when pidmap_init calls
> kzalloc later on and produce the oops below later on in the init
> sequence.  I believe we see this because of all the debugging options
> we have enabled in the kernel configs.
> 
> This might be normal for all I know, but the oops is rather annoying.
> It seems RCU isn't in a quiescent state, we aren't preemptible yet,
> and it _really_ wants to reschedule things to make itself happy.
> Anyone have any thoughts on how to either keep RCU from marking
> current as needing reschdule so early, or otherwise working around the
> bug?

The deal is that RCU realizes that RCU needs a quiescent state from
this CPU.  The set_need_resched() is intended to cause one.  But there
is not much point this early in boot, because the scheduler isn't going
to do anything anyway.  I can prevent this with the following patch,
but isn't this same thing possible later at runtime?

You really do need to be able to handle set_need_resched() at random
times, and at first glance it appears to me that the warning could be
triggered at runtime as well.  If so, the real fix is elsewhere, right?
Especially given that the patch imposes extra cost at runtime...

							Thanx, Paul

------------------------------------------------------------------------

rcu: Prevent early boot set_need_resched() from __rcu_pending()

There isn't a whole lot of point in poking the scheduler before there
are other tasks to switch to.  This commit therefore adds a check
for rcu_scheduler_fully_active in __rcu_pending() to suppress any
pre-scheduler calls to set_need_resched().  The downside of this approach
is additional runtime overhead in a reasonably hot code path.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index d22dccb..9ccd19e 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1724,7 +1724,8 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
 	check_cpu_stall(rsp, rdp);
 
 	/* Is the RCU core waiting for a quiescent state from this CPU? */
-	if (rdp->qs_pending && !rdp->passed_quiesce) {
+	if (rcu_scheduler_fully_active &&
+	    rdp->qs_pending && !rdp->passed_quiesce) {
 
 		/*
 		 * If force_quiescent_state() coming soon and this CPU

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-04 14:04   ` Paul E. McKenney
@ 2011-08-04 15:06     ` Josh Boyer
  2011-08-04 16:26       ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-04 15:06 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, Andrew Morton

On Thu, Aug 04, 2011 at 07:04:38AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 04, 2011 at 07:46:03AM -0400, Josh Boyer wrote:
> > On Mon, Aug 1, 2011 at 11:46 AM, Josh Boyer <jwboyer@redhat.com> wrote:
> > > We're seeing a scheduling while atomic backtrace in rawhide from pidmap_init
> > > (https://bugzilla.redhat.com/show_bug.cgi?id=726877).  While this seems
> > > mostly harmless given that there isn't anything else to schedule to at
> > > this point, I do wonder why things are marked as needing rescheduled so
> > > early.
> > >
> > > We get to might_sleep through the might_sleep_if call in
> > > slab_pre_alloc_hook because both kzalloc and KMEM_CACHE are called with
> > > GFP_KERNEL.  That eventually has a call chain like:
> > >
> > > might_resched->_cond_resched->should_resched
> > >
> > > which apparently returns true.  Why the initial thread says it should
> > > reschedule at this point, I'm not sure.
> > >
> > > I tried cheating by making the kzalloc call in pidmap_init use GFP_IOFS
> > > instead of GFP_KERNEL to avoid the might_sleep_if call, and that worked
> > > but I can't do the same for the kmalloc calls in kmem_cache_create, so
> > > getting to the bottom of why should_resched is returning true seems to
> > > be a better approach.
> > 
> > A bit more info on this.
> > 
> > What seems to be happening is that late_time_init is called, which
> > gets around to calling hpet_time_init, which enables the HPET, and
> > then calls setup_default_timer_irq.  setup_default_timer_irq in
> > arch/x86/kernel/time.c calls setup_irq with the timer_interrupt
> > handler.
> > 
> > At this point the timer interrupt hits, and then tick_handle_periodic is called
> > 
> > timer int
> > tick_handle_periodic -> tick_periodic -> update_process_times ->
> > rcu_check_callbacks -> rcu_pending ->
> > __rcp_pending -> set_need_resched (this is called around line 1685 in
> > kernel/rcutree.c)
> > 
> > So what's happening is that once the timer interrupt starts, RCU is
> > coming in and marking current as needing reschedule, and that in turn
> > makes the slab_pre_alloc_hook -> might_sleep_if -> might_sleep ->
> > might_resched -> _cond_resched to trigger when pidmap_init calls
> > kzalloc later on and produce the oops below later on in the init
> > sequence.  I believe we see this because of all the debugging options
> > we have enabled in the kernel configs.
> > 
> > This might be normal for all I know, but the oops is rather annoying.
> > It seems RCU isn't in a quiescent state, we aren't preemptible yet,
> > and it _really_ wants to reschedule things to make itself happy.
> > Anyone have any thoughts on how to either keep RCU from marking
> > current as needing reschdule so early, or otherwise working around the
> > bug?
> 
> The deal is that RCU realizes that RCU needs a quiescent state from
> this CPU.  The set_need_resched() is intended to cause one.  But there
> is not much point this early in boot, because the scheduler isn't going
> to do anything anyway.  I can prevent this with the following patch,
> but isn't this same thing possible later at runtime?

Possibly, but I'm not sure at the moment.  The patch avoids the oops and
I haven't seen another once in some brief runtime testing.  (Trivial
fixing to make it apply to current linus.)

> You really do need to be able to handle set_need_resched() at random
> times, and at first glance it appears to me that the warning could be
> triggered at runtime as well.  If so, the real fix is elsewhere, right?
> Especially given that the patch imposes extra cost at runtime...

In staring at it for a while it seems to be a combination of being in
atomic context according to the scheduler but things in early boot using
GFP_KERNEL.  At the point we're at in the boot, that is perfectly legal
as it's not being called from an interrupt handler and the mm subsystem
should be all setup, but we're still really early in boot and preempt is
disabled.  As I mentioned before, KMEM_CACHE calls kmalloc with
GFP_KERNEL and I don't think we want to change that.

Once we're past early boot, I would expect that things running in true
atomic context won't be calling KMEM_CACHE or using GFP_KERNEL.  Or
maybe I hope?

I understand the desire to avoid another conditional, but I certainly
don't have any other suggestions at the moment.

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-04 15:06     ` Josh Boyer
@ 2011-08-04 16:26       ` Paul E. McKenney
  2011-08-04 17:31         ` Josh Boyer
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-04 16:26 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linux-kernel, Andrew Morton

On Thu, Aug 04, 2011 at 11:06:03AM -0400, Josh Boyer wrote:
> On Thu, Aug 04, 2011 at 07:04:38AM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 04, 2011 at 07:46:03AM -0400, Josh Boyer wrote:
> > > On Mon, Aug 1, 2011 at 11:46 AM, Josh Boyer <jwboyer@redhat.com> wrote:
> > > > We're seeing a scheduling while atomic backtrace in rawhide from pidmap_init
> > > > (https://bugzilla.redhat.com/show_bug.cgi?id=726877).  While this seems
> > > > mostly harmless given that there isn't anything else to schedule to at
> > > > this point, I do wonder why things are marked as needing rescheduled so
> > > > early.
> > > >
> > > > We get to might_sleep through the might_sleep_if call in
> > > > slab_pre_alloc_hook because both kzalloc and KMEM_CACHE are called with
> > > > GFP_KERNEL.  That eventually has a call chain like:
> > > >
> > > > might_resched->_cond_resched->should_resched
> > > >
> > > > which apparently returns true.  Why the initial thread says it should
> > > > reschedule at this point, I'm not sure.
> > > >
> > > > I tried cheating by making the kzalloc call in pidmap_init use GFP_IOFS
> > > > instead of GFP_KERNEL to avoid the might_sleep_if call, and that worked
> > > > but I can't do the same for the kmalloc calls in kmem_cache_create, so
> > > > getting to the bottom of why should_resched is returning true seems to
> > > > be a better approach.
> > > 
> > > A bit more info on this.
> > > 
> > > What seems to be happening is that late_time_init is called, which
> > > gets around to calling hpet_time_init, which enables the HPET, and
> > > then calls setup_default_timer_irq.  setup_default_timer_irq in
> > > arch/x86/kernel/time.c calls setup_irq with the timer_interrupt
> > > handler.
> > > 
> > > At this point the timer interrupt hits, and then tick_handle_periodic is called
> > > 
> > > timer int
> > > tick_handle_periodic -> tick_periodic -> update_process_times ->
> > > rcu_check_callbacks -> rcu_pending ->
> > > __rcp_pending -> set_need_resched (this is called around line 1685 in
> > > kernel/rcutree.c)
> > > 
> > > So what's happening is that once the timer interrupt starts, RCU is
> > > coming in and marking current as needing reschedule, and that in turn
> > > makes the slab_pre_alloc_hook -> might_sleep_if -> might_sleep ->
> > > might_resched -> _cond_resched to trigger when pidmap_init calls
> > > kzalloc later on and produce the oops below later on in the init
> > > sequence.  I believe we see this because of all the debugging options
> > > we have enabled in the kernel configs.
> > > 
> > > This might be normal for all I know, but the oops is rather annoying.
> > > It seems RCU isn't in a quiescent state, we aren't preemptible yet,
> > > and it _really_ wants to reschedule things to make itself happy.
> > > Anyone have any thoughts on how to either keep RCU from marking
> > > current as needing reschdule so early, or otherwise working around the
> > > bug?
> > 
> > The deal is that RCU realizes that RCU needs a quiescent state from
> > this CPU.  The set_need_resched() is intended to cause one.  But there
> > is not much point this early in boot, because the scheduler isn't going
> > to do anything anyway.  I can prevent this with the following patch,
> > but isn't this same thing possible later at runtime?
> 
> Possibly, but I'm not sure at the moment.  The patch avoids the oops and
> I haven't seen another once in some brief runtime testing.  (Trivial
> fixing to make it apply to current linus.)
> 
> > You really do need to be able to handle set_need_resched() at random
> > times, and at first glance it appears to me that the warning could be
> > triggered at runtime as well.  If so, the real fix is elsewhere, right?
> > Especially given that the patch imposes extra cost at runtime...
> 
> In staring at it for a while it seems to be a combination of being in
> atomic context according to the scheduler but things in early boot using
> GFP_KERNEL.  At the point we're at in the boot, that is perfectly legal
> as it's not being called from an interrupt handler and the mm subsystem
> should be all setup, but we're still really early in boot and preempt is
> disabled.

Isn't preemption disabled at that point in boot?  And isn't GFP_KERNEL
illegal within preemption-disabled regions?

>            As I mentioned before, KMEM_CACHE calls kmalloc with
> GFP_KERNEL and I don't think we want to change that.
> 
> Once we're past early boot, I would expect that things running in true
> atomic context won't be calling KMEM_CACHE or using GFP_KERNEL.  Or
> maybe I hope?
> 
> I understand the desire to avoid another conditional, but I certainly
> don't have any other suggestions at the moment.

How about doing GFP_ATOMIC on allocations done during that portion
of the boot patch for which preemption is disabled?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-04 16:26       ` Paul E. McKenney
@ 2011-08-04 17:31         ` Josh Boyer
  2011-08-05  1:19           ` Josh Boyer
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-04 17:31 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, Andrew Morton

On Thu, Aug 04, 2011 at 09:26:58AM -0700, Paul E. McKenney wrote:
> > > You really do need to be able to handle set_need_resched() at random
> > > times, and at first glance it appears to me that the warning could be
> > > triggered at runtime as well.  If so, the real fix is elsewhere, right?
> > > Especially given that the patch imposes extra cost at runtime...
> > 
> > In staring at it for a while it seems to be a combination of being in
> > atomic context according to the scheduler but things in early boot using
> > GFP_KERNEL.  At the point we're at in the boot, that is perfectly legal
> > as it's not being called from an interrupt handler and the mm subsystem
> > should be all setup, but we're still really early in boot and preempt is
> > disabled.
> 
> Isn't preemption disabled at that point in boot?  And isn't GFP_KERNEL
> illegal within preemption-disabled regions?

Yes, it's disabled.  I'm not sure if it's illegal or not.  pidmap_init
is called from start_kernel on line 598 of main.c.  local_irq_enable is
called on line 553, followed immediately by this comment:

	/* Interrupts are enabled now so all GFP allocations are safe. */
	gfp_allowed_mask = __GFP_BITS_MASK;

	kmem_cache_init_late();

So the comments there lead me to think I have no clue :).  That's mostly
why I'm asking for feedback here.  I don't have a huge amount of
experience in what is and isn't allowed in the early bootup path.

> >            As I mentioned before, KMEM_CACHE calls kmalloc with
> > GFP_KERNEL and I don't think we want to change that.
> > 
> > Once we're past early boot, I would expect that things running in true
> > atomic context won't be calling KMEM_CACHE or using GFP_KERNEL.  Or
> > maybe I hope?
> > 
> > I understand the desire to avoid another conditional, but I certainly
> > don't have any other suggestions at the moment.
> 
> How about doing GFP_ATOMIC on allocations done during that portion
> of the boot patch for which preemption is disabled?

Well, in the pidmap_init case there are two spots relevant to this.  The
first is the kzalloc call on line 562 of kernel/pid.c.  I could change
that to use GFP_IOFS even, and it avoids the backtrace from there. (And
I did that originally.)

However, the call on line 567 to KMEM_CACHE calls into
kmem_cache_create.  There is a flags variable, but it's for slab flags,
and kmem_cache_create calls kmalloc internally with GFP_KERNEL.  I don't
see kmem_cache_create_atomic or otherwise that would avoid this.  None
of that code is new either, most if it dating back to 2008.

The same issue exists in some of the next functions called in
start_kernel, like anon_vma_init, cred_init, fork_init, etc.  They all
call kmem_cache_create.

I could be missing something obvious, but I don't see a way to avoid
using GFP_KERNEL without a lot of rip-up in the rest of the init path.

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-04 17:31         ` Josh Boyer
@ 2011-08-05  1:19           ` Josh Boyer
  2011-08-05  6:56             ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-05  1:19 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, Andrew Morton

On Thu, Aug 4, 2011 at 1:31 PM, Josh Boyer <jwboyer@redhat.com> wrote:
> On Thu, Aug 04, 2011 at 09:26:58AM -0700, Paul E. McKenney wrote:
>> > > You really do need to be able to handle set_need_resched() at random
>> > > times, and at first glance it appears to me that the warning could be
>> > > triggered at runtime as well.  If so, the real fix is elsewhere, right?
>> > > Especially given that the patch imposes extra cost at runtime...
>> >
>> > In staring at it for a while it seems to be a combination of being in
>> > atomic context according to the scheduler but things in early boot using
>> > GFP_KERNEL.  At the point we're at in the boot, that is perfectly legal
>> > as it's not being called from an interrupt handler and the mm subsystem
>> > should be all setup, but we're still really early in boot and preempt is
>> > disabled.
>>
>> Isn't preemption disabled at that point in boot?  And isn't GFP_KERNEL
>> illegal within preemption-disabled regions?
>
> Yes, it's disabled.  I'm not sure if it's illegal or not.  pidmap_init
> is called from start_kernel on line 598 of main.c.  local_irq_enable is
> called on line 553, followed immediately by this comment:
>
>        /* Interrupts are enabled now so all GFP allocations are safe. */
>        gfp_allowed_mask = __GFP_BITS_MASK;
>
>        kmem_cache_init_late();
>
> So the comments there lead me to think I have no clue :).  That's mostly
> why I'm asking for feedback here.  I don't have a huge amount of
> experience in what is and isn't allowed in the early bootup path.
>
>> >            As I mentioned before, KMEM_CACHE calls kmalloc with
>> > GFP_KERNEL and I don't think we want to change that.
>> >
>> > Once we're past early boot, I would expect that things running in true
>> > atomic context won't be calling KMEM_CACHE or using GFP_KERNEL.  Or
>> > maybe I hope?
>> >
>> > I understand the desire to avoid another conditional, but I certainly
>> > don't have any other suggestions at the moment.
>>
>> How about doing GFP_ATOMIC on allocations done during that portion
>> of the boot patch for which preemption is disabled?
>
> Well, in the pidmap_init case there are two spots relevant to this.  The
> first is the kzalloc call on line 562 of kernel/pid.c.  I could change
> that to use GFP_IOFS even, and it avoids the backtrace from there. (And
> I did that originally.)
>
> However, the call on line 567 to KMEM_CACHE calls into
> kmem_cache_create.  There is a flags variable, but it's for slab flags,
> and kmem_cache_create calls kmalloc internally with GFP_KERNEL.  I don't
> see kmem_cache_create_atomic or otherwise that would avoid this.  None
> of that code is new either, most if it dating back to 2008.
>
> The same issue exists in some of the next functions called in
> start_kernel, like anon_vma_init, cred_init, fork_init, etc.  They all
> call kmem_cache_create.
>
> I could be missing something obvious, but I don't see a way to avoid
> using GFP_KERNEL without a lot of rip-up in the rest of the init path.

As an aside, I bisected this back to:

e8f7c70f44f sched: Make sleeping inside spinlock detection working in
!CONFIG_PREEMPT

However, that doesn't seem all that helpful.  The
CONFIG_DEBUG_SPINLOCK_SLEEP option later got renamed to
DEBUG_ATOMIC_SLEEP, and all it's doing is selecting PREEMPT_COUNT.  At
first glance, it seems this commit just allowed an issue that's been
around for a while (benign or otherwise) to finally show up.

(The Fedora kernel configs have CONFIG_PREEMPT_VOLUNTARY set, but not
CONFIG_PREEMPT so PREEMPT_COUNT wasn't getting selected until this
option did so.)

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-05  1:19           ` Josh Boyer
@ 2011-08-05  6:56             ` Paul E. McKenney
  2011-08-05 14:22               ` Josh Boyer
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-05  6:56 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linux-kernel, Andrew Morton, fweisbec

On Thu, Aug 04, 2011 at 09:19:31PM -0400, Josh Boyer wrote:
> On Thu, Aug 4, 2011 at 1:31 PM, Josh Boyer <jwboyer@redhat.com> wrote:
> > On Thu, Aug 04, 2011 at 09:26:58AM -0700, Paul E. McKenney wrote:
> >> > > You really do need to be able to handle set_need_resched() at random
> >> > > times, and at first glance it appears to me that the warning could be
> >> > > triggered at runtime as well.  If so, the real fix is elsewhere, right?
> >> > > Especially given that the patch imposes extra cost at runtime...
> >> >
> >> > In staring at it for a while it seems to be a combination of being in
> >> > atomic context according to the scheduler but things in early boot using
> >> > GFP_KERNEL.  At the point we're at in the boot, that is perfectly legal
> >> > as it's not being called from an interrupt handler and the mm subsystem
> >> > should be all setup, but we're still really early in boot and preempt is
> >> > disabled.
> >>
> >> Isn't preemption disabled at that point in boot?  And isn't GFP_KERNEL
> >> illegal within preemption-disabled regions?
> >
> > Yes, it's disabled.  I'm not sure if it's illegal or not.  pidmap_init
> > is called from start_kernel on line 598 of main.c.  local_irq_enable is
> > called on line 553, followed immediately by this comment:
> >
> >        /* Interrupts are enabled now so all GFP allocations are safe. */
> >        gfp_allowed_mask = __GFP_BITS_MASK;
> >
> >        kmem_cache_init_late();
> >
> > So the comments there lead me to think I have no clue :).  That's mostly
> > why I'm asking for feedback here.  I don't have a huge amount of
> > experience in what is and isn't allowed in the early bootup path.
> >
> >> >            As I mentioned before, KMEM_CACHE calls kmalloc with
> >> > GFP_KERNEL and I don't think we want to change that.
> >> >
> >> > Once we're past early boot, I would expect that things running in true
> >> > atomic context won't be calling KMEM_CACHE or using GFP_KERNEL.  Or
> >> > maybe I hope?
> >> >
> >> > I understand the desire to avoid another conditional, but I certainly
> >> > don't have any other suggestions at the moment.
> >>
> >> How about doing GFP_ATOMIC on allocations done during that portion
> >> of the boot patch for which preemption is disabled?
> >
> > Well, in the pidmap_init case there are two spots relevant to this.  The
> > first is the kzalloc call on line 562 of kernel/pid.c.  I could change
> > that to use GFP_IOFS even, and it avoids the backtrace from there. (And
> > I did that originally.)
> >
> > However, the call on line 567 to KMEM_CACHE calls into
> > kmem_cache_create.  There is a flags variable, but it's for slab flags,
> > and kmem_cache_create calls kmalloc internally with GFP_KERNEL.  I don't
> > see kmem_cache_create_atomic or otherwise that would avoid this.  None
> > of that code is new either, most if it dating back to 2008.
> >
> > The same issue exists in some of the next functions called in
> > start_kernel, like anon_vma_init, cred_init, fork_init, etc.  They all
> > call kmem_cache_create.
> >
> > I could be missing something obvious, but I don't see a way to avoid
> > using GFP_KERNEL without a lot of rip-up in the rest of the init path.
> 
> As an aside, I bisected this back to:
> 
> e8f7c70f44f sched: Make sleeping inside spinlock detection working in
> !CONFIG_PREEMPT

OK, added Frederic on CC.

> However, that doesn't seem all that helpful.  The
> CONFIG_DEBUG_SPINLOCK_SLEEP option later got renamed to
> DEBUG_ATOMIC_SLEEP, and all it's doing is selecting PREEMPT_COUNT.  At
> first glance, it seems this commit just allowed an issue that's been
> around for a while (benign or otherwise) to finally show up.
> 
> (The Fedora kernel configs have CONFIG_PREEMPT_VOLUNTARY set, but not
> CONFIG_PREEMPT so PREEMPT_COUNT wasn't getting selected until this
> option did so.)

Understood.  So my question is "what is the real way to fix this?"
Within RCU, I would probably wrapper the calls to set_need_resched()
so that it checks for the scheduler being fully alive.  Except for the
call from rcu_enter_nohz(), of course -- if that one is called before
the scheduler is ready, then that is a bug that needs to be fixed.

Nevertheless, I am wondering if all of this isn't really papering over
some real problem somewhere.  The way we get to this place is from people
registering RCU callbacks during early boot, which is OK in and of itself,
at least in moderation.  But if someone is expecting those callbacks to be
invoked before the scheduler is fully set up and running multiple tasks,
they are going to be disappointed.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-05  6:56             ` Paul E. McKenney
@ 2011-08-05 14:22               ` Josh Boyer
  2011-08-05 17:08                 ` Frederic Weisbecker
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-05 14:22 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, Andrew Morton, fweisbec

On Thu, Aug 04, 2011 at 11:56:46PM -0700, Paul E. McKenney wrote:
> > > I could be missing something obvious, but I don't see a way to avoid
> > > using GFP_KERNEL without a lot of rip-up in the rest of the init path.
> > 
> > As an aside, I bisected this back to:
> > 
> > e8f7c70f44f sched: Make sleeping inside spinlock detection working in
> > !CONFIG_PREEMPT
> 
> OK, added Frederic on CC.
> 
> > However, that doesn't seem all that helpful.  The
> > CONFIG_DEBUG_SPINLOCK_SLEEP option later got renamed to
> > DEBUG_ATOMIC_SLEEP, and all it's doing is selecting PREEMPT_COUNT.  At
> > first glance, it seems this commit just allowed an issue that's been
> > around for a while (benign or otherwise) to finally show up.
> > 
> > (The Fedora kernel configs have CONFIG_PREEMPT_VOLUNTARY set, but not
> > CONFIG_PREEMPT so PREEMPT_COUNT wasn't getting selected until this
> > option did so.)
> 
> Understood.  So my question is "what is the real way to fix this?"
> Within RCU, I would probably wrapper the calls to set_need_resched()
> so that it checks for the scheduler being fully alive.  Except for the
> call from rcu_enter_nohz(), of course -- if that one is called before
> the scheduler is ready, then that is a bug that needs to be fixed.

By scheduler being fully alive, do you mean when rcu_scheduler_starting
is called?  Or do you mean the actual scheduler, because sched_init is
called well before any of this happens.

> Nevertheless, I am wondering if all of this isn't really papering over
> some real problem somewhere.  The way we get to this place is from people
> registering RCU callbacks during early boot, which is OK in and of itself,
> at least in moderation.  But if someone is expecting those callbacks to be
> invoked before the scheduler is fully set up and running multiple tasks,
> they are going to be disappointed.

Is there a way to dump what callbacks have been registered?  As far as I
can see, we call rcu_check_callbacks unconditionally when a timer
interrupt is taken and that calls rcu_pending unconditionally as well.
Before that, rcu_init is called which eventually sets up the per-cpu
data via rcu_init_percpu_data and that sets rdp->qs_pending = 1.
Until a quiescent state is reached __rcu_pending is going to try and
force it, which is where the set_need_resched is called.

Basically, I took what you said about wrapping set_need_resched and came
up with the patch below.  It gets rid of the oops from pidmap_init, but
I need to test it a bit more.  Would be happy to have feedback.

josh

---

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index ba06207..8c6cb6e 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1681,8 +1681,14 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
 		rdp->n_rp_qs_pending++;
 		if (!rdp->preemptible &&
 		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
-				 jiffies))
-			set_need_resched();
+				 jiffies)) {
+			/* Make sure we're ready to mark the task as needing
+ 			 * rescheduling otherwise we can trigger oopes early
+ 			 * in the init path
+ 			 */
+			if (rcu_scheduler_active)
+				set_need_resched();
+		}
 	} else if (rdp->qs_pending && rdp->passed_quiesc) {
 		rdp->n_rp_report_qs++;
 		return 1;

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-05 14:22               ` Josh Boyer
@ 2011-08-05 17:08                 ` Frederic Weisbecker
  2011-08-05 22:26                   ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2011-08-05 17:08 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Paul E. McKenney, linux-kernel, Andrew Morton

On Fri, Aug 05, 2011 at 10:22:45AM -0400, Josh Boyer wrote:
> On Thu, Aug 04, 2011 at 11:56:46PM -0700, Paul E. McKenney wrote:
> > > > I could be missing something obvious, but I don't see a way to avoid
> > > > using GFP_KERNEL without a lot of rip-up in the rest of the init path.
> > > 
> > > As an aside, I bisected this back to:
> > > 
> > > e8f7c70f44f sched: Make sleeping inside spinlock detection working in
> > > !CONFIG_PREEMPT
> > 
> > OK, added Frederic on CC.
> > 
> > > However, that doesn't seem all that helpful.  The
> > > CONFIG_DEBUG_SPINLOCK_SLEEP option later got renamed to
> > > DEBUG_ATOMIC_SLEEP, and all it's doing is selecting PREEMPT_COUNT.  At
> > > first glance, it seems this commit just allowed an issue that's been
> > > around for a while (benign or otherwise) to finally show up.
> > > 
> > > (The Fedora kernel configs have CONFIG_PREEMPT_VOLUNTARY set, but not
> > > CONFIG_PREEMPT so PREEMPT_COUNT wasn't getting selected until this
> > > option did so.)
> > 
> > Understood.  So my question is "what is the real way to fix this?"
> > Within RCU, I would probably wrapper the calls to set_need_resched()
> > so that it checks for the scheduler being fully alive.  Except for the
> > call from rcu_enter_nohz(), of course -- if that one is called before
> > the scheduler is ready, then that is a bug that needs to be fixed.
> 
> By scheduler being fully alive, do you mean when rcu_scheduler_starting
> is called?  Or do you mean the actual scheduler, because sched_init is
> called well before any of this happens.
> 
> > Nevertheless, I am wondering if all of this isn't really papering over
> > some real problem somewhere.  The way we get to this place is from people
> > registering RCU callbacks during early boot, which is OK in and of itself,
> > at least in moderation.  But if someone is expecting those callbacks to be
> > invoked before the scheduler is fully set up and running multiple tasks,
> > they are going to be disappointed.
> 
> Is there a way to dump what callbacks have been registered?  As far as I
> can see, we call rcu_check_callbacks unconditionally when a timer
> interrupt is taken and that calls rcu_pending unconditionally as well.
> Before that, rcu_init is called which eventually sets up the per-cpu
> data via rcu_init_percpu_data and that sets rdp->qs_pending = 1.
> Until a quiescent state is reached __rcu_pending is going to try and
> force it, which is where the set_need_resched is called.
> 
> Basically, I took what you said about wrapping set_need_resched and came
> up with the patch below.  It gets rid of the oops from pidmap_init, but
> I need to test it a bit more.  Would be happy to have feedback.
> 
> josh
> 
> ---
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index ba06207..8c6cb6e 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1681,8 +1681,14 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
>  		rdp->n_rp_qs_pending++;
>  		if (!rdp->preemptible &&
>  		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
> -				 jiffies))
> -			set_need_resched();
> +				 jiffies)) {
> +			/* Make sure we're ready to mark the task as needing
> + 			 * rescheduling otherwise we can trigger oopes early
> + 			 * in the init path
> + 			 */
> +			if (rcu_scheduler_active)
> +				set_need_resched();

What about we avoid setting rdp->qs_pending = 1 for the CPU
that handles the boot?

> +		}
>  	} else if (rdp->qs_pending && rdp->passed_quiesc) {
>  		rdp->n_rp_report_qs++;
>  		return 1;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-05 17:08                 ` Frederic Weisbecker
@ 2011-08-05 22:26                   ` Paul E. McKenney
  2011-08-05 23:12                     ` Frederic Weisbecker
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-05 22:26 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Josh Boyer, linux-kernel, Andrew Morton

On Fri, Aug 05, 2011 at 07:08:08PM +0200, Frederic Weisbecker wrote:
> On Fri, Aug 05, 2011 at 10:22:45AM -0400, Josh Boyer wrote:
> > On Thu, Aug 04, 2011 at 11:56:46PM -0700, Paul E. McKenney wrote:
> > > > > I could be missing something obvious, but I don't see a way to avoid
> > > > > using GFP_KERNEL without a lot of rip-up in the rest of the init path.
> > > > 
> > > > As an aside, I bisected this back to:
> > > > 
> > > > e8f7c70f44f sched: Make sleeping inside spinlock detection working in
> > > > !CONFIG_PREEMPT
> > > 
> > > OK, added Frederic on CC.
> > > 
> > > > However, that doesn't seem all that helpful.  The
> > > > CONFIG_DEBUG_SPINLOCK_SLEEP option later got renamed to
> > > > DEBUG_ATOMIC_SLEEP, and all it's doing is selecting PREEMPT_COUNT.  At
> > > > first glance, it seems this commit just allowed an issue that's been
> > > > around for a while (benign or otherwise) to finally show up.
> > > > 
> > > > (The Fedora kernel configs have CONFIG_PREEMPT_VOLUNTARY set, but not
> > > > CONFIG_PREEMPT so PREEMPT_COUNT wasn't getting selected until this
> > > > option did so.)
> > > 
> > > Understood.  So my question is "what is the real way to fix this?"
> > > Within RCU, I would probably wrapper the calls to set_need_resched()
> > > so that it checks for the scheduler being fully alive.  Except for the
> > > call from rcu_enter_nohz(), of course -- if that one is called before
> > > the scheduler is ready, then that is a bug that needs to be fixed.
> > 
> > By scheduler being fully alive, do you mean when rcu_scheduler_starting
> > is called?  Or do you mean the actual scheduler, because sched_init is
> > called well before any of this happens.
> > 
> > > Nevertheless, I am wondering if all of this isn't really papering over
> > > some real problem somewhere.  The way we get to this place is from people
> > > registering RCU callbacks during early boot, which is OK in and of itself,
> > > at least in moderation.  But if someone is expecting those callbacks to be
> > > invoked before the scheduler is fully set up and running multiple tasks,
> > > they are going to be disappointed.
> > 
> > Is there a way to dump what callbacks have been registered?  As far as I
> > can see, we call rcu_check_callbacks unconditionally when a timer
> > interrupt is taken and that calls rcu_pending unconditionally as well.
> > Before that, rcu_init is called which eventually sets up the per-cpu
> > data via rcu_init_percpu_data and that sets rdp->qs_pending = 1.
> > Until a quiescent state is reached __rcu_pending is going to try and
> > force it, which is where the set_need_resched is called.
> > 
> > Basically, I took what you said about wrapping set_need_resched and came
> > up with the patch below.  It gets rid of the oops from pidmap_init, but
> > I need to test it a bit more.  Would be happy to have feedback.
> > 
> > josh
> > 
> > ---
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index ba06207..8c6cb6e 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -1681,8 +1681,14 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
> >  		rdp->n_rp_qs_pending++;
> >  		if (!rdp->preemptible &&
> >  		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
> > -				 jiffies))
> > -			set_need_resched();
> > +				 jiffies)) {
> > +			/* Make sure we're ready to mark the task as needing
> > + 			 * rescheduling otherwise we can trigger oopes early
> > + 			 * in the init path
> > + 			 */
> > +			if (rcu_scheduler_active)
> > +				set_need_resched();
> 
> What about we avoid setting rdp->qs_pending = 1 for the CPU
> that handles the boot?

That sounds promising -- only checked at the beginning of a grace period,
so not too much overhead.  In contrast, __rcu_pending() is called
multiple times per transition to dyntick-idle state.

I will take a look at this.

							Thanx, Paul

> > +		}
> >  	} else if (rdp->qs_pending && rdp->passed_quiesc) {
> >  		rdp->n_rp_report_qs++;
> >  		return 1;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-05 22:26                   ` Paul E. McKenney
@ 2011-08-05 23:12                     ` Frederic Weisbecker
  2011-08-08  2:09                       ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2011-08-05 23:12 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Josh Boyer, linux-kernel, Andrew Morton

On Fri, Aug 05, 2011 at 03:26:41PM -0700, Paul E. McKenney wrote:
> On Fri, Aug 05, 2011 at 07:08:08PM +0200, Frederic Weisbecker wrote:
> > On Fri, Aug 05, 2011 at 10:22:45AM -0400, Josh Boyer wrote:
> > > On Thu, Aug 04, 2011 at 11:56:46PM -0700, Paul E. McKenney wrote:
> > > > > > I could be missing something obvious, but I don't see a way to avoid
> > > > > > using GFP_KERNEL without a lot of rip-up in the rest of the init path.
> > > > > 
> > > > > As an aside, I bisected this back to:
> > > > > 
> > > > > e8f7c70f44f sched: Make sleeping inside spinlock detection working in
> > > > > !CONFIG_PREEMPT
> > > > 
> > > > OK, added Frederic on CC.
> > > > 
> > > > > However, that doesn't seem all that helpful.  The
> > > > > CONFIG_DEBUG_SPINLOCK_SLEEP option later got renamed to
> > > > > DEBUG_ATOMIC_SLEEP, and all it's doing is selecting PREEMPT_COUNT.  At
> > > > > first glance, it seems this commit just allowed an issue that's been
> > > > > around for a while (benign or otherwise) to finally show up.
> > > > > 
> > > > > (The Fedora kernel configs have CONFIG_PREEMPT_VOLUNTARY set, but not
> > > > > CONFIG_PREEMPT so PREEMPT_COUNT wasn't getting selected until this
> > > > > option did so.)
> > > > 
> > > > Understood.  So my question is "what is the real way to fix this?"
> > > > Within RCU, I would probably wrapper the calls to set_need_resched()
> > > > so that it checks for the scheduler being fully alive.  Except for the
> > > > call from rcu_enter_nohz(), of course -- if that one is called before
> > > > the scheduler is ready, then that is a bug that needs to be fixed.
> > > 
> > > By scheduler being fully alive, do you mean when rcu_scheduler_starting
> > > is called?  Or do you mean the actual scheduler, because sched_init is
> > > called well before any of this happens.
> > > 
> > > > Nevertheless, I am wondering if all of this isn't really papering over
> > > > some real problem somewhere.  The way we get to this place is from people
> > > > registering RCU callbacks during early boot, which is OK in and of itself,
> > > > at least in moderation.  But if someone is expecting those callbacks to be
> > > > invoked before the scheduler is fully set up and running multiple tasks,
> > > > they are going to be disappointed.
> > > 
> > > Is there a way to dump what callbacks have been registered?  As far as I
> > > can see, we call rcu_check_callbacks unconditionally when a timer
> > > interrupt is taken and that calls rcu_pending unconditionally as well.
> > > Before that, rcu_init is called which eventually sets up the per-cpu
> > > data via rcu_init_percpu_data and that sets rdp->qs_pending = 1.
> > > Until a quiescent state is reached __rcu_pending is going to try and
> > > force it, which is where the set_need_resched is called.
> > > 
> > > Basically, I took what you said about wrapping set_need_resched and came
> > > up with the patch below.  It gets rid of the oops from pidmap_init, but
> > > I need to test it a bit more.  Would be happy to have feedback.
> > > 
> > > josh
> > > 
> > > ---
> > > 
> > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > index ba06207..8c6cb6e 100644
> > > --- a/kernel/rcutree.c
> > > +++ b/kernel/rcutree.c
> > > @@ -1681,8 +1681,14 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
> > >  		rdp->n_rp_qs_pending++;
> > >  		if (!rdp->preemptible &&
> > >  		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
> > > -				 jiffies))
> > > -			set_need_resched();
> > > +				 jiffies)) {
> > > +			/* Make sure we're ready to mark the task as needing
> > > + 			 * rescheduling otherwise we can trigger oopes early
> > > + 			 * in the init path
> > > + 			 */
> > > +			if (rcu_scheduler_active)
> > > +				set_need_resched();
> > 
> > What about we avoid setting rdp->qs_pending = 1 for the CPU
> > that handles the boot?
> 
> That sounds promising -- only checked at the beginning of a grace period,
> so not too much overhead.  In contrast, __rcu_pending() is called
> multiple times per transition to dyntick-idle state.
> 
> I will take a look at this.
> 
> 							Thanx, Paul

I actually thought it could be done from rcu_init_percpu_data(). This
is where we initialize the qs_pending to 1, which I believe is responsible
for that set_need_resched() from rcu soon after on boot.

It's possible we also have secondary offenders in places that enqueue
rcu callbacks in the boot. But if not, then we are fine with tweaking
that qs_pending on cpu boot.

Because the initial qs_pending = 1 in rcu_init_percpu_data()
is useful only if a grace period has started, but by the time
rcu_init() is called I guess it can't be the case.

> 
> > > +		}
> > >  	} else if (rdp->qs_pending && rdp->passed_quiesc) {
> > >  		rdp->n_rp_report_qs++;
> > >  		return 1;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-05 23:12                     ` Frederic Weisbecker
@ 2011-08-08  2:09                       ` Paul E. McKenney
  2011-08-08  2:55                         ` Frederic Weisbecker
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-08  2:09 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Josh Boyer, linux-kernel, Andrew Morton

On Sat, Aug 06, 2011 at 01:12:18AM +0200, Frederic Weisbecker wrote:
> On Fri, Aug 05, 2011 at 03:26:41PM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 05, 2011 at 07:08:08PM +0200, Frederic Weisbecker wrote:
> > > On Fri, Aug 05, 2011 at 10:22:45AM -0400, Josh Boyer wrote:
> > > > On Thu, Aug 04, 2011 at 11:56:46PM -0700, Paul E. McKenney wrote:
> > > > > > > I could be missing something obvious, but I don't see a way to avoid
> > > > > > > using GFP_KERNEL without a lot of rip-up in the rest of the init path.
> > > > > > 
> > > > > > As an aside, I bisected this back to:
> > > > > > 
> > > > > > e8f7c70f44f sched: Make sleeping inside spinlock detection working in
> > > > > > !CONFIG_PREEMPT
> > > > > 
> > > > > OK, added Frederic on CC.
> > > > > 
> > > > > > However, that doesn't seem all that helpful.  The
> > > > > > CONFIG_DEBUG_SPINLOCK_SLEEP option later got renamed to
> > > > > > DEBUG_ATOMIC_SLEEP, and all it's doing is selecting PREEMPT_COUNT.  At
> > > > > > first glance, it seems this commit just allowed an issue that's been
> > > > > > around for a while (benign or otherwise) to finally show up.
> > > > > > 
> > > > > > (The Fedora kernel configs have CONFIG_PREEMPT_VOLUNTARY set, but not
> > > > > > CONFIG_PREEMPT so PREEMPT_COUNT wasn't getting selected until this
> > > > > > option did so.)
> > > > > 
> > > > > Understood.  So my question is "what is the real way to fix this?"
> > > > > Within RCU, I would probably wrapper the calls to set_need_resched()
> > > > > so that it checks for the scheduler being fully alive.  Except for the
> > > > > call from rcu_enter_nohz(), of course -- if that one is called before
> > > > > the scheduler is ready, then that is a bug that needs to be fixed.
> > > > 
> > > > By scheduler being fully alive, do you mean when rcu_scheduler_starting
> > > > is called?  Or do you mean the actual scheduler, because sched_init is
> > > > called well before any of this happens.
> > > > 
> > > > > Nevertheless, I am wondering if all of this isn't really papering over
> > > > > some real problem somewhere.  The way we get to this place is from people
> > > > > registering RCU callbacks during early boot, which is OK in and of itself,
> > > > > at least in moderation.  But if someone is expecting those callbacks to be
> > > > > invoked before the scheduler is fully set up and running multiple tasks,
> > > > > they are going to be disappointed.
> > > > 
> > > > Is there a way to dump what callbacks have been registered?  As far as I
> > > > can see, we call rcu_check_callbacks unconditionally when a timer
> > > > interrupt is taken and that calls rcu_pending unconditionally as well.
> > > > Before that, rcu_init is called which eventually sets up the per-cpu
> > > > data via rcu_init_percpu_data and that sets rdp->qs_pending = 1.
> > > > Until a quiescent state is reached __rcu_pending is going to try and
> > > > force it, which is where the set_need_resched is called.
> > > > 
> > > > Basically, I took what you said about wrapping set_need_resched and came
> > > > up with the patch below.  It gets rid of the oops from pidmap_init, but
> > > > I need to test it a bit more.  Would be happy to have feedback.
> > > > 
> > > > josh
> > > > 
> > > > ---
> > > > 
> > > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > > index ba06207..8c6cb6e 100644
> > > > --- a/kernel/rcutree.c
> > > > +++ b/kernel/rcutree.c
> > > > @@ -1681,8 +1681,14 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
> > > >  		rdp->n_rp_qs_pending++;
> > > >  		if (!rdp->preemptible &&
> > > >  		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
> > > > -				 jiffies))
> > > > -			set_need_resched();
> > > > +				 jiffies)) {
> > > > +			/* Make sure we're ready to mark the task as needing
> > > > + 			 * rescheduling otherwise we can trigger oopes early
> > > > + 			 * in the init path
> > > > + 			 */
> > > > +			if (rcu_scheduler_active)
> > > > +				set_need_resched();
> > > 
> > > What about we avoid setting rdp->qs_pending = 1 for the CPU
> > > that handles the boot?
> > 
> > That sounds promising -- only checked at the beginning of a grace period,
> > so not too much overhead.  In contrast, __rcu_pending() is called
> > multiple times per transition to dyntick-idle state.
> > 
> > I will take a look at this.
> > 
> > 							Thanx, Paul
> 
> I actually thought it could be done from rcu_init_percpu_data(). This
> is where we initialize the qs_pending to 1, which I believe is responsible
> for that set_need_resched() from rcu soon after on boot.

That would cover some of the situations, but...

> It's possible we also have secondary offenders in places that enqueue
> rcu callbacks in the boot. But if not, then we are fine with tweaking
> that qs_pending on cpu boot.

The first time we take a scheduling-clock interrupt on a CPU with a
callback queued, we will also set qs_pending.  Hence the need to also
suppress the assignment in __note_new_gpnum().  Or better yet, just
prevent new grace periods in cpu_needs_another_gp().  I believe that doing
this will make it unnecessary to do anything in rcu_init_percpu_data().

> Because the initial qs_pending = 1 in rcu_init_percpu_data()
> is useful only if a grace period has started, but by the time
> rcu_init() is called I guess it can't be the case.

Plus it is necessary to suppress RCU CPU stall warnings at boot time.

> > 
> > > > +		}
> > > >  	} else if (rdp->qs_pending && rdp->passed_quiesc) {
> > > >  		rdp->n_rp_report_qs++;
> > > >  		return 1;

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-08  2:09                       ` Paul E. McKenney
@ 2011-08-08  2:55                         ` Frederic Weisbecker
  2011-08-08  3:10                           ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2011-08-08  2:55 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Josh Boyer, linux-kernel, Andrew Morton

On Sun, Aug 07, 2011 at 07:09:14PM -0700, Paul E. McKenney wrote:
> On Sat, Aug 06, 2011 at 01:12:18AM +0200, Frederic Weisbecker wrote:
> > On Fri, Aug 05, 2011 at 03:26:41PM -0700, Paul E. McKenney wrote:
> > > On Fri, Aug 05, 2011 at 07:08:08PM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Aug 05, 2011 at 10:22:45AM -0400, Josh Boyer wrote:
> > > > > On Thu, Aug 04, 2011 at 11:56:46PM -0700, Paul E. McKenney wrote:
> > > > > > > > I could be missing something obvious, but I don't see a way to avoid
> > > > > > > > using GFP_KERNEL without a lot of rip-up in the rest of the init path.
> > > > > > > 
> > > > > > > As an aside, I bisected this back to:
> > > > > > > 
> > > > > > > e8f7c70f44f sched: Make sleeping inside spinlock detection working in
> > > > > > > !CONFIG_PREEMPT
> > > > > > 
> > > > > > OK, added Frederic on CC.
> > > > > > 
> > > > > > > However, that doesn't seem all that helpful.  The
> > > > > > > CONFIG_DEBUG_SPINLOCK_SLEEP option later got renamed to
> > > > > > > DEBUG_ATOMIC_SLEEP, and all it's doing is selecting PREEMPT_COUNT.  At
> > > > > > > first glance, it seems this commit just allowed an issue that's been
> > > > > > > around for a while (benign or otherwise) to finally show up.
> > > > > > > 
> > > > > > > (The Fedora kernel configs have CONFIG_PREEMPT_VOLUNTARY set, but not
> > > > > > > CONFIG_PREEMPT so PREEMPT_COUNT wasn't getting selected until this
> > > > > > > option did so.)
> > > > > > 
> > > > > > Understood.  So my question is "what is the real way to fix this?"
> > > > > > Within RCU, I would probably wrapper the calls to set_need_resched()
> > > > > > so that it checks for the scheduler being fully alive.  Except for the
> > > > > > call from rcu_enter_nohz(), of course -- if that one is called before
> > > > > > the scheduler is ready, then that is a bug that needs to be fixed.
> > > > > 
> > > > > By scheduler being fully alive, do you mean when rcu_scheduler_starting
> > > > > is called?  Or do you mean the actual scheduler, because sched_init is
> > > > > called well before any of this happens.
> > > > > 
> > > > > > Nevertheless, I am wondering if all of this isn't really papering over
> > > > > > some real problem somewhere.  The way we get to this place is from people
> > > > > > registering RCU callbacks during early boot, which is OK in and of itself,
> > > > > > at least in moderation.  But if someone is expecting those callbacks to be
> > > > > > invoked before the scheduler is fully set up and running multiple tasks,
> > > > > > they are going to be disappointed.
> > > > > 
> > > > > Is there a way to dump what callbacks have been registered?  As far as I
> > > > > can see, we call rcu_check_callbacks unconditionally when a timer
> > > > > interrupt is taken and that calls rcu_pending unconditionally as well.
> > > > > Before that, rcu_init is called which eventually sets up the per-cpu
> > > > > data via rcu_init_percpu_data and that sets rdp->qs_pending = 1.
> > > > > Until a quiescent state is reached __rcu_pending is going to try and
> > > > > force it, which is where the set_need_resched is called.
> > > > > 
> > > > > Basically, I took what you said about wrapping set_need_resched and came
> > > > > up with the patch below.  It gets rid of the oops from pidmap_init, but
> > > > > I need to test it a bit more.  Would be happy to have feedback.
> > > > > 
> > > > > josh
> > > > > 
> > > > > ---
> > > > > 
> > > > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > > > index ba06207..8c6cb6e 100644
> > > > > --- a/kernel/rcutree.c
> > > > > +++ b/kernel/rcutree.c
> > > > > @@ -1681,8 +1681,14 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
> > > > >  		rdp->n_rp_qs_pending++;
> > > > >  		if (!rdp->preemptible &&
> > > > >  		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
> > > > > -				 jiffies))
> > > > > -			set_need_resched();
> > > > > +				 jiffies)) {
> > > > > +			/* Make sure we're ready to mark the task as needing
> > > > > + 			 * rescheduling otherwise we can trigger oopes early
> > > > > + 			 * in the init path
> > > > > + 			 */
> > > > > +			if (rcu_scheduler_active)
> > > > > +				set_need_resched();
> > > > 
> > > > What about we avoid setting rdp->qs_pending = 1 for the CPU
> > > > that handles the boot?
> > > 
> > > That sounds promising -- only checked at the beginning of a grace period,
> > > so not too much overhead.  In contrast, __rcu_pending() is called
> > > multiple times per transition to dyntick-idle state.
> > > 
> > > I will take a look at this.
> > > 
> > > 							Thanx, Paul
> > 
> > I actually thought it could be done from rcu_init_percpu_data(). This
> > is where we initialize the qs_pending to 1, which I believe is responsible
> > for that set_need_resched() from rcu soon after on boot.
> 
> That would cover some of the situations, but...
> 
> > It's possible we also have secondary offenders in places that enqueue
> > rcu callbacks in the boot. But if not, then we are fine with tweaking
> > that qs_pending on cpu boot.
> 
> The first time we take a scheduling-clock interrupt on a CPU with a
> callback queued, we will also set qs_pending.  Hence the need to also
> suppress the assignment in __note_new_gpnum().  Or better yet, just
> prevent new grace periods in cpu_needs_another_gp().  I believe that doing
> this will make it unnecessary to do anything in rcu_init_percpu_data().

Yeah if we have callbacks enqueued during the boot then we need to have
a check in cpu_needs_another_gp().

Now rcu_init_percpu_data() still sets rdp->qs_pending to 1, and that
is going to stay as is as long as preemption is disabled.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-08  2:55                         ` Frederic Weisbecker
@ 2011-08-08  3:10                           ` Paul E. McKenney
  2011-08-09 11:35                             ` Frederic Weisbecker
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-08  3:10 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Josh Boyer, linux-kernel, Andrew Morton

On Mon, Aug 08, 2011 at 04:55:07AM +0200, Frederic Weisbecker wrote:
> On Sun, Aug 07, 2011 at 07:09:14PM -0700, Paul E. McKenney wrote:
> > On Sat, Aug 06, 2011 at 01:12:18AM +0200, Frederic Weisbecker wrote:
> > > On Fri, Aug 05, 2011 at 03:26:41PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Aug 05, 2011 at 07:08:08PM +0200, Frederic Weisbecker wrote:
> > > > > On Fri, Aug 05, 2011 at 10:22:45AM -0400, Josh Boyer wrote:
> > > > > > On Thu, Aug 04, 2011 at 11:56:46PM -0700, Paul E. McKenney wrote:
> > > > > > > > > I could be missing something obvious, but I don't see a way to avoid
> > > > > > > > > using GFP_KERNEL without a lot of rip-up in the rest of the init path.
> > > > > > > > 
> > > > > > > > As an aside, I bisected this back to:
> > > > > > > > 
> > > > > > > > e8f7c70f44f sched: Make sleeping inside spinlock detection working in
> > > > > > > > !CONFIG_PREEMPT
> > > > > > > 
> > > > > > > OK, added Frederic on CC.
> > > > > > > 
> > > > > > > > However, that doesn't seem all that helpful.  The
> > > > > > > > CONFIG_DEBUG_SPINLOCK_SLEEP option later got renamed to
> > > > > > > > DEBUG_ATOMIC_SLEEP, and all it's doing is selecting PREEMPT_COUNT.  At
> > > > > > > > first glance, it seems this commit just allowed an issue that's been
> > > > > > > > around for a while (benign or otherwise) to finally show up.
> > > > > > > > 
> > > > > > > > (The Fedora kernel configs have CONFIG_PREEMPT_VOLUNTARY set, but not
> > > > > > > > CONFIG_PREEMPT so PREEMPT_COUNT wasn't getting selected until this
> > > > > > > > option did so.)
> > > > > > > 
> > > > > > > Understood.  So my question is "what is the real way to fix this?"
> > > > > > > Within RCU, I would probably wrapper the calls to set_need_resched()
> > > > > > > so that it checks for the scheduler being fully alive.  Except for the
> > > > > > > call from rcu_enter_nohz(), of course -- if that one is called before
> > > > > > > the scheduler is ready, then that is a bug that needs to be fixed.
> > > > > > 
> > > > > > By scheduler being fully alive, do you mean when rcu_scheduler_starting
> > > > > > is called?  Or do you mean the actual scheduler, because sched_init is
> > > > > > called well before any of this happens.
> > > > > > 
> > > > > > > Nevertheless, I am wondering if all of this isn't really papering over
> > > > > > > some real problem somewhere.  The way we get to this place is from people
> > > > > > > registering RCU callbacks during early boot, which is OK in and of itself,
> > > > > > > at least in moderation.  But if someone is expecting those callbacks to be
> > > > > > > invoked before the scheduler is fully set up and running multiple tasks,
> > > > > > > they are going to be disappointed.
> > > > > > 
> > > > > > Is there a way to dump what callbacks have been registered?  As far as I
> > > > > > can see, we call rcu_check_callbacks unconditionally when a timer
> > > > > > interrupt is taken and that calls rcu_pending unconditionally as well.
> > > > > > Before that, rcu_init is called which eventually sets up the per-cpu
> > > > > > data via rcu_init_percpu_data and that sets rdp->qs_pending = 1.
> > > > > > Until a quiescent state is reached __rcu_pending is going to try and
> > > > > > force it, which is where the set_need_resched is called.
> > > > > > 
> > > > > > Basically, I took what you said about wrapping set_need_resched and came
> > > > > > up with the patch below.  It gets rid of the oops from pidmap_init, but
> > > > > > I need to test it a bit more.  Would be happy to have feedback.
> > > > > > 
> > > > > > josh
> > > > > > 
> > > > > > ---
> > > > > > 
> > > > > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > > > > index ba06207..8c6cb6e 100644
> > > > > > --- a/kernel/rcutree.c
> > > > > > +++ b/kernel/rcutree.c
> > > > > > @@ -1681,8 +1681,14 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
> > > > > >  		rdp->n_rp_qs_pending++;
> > > > > >  		if (!rdp->preemptible &&
> > > > > >  		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
> > > > > > -				 jiffies))
> > > > > > -			set_need_resched();
> > > > > > +				 jiffies)) {
> > > > > > +			/* Make sure we're ready to mark the task as needing
> > > > > > + 			 * rescheduling otherwise we can trigger oopes early
> > > > > > + 			 * in the init path
> > > > > > + 			 */
> > > > > > +			if (rcu_scheduler_active)
> > > > > > +				set_need_resched();
> > > > > 
> > > > > What about we avoid setting rdp->qs_pending = 1 for the CPU
> > > > > that handles the boot?
> > > > 
> > > > That sounds promising -- only checked at the beginning of a grace period,
> > > > so not too much overhead.  In contrast, __rcu_pending() is called
> > > > multiple times per transition to dyntick-idle state.
> > > > 
> > > > I will take a look at this.
> > > > 
> > > > 							Thanx, Paul
> > > 
> > > I actually thought it could be done from rcu_init_percpu_data(). This
> > > is where we initialize the qs_pending to 1, which I believe is responsible
> > > for that set_need_resched() from rcu soon after on boot.
> > 
> > That would cover some of the situations, but...
> > 
> > > It's possible we also have secondary offenders in places that enqueue
> > > rcu callbacks in the boot. But if not, then we are fine with tweaking
> > > that qs_pending on cpu boot.
> > 
> > The first time we take a scheduling-clock interrupt on a CPU with a
> > callback queued, we will also set qs_pending.  Hence the need to also
> > suppress the assignment in __note_new_gpnum().  Or better yet, just
> > prevent new grace periods in cpu_needs_another_gp().  I believe that doing
> > this will make it unnecessary to do anything in rcu_init_percpu_data().
> 
> Yeah if we have callbacks enqueued during the boot then we need to have
> a check in cpu_needs_another_gp().
> 
> Now rcu_init_percpu_data() still sets rdp->qs_pending to 1, and that
> is going to stay as is as long as preemption is disabled.

But setting rdp->qs_pending to 1 in rcu_init_percpu_data() has no effect
until a grace period starts.  So, if grace periods are prevented from
starting, no need to mess with rcu_init_percpu_data().  Especially given
that rcu_init_percpu_data() is also used at late boot and runtime for
CPU hotplug.

So I believe that it is sufficient to change cpu_needs_another_gp()
to check for boot being far enough along to allow grace periods.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-08  3:10                           ` Paul E. McKenney
@ 2011-08-09 11:35                             ` Frederic Weisbecker
  2011-08-10 12:45                               ` Josh Boyer
  0 siblings, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2011-08-09 11:35 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Josh Boyer, linux-kernel, Andrew Morton

On Sun, Aug 07, 2011 at 08:10:14PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 08, 2011 at 04:55:07AM +0200, Frederic Weisbecker wrote:
> > On Sun, Aug 07, 2011 at 07:09:14PM -0700, Paul E. McKenney wrote:
> > > On Sat, Aug 06, 2011 at 01:12:18AM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Aug 05, 2011 at 03:26:41PM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Aug 05, 2011 at 07:08:08PM +0200, Frederic Weisbecker wrote:
> > > > > > On Fri, Aug 05, 2011 at 10:22:45AM -0400, Josh Boyer wrote:
> > > > > > > On Thu, Aug 04, 2011 at 11:56:46PM -0700, Paul E. McKenney wrote:
> > > > > > > > > > I could be missing something obvious, but I don't see a way to avoid
> > > > > > > > > > using GFP_KERNEL without a lot of rip-up in the rest of the init path.
> > > > > > > > > 
> > > > > > > > > As an aside, I bisected this back to:
> > > > > > > > > 
> > > > > > > > > e8f7c70f44f sched: Make sleeping inside spinlock detection working in
> > > > > > > > > !CONFIG_PREEMPT
> > > > > > > > 
> > > > > > > > OK, added Frederic on CC.
> > > > > > > > 
> > > > > > > > > However, that doesn't seem all that helpful.  The
> > > > > > > > > CONFIG_DEBUG_SPINLOCK_SLEEP option later got renamed to
> > > > > > > > > DEBUG_ATOMIC_SLEEP, and all it's doing is selecting PREEMPT_COUNT.  At
> > > > > > > > > first glance, it seems this commit just allowed an issue that's been
> > > > > > > > > around for a while (benign or otherwise) to finally show up.
> > > > > > > > > 
> > > > > > > > > (The Fedora kernel configs have CONFIG_PREEMPT_VOLUNTARY set, but not
> > > > > > > > > CONFIG_PREEMPT so PREEMPT_COUNT wasn't getting selected until this
> > > > > > > > > option did so.)
> > > > > > > > 
> > > > > > > > Understood.  So my question is "what is the real way to fix this?"
> > > > > > > > Within RCU, I would probably wrapper the calls to set_need_resched()
> > > > > > > > so that it checks for the scheduler being fully alive.  Except for the
> > > > > > > > call from rcu_enter_nohz(), of course -- if that one is called before
> > > > > > > > the scheduler is ready, then that is a bug that needs to be fixed.
> > > > > > > 
> > > > > > > By scheduler being fully alive, do you mean when rcu_scheduler_starting
> > > > > > > is called?  Or do you mean the actual scheduler, because sched_init is
> > > > > > > called well before any of this happens.
> > > > > > > 
> > > > > > > > Nevertheless, I am wondering if all of this isn't really papering over
> > > > > > > > some real problem somewhere.  The way we get to this place is from people
> > > > > > > > registering RCU callbacks during early boot, which is OK in and of itself,
> > > > > > > > at least in moderation.  But if someone is expecting those callbacks to be
> > > > > > > > invoked before the scheduler is fully set up and running multiple tasks,
> > > > > > > > they are going to be disappointed.
> > > > > > > 
> > > > > > > Is there a way to dump what callbacks have been registered?  As far as I
> > > > > > > can see, we call rcu_check_callbacks unconditionally when a timer
> > > > > > > interrupt is taken and that calls rcu_pending unconditionally as well.
> > > > > > > Before that, rcu_init is called which eventually sets up the per-cpu
> > > > > > > data via rcu_init_percpu_data and that sets rdp->qs_pending = 1.
> > > > > > > Until a quiescent state is reached __rcu_pending is going to try and
> > > > > > > force it, which is where the set_need_resched is called.
> > > > > > > 
> > > > > > > Basically, I took what you said about wrapping set_need_resched and came
> > > > > > > up with the patch below.  It gets rid of the oops from pidmap_init, but
> > > > > > > I need to test it a bit more.  Would be happy to have feedback.
> > > > > > > 
> > > > > > > josh
> > > > > > > 
> > > > > > > ---
> > > > > > > 
> > > > > > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > > > > > index ba06207..8c6cb6e 100644
> > > > > > > --- a/kernel/rcutree.c
> > > > > > > +++ b/kernel/rcutree.c
> > > > > > > @@ -1681,8 +1681,14 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
> > > > > > >  		rdp->n_rp_qs_pending++;
> > > > > > >  		if (!rdp->preemptible &&
> > > > > > >  		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
> > > > > > > -				 jiffies))
> > > > > > > -			set_need_resched();
> > > > > > > +				 jiffies)) {
> > > > > > > +			/* Make sure we're ready to mark the task as needing
> > > > > > > + 			 * rescheduling otherwise we can trigger oopes early
> > > > > > > + 			 * in the init path
> > > > > > > + 			 */
> > > > > > > +			if (rcu_scheduler_active)
> > > > > > > +				set_need_resched();
> > > > > > 
> > > > > > What about we avoid setting rdp->qs_pending = 1 for the CPU
> > > > > > that handles the boot?
> > > > > 
> > > > > That sounds promising -- only checked at the beginning of a grace period,
> > > > > so not too much overhead.  In contrast, __rcu_pending() is called
> > > > > multiple times per transition to dyntick-idle state.
> > > > > 
> > > > > I will take a look at this.
> > > > > 
> > > > > 							Thanx, Paul
> > > > 
> > > > I actually thought it could be done from rcu_init_percpu_data(). This
> > > > is where we initialize the qs_pending to 1, which I believe is responsible
> > > > for that set_need_resched() from rcu soon after on boot.
> > > 
> > > That would cover some of the situations, but...
> > > 
> > > > It's possible we also have secondary offenders in places that enqueue
> > > > rcu callbacks in the boot. But if not, then we are fine with tweaking
> > > > that qs_pending on cpu boot.
> > > 
> > > The first time we take a scheduling-clock interrupt on a CPU with a
> > > callback queued, we will also set qs_pending.  Hence the need to also
> > > suppress the assignment in __note_new_gpnum().  Or better yet, just
> > > prevent new grace periods in cpu_needs_another_gp().  I believe that doing
> > > this will make it unnecessary to do anything in rcu_init_percpu_data().
> > 
> > Yeah if we have callbacks enqueued during the boot then we need to have
> > a check in cpu_needs_another_gp().
> > 
> > Now rcu_init_percpu_data() still sets rdp->qs_pending to 1, and that
> > is going to stay as is as long as preemption is disabled.
> 
> But setting rdp->qs_pending to 1 in rcu_init_percpu_data() has no effect
> until a grace period starts.  So, if grace periods are prevented from
> starting, no need to mess with rcu_init_percpu_data().  Especially given
> that rcu_init_percpu_data() is also used at late boot and runtime for
> CPU hotplug.

Ok.

> 
> So I believe that it is sufficient to change cpu_needs_another_gp()
> to check for boot being far enough along to allow grace periods.

Yep, sounds good.

Thanks.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-09 11:35                             ` Frederic Weisbecker
@ 2011-08-10 12:45                               ` Josh Boyer
  2011-08-10 14:53                                 ` Frederic Weisbecker
  2011-08-14 23:04                                 ` Paul E. McKenney
  0 siblings, 2 replies; 42+ messages in thread
From: Josh Boyer @ 2011-08-10 12:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, Josh Boyer, linux-kernel, Andrew Morton

On Tue, Aug 09, 2011 at 01:35:18PM +0200, Frederic Weisbecker wrote:
> > > > > > > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > > > > > > index ba06207..8c6cb6e 100644
> > > > > > > > --- a/kernel/rcutree.c
> > > > > > > > +++ b/kernel/rcutree.c
> > > > > > > > @@ -1681,8 +1681,14 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
> > > > > > > >  		rdp->n_rp_qs_pending++;
> > > > > > > >  		if (!rdp->preemptible &&
> > > > > > > >  		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
> > > > > > > > -				 jiffies))
> > > > > > > > -			set_need_resched();
> > > > > > > > +				 jiffies)) {
> > > > > > > > +			/* Make sure we're ready to mark the task as needing
> > > > > > > > + 			 * rescheduling otherwise we can trigger oopes early
> > > > > > > > + 			 * in the init path
> > > > > > > > + 			 */
> > > > > > > > +			if (rcu_scheduler_active)
> > > > > > > > +				set_need_resched();
> > > > > > > 

<snip>

> > > > The first time we take a scheduling-clock interrupt on a CPU with a
> > > > callback queued, we will also set qs_pending.  Hence the need to also
> > > > suppress the assignment in __note_new_gpnum().  Or better yet, just
> > > > prevent new grace periods in cpu_needs_another_gp().  I believe that doing
> > > > this will make it unnecessary to do anything in rcu_init_percpu_data().
> > > 
> > > Yeah if we have callbacks enqueued during the boot then we need to have
> > > a check in cpu_needs_another_gp().
> > > 
> > > Now rcu_init_percpu_data() still sets rdp->qs_pending to 1, and that
> > > is going to stay as is as long as preemption is disabled.
> > 
> > But setting rdp->qs_pending to 1 in rcu_init_percpu_data() has no effect
> > until a grace period starts.  So, if grace periods are prevented from

Er... really?  Because it gets set and __rcu_pending looks at it
unconditionally in the case that is calling set_need_resched.  It
doesn't check if there is anything about a grace period going on or not.

> > starting, no need to mess with rcu_init_percpu_data().  Especially given
> > that rcu_init_percpu_data() is also used at late boot and runtime for
> > CPU hotplug.
> 
> Ok.
> 
> > 
> > So I believe that it is sufficient to change cpu_needs_another_gp()
> > to check for boot being far enough along to allow grace periods.
> 
> Yep, sounds good.

I looked at doing this but got lost as to 1) how it would help the
situtation I've reported, and 2) exactly how to do that.

I'd be happy to test, but at the moment the proposed solution is
confusing to me.

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-10 12:45                               ` Josh Boyer
@ 2011-08-10 14:53                                 ` Frederic Weisbecker
  2011-08-10 15:03                                   ` Josh Boyer
  2011-08-14 23:04                                 ` Paul E. McKenney
  1 sibling, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2011-08-10 14:53 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Paul E. McKenney, linux-kernel, Andrew Morton

On Wed, Aug 10, 2011 at 08:45:29AM -0400, Josh Boyer wrote:
> On Tue, Aug 09, 2011 at 01:35:18PM +0200, Frederic Weisbecker wrote:
> > > But setting rdp->qs_pending to 1 in rcu_init_percpu_data() has no effect
> > > until a grace period starts.  So, if grace periods are prevented from
> 
> Er... really?  Because it gets set and __rcu_pending looks at it
> unconditionally in the case that is calling set_need_resched.  It
> doesn't check if there is anything about a grace period going on or not.

You mean this?

	if (rdp->qs_pending && !rdp->passed_quiesc) {

		/*
		 * If force_quiescent_state() coming soon and this CPU
		 * needs a quiescent state, and this is either RCU-sched
		 * or RCU-bh, force a local reschedule.
		 */
		rdp->n_rp_qs_pending++;
		if (!rdp->preemptible &&
		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
				 jiffies))
			set_need_resched();
	}

On boot, if we don't start a grace period, we don't schedule a grace period forcing,
so rsp->jiffies_force_qs is 0.

With ULONG_CMP_LT taking care of (-1 < jiffies) to be valid even with
ulong, then we are fine I guess.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-10 14:53                                 ` Frederic Weisbecker
@ 2011-08-10 15:03                                   ` Josh Boyer
  0 siblings, 0 replies; 42+ messages in thread
From: Josh Boyer @ 2011-08-10 15:03 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Paul E. McKenney, linux-kernel, Andrew Morton

On Wed, Aug 10, 2011 at 04:53:41PM +0200, Frederic Weisbecker wrote:
> On Wed, Aug 10, 2011 at 08:45:29AM -0400, Josh Boyer wrote:
> > On Tue, Aug 09, 2011 at 01:35:18PM +0200, Frederic Weisbecker wrote:
> > > > But setting rdp->qs_pending to 1 in rcu_init_percpu_data() has no effect
> > > > until a grace period starts.  So, if grace periods are prevented from
> > 
> > Er... really?  Because it gets set and __rcu_pending looks at it
> > unconditionally in the case that is calling set_need_resched.  It
> > doesn't check if there is anything about a grace period going on or not.
> 
> You mean this?
> 
> 	if (rdp->qs_pending && !rdp->passed_quiesc) {
> 
> 		/*
> 		 * If force_quiescent_state() coming soon and this CPU
> 		 * needs a quiescent state, and this is either RCU-sched
> 		 * or RCU-bh, force a local reschedule.
> 		 */
> 		rdp->n_rp_qs_pending++;
> 		if (!rdp->preemptible &&
> 		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
> 				 jiffies))
> 			set_need_resched();
> 	}
> 

Yes.

> On boot, if we don't start a grace period, we don't schedule a grace period forcing,
> so rsp->jiffies_force_qs is 0.

Ok.  I missed that cpu_needs_another_gp() would somehow prevent
rsp->jiffies_force_qs from getting set.

> With ULONG_CMP_LT taking care of (-1 < jiffies) to be valid even with
> ulong, then we are fine I guess.

I'd be happy to test a patch to make sure :)

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-10 12:45                               ` Josh Boyer
  2011-08-10 14:53                                 ` Frederic Weisbecker
@ 2011-08-14 23:04                                 ` Paul E. McKenney
  2011-08-15 14:04                                   ` Josh Boyer
  1 sibling, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-14 23:04 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Frederic Weisbecker, linux-kernel, Andrew Morton

On Wed, Aug 10, 2011 at 08:45:29AM -0400, Josh Boyer wrote:
> On Tue, Aug 09, 2011 at 01:35:18PM +0200, Frederic Weisbecker wrote:
> > > > > > > > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > > > > > > > index ba06207..8c6cb6e 100644
> > > > > > > > > --- a/kernel/rcutree.c
> > > > > > > > > +++ b/kernel/rcutree.c
> > > > > > > > > @@ -1681,8 +1681,14 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
> > > > > > > > >  		rdp->n_rp_qs_pending++;
> > > > > > > > >  		if (!rdp->preemptible &&
> > > > > > > > >  		    ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
> > > > > > > > > -				 jiffies))
> > > > > > > > > -			set_need_resched();
> > > > > > > > > +				 jiffies)) {
> > > > > > > > > +			/* Make sure we're ready to mark the task as needing
> > > > > > > > > + 			 * rescheduling otherwise we can trigger oopes early
> > > > > > > > > + 			 * in the init path
> > > > > > > > > + 			 */
> > > > > > > > > +			if (rcu_scheduler_active)
> > > > > > > > > +				set_need_resched();
> > > > > > > > 
> 
> <snip>
> 
> > > > > The first time we take a scheduling-clock interrupt on a CPU with a
> > > > > callback queued, we will also set qs_pending.  Hence the need to also
> > > > > suppress the assignment in __note_new_gpnum().  Or better yet, just
> > > > > prevent new grace periods in cpu_needs_another_gp().  I believe that doing
> > > > > this will make it unnecessary to do anything in rcu_init_percpu_data().
> > > > 
> > > > Yeah if we have callbacks enqueued during the boot then we need to have
> > > > a check in cpu_needs_another_gp().
> > > > 
> > > > Now rcu_init_percpu_data() still sets rdp->qs_pending to 1, and that
> > > > is going to stay as is as long as preemption is disabled.
> > > 
> > > But setting rdp->qs_pending to 1 in rcu_init_percpu_data() has no effect
> > > until a grace period starts.  So, if grace periods are prevented from
> 
> Er... really?  Because it gets set and __rcu_pending looks at it
> unconditionally in the case that is calling set_need_resched.  It
> doesn't check if there is anything about a grace period going on or not.

Frederic noted the condition that prevents this at boot time, but it
appears that newly onlined CPUs might send themselves needless resched
IPIs at runtime if RCU is idle.

> > > starting, no need to mess with rcu_init_percpu_data().  Especially given
> > > that rcu_init_percpu_data() is also used at late boot and runtime for
> > > CPU hotplug.
> > 
> > Ok.
> > 
> > > 
> > > So I believe that it is sufficient to change cpu_needs_another_gp()
> > > to check for boot being far enough along to allow grace periods.
> > 
> > Yep, sounds good.
> 
> I looked at doing this but got lost as to 1) how it would help the
> situtation I've reported, and 2) exactly how to do that.

It would prevent control from reaching that point, and that might
well be needed for other reasons.  (This bit about RCU needing to
work differently at boot time is, well, "interesting".)

> I'd be happy to test, but at the moment the proposed solution is
> confusing to me.

Please see the attached.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Avoid having just-onlined CPU resched itself when RCU is idle

CPUs set rdp->qs_pending when coming online to resolve races with
grace-period start.  However, this means that if RCU is idle, the
just-onlined CPU might needlessly send itself resched IPIs.  Adjust
the online-CPU initialization to avoid this, and also to correctly
cause the CPU to respond to the current grace period if needed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index daa2e62..23ce24e 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1915,8 +1915,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
 
 	/* Set up local state, ensuring consistent view of global state. */
 	raw_spin_lock_irqsave(&rnp->lock, flags);
-	rdp->passed_quiesce = 0;  /* We could be racing with new GP, */
-	rdp->qs_pending = 1;	 /*  so set up to respond to current GP. */
 	rdp->beenonline = 1;	 /* We have now been online. */
 	rdp->preemptible = preemptible;
 	rdp->qlen_last_fqs_check = 0;
@@ -1941,8 +1939,15 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
 		rnp->qsmaskinit |= mask;
 		mask = rnp->grpmask;
 		if (rnp == rdp->mynode) {
-			rdp->gpnum = rnp->completed; /* if GP in progress... */
+			/*
+			 * If there is a grace period in progress, we will
+			 * set up to wait for it next time we run the
+			 * RCU core code.
+			 */
+			rdp->gpnum = rnp->completed;
 			rdp->completed = rnp->completed;
+			rdp->passed_quiesce = 0;
+			rdp->qs_pending = 0;
 			rdp->passed_quiesce_gpnum = rnp->gpnum - 1;
 			trace_rcu_grace_period(rsp->name, rdp->gpnum, "cpuonl");
 		}

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-14 23:04                                 ` Paul E. McKenney
@ 2011-08-15 14:04                                   ` Josh Boyer
  2011-08-15 15:20                                     ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-15 14:04 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Frederic Weisbecker, linux-kernel, Andrew Morton

On Sun, Aug 14, 2011 at 04:04:53PM -0700, Paul E. McKenney wrote:
> > > > > Now rcu_init_percpu_data() still sets rdp->qs_pending to 1, and that
> > > > > is going to stay as is as long as preemption is disabled.
> > > > 
> > > > But setting rdp->qs_pending to 1 in rcu_init_percpu_data() has no effect
> > > > until a grace period starts.  So, if grace periods are prevented from
> > 
> > Er... really?  Because it gets set and __rcu_pending looks at it
> > unconditionally in the case that is calling set_need_resched.  It
> > doesn't check if there is anything about a grace period going on or not.
> 
> Frederic noted the condition that prevents this at boot time, but it
> appears that newly onlined CPUs might send themselves needless resched
> IPIs at runtime if RCU is idle.
> 
> > > > starting, no need to mess with rcu_init_percpu_data().  Especially given
> > > > that rcu_init_percpu_data() is also used at late boot and runtime for
> > > > CPU hotplug.
> > > 
> > > Ok.
> > > 
> > > > 
> > > > So I believe that it is sufficient to change cpu_needs_another_gp()
> > > > to check for boot being far enough along to allow grace periods.
> > > 
> > > Yep, sounds good.
> > 
> > I looked at doing this but got lost as to 1) how it would help the
> > situtation I've reported, and 2) exactly how to do that.
> 
> It would prevent control from reaching that point, and that might
> well be needed for other reasons.  (This bit about RCU needing to
> work differently at boot time is, well, "interesting".)
> 
> > I'd be happy to test, but at the moment the proposed solution is
> > confusing to me.
> 
> Please see the attached.

Fixed it up quickly to apply on top of -rc2 and it seems to solve the
problem nicely.  Thanks for the patch.

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-15 14:04                                   ` Josh Boyer
@ 2011-08-15 15:20                                     ` Paul E. McKenney
  2011-08-17 22:37                                       ` Josh Boyer
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-15 15:20 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Frederic Weisbecker, linux-kernel, Andrew Morton

On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> On Sun, Aug 14, 2011 at 04:04:53PM -0700, Paul E. McKenney wrote:
> > > > > > Now rcu_init_percpu_data() still sets rdp->qs_pending to 1, and that
> > > > > > is going to stay as is as long as preemption is disabled.
> > > > > 
> > > > > But setting rdp->qs_pending to 1 in rcu_init_percpu_data() has no effect
> > > > > until a grace period starts.  So, if grace periods are prevented from
> > > 
> > > Er... really?  Because it gets set and __rcu_pending looks at it
> > > unconditionally in the case that is calling set_need_resched.  It
> > > doesn't check if there is anything about a grace period going on or not.
> > 
> > Frederic noted the condition that prevents this at boot time, but it
> > appears that newly onlined CPUs might send themselves needless resched
> > IPIs at runtime if RCU is idle.
> > 
> > > > > starting, no need to mess with rcu_init_percpu_data().  Especially given
> > > > > that rcu_init_percpu_data() is also used at late boot and runtime for
> > > > > CPU hotplug.
> > > > 
> > > > Ok.
> > > > 
> > > > > 
> > > > > So I believe that it is sufficient to change cpu_needs_another_gp()
> > > > > to check for boot being far enough along to allow grace periods.
> > > > 
> > > > Yep, sounds good.
> > > 
> > > I looked at doing this but got lost as to 1) how it would help the
> > > situtation I've reported, and 2) exactly how to do that.
> > 
> > It would prevent control from reaching that point, and that might
> > well be needed for other reasons.  (This bit about RCU needing to
> > work differently at boot time is, well, "interesting".)
> > 
> > > I'd be happy to test, but at the moment the proposed solution is
> > > confusing to me.
> > 
> > Please see the attached.
> 
> Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> problem nicely.  Thanks for the patch.

Good to hear!  I guess I should keep it, then.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-15 15:20                                     ` Paul E. McKenney
@ 2011-08-17 22:37                                       ` Josh Boyer
  2011-08-17 22:49                                         ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-17 22:37 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Frederic Weisbecker, linux-kernel, Andrew Morton

On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > Please see the attached.
> > 
> > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > problem nicely.  Thanks for the patch.
> 
> Good to hear!  I guess I should keep it, then.  ;-)

Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
it come across LKML yet.

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-17 22:37                                       ` Josh Boyer
@ 2011-08-17 22:49                                         ` Paul E. McKenney
  2011-08-17 23:02                                           ` Josh Boyer
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-17 22:49 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Frederic Weisbecker, linux-kernel, Andrew Morton

On Wed, Aug 17, 2011 at 06:37:35PM -0400, Josh Boyer wrote:
> On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > > Please see the attached.
> > > 
> > > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > > problem nicely.  Thanks for the patch.
> > 
> > Good to hear!  I guess I should keep it, then.  ;-)
> 
> Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
> it come across LKML yet.

I might...  But does it qualify as a regression?  That part of the
code hasn't changed for some time now.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-17 22:49                                         ` Paul E. McKenney
@ 2011-08-17 23:02                                           ` Josh Boyer
  2011-08-17 23:06                                             ` Frederic Weisbecker
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-17 23:02 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Frederic Weisbecker, linux-kernel, Andrew Morton

On Wed, Aug 17, 2011 at 03:49:16PM -0700, Paul E. McKenney wrote:
> On Wed, Aug 17, 2011 at 06:37:35PM -0400, Josh Boyer wrote:
> > On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> > > On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > > > Please see the attached.
> > > > 
> > > > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > > > problem nicely.  Thanks for the patch.
> > > 
> > > Good to hear!  I guess I should keep it, then.  ;-)
> > 
> > Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
> > it come across LKML yet.
> 
> I might...  But does it qualify as a regression?  That part of the
> code hasn't changed for some time now.

It's a fix for a problem that is newly surfaced in 3.1.  A regression,
likely not since it's been there forever, but new debugging options
uncovered it.  I'm pretty sure the -rc stage takes fixes even if they
aren't regressions.

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-17 23:02                                           ` Josh Boyer
@ 2011-08-17 23:06                                             ` Frederic Weisbecker
  2011-08-17 23:17                                               ` Josh Boyer
  0 siblings, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2011-08-17 23:06 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Paul E. McKenney, linux-kernel, Andrew Morton

On Wed, Aug 17, 2011 at 07:02:19PM -0400, Josh Boyer wrote:
> On Wed, Aug 17, 2011 at 03:49:16PM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 17, 2011 at 06:37:35PM -0400, Josh Boyer wrote:
> > > On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> > > > On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > > > > Please see the attached.
> > > > > 
> > > > > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > > > > problem nicely.  Thanks for the patch.
> > > > 
> > > > Good to hear!  I guess I should keep it, then.  ;-)
> > > 
> > > Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
> > > it come across LKML yet.
> > 
> > I might...  But does it qualify as a regression?  That part of the
> > code hasn't changed for some time now.
> 
> It's a fix for a problem that is newly surfaced in 3.1.  A regression,
> likely not since it's been there forever, but new debugging options
> uncovered it.  I'm pretty sure the -rc stage takes fixes even if they
> aren't regressions.

Nope, after -rc1 only regressions fixes are taken (most of the time).

> 
> josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-17 23:06                                             ` Frederic Weisbecker
@ 2011-08-17 23:17                                               ` Josh Boyer
  2011-08-18 18:35                                                 ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-17 23:17 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Paul E. McKenney, linux-kernel, Andrew Morton

On Thu, Aug 18, 2011 at 01:06:44AM +0200, Frederic Weisbecker wrote:
> On Wed, Aug 17, 2011 at 07:02:19PM -0400, Josh Boyer wrote:
> > On Wed, Aug 17, 2011 at 03:49:16PM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 17, 2011 at 06:37:35PM -0400, Josh Boyer wrote:
> > > > On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > > > > > Please see the attached.
> > > > > > 
> > > > > > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > > > > > problem nicely.  Thanks for the patch.
> > > > > 
> > > > > Good to hear!  I guess I should keep it, then.  ;-)
> > > > 
> > > > Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
> > > > it come across LKML yet.
> > > 
> > > I might...  But does it qualify as a regression?  That part of the
> > > code hasn't changed for some time now.
> > 
> > It's a fix for a problem that is newly surfaced in 3.1.  A regression,
> > likely not since it's been there forever, but new debugging options
> > uncovered it.  I'm pretty sure the -rc stage takes fixes even if they
> > aren't regressions.
> 
> Nope, after -rc1 only regressions fixes are taken (most of the time).

Sigh.

Look, either way I'm carrying this patch in Fedora because it fixes
a bug that is actually being reported by users (and by abrtd as well).
If you both want to wait until 3.2 to actually submit it to Linus,
then OK.

Honestly, I'm just glad we actually run with the debug options enabled
(which seems to be a rare thing) so bugs like this are actually found.
Thanks for the fix.

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-17 23:17                                               ` Josh Boyer
@ 2011-08-18 18:35                                                 ` Paul E. McKenney
  2011-08-18 19:11                                                   ` Josh Boyer
  2011-08-18 21:00                                                   ` Andrew Morton
  0 siblings, 2 replies; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-18 18:35 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Frederic Weisbecker, linux-kernel, Andrew Morton

On Wed, Aug 17, 2011 at 07:17:50PM -0400, Josh Boyer wrote:
> On Thu, Aug 18, 2011 at 01:06:44AM +0200, Frederic Weisbecker wrote:
> > On Wed, Aug 17, 2011 at 07:02:19PM -0400, Josh Boyer wrote:
> > > On Wed, Aug 17, 2011 at 03:49:16PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Aug 17, 2011 at 06:37:35PM -0400, Josh Boyer wrote:
> > > > > On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > > > > > > Please see the attached.
> > > > > > > 
> > > > > > > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > > > > > > problem nicely.  Thanks for the patch.
> > > > > > 
> > > > > > Good to hear!  I guess I should keep it, then.  ;-)
> > > > > 
> > > > > Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
> > > > > it come across LKML yet.
> > > > 
> > > > I might...  But does it qualify as a regression?  That part of the
> > > > code hasn't changed for some time now.
> > > 
> > > It's a fix for a problem that is newly surfaced in 3.1.  A regression,
> > > likely not since it's been there forever, but new debugging options
> > > uncovered it.  I'm pretty sure the -rc stage takes fixes even if they
> > > aren't regressions.
> > 
> > Nope, after -rc1 only regressions fixes are taken (most of the time).
> 
> Sigh.
> 
> Look, either way I'm carrying this patch in Fedora because it fixes
> a bug that is actually being reported by users (and by abrtd as well).
> If you both want to wait until 3.2 to actually submit it to Linus,
> then OK.
> 
> Honestly, I'm just glad we actually run with the debug options enabled
> (which seems to be a rare thing) so bugs like this are actually found.
> Thanks for the fix.

I am sorry, but I didn't make the rules!  And I must carry the fix
longer as well, if that makes you feel any better.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-18 18:35                                                 ` Paul E. McKenney
@ 2011-08-18 19:11                                                   ` Josh Boyer
  2011-08-18 21:00                                                   ` Andrew Morton
  1 sibling, 0 replies; 42+ messages in thread
From: Josh Boyer @ 2011-08-18 19:11 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Frederic Weisbecker, linux-kernel, Andrew Morton

On Thu, Aug 18, 2011 at 11:35:23AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 17, 2011 at 07:17:50PM -0400, Josh Boyer wrote:
> > On Thu, Aug 18, 2011 at 01:06:44AM +0200, Frederic Weisbecker wrote:
> > > On Wed, Aug 17, 2011 at 07:02:19PM -0400, Josh Boyer wrote:
> > > > On Wed, Aug 17, 2011 at 03:49:16PM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Aug 17, 2011 at 06:37:35PM -0400, Josh Boyer wrote:
> > > > > > On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> > > > > > > On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > > > > > > > Please see the attached.
> > > > > > > > 
> > > > > > > > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > > > > > > > problem nicely.  Thanks for the patch.
> > > > > > > 
> > > > > > > Good to hear!  I guess I should keep it, then.  ;-)
> > > > > > 
> > > > > > Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
> > > > > > it come across LKML yet.
> > > > > 
> > > > > I might...  But does it qualify as a regression?  That part of the
> > > > > code hasn't changed for some time now.
> > > > 
> > > > It's a fix for a problem that is newly surfaced in 3.1.  A regression,
> > > > likely not since it's been there forever, but new debugging options
> > > > uncovered it.  I'm pretty sure the -rc stage takes fixes even if they
> > > > aren't regressions.
> > > 
> > > Nope, after -rc1 only regressions fixes are taken (most of the time).
> > 
> > Sigh.
> > 
> > Look, either way I'm carrying this patch in Fedora because it fixes
> > a bug that is actually being reported by users (and by abrtd as well).
> > If you both want to wait until 3.2 to actually submit it to Linus,
> > then OK.
> > 
> > Honestly, I'm just glad we actually run with the debug options enabled
> > (which seems to be a rare thing) so bugs like this are actually found.
> > Thanks for the fix.
> 
> I am sorry, but I didn't make the rules!  And I must carry the fix
> longer as well, if that makes you feel any better.

My thank you was meant to be sincere, not snarky.  It's a two-line
addition to a spec file, so it's not as if the burden is unbearable.
Especially when doing it prevents me from duping a large number of bugs
for most of a release ;)

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-18 18:35                                                 ` Paul E. McKenney
  2011-08-18 19:11                                                   ` Josh Boyer
@ 2011-08-18 21:00                                                   ` Andrew Morton
  2011-08-18 21:23                                                     ` Paul E. McKenney
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2011-08-18 21:00 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, Frederic Weisbecker, linux-kernel

On Thu, 18 Aug 2011 11:35:23 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Wed, Aug 17, 2011 at 07:17:50PM -0400, Josh Boyer wrote:
> > On Thu, Aug 18, 2011 at 01:06:44AM +0200, Frederic Weisbecker wrote:
> > > On Wed, Aug 17, 2011 at 07:02:19PM -0400, Josh Boyer wrote:
> > > > On Wed, Aug 17, 2011 at 03:49:16PM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Aug 17, 2011 at 06:37:35PM -0400, Josh Boyer wrote:
> > > > > > On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> > > > > > > On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > > > > > > > Please see the attached.
> > > > > > > > 
> > > > > > > > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > > > > > > > problem nicely.  Thanks for the patch.
> > > > > > > 
> > > > > > > Good to hear!  I guess I should keep it, then.  ;-)
> > > > > > 
> > > > > > Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
> > > > > > it come across LKML yet.
> > > > > 
> > > > > I might...  But does it qualify as a regression?  That part of the
> > > > > code hasn't changed for some time now.
> > > > 
> > > > It's a fix for a problem that is newly surfaced in 3.1.  A regression,
> > > > likely not since it's been there forever, but new debugging options
> > > > uncovered it.  I'm pretty sure the -rc stage takes fixes even if they
> > > > aren't regressions.
> > > 
> > > Nope, after -rc1 only regressions fixes are taken (most of the time).
> > 
> > Sigh.
> > 
> > Look, either way I'm carrying this patch in Fedora because it fixes
> > a bug that is actually being reported by users (and by abrtd as well).
> > If you both want to wait until 3.2 to actually submit it to Linus,
> > then OK.
> > 
> > Honestly, I'm just glad we actually run with the debug options enabled
> > (which seems to be a rare thing) so bugs like this are actually found.
> > Thanks for the fix.
> 
> I am sorry, but I didn't make the rules!  And I must carry the fix
> longer as well, if that makes you feel any better.
> 

bah, we're not that anal.  The patch fixes a bug and prevents a nasty
warning spew.  Please, send it to Linus.

We appear to be referring to the patch "rcu: Avoid having just-onlined
CPU resched itself when RCU is idle"?  If so, the changelog doesn't
even mention that the patch fixes a scheduling-while-atomic warning and
the changelog fails to refer to the redhat bug report.  These omissions
should be repaired, please.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-18 21:00                                                   ` Andrew Morton
@ 2011-08-18 21:23                                                     ` Paul E. McKenney
  2011-08-18 21:55                                                       ` Paul E. McKenney
  2011-08-18 22:19                                                       ` Josh Boyer
  0 siblings, 2 replies; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-18 21:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Josh Boyer, Frederic Weisbecker, linux-kernel

On Thu, Aug 18, 2011 at 02:00:34PM -0700, Andrew Morton wrote:
> On Thu, 18 Aug 2011 11:35:23 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Wed, Aug 17, 2011 at 07:17:50PM -0400, Josh Boyer wrote:
> > > On Thu, Aug 18, 2011 at 01:06:44AM +0200, Frederic Weisbecker wrote:
> > > > On Wed, Aug 17, 2011 at 07:02:19PM -0400, Josh Boyer wrote:
> > > > > On Wed, Aug 17, 2011 at 03:49:16PM -0700, Paul E. McKenney wrote:
> > > > > > On Wed, Aug 17, 2011 at 06:37:35PM -0400, Josh Boyer wrote:
> > > > > > > On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> > > > > > > > On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > > > > > > > > Please see the attached.
> > > > > > > > > 
> > > > > > > > > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > > > > > > > > problem nicely.  Thanks for the patch.
> > > > > > > > 
> > > > > > > > Good to hear!  I guess I should keep it, then.  ;-)
> > > > > > > 
> > > > > > > Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
> > > > > > > it come across LKML yet.
> > > > > > 
> > > > > > I might...  But does it qualify as a regression?  That part of the
> > > > > > code hasn't changed for some time now.
> > > > > 
> > > > > It's a fix for a problem that is newly surfaced in 3.1.  A regression,
> > > > > likely not since it's been there forever, but new debugging options
> > > > > uncovered it.  I'm pretty sure the -rc stage takes fixes even if they
> > > > > aren't regressions.
> > > > 
> > > > Nope, after -rc1 only regressions fixes are taken (most of the time).
> > > 
> > > Sigh.
> > > 
> > > Look, either way I'm carrying this patch in Fedora because it fixes
> > > a bug that is actually being reported by users (and by abrtd as well).
> > > If you both want to wait until 3.2 to actually submit it to Linus,
> > > then OK.
> > > 
> > > Honestly, I'm just glad we actually run with the debug options enabled
> > > (which seems to be a rare thing) so bugs like this are actually found.
> > > Thanks for the fix.
> > 
> > I am sorry, but I didn't make the rules!  And I must carry the fix
> > longer as well, if that makes you feel any better.
> 
> bah, we're not that anal.  The patch fixes a bug and prevents a nasty
> warning spew.  Please, send it to Linus.

Given your Acked-by and Josh's Tested-by I might consider it.  ;-)

Speaking of which, Josh, does this patch help Nicolas and Michal?

> We appear to be referring to the patch "rcu: Avoid having just-onlined
> CPU resched itself when RCU is idle"?  If so, the changelog doesn't
> even mention that the patch fixes a scheduling-while-atomic warning and
> the changelog fails to refer to the redhat bug report.  These omissions
> should be repaired, please.

OK...  But I cannot bring myself to believe that my fix does more than
hide some other bug.  Which is OK, I will just say that in the changelog.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-18 21:23                                                     ` Paul E. McKenney
@ 2011-08-18 21:55                                                       ` Paul E. McKenney
  2011-08-18 22:21                                                         ` Josh Boyer
  2011-08-24 22:45                                                         ` Frederic Weisbecker
  2011-08-18 22:19                                                       ` Josh Boyer
  1 sibling, 2 replies; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-18 21:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Josh Boyer, Frederic Weisbecker, linux-kernel

On Thu, Aug 18, 2011 at 02:23:34PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 18, 2011 at 02:00:34PM -0700, Andrew Morton wrote:
> > On Thu, 18 Aug 2011 11:35:23 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > On Wed, Aug 17, 2011 at 07:17:50PM -0400, Josh Boyer wrote:
> > > > On Thu, Aug 18, 2011 at 01:06:44AM +0200, Frederic Weisbecker wrote:
> > > > > On Wed, Aug 17, 2011 at 07:02:19PM -0400, Josh Boyer wrote:
> > > > > > On Wed, Aug 17, 2011 at 03:49:16PM -0700, Paul E. McKenney wrote:
> > > > > > > On Wed, Aug 17, 2011 at 06:37:35PM -0400, Josh Boyer wrote:
> > > > > > > > On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> > > > > > > > > On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > > > > > > > > > Please see the attached.
> > > > > > > > > > 
> > > > > > > > > > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > > > > > > > > > problem nicely.  Thanks for the patch.
> > > > > > > > > 
> > > > > > > > > Good to hear!  I guess I should keep it, then.  ;-)
> > > > > > > > 
> > > > > > > > Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
> > > > > > > > it come across LKML yet.
> > > > > > > 
> > > > > > > I might...  But does it qualify as a regression?  That part of the
> > > > > > > code hasn't changed for some time now.
> > > > > > 
> > > > > > It's a fix for a problem that is newly surfaced in 3.1.  A regression,
> > > > > > likely not since it's been there forever, but new debugging options
> > > > > > uncovered it.  I'm pretty sure the -rc stage takes fixes even if they
> > > > > > aren't regressions.
> > > > > 
> > > > > Nope, after -rc1 only regressions fixes are taken (most of the time).
> > > > 
> > > > Sigh.
> > > > 
> > > > Look, either way I'm carrying this patch in Fedora because it fixes
> > > > a bug that is actually being reported by users (and by abrtd as well).
> > > > If you both want to wait until 3.2 to actually submit it to Linus,
> > > > then OK.
> > > > 
> > > > Honestly, I'm just glad we actually run with the debug options enabled
> > > > (which seems to be a rare thing) so bugs like this are actually found.
> > > > Thanks for the fix.
> > > 
> > > I am sorry, but I didn't make the rules!  And I must carry the fix
> > > longer as well, if that makes you feel any better.
> > 
> > bah, we're not that anal.  The patch fixes a bug and prevents a nasty
> > warning spew.  Please, send it to Linus.
> 
> Given your Acked-by and Josh's Tested-by I might consider it.  ;-)
> 
> Speaking of which, Josh, does this patch help Nicolas and Michal?
> 
> > We appear to be referring to the patch "rcu: Avoid having just-onlined
> > CPU resched itself when RCU is idle"?  If so, the changelog doesn't
> > even mention that the patch fixes a scheduling-while-atomic warning and
> > the changelog fails to refer to the redhat bug report.  These omissions
> > should be repaired, please.
> 
> OK...  But I cannot bring myself to believe that my fix does more than
> hide some other bug.  Which is OK, I will just say that in the changelog.

And here is this patch ported to v3.1-rc2, FYI.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Avoid having just-onlined CPU resched itself when RCU is idle

CPUs set rdp->qs_pending when coming online to resolve races with
grace-period start.  However, this means that if RCU is idle, the
just-onlined CPU might needlessly send itself resched IPIs.  Adjust the
online-CPU initialization to avoid this, and also to correctly cause
the CPU to respond to the current grace period if needed.

This patch is believed to fix or otherwise suppress problems in
https://bugzilla.redhat.com/show_bug.cgi?id=726877, however, the
relationship is not apparent to this patch's author.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index ba06207..6986d34 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1865,8 +1865,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
 
 	/* Set up local state, ensuring consistent view of global state. */
 	raw_spin_lock_irqsave(&rnp->lock, flags);
-	rdp->passed_quiesc = 0;  /* We could be racing with new GP, */
-	rdp->qs_pending = 1;	 /*  so set up to respond to current GP. */
 	rdp->beenonline = 1;	 /* We have now been online. */
 	rdp->preemptible = preemptible;
 	rdp->qlen_last_fqs_check = 0;
@@ -1891,8 +1889,15 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
 		rnp->qsmaskinit |= mask;
 		mask = rnp->grpmask;
 		if (rnp == rdp->mynode) {
-			rdp->gpnum = rnp->completed; /* if GP in progress... */
+			/*
+			 * If there is a grace period in progress, we will
+			 * set up to wait for it next time we run the
+			 * RCU core code.
+			 */
+			rdp->gpnum = rnp->completed;
 			rdp->completed = rnp->completed;
+			rdp->passed_quiesc = 0;
+			rdp->qs_pending = 1;
 			rdp->passed_quiesc_completed = rnp->completed - 1;
 		}
 		raw_spin_unlock(&rnp->lock); /* irqs already disabled. */

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-18 21:23                                                     ` Paul E. McKenney
  2011-08-18 21:55                                                       ` Paul E. McKenney
@ 2011-08-18 22:19                                                       ` Josh Boyer
  2011-08-18 23:16                                                         ` Paul E. McKenney
  1 sibling, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-18 22:19 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Andrew Morton, Frederic Weisbecker, linux-kernel

On Thu, Aug 18, 2011 at 02:23:34PM -0700, Paul E. McKenney wrote:
> > > I am sorry, but I didn't make the rules!  And I must carry the fix
> > > longer as well, if that makes you feel any better.
> > 
> > bah, we're not that anal.  The patch fixes a bug and prevents a nasty
> > warning spew.  Please, send it to Linus.
> 
> Given your Acked-by and Josh's Tested-by I might consider it.  ;-)
> 
> Speaking of which, Josh, does this patch help Nicolas and Michal?

Yes, it did.  The bug is closed because I included the patch.  (I could
recreate it myself as well.)

> > We appear to be referring to the patch "rcu: Avoid having just-onlined
> > CPU resched itself when RCU is idle"?  If so, the changelog doesn't
> > even mention that the patch fixes a scheduling-while-atomic warning and
> > the changelog fails to refer to the redhat bug report.  These omissions
> > should be repaired, please.
> 
> OK...  But I cannot bring myself to believe that my fix does more than
> hide some other bug.  Which is OK, I will just say that in the changelog.

I looked for this other bug myself for a while.  I couldn't find it.

josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-18 21:55                                                       ` Paul E. McKenney
@ 2011-08-18 22:21                                                         ` Josh Boyer
  2011-08-18 23:01                                                           ` Paul E. McKenney
  2011-08-24 22:45                                                         ` Frederic Weisbecker
  1 sibling, 1 reply; 42+ messages in thread
From: Josh Boyer @ 2011-08-18 22:21 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Andrew Morton, Frederic Weisbecker, linux-kernel

On Thu, Aug 18, 2011 at 02:55:40PM -0700, Paul E. McKenney wrote:
> And here is this patch ported to v3.1-rc2, FYI.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> rcu: Avoid having just-onlined CPU resched itself when RCU is idle
> 
> CPUs set rdp->qs_pending when coming online to resolve races with
> grace-period start.  However, this means that if RCU is idle, the
> just-onlined CPU might needlessly send itself resched IPIs.  Adjust the
> online-CPU initialization to avoid this, and also to correctly cause
> the CPU to respond to the current grace period if needed.
> 
> This patch is believed to fix or otherwise suppress problems in
> https://bugzilla.redhat.com/show_bug.cgi?id=726877, however, the
> relationship is not apparent to this patch's author.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

This is what has been in rawhide for a while now.

Tested-by: Josh Boyer <jwboyer@redhat.com>

> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index ba06207..6986d34 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1865,8 +1865,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
>  
>  	/* Set up local state, ensuring consistent view of global state. */
>  	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	rdp->passed_quiesc = 0;  /* We could be racing with new GP, */
> -	rdp->qs_pending = 1;	 /*  so set up to respond to current GP. */
>  	rdp->beenonline = 1;	 /* We have now been online. */
>  	rdp->preemptible = preemptible;
>  	rdp->qlen_last_fqs_check = 0;
> @@ -1891,8 +1889,15 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
>  		rnp->qsmaskinit |= mask;
>  		mask = rnp->grpmask;
>  		if (rnp == rdp->mynode) {
> -			rdp->gpnum = rnp->completed; /* if GP in progress... */
> +			/*
> +			 * If there is a grace period in progress, we will
> +			 * set up to wait for it next time we run the
> +			 * RCU core code.
> +			 */
> +			rdp->gpnum = rnp->completed;
>  			rdp->completed = rnp->completed;
> +			rdp->passed_quiesc = 0;
> +			rdp->qs_pending = 1;
>  			rdp->passed_quiesc_completed = rnp->completed - 1;
>  		}
>  		raw_spin_unlock(&rnp->lock); /* irqs already disabled. */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-18 22:21                                                         ` Josh Boyer
@ 2011-08-18 23:01                                                           ` Paul E. McKenney
  0 siblings, 0 replies; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-18 23:01 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Andrew Morton, Frederic Weisbecker, linux-kernel

On Thu, Aug 18, 2011 at 06:21:08PM -0400, Josh Boyer wrote:
> On Thu, Aug 18, 2011 at 02:55:40PM -0700, Paul E. McKenney wrote:
> > And here is this patch ported to v3.1-rc2, FYI.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > rcu: Avoid having just-onlined CPU resched itself when RCU is idle
> > 
> > CPUs set rdp->qs_pending when coming online to resolve races with
> > grace-period start.  However, this means that if RCU is idle, the
> > just-onlined CPU might needlessly send itself resched IPIs.  Adjust the
> > online-CPU initialization to avoid this, and also to correctly cause
> > the CPU to respond to the current grace period if needed.
> > 
> > This patch is believed to fix or otherwise suppress problems in
> > https://bugzilla.redhat.com/show_bug.cgi?id=726877, however, the
> > relationship is not apparent to this patch's author.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> This is what has been in rawhide for a while now.
> 
> Tested-by: Josh Boyer <jwboyer@redhat.com>

Thank you, I have added your Tested-by.

							Thanx, Paul

> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index ba06207..6986d34 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -1865,8 +1865,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
> >  
> >  	/* Set up local state, ensuring consistent view of global state. */
> >  	raw_spin_lock_irqsave(&rnp->lock, flags);
> > -	rdp->passed_quiesc = 0;  /* We could be racing with new GP, */
> > -	rdp->qs_pending = 1;	 /*  so set up to respond to current GP. */
> >  	rdp->beenonline = 1;	 /* We have now been online. */
> >  	rdp->preemptible = preemptible;
> >  	rdp->qlen_last_fqs_check = 0;
> > @@ -1891,8 +1889,15 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
> >  		rnp->qsmaskinit |= mask;
> >  		mask = rnp->grpmask;
> >  		if (rnp == rdp->mynode) {
> > -			rdp->gpnum = rnp->completed; /* if GP in progress... */
> > +			/*
> > +			 * If there is a grace period in progress, we will
> > +			 * set up to wait for it next time we run the
> > +			 * RCU core code.
> > +			 */
> > +			rdp->gpnum = rnp->completed;
> >  			rdp->completed = rnp->completed;
> > +			rdp->passed_quiesc = 0;
> > +			rdp->qs_pending = 1;
> >  			rdp->passed_quiesc_completed = rnp->completed - 1;
> >  		}
> >  		raw_spin_unlock(&rnp->lock); /* irqs already disabled. */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-18 22:19                                                       ` Josh Boyer
@ 2011-08-18 23:16                                                         ` Paul E. McKenney
  2011-08-18 23:27                                                           ` Andrew Morton
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-18 23:16 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Andrew Morton, Frederic Weisbecker, linux-kernel

On Thu, Aug 18, 2011 at 06:19:20PM -0400, Josh Boyer wrote:
> On Thu, Aug 18, 2011 at 02:23:34PM -0700, Paul E. McKenney wrote:
> > > > I am sorry, but I didn't make the rules!  And I must carry the fix
> > > > longer as well, if that makes you feel any better.
> > > 
> > > bah, we're not that anal.  The patch fixes a bug and prevents a nasty
> > > warning spew.  Please, send it to Linus.
> > 
> > Given your Acked-by and Josh's Tested-by I might consider it.  ;-)
> > 
> > Speaking of which, Josh, does this patch help Nicolas and Michal?
> 
> Yes, it did.  The bug is closed because I included the patch.  (I could
> recreate it myself as well.)

OK, good to know!

> > > We appear to be referring to the patch "rcu: Avoid having just-onlined
> > > CPU resched itself when RCU is idle"?  If so, the changelog doesn't
> > > even mention that the patch fixes a scheduling-while-atomic warning and
> > > the changelog fails to refer to the redhat bug report.  These omissions
> > > should be repaired, please.
> > 
> > OK...  But I cannot bring myself to believe that my fix does more than
> > hide some other bug.  Which is OK, I will just say that in the changelog.
> 
> I looked for this other bug myself for a while.  I couldn't find it.

Well, if there really is some other bug, it will make its continued
presence know soon enough.

Andrew, are you willing to ack this?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-18 23:16                                                         ` Paul E. McKenney
@ 2011-08-18 23:27                                                           ` Andrew Morton
  2011-08-19  0:38                                                             ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2011-08-18 23:27 UTC (permalink / raw)
  To: paulmck; +Cc: Josh Boyer, Frederic Weisbecker, linux-kernel

On Thu, 18 Aug 2011 16:16:46 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> Andrew, are you willing to ack this?

WhoMeNope.  That would imply that I understood it, but my brain is far
too small to understand rcutree.c - that's what we have paulmcks for.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-18 23:27                                                           ` Andrew Morton
@ 2011-08-19  0:38                                                             ` Paul E. McKenney
  0 siblings, 0 replies; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-19  0:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Josh Boyer, Frederic Weisbecker, linux-kernel

On Thu, Aug 18, 2011 at 04:27:23PM -0700, Andrew Morton wrote:
> On Thu, 18 Aug 2011 16:16:46 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Andrew, are you willing to ack this?
> 
> WhoMeNope.  That would imply that I understood it, but my brain is far
> too small to understand rcutree.c - that's what we have paulmcks for.

Yeah, well, the only reason my brain can encompass rcutree.c is that
there is not much else there.  I will CC you when submitting the patch.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-18 21:55                                                       ` Paul E. McKenney
  2011-08-18 22:21                                                         ` Josh Boyer
@ 2011-08-24 22:45                                                         ` Frederic Weisbecker
  2011-08-24 23:12                                                           ` Paul E. McKenney
  1 sibling, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2011-08-24 22:45 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Andrew Morton, Josh Boyer, linux-kernel

On Thu, Aug 18, 2011 at 02:55:40PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 18, 2011 at 02:23:34PM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 18, 2011 at 02:00:34PM -0700, Andrew Morton wrote:
> > > On Thu, 18 Aug 2011 11:35:23 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > 
> > > > On Wed, Aug 17, 2011 at 07:17:50PM -0400, Josh Boyer wrote:
> > > > > On Thu, Aug 18, 2011 at 01:06:44AM +0200, Frederic Weisbecker wrote:
> > > > > > On Wed, Aug 17, 2011 at 07:02:19PM -0400, Josh Boyer wrote:
> > > > > > > On Wed, Aug 17, 2011 at 03:49:16PM -0700, Paul E. McKenney wrote:
> > > > > > > > On Wed, Aug 17, 2011 at 06:37:35PM -0400, Josh Boyer wrote:
> > > > > > > > > On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> > > > > > > > > > On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > > > > > > > > > > Please see the attached.
> > > > > > > > > > > 
> > > > > > > > > > > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > > > > > > > > > > problem nicely.  Thanks for the patch.
> > > > > > > > > > 
> > > > > > > > > > Good to hear!  I guess I should keep it, then.  ;-)
> > > > > > > > > 
> > > > > > > > > Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
> > > > > > > > > it come across LKML yet.
> > > > > > > > 
> > > > > > > > I might...  But does it qualify as a regression?  That part of the
> > > > > > > > code hasn't changed for some time now.
> > > > > > > 
> > > > > > > It's a fix for a problem that is newly surfaced in 3.1.  A regression,
> > > > > > > likely not since it's been there forever, but new debugging options
> > > > > > > uncovered it.  I'm pretty sure the -rc stage takes fixes even if they
> > > > > > > aren't regressions.
> > > > > > 
> > > > > > Nope, after -rc1 only regressions fixes are taken (most of the time).
> > > > > 
> > > > > Sigh.
> > > > > 
> > > > > Look, either way I'm carrying this patch in Fedora because it fixes
> > > > > a bug that is actually being reported by users (and by abrtd as well).
> > > > > If you both want to wait until 3.2 to actually submit it to Linus,
> > > > > then OK.
> > > > > 
> > > > > Honestly, I'm just glad we actually run with the debug options enabled
> > > > > (which seems to be a rare thing) so bugs like this are actually found.
> > > > > Thanks for the fix.
> > > > 
> > > > I am sorry, but I didn't make the rules!  And I must carry the fix
> > > > longer as well, if that makes you feel any better.
> > > 
> > > bah, we're not that anal.  The patch fixes a bug and prevents a nasty
> > > warning spew.  Please, send it to Linus.
> > 
> > Given your Acked-by and Josh's Tested-by I might consider it.  ;-)
> > 
> > Speaking of which, Josh, does this patch help Nicolas and Michal?
> > 
> > > We appear to be referring to the patch "rcu: Avoid having just-onlined
> > > CPU resched itself when RCU is idle"?  If so, the changelog doesn't
> > > even mention that the patch fixes a scheduling-while-atomic warning and
> > > the changelog fails to refer to the redhat bug report.  These omissions
> > > should be repaired, please.
> > 
> > OK...  But I cannot bring myself to believe that my fix does more than
> > hide some other bug.  Which is OK, I will just say that in the changelog.
> 
> And here is this patch ported to v3.1-rc2, FYI.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> rcu: Avoid having just-onlined CPU resched itself when RCU is idle
> 
> CPUs set rdp->qs_pending when coming online to resolve races with
> grace-period start.  However, this means that if RCU is idle, the
> just-onlined CPU might needlessly send itself resched IPIs.  Adjust the
> online-CPU initialization to avoid this, and also to correctly cause
> the CPU to respond to the current grace period if needed.
> 
> This patch is believed to fix or otherwise suppress problems in
> https://bugzilla.redhat.com/show_bug.cgi?id=726877, however, the
> relationship is not apparent to this patch's author.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index ba06207..6986d34 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1865,8 +1865,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
>  
>  	/* Set up local state, ensuring consistent view of global state. */
>  	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	rdp->passed_quiesc = 0;  /* We could be racing with new GP, */
> -	rdp->qs_pending = 1;	 /*  so set up to respond to current GP. */
>  	rdp->beenonline = 1;	 /* We have now been online. */
>  	rdp->preemptible = preemptible;
>  	rdp->qlen_last_fqs_check = 0;
> @@ -1891,8 +1889,15 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
>  		rnp->qsmaskinit |= mask;
>  		mask = rnp->grpmask;
>  		if (rnp == rdp->mynode) {
> -			rdp->gpnum = rnp->completed; /* if GP in progress... */
> +			/*
> +			 * If there is a grace period in progress, we will
> +			 * set up to wait for it next time we run the
> +			 * RCU core code.
> +			 */
> +			rdp->gpnum = rnp->completed;
>  			rdp->completed = rnp->completed;
> +			rdp->passed_quiesc = 0;
> +			rdp->qs_pending = 1;

In the previous version you had rdp->qs_pending = 0 here.
If it's set to 0 I can understand that it fixes the problem.
Otherwise, set to 1 I don't know how it fixes the thing.

Should it perhaps set it to 1 only if we have rnp->gpnum > rnp->completed ?

>  			rdp->passed_quiesc_completed = rnp->completed - 1;
>  		}
>  		raw_spin_unlock(&rnp->lock); /* irqs already disabled. */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-24 22:45                                                         ` Frederic Weisbecker
@ 2011-08-24 23:12                                                           ` Paul E. McKenney
  2011-08-24 23:34                                                             ` Frederic Weisbecker
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-24 23:12 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Andrew Morton, Josh Boyer, linux-kernel

On Thu, Aug 25, 2011 at 12:45:00AM +0200, Frederic Weisbecker wrote:
> On Thu, Aug 18, 2011 at 02:55:40PM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 18, 2011 at 02:23:34PM -0700, Paul E. McKenney wrote:
> > > On Thu, Aug 18, 2011 at 02:00:34PM -0700, Andrew Morton wrote:
> > > > On Thu, 18 Aug 2011 11:35:23 -0700
> > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > On Wed, Aug 17, 2011 at 07:17:50PM -0400, Josh Boyer wrote:
> > > > > > On Thu, Aug 18, 2011 at 01:06:44AM +0200, Frederic Weisbecker wrote:
> > > > > > > On Wed, Aug 17, 2011 at 07:02:19PM -0400, Josh Boyer wrote:
> > > > > > > > On Wed, Aug 17, 2011 at 03:49:16PM -0700, Paul E. McKenney wrote:
> > > > > > > > > On Wed, Aug 17, 2011 at 06:37:35PM -0400, Josh Boyer wrote:
> > > > > > > > > > On Mon, Aug 15, 2011 at 08:20:52AM -0700, Paul E. McKenney wrote:
> > > > > > > > > > > On Mon, Aug 15, 2011 at 10:04:17AM -0400, Josh Boyer wrote:
> > > > > > > > > > > > > Please see the attached.
> > > > > > > > > > > > 
> > > > > > > > > > > > Fixed it up quickly to apply on top of -rc2 and it seems to solve the
> > > > > > > > > > > > problem nicely.  Thanks for the patch.
> > > > > > > > > > > 
> > > > > > > > > > > Good to hear!  I guess I should keep it, then.  ;-)
> > > > > > > > > > 
> > > > > > > > > > Hey Paul, were you going to send this to Linus for -rc3?  I haven't seen
> > > > > > > > > > it come across LKML yet.
> > > > > > > > > 
> > > > > > > > > I might...  But does it qualify as a regression?  That part of the
> > > > > > > > > code hasn't changed for some time now.
> > > > > > > > 
> > > > > > > > It's a fix for a problem that is newly surfaced in 3.1.  A regression,
> > > > > > > > likely not since it's been there forever, but new debugging options
> > > > > > > > uncovered it.  I'm pretty sure the -rc stage takes fixes even if they
> > > > > > > > aren't regressions.
> > > > > > > 
> > > > > > > Nope, after -rc1 only regressions fixes are taken (most of the time).
> > > > > > 
> > > > > > Sigh.
> > > > > > 
> > > > > > Look, either way I'm carrying this patch in Fedora because it fixes
> > > > > > a bug that is actually being reported by users (and by abrtd as well).
> > > > > > If you both want to wait until 3.2 to actually submit it to Linus,
> > > > > > then OK.
> > > > > > 
> > > > > > Honestly, I'm just glad we actually run with the debug options enabled
> > > > > > (which seems to be a rare thing) so bugs like this are actually found.
> > > > > > Thanks for the fix.
> > > > > 
> > > > > I am sorry, but I didn't make the rules!  And I must carry the fix
> > > > > longer as well, if that makes you feel any better.
> > > > 
> > > > bah, we're not that anal.  The patch fixes a bug and prevents a nasty
> > > > warning spew.  Please, send it to Linus.
> > > 
> > > Given your Acked-by and Josh's Tested-by I might consider it.  ;-)
> > > 
> > > Speaking of which, Josh, does this patch help Nicolas and Michal?
> > > 
> > > > We appear to be referring to the patch "rcu: Avoid having just-onlined
> > > > CPU resched itself when RCU is idle"?  If so, the changelog doesn't
> > > > even mention that the patch fixes a scheduling-while-atomic warning and
> > > > the changelog fails to refer to the redhat bug report.  These omissions
> > > > should be repaired, please.
> > > 
> > > OK...  But I cannot bring myself to believe that my fix does more than
> > > hide some other bug.  Which is OK, I will just say that in the changelog.
> > 
> > And here is this patch ported to v3.1-rc2, FYI.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > rcu: Avoid having just-onlined CPU resched itself when RCU is idle
> > 
> > CPUs set rdp->qs_pending when coming online to resolve races with
> > grace-period start.  However, this means that if RCU is idle, the
> > just-onlined CPU might needlessly send itself resched IPIs.  Adjust the
> > online-CPU initialization to avoid this, and also to correctly cause
> > the CPU to respond to the current grace period if needed.
> > 
> > This patch is believed to fix or otherwise suppress problems in
> > https://bugzilla.redhat.com/show_bug.cgi?id=726877, however, the
> > relationship is not apparent to this patch's author.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index ba06207..6986d34 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -1865,8 +1865,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
> >  
> >  	/* Set up local state, ensuring consistent view of global state. */
> >  	raw_spin_lock_irqsave(&rnp->lock, flags);
> > -	rdp->passed_quiesc = 0;  /* We could be racing with new GP, */
> > -	rdp->qs_pending = 1;	 /*  so set up to respond to current GP. */
> >  	rdp->beenonline = 1;	 /* We have now been online. */
> >  	rdp->preemptible = preemptible;
> >  	rdp->qlen_last_fqs_check = 0;
> > @@ -1891,8 +1889,15 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
> >  		rnp->qsmaskinit |= mask;
> >  		mask = rnp->grpmask;
> >  		if (rnp == rdp->mynode) {
> > -			rdp->gpnum = rnp->completed; /* if GP in progress... */
> > +			/*
> > +			 * If there is a grace period in progress, we will
> > +			 * set up to wait for it next time we run the
> > +			 * RCU core code.
> > +			 */
> > +			rdp->gpnum = rnp->completed;
> >  			rdp->completed = rnp->completed;
> > +			rdp->passed_quiesc = 0;
> > +			rdp->qs_pending = 1;
> 
> In the previous version you had rdp->qs_pending = 0 here.
> If it's set to 0 I can understand that it fixes the problem.
> Otherwise, set to 1 I don't know how it fixes the thing.

Indeed, this was a bogus patch.  The version I posted on the -rcu git
tree a few days ago has the correct "rdp->qs_pending = 0".  I thought
that I had chased down all the bogus copies, but obviously not.  :-(

> Should it perhaps set it to 1 only if we have rnp->gpnum > rnp->completed ?

I would rather keep it simple.  If rnp->gpnum > rnp->completed, then
the newly onlined CPU will notice and adjust appropriately soon enough.

							Thanx, Paul

> >  			rdp->passed_quiesc_completed = rnp->completed - 1;
> >  		}
> >  		raw_spin_unlock(&rnp->lock); /* irqs already disabled. */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-24 23:12                                                           ` Paul E. McKenney
@ 2011-08-24 23:34                                                             ` Frederic Weisbecker
  2011-08-24 23:57                                                               ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2011-08-24 23:34 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Andrew Morton, Josh Boyer, linux-kernel

On Wed, Aug 24, 2011 at 04:12:55PM -0700, Paul E. McKenney wrote:
> Indeed, this was a bogus patch.  The version I posted on the -rcu git
> tree a few days ago has the correct "rdp->qs_pending = 0".  I thought
> that I had chased down all the bogus copies, but obviously not.  :-(

Ah ok.
 
> > Should it perhaps set it to 1 only if we have rnp->gpnum > rnp->completed ?
> 
> I would rather keep it simple.  If rnp->gpnum > rnp->completed, then
> the newly onlined CPU will notice and adjust appropriately soon enough.

Ah, just to ensure I understood well is that because rcu_pending() (called
from tick) would detect rnp->gpnum != rdp->gpnum and thus trigger the softirq
that notes the new gpnum and toggles  qs_pending?

> 							Thanx, Paul
> 
> > >  			rdp->passed_quiesc_completed = rnp->completed - 1;
> > >  		}
> > >  		raw_spin_unlock(&rnp->lock); /* irqs already disabled. */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: 3.0-git15 Atomic scheduling in pidmap_init
  2011-08-24 23:34                                                             ` Frederic Weisbecker
@ 2011-08-24 23:57                                                               ` Paul E. McKenney
  0 siblings, 0 replies; 42+ messages in thread
From: Paul E. McKenney @ 2011-08-24 23:57 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Andrew Morton, Josh Boyer, linux-kernel

On Thu, Aug 25, 2011 at 01:34:24AM +0200, Frederic Weisbecker wrote:
> On Wed, Aug 24, 2011 at 04:12:55PM -0700, Paul E. McKenney wrote:
> > Indeed, this was a bogus patch.  The version I posted on the -rcu git
> > tree a few days ago has the correct "rdp->qs_pending = 0".  I thought
> > that I had chased down all the bogus copies, but obviously not.  :-(
> 
> Ah ok.
> 
> > > Should it perhaps set it to 1 only if we have rnp->gpnum > rnp->completed ?
> > 
> > I would rather keep it simple.  If rnp->gpnum > rnp->completed, then
> > the newly onlined CPU will notice and adjust appropriately soon enough.
> 
> Ah, just to ensure I understood well is that because rcu_pending() (called
> from tick) would detect rnp->gpnum != rdp->gpnum and thus trigger the softirq
> that notes the new gpnum and toggles  qs_pending?

Yep, that is the theory, anyway.  ;-)

							Thanx, Paul

> > > >  			rdp->passed_quiesc_completed = rnp->completed - 1;
> > > >  		}
> > > >  		raw_spin_unlock(&rnp->lock); /* irqs already disabled. */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2011-08-24 23:57 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-01 15:46 3.0-git15 Atomic scheduling in pidmap_init Josh Boyer
2011-08-04 11:46 ` Josh Boyer
2011-08-04 14:04   ` Paul E. McKenney
2011-08-04 15:06     ` Josh Boyer
2011-08-04 16:26       ` Paul E. McKenney
2011-08-04 17:31         ` Josh Boyer
2011-08-05  1:19           ` Josh Boyer
2011-08-05  6:56             ` Paul E. McKenney
2011-08-05 14:22               ` Josh Boyer
2011-08-05 17:08                 ` Frederic Weisbecker
2011-08-05 22:26                   ` Paul E. McKenney
2011-08-05 23:12                     ` Frederic Weisbecker
2011-08-08  2:09                       ` Paul E. McKenney
2011-08-08  2:55                         ` Frederic Weisbecker
2011-08-08  3:10                           ` Paul E. McKenney
2011-08-09 11:35                             ` Frederic Weisbecker
2011-08-10 12:45                               ` Josh Boyer
2011-08-10 14:53                                 ` Frederic Weisbecker
2011-08-10 15:03                                   ` Josh Boyer
2011-08-14 23:04                                 ` Paul E. McKenney
2011-08-15 14:04                                   ` Josh Boyer
2011-08-15 15:20                                     ` Paul E. McKenney
2011-08-17 22:37                                       ` Josh Boyer
2011-08-17 22:49                                         ` Paul E. McKenney
2011-08-17 23:02                                           ` Josh Boyer
2011-08-17 23:06                                             ` Frederic Weisbecker
2011-08-17 23:17                                               ` Josh Boyer
2011-08-18 18:35                                                 ` Paul E. McKenney
2011-08-18 19:11                                                   ` Josh Boyer
2011-08-18 21:00                                                   ` Andrew Morton
2011-08-18 21:23                                                     ` Paul E. McKenney
2011-08-18 21:55                                                       ` Paul E. McKenney
2011-08-18 22:21                                                         ` Josh Boyer
2011-08-18 23:01                                                           ` Paul E. McKenney
2011-08-24 22:45                                                         ` Frederic Weisbecker
2011-08-24 23:12                                                           ` Paul E. McKenney
2011-08-24 23:34                                                             ` Frederic Weisbecker
2011-08-24 23:57                                                               ` Paul E. McKenney
2011-08-18 22:19                                                       ` Josh Boyer
2011-08-18 23:16                                                         ` Paul E. McKenney
2011-08-18 23:27                                                           ` Andrew Morton
2011-08-19  0:38                                                             ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.