All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
@ 2015-02-26  3:22 Preeti U Murthy
  2015-03-02 14:53 ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Preeti U Murthy @ 2015-02-26  3:22 UTC (permalink / raw)
  To: peterz, tglx; +Cc: linux-kernel

The hrtimer mode of broadcast queues hrtimers in the idle entry
path so as to wakeup cpus in deep idle states. hrtimer_{start/cancel}
functions call into tracing which uses RCU. But it is not legal to call
into RCU in cpuidle because it is one of the quiescent states. Hence
protect this region with RCU_NONIDLE which informs RCU that the cpu
is momentarily non-idle.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 kernel/time/tick-broadcast-hrtimer.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c
index eb682d5..6aac4be 100644
--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -49,6 +49,7 @@ static void bc_set_mode(enum clock_event_mode mode,
  */
 static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
 {
+	int bc_moved;
 	/*
 	 * We try to cancel the timer first. If the callback is on
 	 * flight on some other cpu then we let it handle it. If we
@@ -60,9 +61,15 @@ static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
 	 * restart the timer because we are in the callback, but we
 	 * can set the expiry time and let the callback return
 	 * HRTIMER_RESTART.
+	 *
+	 * Since we are in the idle loop at this point and because
+	 * hrtimer_{start/cancel} functions call into tracing,
+	 * calls to these functions must be bound within RCU_NONIDLE.
 	 */
-	if (hrtimer_try_to_cancel(&bctimer) >= 0) {
-		hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED);
+	RCU_NONIDLE(bc_moved = (hrtimer_try_to_cancel(&bctimer) >= 0) ?
+		!hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED) :
+			0);
+	if (bc_moved) {
 		/* Bind the "device" to the cpu */
 		bc->bound_on = smp_processor_id();
 	} else if (bc->bound_on == smp_processor_id()) {


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
  2015-02-26  3:22 [PATCH] tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop Preeti U Murthy
@ 2015-03-02 14:53 ` Peter Zijlstra
  2015-03-05  4:36   ` Preeti U Murthy
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2015-03-02 14:53 UTC (permalink / raw)
  To: Preeti U Murthy; +Cc: tglx, linux-kernel

On Thu, Feb 26, 2015 at 08:52:02AM +0530, Preeti U Murthy wrote:
> The hrtimer mode of broadcast queues hrtimers in the idle entry
> path so as to wakeup cpus in deep idle states. 

Callgraph please...

> hrtimer_{start/cancel}
> functions call into tracing which uses RCU. But it is not legal to call
> into RCU in cpuidle because it is one of the quiescent states. Hence
> protect this region with RCU_NONIDLE which informs RCU that the cpu
> is momentarily non-idle.

It it not clear to me that every user of bc_set_next() is from IDLE.
>From what I can tell it ends up being clockevents_program_event() and
that is called quite a lot.

Why is bc_set_next() a good function to annotate?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
  2015-03-02 14:53 ` Peter Zijlstra
@ 2015-03-05  4:36   ` Preeti U Murthy
  2015-03-16 14:56     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Preeti U Murthy @ 2015-03-05  4:36 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: tglx, linux-kernel


On 03/02/2015 08:23 PM, Peter Zijlstra wrote:
> On Thu, Feb 26, 2015 at 08:52:02AM +0530, Preeti U Murthy wrote:
>> The hrtimer mode of broadcast queues hrtimers in the idle entry
>> path so as to wakeup cpus in deep idle states. 
> 
> Callgraph please...

cpuidle_idle_call()
|____ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, ....))
     |_____tick_broadcast_set_event()
           |____clockevents_program_event()
                |____bc_set_next()
> 
>> hrtimer_{start/cancel}
>> functions call into tracing which uses RCU. But it is not legal to call
>> into RCU in cpuidle because it is one of the quiescent states. Hence
>> protect this region with RCU_NONIDLE which informs RCU that the cpu
>> is momentarily non-idle.
> 
> It it not clear to me that every user of bc_set_next() is from IDLE.
> From what I can tell it ends up being clockevents_program_event() and
> that is called quite a lot.

bc_set_next() is called from at places:
1. Idle entry : It is called when a cpu in its idle entry path finds the
need to reset the broadcast hrtimer.
2. CPU offline operations : When the cpu on which the broadcast hrtimer
is being queued goes offline.

So you see that almost all the time, it is called in idle entry path.

Regards
Preeti U Murthy

> 
> Why is bc_set_next() a good function to annotate?
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
  2015-03-05  4:36   ` Preeti U Murthy
@ 2015-03-16 14:56     ` Peter Zijlstra
  2015-03-17  3:29       ` Preeti U Murthy
  2015-03-17  4:19       ` Preeti U Murthy
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2015-03-16 14:56 UTC (permalink / raw)
  To: Preeti U Murthy; +Cc: tglx, linux-kernel

On Thu, Mar 05, 2015 at 10:06:30AM +0530, Preeti U Murthy wrote:
> 
> On 03/02/2015 08:23 PM, Peter Zijlstra wrote:
> > On Thu, Feb 26, 2015 at 08:52:02AM +0530, Preeti U Murthy wrote:
> >> The hrtimer mode of broadcast queues hrtimers in the idle entry
> >> path so as to wakeup cpus in deep idle states. 
> > 
> > Callgraph please...
> 
> cpuidle_idle_call()
> |____ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, ....))
>      |_____tick_broadcast_set_event()
>            |____clockevents_program_event()
>                 |____bc_set_next()
> > 
> >> hrtimer_{start/cancel}
> >> functions call into tracing which uses RCU. But it is not legal to call
> >> into RCU in cpuidle because it is one of the quiescent states. Hence
> >> protect this region with RCU_NONIDLE which informs RCU that the cpu
> >> is momentarily non-idle.
> > 
> > It it not clear to me that every user of bc_set_next() is from IDLE.
> > From what I can tell it ends up being clockevents_program_event() and
> > that is called quite a lot.
> 
> bc_set_next() is called from at places:
> 1. Idle entry : It is called when a cpu in its idle entry path finds the
> need to reset the broadcast hrtimer.
> 2. CPU offline operations : When the cpu on which the broadcast hrtimer
> is being queued goes offline.
> 
> So you see that almost all the time, it is called in idle entry path.

How about:

	hrtimer_reprogram()
	  tick_program_event()
	    clockevents_program_event()
	      ->set_next_ktime()

That is called from !idle loads of times. I guess I'm not seeing what
avoids &ce_broadcast_hrtimer from being the 'normal' clock event.

Sure; it might be that for power you only end up with that broadcast
crap enabled on idle/hotplug, but is this always so?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
  2015-03-16 14:56     ` Peter Zijlstra
@ 2015-03-17  3:29       ` Preeti U Murthy
  2015-03-17  4:19       ` Preeti U Murthy
  1 sibling, 0 replies; 8+ messages in thread
From: Preeti U Murthy @ 2015-03-17  3:29 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: tglx, linux-kernel


On 03/16/2015 08:26 PM, Peter Zijlstra wrote:
> On Thu, Mar 05, 2015 at 10:06:30AM +0530, Preeti U Murthy wrote:
>>
>> On 03/02/2015 08:23 PM, Peter Zijlstra wrote:
>>> On Thu, Feb 26, 2015 at 08:52:02AM +0530, Preeti U Murthy wrote:
>>>> The hrtimer mode of broadcast queues hrtimers in the idle entry
>>>> path so as to wakeup cpus in deep idle states. 
>>>
>>> Callgraph please...
>>
>> cpuidle_idle_call()
>> |____ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, ....))
>>      |_____tick_broadcast_set_event()
>>            |____clockevents_program_event()
>>                 |____bc_set_next()
>>>
>>>> hrtimer_{start/cancel}
>>>> functions call into tracing which uses RCU. But it is not legal to call
>>>> into RCU in cpuidle because it is one of the quiescent states. Hence
>>>> protect this region with RCU_NONIDLE which informs RCU that the cpu
>>>> is momentarily non-idle.
>>>
>>> It it not clear to me that every user of bc_set_next() is from IDLE.
>>> From what I can tell it ends up being clockevents_program_event() and
>>> that is called quite a lot.
>>
>> bc_set_next() is called from at places:
>> 1. Idle entry : It is called when a cpu in its idle entry path finds the
>> need to reset the broadcast hrtimer.
>> 2. CPU offline operations : When the cpu on which the broadcast hrtimer
>> is being queued goes offline.
>>
>> So you see that almost all the time, it is called in idle entry path.
> 
> How about:
> 
> 	hrtimer_reprogram()
> 	  tick_program_event()
> 	    clockevents_program_event()
> 	      ->set_next_ktime()
> 
> That is called from !idle loads of times. I guess I'm not seeing what
> avoids &ce_broadcast_hrtimer from being the 'normal' clock event.

It is a normal clock event. In the above context, this hrtimer is being
moved from CPUx to the CPU executing that code. Hence it needs to be
enqueued onto the new CPU. Any hrtimer enqueue calls into tracing.
A hrtimer_reprogram() alone will not suffice.
Moreover hrtimer_reprogram() cannot be called directly, can it? nor is
it safe. Or am I missing your point ?

> 
> Sure; it might be that for power you only end up with that broadcast
> crap enabled on idle/hotplug, but is this always so?

The hrtimer broadcast framework gets invoked only during idle. This is
platform agnostic.

Regards
Preeti U Murthy
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
  2015-03-16 14:56     ` Peter Zijlstra
  2015-03-17  3:29       ` Preeti U Murthy
@ 2015-03-17  4:19       ` Preeti U Murthy
  2015-03-17  7:24         ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Preeti U Murthy @ 2015-03-17  4:19 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: tglx, linux-kernel, Ingo Molnar

On 03/16/2015 08:26 PM, Peter Zijlstra wrote:
> On Thu, Mar 05, 2015 at 10:06:30AM +0530, Preeti U Murthy wrote:
>>
>> On 03/02/2015 08:23 PM, Peter Zijlstra wrote:
>>> On Thu, Feb 26, 2015 at 08:52:02AM +0530, Preeti U Murthy wrote:
>>>> The hrtimer mode of broadcast queues hrtimers in the idle entry
>>>> path so as to wakeup cpus in deep idle states. 
>>>
>>> Callgraph please...
>>
>> cpuidle_idle_call()
>> |____ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, ....))
>>      |_____tick_broadcast_set_event()
>>            |____clockevents_program_event()
>>                 |____bc_set_next()
>>>
>>>> hrtimer_{start/cancel}
>>>> functions call into tracing which uses RCU. But it is not legal to call
>>>> into RCU in cpuidle because it is one of the quiescent states. Hence
>>>> protect this region with RCU_NONIDLE which informs RCU that the cpu
>>>> is momentarily non-idle.
>>>
>>> It it not clear to me that every user of bc_set_next() is from IDLE.
>>> From what I can tell it ends up being clockevents_program_event() and
>>> that is called quite a lot.
>>
>> bc_set_next() is called from at places:
>> 1. Idle entry : It is called when a cpu in its idle entry path finds the
>> need to reset the broadcast hrtimer.
>> 2. CPU offline operations : When the cpu on which the broadcast hrtimer
>> is being queued goes offline.
>>
>> So you see that almost all the time, it is called in idle entry path.
> 
> How about:
> 
> 	hrtimer_reprogram()
> 	  tick_program_event()
> 	    clockevents_program_event()
> 	      ->set_next_ktime()
> 
> That is called from !idle loads of times. I guess I'm not seeing what
> avoids &ce_broadcast_hrtimer from being the 'normal' clock event.

Ok I see your point now. Sorry about having misinterpreted it
previously. ce_broadcast_hrtimer is not the per-cpu clock device. It is
not a real clock device. It is a pseudo clock device, which is called
only from the guts of the broadcast framework.
When it is programmed, it queues a hrtimer and programs the per-cpu
clock device. in the fashion mentioned above.

No hrtimer programming/starting/canceling will get routed through
bc_set_next(). The broadcast framework makes use of a separate broadcast
clock device, which is never the per-cpu clock device to wake cpus from
idle. This device is programmed explicitly when required and not
indirectly via timer queueing. *Only* when this broadcast clock device
needs to reprogrammed, bc_set_next() gets called on those archs which
*do not have a real broadcast clock device*. And the whole thing kicks
in when cpus go idle only, not just for PowerPC but for ARM as well.

Regards
Preeti U Murthy
> 
> Sure; it might be that for power you only end up with that broadcast
> crap enabled on idle/hotplug, but is this always so?
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
  2015-03-17  4:19       ` Preeti U Murthy
@ 2015-03-17  7:24         ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2015-03-17  7:24 UTC (permalink / raw)
  To: Preeti U Murthy; +Cc: tglx, linux-kernel, Ingo Molnar

On Tue, Mar 17, 2015 at 09:49:45AM +0530, Preeti U Murthy wrote:
> Ok I see your point now. Sorry about having misinterpreted it
> previously. ce_broadcast_hrtimer is not the per-cpu clock device. It is
> not a real clock device. It is a pseudo clock device, which is called
> only from the guts of the broadcast framework.
> When it is programmed, it queues a hrtimer and programs the per-cpu
> clock device. in the fashion mentioned above.
> 
> No hrtimer programming/starting/canceling will get routed through
> bc_set_next(). The broadcast framework makes use of a separate broadcast
> clock device, which is never the per-cpu clock device to wake cpus from
> idle. This device is programmed explicitly when required and not
> indirectly via timer queueing. *Only* when this broadcast clock device
> needs to reprogrammed, bc_set_next() gets called on those archs which
> *do not have a real broadcast clock device*. And the whole thing kicks
> in when cpus go idle only, not just for PowerPC but for ARM as well.

Ah, I see, tick_check_new_device() will never select it.

CLOCK_EVT_FEAT_PERCPU seems a rather pointless thing though, maybe a
remnant of something gone.

OK now it makes sense.. some of this could use a few comments, but I
suppose that's true of most things ;-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
@ 2015-02-16  6:24 Preeti U Murthy
  0 siblings, 0 replies; 8+ messages in thread
From: Preeti U Murthy @ 2015-02-16  6:24 UTC (permalink / raw)
  To: tglx, paulmck, sam.bobroff; +Cc: linuxppc-dev

The hrtimer mode of broadcast queues hrtimers in the idle entry
path so as to wakeup cpus in deep idle states. hrtimer_{start/cancel}
functions call into tracing which uses RCU. But it is not legal to call
into RCU in cpuidle because it is one of the quiescent states. Hence
protect this region with RCU_NONIDLE which informs RCU that the cpu
is momentarily non-idle.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 kernel/time/tick-broadcast-hrtimer.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c
index eb682d5..6aac4be 100644
--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -49,6 +49,7 @@ static void bc_set_mode(enum clock_event_mode mode,
  */
 static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
 {
+	int bc_moved;
 	/*
 	 * We try to cancel the timer first. If the callback is on
 	 * flight on some other cpu then we let it handle it. If we
@@ -60,9 +61,15 @@ static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
 	 * restart the timer because we are in the callback, but we
 	 * can set the expiry time and let the callback return
 	 * HRTIMER_RESTART.
+	 *
+	 * Since we are in the idle loop at this point and because
+	 * hrtimer_{start/cancel} functions call into tracing,
+	 * calls to these functions must be bound within RCU_NONIDLE.
 	 */
-	if (hrtimer_try_to_cancel(&bctimer) >= 0) {
-		hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED);
+	RCU_NONIDLE(bc_moved = (hrtimer_try_to_cancel(&bctimer) >= 0) ?
+		!hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED) :
+			0);
+	if (bc_moved) {
 		/* Bind the "device" to the cpu */
 		bc->bound_on = smp_processor_id();
 	} else if (bc->bound_on == smp_processor_id()) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-03-17  7:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-26  3:22 [PATCH] tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop Preeti U Murthy
2015-03-02 14:53 ` Peter Zijlstra
2015-03-05  4:36   ` Preeti U Murthy
2015-03-16 14:56     ` Peter Zijlstra
2015-03-17  3:29       ` Preeti U Murthy
2015-03-17  4:19       ` Preeti U Murthy
2015-03-17  7:24         ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2015-02-16  6:24 Preeti U Murthy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.