All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit
@ 2019-01-10  1:38 Mathieu Desnoyers
  2019-01-10  4:13 ` Paul E. McKenney
  2019-01-10 16:08 ` Steven Rostedt
  0 siblings, 2 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2019-01-10  1:38 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, Peter Zijlstra, rostedt

Hi Paul,

I've had a user report that trace_sched_waking() appears to be
invoked while !rcu_is_watching() in some situation, so I started
digging into the scheduler idle code.

It appears that interrupts are re-enabled before rcu_eqs_exit() is
invoked when exiting idle code from the scheduler.

I wonder what happens if an interrupt handler (including scheduler code)
happens to issue a RCU read-side critical section before rcu_eqs_exit()
is called ? Is there some code on interrupt entry that ensures rcu eqs
state is exited in such scenario ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit
  2019-01-10  1:38 Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit Mathieu Desnoyers
@ 2019-01-10  4:13 ` Paul E. McKenney
  2019-01-10  6:30   ` Mathieu Desnoyers
  2019-01-10 16:08 ` Steven Rostedt
  1 sibling, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2019-01-10  4:13 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: linux-kernel, Peter Zijlstra, rostedt

On Wed, Jan 09, 2019 at 08:38:51PM -0500, Mathieu Desnoyers wrote:
> Hi Paul,
> 
> I've had a user report that trace_sched_waking() appears to be
> invoked while !rcu_is_watching() in some situation, so I started
> digging into the scheduler idle code.
> 
> It appears that interrupts are re-enabled before rcu_eqs_exit() is
> invoked when exiting idle code from the scheduler.
> 
> I wonder what happens if an interrupt handler (including scheduler code)
> happens to issue a RCU read-side critical section before rcu_eqs_exit()
> is called ? Is there some code on interrupt entry that ensures rcu eqs
> state is exited in such scenario ?

Interrupt handlers are supposed to invoke irq_enter(), which will in
turn invoke rcu_irq_enter(), which should take care of things.

However, there are cases where a given architecture knows that a given
interrupt handler does not contain RCU readers, and in this case, the
architecture might omit the rcu_irq_enter() or maybe even the whole
irq_enter().  And then it is all fun and games until someone adds an
RCU read-side critical section.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit
  2019-01-10  4:13 ` Paul E. McKenney
@ 2019-01-10  6:30   ` Mathieu Desnoyers
  2019-01-10 14:19     ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Mathieu Desnoyers @ 2019-01-10  6:30 UTC (permalink / raw)
  To: paulmck; +Cc: linux-kernel, Peter Zijlstra, rostedt

----- On Jan 9, 2019, at 8:13 PM, paulmck paulmck@linux.ibm.com wrote:

> On Wed, Jan 09, 2019 at 08:38:51PM -0500, Mathieu Desnoyers wrote:
>> Hi Paul,
>> 
>> I've had a user report that trace_sched_waking() appears to be
>> invoked while !rcu_is_watching() in some situation, so I started
>> digging into the scheduler idle code.
>> 
>> It appears that interrupts are re-enabled before rcu_eqs_exit() is
>> invoked when exiting idle code from the scheduler.
>> 
>> I wonder what happens if an interrupt handler (including scheduler code)
>> happens to issue a RCU read-side critical section before rcu_eqs_exit()
>> is called ? Is there some code on interrupt entry that ensures rcu eqs
>> state is exited in such scenario ?
> 
> Interrupt handlers are supposed to invoke irq_enter(), which will in
> turn invoke rcu_irq_enter(), which should take care of things.
> 
> However, there are cases where a given architecture knows that a given
> interrupt handler does not contain RCU readers, and in this case, the
> architecture might omit the rcu_irq_enter() or maybe even the whole
> irq_enter().  And then it is all fun and games until someone adds an
> RCU read-side critical section.  ;-)

Even if an irq handler does not contain any RCU read-side critical
section, won't it end by possibly invoking the scheduler before
returning ? Considering that the scheduler has tracepoints which
use RCU, this might be related to the issue that has been brought
to my attention.

Do you have examples of such interrupt handlers which do not invoke
rcu_irq_enter() ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit
  2019-01-10  6:30   ` Mathieu Desnoyers
@ 2019-01-10 14:19     ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2019-01-10 14:19 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: linux-kernel, Peter Zijlstra, rostedt

On Thu, Jan 10, 2019 at 01:30:05AM -0500, Mathieu Desnoyers wrote:
> ----- On Jan 9, 2019, at 8:13 PM, paulmck paulmck@linux.ibm.com wrote:
> 
> > On Wed, Jan 09, 2019 at 08:38:51PM -0500, Mathieu Desnoyers wrote:
> >> Hi Paul,
> >> 
> >> I've had a user report that trace_sched_waking() appears to be
> >> invoked while !rcu_is_watching() in some situation, so I started
> >> digging into the scheduler idle code.
> >> 
> >> It appears that interrupts are re-enabled before rcu_eqs_exit() is
> >> invoked when exiting idle code from the scheduler.
> >> 
> >> I wonder what happens if an interrupt handler (including scheduler code)
> >> happens to issue a RCU read-side critical section before rcu_eqs_exit()
> >> is called ? Is there some code on interrupt entry that ensures rcu eqs
> >> state is exited in such scenario ?
> > 
> > Interrupt handlers are supposed to invoke irq_enter(), which will in
> > turn invoke rcu_irq_enter(), which should take care of things.
> > 
> > However, there are cases where a given architecture knows that a given
> > interrupt handler does not contain RCU readers, and in this case, the
> > architecture might omit the rcu_irq_enter() or maybe even the whole
> > irq_enter().  And then it is all fun and games until someone adds an
> > RCU read-side critical section.  ;-)
> 
> Even if an irq handler does not contain any RCU read-side critical
> section, won't it end by possibly invoking the scheduler before
> returning ? Considering that the scheduler has tracepoints which
> use RCU, this might be related to the issue that has been brought
> to my attention.

Most interrupt handlers just return, but yes, scheduler state is often
checked during return from interrupt.  But in that case, the interrupt
handler needs to have invoked irq_enter().

> Do you have examples of such interrupt handlers which do not invoke
> rcu_irq_enter() ?

Mostly examples of lightweight interrupts handlers that used to not invoke
irq_enter() and thus not rcu_irq_enter(), but which later started using
RCU readers.  Which means that they are no longer examples that do not
invoke rcu_irq_enter().  ;-)

Some of them just invoked rcu_irq_enter(), others had to do the full
irq_enter() call (which in turn invokes rcu_irq_enter()).

These interrupt handlers were very light-weight.  Page-table walkers,
hardware events, and the like.  Take an interrupt, look at a hardware
register, update a data structure, maybe write to a hardware register,
return from interrupt.

If there is only one such tracepoint, one approach is to use _rcuidle,
that is, instead of trace_blarvitz(), trace_blarvitz_rcuidle().  This can
add overhead, so this might not be appropriate for any of the scheduler's
fastpaths.  Which brings me back to the interrupt handler invoking
either irq_enter() or rcu_irq_enter().  Or moving the tracepoints to
a nearby region of code that RCU is already watching.

So, is it reasonably to add the rcu_irq_enter()?  If you do change this,
please test with CONFIG_RCU_EQS_DEBUG=y.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit
  2019-01-10  1:38 Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit Mathieu Desnoyers
  2019-01-10  4:13 ` Paul E. McKenney
@ 2019-01-10 16:08 ` Steven Rostedt
  2019-01-10 16:44   ` Mathieu Desnoyers
  1 sibling, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2019-01-10 16:08 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra

On Wed, 9 Jan 2019 20:38:51 -0500 (EST)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> Hi Paul,
> 
> I've had a user report that trace_sched_waking() appears to be
> invoked while !rcu_is_watching() in some situation, so I started
> digging into the scheduler idle code.

I'm wondering if this isn't a bug. Do you have the backtrace for where
trace_sched_waking() was called without rcu watching?

-- Steve

> 
> It appears that interrupts are re-enabled before rcu_eqs_exit() is
> invoked when exiting idle code from the scheduler.
> 
> I wonder what happens if an interrupt handler (including scheduler code)
> happens to issue a RCU read-side critical section before rcu_eqs_exit()
> is called ? Is there some code on interrupt entry that ensures rcu eqs
> state is exited in such scenario ?
> 
> Thanks,
> 
> Mathieu
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit
  2019-01-10 16:08 ` Steven Rostedt
@ 2019-01-10 16:44   ` Mathieu Desnoyers
  2019-01-10 17:11     ` Mathieu Desnoyers
  0 siblings, 1 reply; 14+ messages in thread
From: Mathieu Desnoyers @ 2019-01-10 16:44 UTC (permalink / raw)
  To: rostedt; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra

----- On Jan 10, 2019, at 8:08 AM, rostedt rostedt@goodmis.org wrote:

> On Wed, 9 Jan 2019 20:38:51 -0500 (EST)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> Hi Paul,
>> 
>> I've had a user report that trace_sched_waking() appears to be
>> invoked while !rcu_is_watching() in some situation, so I started
>> digging into the scheduler idle code.
> 
> I'm wondering if this isn't a bug. Do you have the backtrace for where
> trace_sched_waking() was called without rcu watching?

I strongly suspect a bug as well. I'm awaiting a reproducer from the
user whom reported this issue so I can add a WARN_ON_ONCE(!rcu_is_watching())
in the scheduler code near trace_sched_waking() and gather a backtrace.

It still has to be confirmed, but I suspect this have been triggered
within a HyperV guest. It may therefore be related to a virtualized environment.

I'll try to ask more specifically on which environment this was encountered.

Thanks,

Mathieu

> 
> -- Steve
> 
>> 
>> It appears that interrupts are re-enabled before rcu_eqs_exit() is
>> invoked when exiting idle code from the scheduler.
>> 
>> I wonder what happens if an interrupt handler (including scheduler code)
>> happens to issue a RCU read-side critical section before rcu_eqs_exit()
>> is called ? Is there some code on interrupt entry that ensures rcu eqs
>> state is exited in such scenario ?
>> 
>> Thanks,
>> 
>> Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit
  2019-01-10 16:44   ` Mathieu Desnoyers
@ 2019-01-10 17:11     ` Mathieu Desnoyers
  2019-01-10 17:23       ` Steven Rostedt
  2019-01-10 17:25       ` Mathieu Desnoyers
  0 siblings, 2 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2019-01-10 17:11 UTC (permalink / raw)
  To: rostedt; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra

----- On Jan 10, 2019, at 8:44 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Jan 10, 2019, at 8:08 AM, rostedt rostedt@goodmis.org wrote:
> 
>> On Wed, 9 Jan 2019 20:38:51 -0500 (EST)
>> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>> 
>>> Hi Paul,
>>> 
>>> I've had a user report that trace_sched_waking() appears to be
>>> invoked while !rcu_is_watching() in some situation, so I started
>>> digging into the scheduler idle code.
>> 
>> I'm wondering if this isn't a bug. Do you have the backtrace for where
>> trace_sched_waking() was called without rcu watching?
> 
> I strongly suspect a bug as well. I'm awaiting a reproducer from the
> user whom reported this issue so I can add a WARN_ON_ONCE(!rcu_is_watching())
> in the scheduler code near trace_sched_waking() and gather a backtrace.
> 
> It still has to be confirmed, but I suspect this have been triggered
> within a HyperV guest. It may therefore be related to a virtualized environment.
> 
> I'll try to ask more specifically on which environment this was encountered.

So it ends up it happens directly on hardware on a Linux laptop. Here is
the stacktrace:

vmlinux!try_to_wake_up
vmlinux!default_wake_function
vmlinux!pollwake
vmlinux!__wake_up_common
vmlinux!__wake_up_common_lock
vmlinux!__wake_up
vmlinux!perf_event_wakeup
vmlinux!perf_pending_event
vmlinux!irq_work_run_list
vmlinux!irq_work_run
vmlinux!smp_irq_work_iterrupt
vmlinux!irq_work_interrupt
vmlinux!finish_task_switch
vmlinux!__schedule
vmlinux!schedule_idle
vmlinux!do_idle
vmlinux!cpu_startup_entry
vmlinux!start_secondary
vmlinux!secondary_startup_64

Does it raise any red flag ?

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
>> 
>> -- Steve
>> 
>>> 
>>> It appears that interrupts are re-enabled before rcu_eqs_exit() is
>>> invoked when exiting idle code from the scheduler.
>>> 
>>> I wonder what happens if an interrupt handler (including scheduler code)
>>> happens to issue a RCU read-side critical section before rcu_eqs_exit()
>>> is called ? Is there some code on interrupt entry that ensures rcu eqs
>>> state is exited in such scenario ?
>>> 
>>> Thanks,
>>> 
>>> Mathieu
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit
  2019-01-10 17:11     ` Mathieu Desnoyers
@ 2019-01-10 17:23       ` Steven Rostedt
  2019-01-10 17:25       ` Mathieu Desnoyers
  1 sibling, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2019-01-10 17:23 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra

On Thu, 10 Jan 2019 12:11:55 -0500 (EST)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Jan 10, 2019, at 8:44 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
> 
> > ----- On Jan 10, 2019, at 8:08 AM, rostedt rostedt@goodmis.org wrote:
> >   
> >> On Wed, 9 Jan 2019 20:38:51 -0500 (EST)
> >> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >>   
> >>> Hi Paul,
> >>> 
> >>> I've had a user report that trace_sched_waking() appears to be
> >>> invoked while !rcu_is_watching() in some situation, so I started
> >>> digging into the scheduler idle code.  
> >> 
> >> I'm wondering if this isn't a bug. Do you have the backtrace for where
> >> trace_sched_waking() was called without rcu watching?  
> > 
> > I strongly suspect a bug as well. I'm awaiting a reproducer from the
> > user whom reported this issue so I can add a WARN_ON_ONCE(!rcu_is_watching())
> > in the scheduler code near trace_sched_waking() and gather a backtrace.
> > 
> > It still has to be confirmed, but I suspect this have been triggered
> > within a HyperV guest. It may therefore be related to a virtualized environment.
> > 
> > I'll try to ask more specifically on which environment this was encountered.  
> 
> So it ends up it happens directly on hardware on a Linux laptop. Here is
> the stacktrace:
> 
> vmlinux!try_to_wake_up
> vmlinux!default_wake_function
> vmlinux!pollwake
> vmlinux!__wake_up_common
> vmlinux!__wake_up_common_lock
> vmlinux!__wake_up
> vmlinux!perf_event_wakeup
> vmlinux!perf_pending_event
> vmlinux!irq_work_run_list
> vmlinux!irq_work_run

> vmlinux!smp_irq_work_iterrupt

The first thing this calls is ipi_entering_ack_irq(), which calls
irq_enter(), which calls rcu_irq_enter().

So why isn't rcu watching here?

-- Steve


> vmlinux!irq_work_interrupt
> vmlinux!finish_task_switch
> vmlinux!__schedule
> vmlinux!schedule_idle
> vmlinux!do_idle
> vmlinux!cpu_startup_entry
> vmlinux!start_secondary
> vmlinux!secondary_startup_64
> 
> Does it raise any red flag ?
> 
> Thanks,
> 
> Mathieu
> 
> > 
> > Thanks,
> > 
> > Mathieu
> >   
> >> 
> >> -- Steve
> >>   
> >>> 
> >>> It appears that interrupts are re-enabled before rcu_eqs_exit() is
> >>> invoked when exiting idle code from the scheduler.
> >>> 
> >>> I wonder what happens if an interrupt handler (including scheduler code)
> >>> happens to issue a RCU read-side critical section before rcu_eqs_exit()
> >>> is called ? Is there some code on interrupt entry that ensures rcu eqs
> >>> state is exited in such scenario ?
> >>> 
> >>> Thanks,
> >>> 
> >>> Mathieu  
> > 
> > --
> > Mathieu Desnoyers
> > EfficiOS Inc.
> > http://www.efficios.com  
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit
  2019-01-10 17:11     ` Mathieu Desnoyers
  2019-01-10 17:23       ` Steven Rostedt
@ 2019-01-10 17:25       ` Mathieu Desnoyers
  2019-01-10 17:45         ` Perf: event wakeup discards sched_waking events Mathieu Desnoyers
  1 sibling, 1 reply; 14+ messages in thread
From: Mathieu Desnoyers @ 2019-01-10 17:25 UTC (permalink / raw)
  To: rostedt; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra

----- On Jan 10, 2019, at 9:11 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Jan 10, 2019, at 8:44 AM, Mathieu Desnoyers
> mathieu.desnoyers@efficios.com wrote:
> 
>> ----- On Jan 10, 2019, at 8:08 AM, rostedt rostedt@goodmis.org wrote:
>> 
>>> On Wed, 9 Jan 2019 20:38:51 -0500 (EST)
>>> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>> 
>>>> Hi Paul,
>>>> 
>>>> I've had a user report that trace_sched_waking() appears to be
>>>> invoked while !rcu_is_watching() in some situation, so I started
>>>> digging into the scheduler idle code.
>>> 
>>> I'm wondering if this isn't a bug. Do you have the backtrace for where
>>> trace_sched_waking() was called without rcu watching?
>> 
>> I strongly suspect a bug as well. I'm awaiting a reproducer from the
>> user whom reported this issue so I can add a WARN_ON_ONCE(!rcu_is_watching())
>> in the scheduler code near trace_sched_waking() and gather a backtrace.
>> 
>> It still has to be confirmed, but I suspect this have been triggered
>> within a HyperV guest. It may therefore be related to a virtualized environment.
>> 
>> I'll try to ask more specifically on which environment this was encountered.
> 
> So it ends up it happens directly on hardware on a Linux laptop. Here is
> the stacktrace:
> 
> vmlinux!try_to_wake_up
> vmlinux!default_wake_function
> vmlinux!pollwake
> vmlinux!__wake_up_common
> vmlinux!__wake_up_common_lock
> vmlinux!__wake_up
> vmlinux!perf_event_wakeup
> vmlinux!perf_pending_event
> vmlinux!irq_work_run_list
> vmlinux!irq_work_run
> vmlinux!smp_irq_work_iterrupt
> vmlinux!irq_work_interrupt
> vmlinux!finish_task_switch
> vmlinux!__schedule
> vmlinux!schedule_idle
> vmlinux!do_idle
> vmlinux!cpu_startup_entry
> vmlinux!start_secondary
> vmlinux!secondary_startup_64
> 
> Does it raise any red flag ?

Based on this backtrace, I think I start to get a better understanding
of the situation.

The initial problem reported to me was that ftrace was showing some sched_waking
events in its trace that were missed by perf.

I presumed this was because of the !rcu_is_watching() check, but I think I was
wrong.

This backtrace seems to show that perf is itself triggering a sched_waking()
event. It there is probably a check that discards nested events within perf,
which would discard this from perf traces, but ftrace (and lttng) would trace
it just fine.

Thoughts ?

Thanks,

Mathieu


> 
> Thanks,
> 
> Mathieu
> 
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>>> 
>>> -- Steve
>>> 
>>>> 
>>>> It appears that interrupts are re-enabled before rcu_eqs_exit() is
>>>> invoked when exiting idle code from the scheduler.
>>>> 
>>>> I wonder what happens if an interrupt handler (including scheduler code)
>>>> happens to issue a RCU read-side critical section before rcu_eqs_exit()
>>>> is called ? Is there some code on interrupt entry that ensures rcu eqs
>>>> state is exited in such scenario ?
>>>> 
>>>> Thanks,
>>>> 
>>>> Mathieu
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Perf: event wakeup discards sched_waking events
  2019-01-10 17:25       ` Mathieu Desnoyers
@ 2019-01-10 17:45         ` Mathieu Desnoyers
  2019-01-14 13:09           ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Mathieu Desnoyers @ 2019-01-10 17:45 UTC (permalink / raw)
  To: rostedt; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra

----- On Jan 10, 2019, at 9:25 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Jan 10, 2019, at 9:11 AM, Mathieu Desnoyers
> mathieu.desnoyers@efficios.com wrote:
> 
>> ----- On Jan 10, 2019, at 8:44 AM, Mathieu Desnoyers
>> mathieu.desnoyers@efficios.com wrote:
>> 
>>> ----- On Jan 10, 2019, at 8:08 AM, rostedt rostedt@goodmis.org wrote:
>>> 
>>>> On Wed, 9 Jan 2019 20:38:51 -0500 (EST)
>>>> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>>> 
>>>>> Hi Paul,
>>>>> 
>>>>> I've had a user report that trace_sched_waking() appears to be
>>>>> invoked while !rcu_is_watching() in some situation, so I started
>>>>> digging into the scheduler idle code.
>>>> 
>>>> I'm wondering if this isn't a bug. Do you have the backtrace for where
>>>> trace_sched_waking() was called without rcu watching?
>>> 
>>> I strongly suspect a bug as well. I'm awaiting a reproducer from the
>>> user whom reported this issue so I can add a WARN_ON_ONCE(!rcu_is_watching())
>>> in the scheduler code near trace_sched_waking() and gather a backtrace.
>>> 
>>> It still has to be confirmed, but I suspect this have been triggered
>>> within a HyperV guest. It may therefore be related to a virtualized environment.
>>> 
>>> I'll try to ask more specifically on which environment this was encountered.
>> 
>> So it ends up it happens directly on hardware on a Linux laptop. Here is
>> the stacktrace:
>> 
>> vmlinux!try_to_wake_up
>> vmlinux!default_wake_function
>> vmlinux!pollwake
>> vmlinux!__wake_up_common
>> vmlinux!__wake_up_common_lock
>> vmlinux!__wake_up
>> vmlinux!perf_event_wakeup
>> vmlinux!perf_pending_event
>> vmlinux!irq_work_run_list
>> vmlinux!irq_work_run
>> vmlinux!smp_irq_work_iterrupt
>> vmlinux!irq_work_interrupt
>> vmlinux!finish_task_switch
>> vmlinux!__schedule
>> vmlinux!schedule_idle
>> vmlinux!do_idle
>> vmlinux!cpu_startup_entry
>> vmlinux!start_secondary
>> vmlinux!secondary_startup_64
>> 
>> Does it raise any red flag ?
> 
> Based on this backtrace, I think I start to get a better understanding
> of the situation.
> 
> The initial problem reported to me was that ftrace was showing some sched_waking
> events in its trace that were missed by perf.
> 
> I presumed this was because of the !rcu_is_watching() check, but I think I was
> wrong.
> 
> This backtrace seems to show that perf is itself triggering a sched_waking()
> event. It there is probably a check that discards nested events within perf,
> which would discard this from perf traces, but ftrace (and lttng) would trace
> it just fine.

Looking at:

static void perf_pending_event(struct irq_work *entry)
{
        struct perf_event *event = container_of(entry,
                        struct perf_event, pending);
        int rctx;

        rctx = perf_swevent_get_recursion_context();
        /*
         * If we 'fail' here, that's OK, it means recursion is already disabled
         * and we won't recurse 'further'.
         */

        if (event->pending_disable) {
                event->pending_disable = 0;
                perf_event_disable_local(event);
        }

        if (event->pending_wakeup) {
                event->pending_wakeup = 0;
                perf_event_wakeup(event);
        }

        if (rctx >= 0)
                perf_swevent_put_recursion_context(rctx);
}

One side-effect of perf_event_wakeup() is to generate a sched_waking
event. But I suspect it won't be traced by perf because it is invoked before
putting the recursion context.

Is there a reason why the wakeup is done before putting the recursion
context ?

Thanks,

Mathieu




-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Perf: event wakeup discards sched_waking events
  2019-01-10 17:45         ` Perf: event wakeup discards sched_waking events Mathieu Desnoyers
@ 2019-01-14 13:09           ` Peter Zijlstra
  2019-01-14 21:36             ` Mathieu Desnoyers
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2019-01-14 13:09 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: rostedt, Paul E. McKenney, linux-kernel

On Thu, Jan 10, 2019 at 12:45:27PM -0500, Mathieu Desnoyers wrote:
 
> static void perf_pending_event(struct irq_work *entry)
> {
>         struct perf_event *event = container_of(entry,
>                         struct perf_event, pending);
>         int rctx;
> 
>         rctx = perf_swevent_get_recursion_context();
>         /*
>          * If we 'fail' here, that's OK, it means recursion is already disabled
>          * and we won't recurse 'further'.
>          */
> 
>         if (event->pending_disable) {
>                 event->pending_disable = 0;
>                 perf_event_disable_local(event);
>         }
> 
>         if (event->pending_wakeup) {
>                 event->pending_wakeup = 0;
>                 perf_event_wakeup(event);
>         }
> 
>         if (rctx >= 0)
>                 perf_swevent_put_recursion_context(rctx);
> }
> 
> One side-effect of perf_event_wakeup() is to generate a sched_waking
> event. But I suspect it won't be traced by perf because it is invoked before
> putting the recursion context.
> 
> Is there a reason why the wakeup is done before putting the recursion
> context ?

d525211f9d1b ("perf: Fix irq_work 'tail' recursion")

If we were to allow perf_event_wakeup() to generate its tracepoint, we'd
be back to square #1, no?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Perf: event wakeup discards sched_waking events
  2019-01-14 13:09           ` Peter Zijlstra
@ 2019-01-14 21:36             ` Mathieu Desnoyers
  2019-01-14 22:04               ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Mathieu Desnoyers @ 2019-01-14 21:36 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: rostedt, Paul E. McKenney, linux-kernel

----- On Jan 14, 2019, at 8:09 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Thu, Jan 10, 2019 at 12:45:27PM -0500, Mathieu Desnoyers wrote:
> 
>> static void perf_pending_event(struct irq_work *entry)
>> {
>>         struct perf_event *event = container_of(entry,
>>                         struct perf_event, pending);
>>         int rctx;
>> 
>>         rctx = perf_swevent_get_recursion_context();
>>         /*
>>          * If we 'fail' here, that's OK, it means recursion is already disabled
>>          * and we won't recurse 'further'.
>>          */
>> 
>>         if (event->pending_disable) {
>>                 event->pending_disable = 0;
>>                 perf_event_disable_local(event);
>>         }
>> 
>>         if (event->pending_wakeup) {
>>                 event->pending_wakeup = 0;
>>                 perf_event_wakeup(event);
>>         }
>> 
>>         if (rctx >= 0)
>>                 perf_swevent_put_recursion_context(rctx);
>> }
>> 
>> One side-effect of perf_event_wakeup() is to generate a sched_waking
>> event. But I suspect it won't be traced by perf because it is invoked before
>> putting the recursion context.
>> 
>> Is there a reason why the wakeup is done before putting the recursion
>> context ?
> 
> d525211f9d1b ("perf: Fix irq_work 'tail' recursion")
> 
> If we were to allow perf_event_wakeup() to generate its tracepoint, we'd
> be back to square #1, no?

Considering that perf tracing code has side-effects that generate additional
events, it's indeed best not to trace them, otherwise you indeed end up with
tail recursion.

Can ftrace end up in the same situation through rb_wake_up_waiters() ? I suspect
the tail recursion would be hard to trigger if the wakeup only happens once per
page though, unless the events generated end up filling up a page.

FWIW, LTTng avoids this entire issue by using a timer-based polling mechanism
to ensure the tracing code does not call into the scheduler wakeup.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Perf: event wakeup discards sched_waking events
  2019-01-14 21:36             ` Mathieu Desnoyers
@ 2019-01-14 22:04               ` Steven Rostedt
  2019-01-14 22:31                 ` Mathieu Desnoyers
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2019-01-14 22:04 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Peter Zijlstra, Paul E. McKenney, linux-kernel

On Mon, 14 Jan 2019 16:36:59 -0500 (EST)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> Can ftrace end up in the same situation through rb_wake_up_waiters() ? I suspect
> the tail recursion would be hard to trigger if the wakeup only happens once per
> page though, unless the events generated end up filling up a page.

And only events from the irq work that was caused by the ftrace wakeup,
which is highly unlikely.

Note, the lastest kernel only wakes up after half the buffer is full
(by default, but that can be changed), as I found that it gives the
best performance to keeping up with traces. I can actually trace small
loads and get all events now. Before, the waking of the tracer would
cause its own events to fill up the buffer and not be able to keep up
even on simple loads.

> 
> FWIW, LTTng avoids this entire issue by using a timer-based polling mechanism
> to ensure the tracing code does not call into the scheduler wakeup.

Does your timer stop if the system is idle and tracing is enabled?

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Perf: event wakeup discards sched_waking events
  2019-01-14 22:04               ` Steven Rostedt
@ 2019-01-14 22:31                 ` Mathieu Desnoyers
  0 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2019-01-14 22:31 UTC (permalink / raw)
  To: rostedt; +Cc: Peter Zijlstra, Paul E. McKenney, linux-kernel, Thomas Gleixner



----- On Jan 14, 2019, at 5:04 PM, rostedt rostedt@goodmis.org wrote:

> On Mon, 14 Jan 2019 16:36:59 -0500 (EST)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> Can ftrace end up in the same situation through rb_wake_up_waiters() ? I suspect
>> the tail recursion would be hard to trigger if the wakeup only happens once per
>> page though, unless the events generated end up filling up a page.
> 
> And only events from the irq work that was caused by the ftrace wakeup,
> which is highly unlikely.
> 
> Note, the lastest kernel only wakes up after half the buffer is full
> (by default, but that can be changed), as I found that it gives the
> best performance to keeping up with traces. I can actually trace small
> loads and get all events now. Before, the waking of the tracer would
> cause its own events to fill up the buffer and not be able to keep up
> even on simple loads.
> 
>> 
>> FWIW, LTTng avoids this entire issue by using a timer-based polling mechanism
>> to ensure the tracing code does not call into the scheduler wakeup.
> 
> Does your timer stop if the system is idle and tracing is enabled?

LTTng exposes three functions for integration with CONFIG_NO_HZ:

lib_ring_buffer_tick_nohz_flush()
lib_ring_buffer_tick_nohz_stop()
lib_ring_buffer_tick_nohz_restart()

Unfortunately, last time I checked, the Linux kernel did not expose any
modules API to register nohz notifiers to call those functions when going
to idle and getting out of idle. Therefore, the plumbing is there
in LTTng (it has been done when I implemented libringbuffer about 6 years
ago intending to contribute it to the Linux kernel), but this code has
never been reachable within lttng-modules due to lack of kernel support.

Thomas Gleixner expressed strong feelings against introducing tick nohz
notifiers for kernel modules in the past, so I never pushed in that
direction.

Perhaps there is now a way to properly wire this up that I've missed ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-01-14 22:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-10  1:38 Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit Mathieu Desnoyers
2019-01-10  4:13 ` Paul E. McKenney
2019-01-10  6:30   ` Mathieu Desnoyers
2019-01-10 14:19     ` Paul E. McKenney
2019-01-10 16:08 ` Steven Rostedt
2019-01-10 16:44   ` Mathieu Desnoyers
2019-01-10 17:11     ` Mathieu Desnoyers
2019-01-10 17:23       ` Steven Rostedt
2019-01-10 17:25       ` Mathieu Desnoyers
2019-01-10 17:45         ` Perf: event wakeup discards sched_waking events Mathieu Desnoyers
2019-01-14 13:09           ` Peter Zijlstra
2019-01-14 21:36             ` Mathieu Desnoyers
2019-01-14 22:04               ` Steven Rostedt
2019-01-14 22:31                 ` Mathieu Desnoyers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.