* Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit @ 2019-01-10 1:38 Mathieu Desnoyers 2019-01-10 4:13 ` Paul E. McKenney 2019-01-10 16:08 ` Steven Rostedt 0 siblings, 2 replies; 14+ messages in thread From: Mathieu Desnoyers @ 2019-01-10 1:38 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel, Peter Zijlstra, rostedt Hi Paul, I've had a user report that trace_sched_waking() appears to be invoked while !rcu_is_watching() in some situation, so I started digging into the scheduler idle code. It appears that interrupts are re-enabled before rcu_eqs_exit() is invoked when exiting idle code from the scheduler. I wonder what happens if an interrupt handler (including scheduler code) happens to issue a RCU read-side critical section before rcu_eqs_exit() is called ? Is there some code on interrupt entry that ensures rcu eqs state is exited in such scenario ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit 2019-01-10 1:38 Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit Mathieu Desnoyers @ 2019-01-10 4:13 ` Paul E. McKenney 2019-01-10 6:30 ` Mathieu Desnoyers 2019-01-10 16:08 ` Steven Rostedt 1 sibling, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2019-01-10 4:13 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: linux-kernel, Peter Zijlstra, rostedt On Wed, Jan 09, 2019 at 08:38:51PM -0500, Mathieu Desnoyers wrote: > Hi Paul, > > I've had a user report that trace_sched_waking() appears to be > invoked while !rcu_is_watching() in some situation, so I started > digging into the scheduler idle code. > > It appears that interrupts are re-enabled before rcu_eqs_exit() is > invoked when exiting idle code from the scheduler. > > I wonder what happens if an interrupt handler (including scheduler code) > happens to issue a RCU read-side critical section before rcu_eqs_exit() > is called ? Is there some code on interrupt entry that ensures rcu eqs > state is exited in such scenario ? Interrupt handlers are supposed to invoke irq_enter(), which will in turn invoke rcu_irq_enter(), which should take care of things. However, there are cases where a given architecture knows that a given interrupt handler does not contain RCU readers, and in this case, the architecture might omit the rcu_irq_enter() or maybe even the whole irq_enter(). And then it is all fun and games until someone adds an RCU read-side critical section. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit 2019-01-10 4:13 ` Paul E. McKenney @ 2019-01-10 6:30 ` Mathieu Desnoyers 2019-01-10 14:19 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Mathieu Desnoyers @ 2019-01-10 6:30 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, Peter Zijlstra, rostedt ----- On Jan 9, 2019, at 8:13 PM, paulmck paulmck@linux.ibm.com wrote: > On Wed, Jan 09, 2019 at 08:38:51PM -0500, Mathieu Desnoyers wrote: >> Hi Paul, >> >> I've had a user report that trace_sched_waking() appears to be >> invoked while !rcu_is_watching() in some situation, so I started >> digging into the scheduler idle code. >> >> It appears that interrupts are re-enabled before rcu_eqs_exit() is >> invoked when exiting idle code from the scheduler. >> >> I wonder what happens if an interrupt handler (including scheduler code) >> happens to issue a RCU read-side critical section before rcu_eqs_exit() >> is called ? Is there some code on interrupt entry that ensures rcu eqs >> state is exited in such scenario ? > > Interrupt handlers are supposed to invoke irq_enter(), which will in > turn invoke rcu_irq_enter(), which should take care of things. > > However, there are cases where a given architecture knows that a given > interrupt handler does not contain RCU readers, and in this case, the > architecture might omit the rcu_irq_enter() or maybe even the whole > irq_enter(). And then it is all fun and games until someone adds an > RCU read-side critical section. ;-) Even if an irq handler does not contain any RCU read-side critical section, won't it end by possibly invoking the scheduler before returning ? Considering that the scheduler has tracepoints which use RCU, this might be related to the issue that has been brought to my attention. Do you have examples of such interrupt handlers which do not invoke rcu_irq_enter() ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit 2019-01-10 6:30 ` Mathieu Desnoyers @ 2019-01-10 14:19 ` Paul E. McKenney 0 siblings, 0 replies; 14+ messages in thread From: Paul E. McKenney @ 2019-01-10 14:19 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: linux-kernel, Peter Zijlstra, rostedt On Thu, Jan 10, 2019 at 01:30:05AM -0500, Mathieu Desnoyers wrote: > ----- On Jan 9, 2019, at 8:13 PM, paulmck paulmck@linux.ibm.com wrote: > > > On Wed, Jan 09, 2019 at 08:38:51PM -0500, Mathieu Desnoyers wrote: > >> Hi Paul, > >> > >> I've had a user report that trace_sched_waking() appears to be > >> invoked while !rcu_is_watching() in some situation, so I started > >> digging into the scheduler idle code. > >> > >> It appears that interrupts are re-enabled before rcu_eqs_exit() is > >> invoked when exiting idle code from the scheduler. > >> > >> I wonder what happens if an interrupt handler (including scheduler code) > >> happens to issue a RCU read-side critical section before rcu_eqs_exit() > >> is called ? Is there some code on interrupt entry that ensures rcu eqs > >> state is exited in such scenario ? > > > > Interrupt handlers are supposed to invoke irq_enter(), which will in > > turn invoke rcu_irq_enter(), which should take care of things. > > > > However, there are cases where a given architecture knows that a given > > interrupt handler does not contain RCU readers, and in this case, the > > architecture might omit the rcu_irq_enter() or maybe even the whole > > irq_enter(). And then it is all fun and games until someone adds an > > RCU read-side critical section. ;-) > > Even if an irq handler does not contain any RCU read-side critical > section, won't it end by possibly invoking the scheduler before > returning ? Considering that the scheduler has tracepoints which > use RCU, this might be related to the issue that has been brought > to my attention. Most interrupt handlers just return, but yes, scheduler state is often checked during return from interrupt. But in that case, the interrupt handler needs to have invoked irq_enter(). > Do you have examples of such interrupt handlers which do not invoke > rcu_irq_enter() ? Mostly examples of lightweight interrupts handlers that used to not invoke irq_enter() and thus not rcu_irq_enter(), but which later started using RCU readers. Which means that they are no longer examples that do not invoke rcu_irq_enter(). ;-) Some of them just invoked rcu_irq_enter(), others had to do the full irq_enter() call (which in turn invokes rcu_irq_enter()). These interrupt handlers were very light-weight. Page-table walkers, hardware events, and the like. Take an interrupt, look at a hardware register, update a data structure, maybe write to a hardware register, return from interrupt. If there is only one such tracepoint, one approach is to use _rcuidle, that is, instead of trace_blarvitz(), trace_blarvitz_rcuidle(). This can add overhead, so this might not be appropriate for any of the scheduler's fastpaths. Which brings me back to the interrupt handler invoking either irq_enter() or rcu_irq_enter(). Or moving the tracepoints to a nearby region of code that RCU is already watching. So, is it reasonably to add the rcu_irq_enter()? If you do change this, please test with CONFIG_RCU_EQS_DEBUG=y. Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit 2019-01-10 1:38 Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit Mathieu Desnoyers 2019-01-10 4:13 ` Paul E. McKenney @ 2019-01-10 16:08 ` Steven Rostedt 2019-01-10 16:44 ` Mathieu Desnoyers 1 sibling, 1 reply; 14+ messages in thread From: Steven Rostedt @ 2019-01-10 16:08 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra On Wed, 9 Jan 2019 20:38:51 -0500 (EST) Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > Hi Paul, > > I've had a user report that trace_sched_waking() appears to be > invoked while !rcu_is_watching() in some situation, so I started > digging into the scheduler idle code. I'm wondering if this isn't a bug. Do you have the backtrace for where trace_sched_waking() was called without rcu watching? -- Steve > > It appears that interrupts are re-enabled before rcu_eqs_exit() is > invoked when exiting idle code from the scheduler. > > I wonder what happens if an interrupt handler (including scheduler code) > happens to issue a RCU read-side critical section before rcu_eqs_exit() > is called ? Is there some code on interrupt entry that ensures rcu eqs > state is exited in such scenario ? > > Thanks, > > Mathieu > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit 2019-01-10 16:08 ` Steven Rostedt @ 2019-01-10 16:44 ` Mathieu Desnoyers 2019-01-10 17:11 ` Mathieu Desnoyers 0 siblings, 1 reply; 14+ messages in thread From: Mathieu Desnoyers @ 2019-01-10 16:44 UTC (permalink / raw) To: rostedt; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra ----- On Jan 10, 2019, at 8:08 AM, rostedt rostedt@goodmis.org wrote: > On Wed, 9 Jan 2019 20:38:51 -0500 (EST) > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >> Hi Paul, >> >> I've had a user report that trace_sched_waking() appears to be >> invoked while !rcu_is_watching() in some situation, so I started >> digging into the scheduler idle code. > > I'm wondering if this isn't a bug. Do you have the backtrace for where > trace_sched_waking() was called without rcu watching? I strongly suspect a bug as well. I'm awaiting a reproducer from the user whom reported this issue so I can add a WARN_ON_ONCE(!rcu_is_watching()) in the scheduler code near trace_sched_waking() and gather a backtrace. It still has to be confirmed, but I suspect this have been triggered within a HyperV guest. It may therefore be related to a virtualized environment. I'll try to ask more specifically on which environment this was encountered. Thanks, Mathieu > > -- Steve > >> >> It appears that interrupts are re-enabled before rcu_eqs_exit() is >> invoked when exiting idle code from the scheduler. >> >> I wonder what happens if an interrupt handler (including scheduler code) >> happens to issue a RCU read-side critical section before rcu_eqs_exit() >> is called ? Is there some code on interrupt entry that ensures rcu eqs >> state is exited in such scenario ? >> >> Thanks, >> >> Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit 2019-01-10 16:44 ` Mathieu Desnoyers @ 2019-01-10 17:11 ` Mathieu Desnoyers 2019-01-10 17:23 ` Steven Rostedt 2019-01-10 17:25 ` Mathieu Desnoyers 0 siblings, 2 replies; 14+ messages in thread From: Mathieu Desnoyers @ 2019-01-10 17:11 UTC (permalink / raw) To: rostedt; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra ----- On Jan 10, 2019, at 8:44 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > ----- On Jan 10, 2019, at 8:08 AM, rostedt rostedt@goodmis.org wrote: > >> On Wed, 9 Jan 2019 20:38:51 -0500 (EST) >> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: >> >>> Hi Paul, >>> >>> I've had a user report that trace_sched_waking() appears to be >>> invoked while !rcu_is_watching() in some situation, so I started >>> digging into the scheduler idle code. >> >> I'm wondering if this isn't a bug. Do you have the backtrace for where >> trace_sched_waking() was called without rcu watching? > > I strongly suspect a bug as well. I'm awaiting a reproducer from the > user whom reported this issue so I can add a WARN_ON_ONCE(!rcu_is_watching()) > in the scheduler code near trace_sched_waking() and gather a backtrace. > > It still has to be confirmed, but I suspect this have been triggered > within a HyperV guest. It may therefore be related to a virtualized environment. > > I'll try to ask more specifically on which environment this was encountered. So it ends up it happens directly on hardware on a Linux laptop. Here is the stacktrace: vmlinux!try_to_wake_up vmlinux!default_wake_function vmlinux!pollwake vmlinux!__wake_up_common vmlinux!__wake_up_common_lock vmlinux!__wake_up vmlinux!perf_event_wakeup vmlinux!perf_pending_event vmlinux!irq_work_run_list vmlinux!irq_work_run vmlinux!smp_irq_work_iterrupt vmlinux!irq_work_interrupt vmlinux!finish_task_switch vmlinux!__schedule vmlinux!schedule_idle vmlinux!do_idle vmlinux!cpu_startup_entry vmlinux!start_secondary vmlinux!secondary_startup_64 Does it raise any red flag ? Thanks, Mathieu > > Thanks, > > Mathieu > >> >> -- Steve >> >>> >>> It appears that interrupts are re-enabled before rcu_eqs_exit() is >>> invoked when exiting idle code from the scheduler. >>> >>> I wonder what happens if an interrupt handler (including scheduler code) >>> happens to issue a RCU read-side critical section before rcu_eqs_exit() >>> is called ? Is there some code on interrupt entry that ensures rcu eqs >>> state is exited in such scenario ? >>> >>> Thanks, >>> >>> Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit 2019-01-10 17:11 ` Mathieu Desnoyers @ 2019-01-10 17:23 ` Steven Rostedt 2019-01-10 17:25 ` Mathieu Desnoyers 1 sibling, 0 replies; 14+ messages in thread From: Steven Rostedt @ 2019-01-10 17:23 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra On Thu, 10 Jan 2019 12:11:55 -0500 (EST) Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > ----- On Jan 10, 2019, at 8:44 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > > > ----- On Jan 10, 2019, at 8:08 AM, rostedt rostedt@goodmis.org wrote: > > > >> On Wed, 9 Jan 2019 20:38:51 -0500 (EST) > >> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >> > >>> Hi Paul, > >>> > >>> I've had a user report that trace_sched_waking() appears to be > >>> invoked while !rcu_is_watching() in some situation, so I started > >>> digging into the scheduler idle code. > >> > >> I'm wondering if this isn't a bug. Do you have the backtrace for where > >> trace_sched_waking() was called without rcu watching? > > > > I strongly suspect a bug as well. I'm awaiting a reproducer from the > > user whom reported this issue so I can add a WARN_ON_ONCE(!rcu_is_watching()) > > in the scheduler code near trace_sched_waking() and gather a backtrace. > > > > It still has to be confirmed, but I suspect this have been triggered > > within a HyperV guest. It may therefore be related to a virtualized environment. > > > > I'll try to ask more specifically on which environment this was encountered. > > So it ends up it happens directly on hardware on a Linux laptop. Here is > the stacktrace: > > vmlinux!try_to_wake_up > vmlinux!default_wake_function > vmlinux!pollwake > vmlinux!__wake_up_common > vmlinux!__wake_up_common_lock > vmlinux!__wake_up > vmlinux!perf_event_wakeup > vmlinux!perf_pending_event > vmlinux!irq_work_run_list > vmlinux!irq_work_run > vmlinux!smp_irq_work_iterrupt The first thing this calls is ipi_entering_ack_irq(), which calls irq_enter(), which calls rcu_irq_enter(). So why isn't rcu watching here? -- Steve > vmlinux!irq_work_interrupt > vmlinux!finish_task_switch > vmlinux!__schedule > vmlinux!schedule_idle > vmlinux!do_idle > vmlinux!cpu_startup_entry > vmlinux!start_secondary > vmlinux!secondary_startup_64 > > Does it raise any red flag ? > > Thanks, > > Mathieu > > > > > Thanks, > > > > Mathieu > > > >> > >> -- Steve > >> > >>> > >>> It appears that interrupts are re-enabled before rcu_eqs_exit() is > >>> invoked when exiting idle code from the scheduler. > >>> > >>> I wonder what happens if an interrupt handler (including scheduler code) > >>> happens to issue a RCU read-side critical section before rcu_eqs_exit() > >>> is called ? Is there some code on interrupt entry that ensures rcu eqs > >>> state is exited in such scenario ? > >>> > >>> Thanks, > >>> > >>> Mathieu > > > > -- > > Mathieu Desnoyers > > EfficiOS Inc. > > http://www.efficios.com > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit 2019-01-10 17:11 ` Mathieu Desnoyers 2019-01-10 17:23 ` Steven Rostedt @ 2019-01-10 17:25 ` Mathieu Desnoyers 2019-01-10 17:45 ` Perf: event wakeup discards sched_waking events Mathieu Desnoyers 1 sibling, 1 reply; 14+ messages in thread From: Mathieu Desnoyers @ 2019-01-10 17:25 UTC (permalink / raw) To: rostedt; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra ----- On Jan 10, 2019, at 9:11 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > ----- On Jan 10, 2019, at 8:44 AM, Mathieu Desnoyers > mathieu.desnoyers@efficios.com wrote: > >> ----- On Jan 10, 2019, at 8:08 AM, rostedt rostedt@goodmis.org wrote: >> >>> On Wed, 9 Jan 2019 20:38:51 -0500 (EST) >>> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: >>> >>>> Hi Paul, >>>> >>>> I've had a user report that trace_sched_waking() appears to be >>>> invoked while !rcu_is_watching() in some situation, so I started >>>> digging into the scheduler idle code. >>> >>> I'm wondering if this isn't a bug. Do you have the backtrace for where >>> trace_sched_waking() was called without rcu watching? >> >> I strongly suspect a bug as well. I'm awaiting a reproducer from the >> user whom reported this issue so I can add a WARN_ON_ONCE(!rcu_is_watching()) >> in the scheduler code near trace_sched_waking() and gather a backtrace. >> >> It still has to be confirmed, but I suspect this have been triggered >> within a HyperV guest. It may therefore be related to a virtualized environment. >> >> I'll try to ask more specifically on which environment this was encountered. > > So it ends up it happens directly on hardware on a Linux laptop. Here is > the stacktrace: > > vmlinux!try_to_wake_up > vmlinux!default_wake_function > vmlinux!pollwake > vmlinux!__wake_up_common > vmlinux!__wake_up_common_lock > vmlinux!__wake_up > vmlinux!perf_event_wakeup > vmlinux!perf_pending_event > vmlinux!irq_work_run_list > vmlinux!irq_work_run > vmlinux!smp_irq_work_iterrupt > vmlinux!irq_work_interrupt > vmlinux!finish_task_switch > vmlinux!__schedule > vmlinux!schedule_idle > vmlinux!do_idle > vmlinux!cpu_startup_entry > vmlinux!start_secondary > vmlinux!secondary_startup_64 > > Does it raise any red flag ? Based on this backtrace, I think I start to get a better understanding of the situation. The initial problem reported to me was that ftrace was showing some sched_waking events in its trace that were missed by perf. I presumed this was because of the !rcu_is_watching() check, but I think I was wrong. This backtrace seems to show that perf is itself triggering a sched_waking() event. It there is probably a check that discards nested events within perf, which would discard this from perf traces, but ftrace (and lttng) would trace it just fine. Thoughts ? Thanks, Mathieu > > Thanks, > > Mathieu > >> >> Thanks, >> >> Mathieu >> >>> >>> -- Steve >>> >>>> >>>> It appears that interrupts are re-enabled before rcu_eqs_exit() is >>>> invoked when exiting idle code from the scheduler. >>>> >>>> I wonder what happens if an interrupt handler (including scheduler code) >>>> happens to issue a RCU read-side critical section before rcu_eqs_exit() >>>> is called ? Is there some code on interrupt entry that ensures rcu eqs >>>> state is exited in such scenario ? >>>> >>>> Thanks, >>>> >>>> Mathieu >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Perf: event wakeup discards sched_waking events 2019-01-10 17:25 ` Mathieu Desnoyers @ 2019-01-10 17:45 ` Mathieu Desnoyers 2019-01-14 13:09 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Mathieu Desnoyers @ 2019-01-10 17:45 UTC (permalink / raw) To: rostedt; +Cc: Paul E. McKenney, linux-kernel, Peter Zijlstra ----- On Jan 10, 2019, at 9:25 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > ----- On Jan 10, 2019, at 9:11 AM, Mathieu Desnoyers > mathieu.desnoyers@efficios.com wrote: > >> ----- On Jan 10, 2019, at 8:44 AM, Mathieu Desnoyers >> mathieu.desnoyers@efficios.com wrote: >> >>> ----- On Jan 10, 2019, at 8:08 AM, rostedt rostedt@goodmis.org wrote: >>> >>>> On Wed, 9 Jan 2019 20:38:51 -0500 (EST) >>>> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: >>>> >>>>> Hi Paul, >>>>> >>>>> I've had a user report that trace_sched_waking() appears to be >>>>> invoked while !rcu_is_watching() in some situation, so I started >>>>> digging into the scheduler idle code. >>>> >>>> I'm wondering if this isn't a bug. Do you have the backtrace for where >>>> trace_sched_waking() was called without rcu watching? >>> >>> I strongly suspect a bug as well. I'm awaiting a reproducer from the >>> user whom reported this issue so I can add a WARN_ON_ONCE(!rcu_is_watching()) >>> in the scheduler code near trace_sched_waking() and gather a backtrace. >>> >>> It still has to be confirmed, but I suspect this have been triggered >>> within a HyperV guest. It may therefore be related to a virtualized environment. >>> >>> I'll try to ask more specifically on which environment this was encountered. >> >> So it ends up it happens directly on hardware on a Linux laptop. Here is >> the stacktrace: >> >> vmlinux!try_to_wake_up >> vmlinux!default_wake_function >> vmlinux!pollwake >> vmlinux!__wake_up_common >> vmlinux!__wake_up_common_lock >> vmlinux!__wake_up >> vmlinux!perf_event_wakeup >> vmlinux!perf_pending_event >> vmlinux!irq_work_run_list >> vmlinux!irq_work_run >> vmlinux!smp_irq_work_iterrupt >> vmlinux!irq_work_interrupt >> vmlinux!finish_task_switch >> vmlinux!__schedule >> vmlinux!schedule_idle >> vmlinux!do_idle >> vmlinux!cpu_startup_entry >> vmlinux!start_secondary >> vmlinux!secondary_startup_64 >> >> Does it raise any red flag ? > > Based on this backtrace, I think I start to get a better understanding > of the situation. > > The initial problem reported to me was that ftrace was showing some sched_waking > events in its trace that were missed by perf. > > I presumed this was because of the !rcu_is_watching() check, but I think I was > wrong. > > This backtrace seems to show that perf is itself triggering a sched_waking() > event. It there is probably a check that discards nested events within perf, > which would discard this from perf traces, but ftrace (and lttng) would trace > it just fine. Looking at: static void perf_pending_event(struct irq_work *entry) { struct perf_event *event = container_of(entry, struct perf_event, pending); int rctx; rctx = perf_swevent_get_recursion_context(); /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. */ if (event->pending_disable) { event->pending_disable = 0; perf_event_disable_local(event); } if (event->pending_wakeup) { event->pending_wakeup = 0; perf_event_wakeup(event); } if (rctx >= 0) perf_swevent_put_recursion_context(rctx); } One side-effect of perf_event_wakeup() is to generate a sched_waking event. But I suspect it won't be traced by perf because it is invoked before putting the recursion context. Is there a reason why the wakeup is done before putting the recursion context ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Perf: event wakeup discards sched_waking events 2019-01-10 17:45 ` Perf: event wakeup discards sched_waking events Mathieu Desnoyers @ 2019-01-14 13:09 ` Peter Zijlstra 2019-01-14 21:36 ` Mathieu Desnoyers 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2019-01-14 13:09 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: rostedt, Paul E. McKenney, linux-kernel On Thu, Jan 10, 2019 at 12:45:27PM -0500, Mathieu Desnoyers wrote: > static void perf_pending_event(struct irq_work *entry) > { > struct perf_event *event = container_of(entry, > struct perf_event, pending); > int rctx; > > rctx = perf_swevent_get_recursion_context(); > /* > * If we 'fail' here, that's OK, it means recursion is already disabled > * and we won't recurse 'further'. > */ > > if (event->pending_disable) { > event->pending_disable = 0; > perf_event_disable_local(event); > } > > if (event->pending_wakeup) { > event->pending_wakeup = 0; > perf_event_wakeup(event); > } > > if (rctx >= 0) > perf_swevent_put_recursion_context(rctx); > } > > One side-effect of perf_event_wakeup() is to generate a sched_waking > event. But I suspect it won't be traced by perf because it is invoked before > putting the recursion context. > > Is there a reason why the wakeup is done before putting the recursion > context ? d525211f9d1b ("perf: Fix irq_work 'tail' recursion") If we were to allow perf_event_wakeup() to generate its tracepoint, we'd be back to square #1, no? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Perf: event wakeup discards sched_waking events 2019-01-14 13:09 ` Peter Zijlstra @ 2019-01-14 21:36 ` Mathieu Desnoyers 2019-01-14 22:04 ` Steven Rostedt 0 siblings, 1 reply; 14+ messages in thread From: Mathieu Desnoyers @ 2019-01-14 21:36 UTC (permalink / raw) To: Peter Zijlstra; +Cc: rostedt, Paul E. McKenney, linux-kernel ----- On Jan 14, 2019, at 8:09 AM, Peter Zijlstra peterz@infradead.org wrote: > On Thu, Jan 10, 2019 at 12:45:27PM -0500, Mathieu Desnoyers wrote: > >> static void perf_pending_event(struct irq_work *entry) >> { >> struct perf_event *event = container_of(entry, >> struct perf_event, pending); >> int rctx; >> >> rctx = perf_swevent_get_recursion_context(); >> /* >> * If we 'fail' here, that's OK, it means recursion is already disabled >> * and we won't recurse 'further'. >> */ >> >> if (event->pending_disable) { >> event->pending_disable = 0; >> perf_event_disable_local(event); >> } >> >> if (event->pending_wakeup) { >> event->pending_wakeup = 0; >> perf_event_wakeup(event); >> } >> >> if (rctx >= 0) >> perf_swevent_put_recursion_context(rctx); >> } >> >> One side-effect of perf_event_wakeup() is to generate a sched_waking >> event. But I suspect it won't be traced by perf because it is invoked before >> putting the recursion context. >> >> Is there a reason why the wakeup is done before putting the recursion >> context ? > > d525211f9d1b ("perf: Fix irq_work 'tail' recursion") > > If we were to allow perf_event_wakeup() to generate its tracepoint, we'd > be back to square #1, no? Considering that perf tracing code has side-effects that generate additional events, it's indeed best not to trace them, otherwise you indeed end up with tail recursion. Can ftrace end up in the same situation through rb_wake_up_waiters() ? I suspect the tail recursion would be hard to trigger if the wakeup only happens once per page though, unless the events generated end up filling up a page. FWIW, LTTng avoids this entire issue by using a timer-based polling mechanism to ensure the tracing code does not call into the scheduler wakeup. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Perf: event wakeup discards sched_waking events 2019-01-14 21:36 ` Mathieu Desnoyers @ 2019-01-14 22:04 ` Steven Rostedt 2019-01-14 22:31 ` Mathieu Desnoyers 0 siblings, 1 reply; 14+ messages in thread From: Steven Rostedt @ 2019-01-14 22:04 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: Peter Zijlstra, Paul E. McKenney, linux-kernel On Mon, 14 Jan 2019 16:36:59 -0500 (EST) Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > Can ftrace end up in the same situation through rb_wake_up_waiters() ? I suspect > the tail recursion would be hard to trigger if the wakeup only happens once per > page though, unless the events generated end up filling up a page. And only events from the irq work that was caused by the ftrace wakeup, which is highly unlikely. Note, the lastest kernel only wakes up after half the buffer is full (by default, but that can be changed), as I found that it gives the best performance to keeping up with traces. I can actually trace small loads and get all events now. Before, the waking of the tracer would cause its own events to fill up the buffer and not be able to keep up even on simple loads. > > FWIW, LTTng avoids this entire issue by using a timer-based polling mechanism > to ensure the tracing code does not call into the scheduler wakeup. Does your timer stop if the system is idle and tracing is enabled? -- Steve ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Perf: event wakeup discards sched_waking events 2019-01-14 22:04 ` Steven Rostedt @ 2019-01-14 22:31 ` Mathieu Desnoyers 0 siblings, 0 replies; 14+ messages in thread From: Mathieu Desnoyers @ 2019-01-14 22:31 UTC (permalink / raw) To: rostedt; +Cc: Peter Zijlstra, Paul E. McKenney, linux-kernel, Thomas Gleixner ----- On Jan 14, 2019, at 5:04 PM, rostedt rostedt@goodmis.org wrote: > On Mon, 14 Jan 2019 16:36:59 -0500 (EST) > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >> Can ftrace end up in the same situation through rb_wake_up_waiters() ? I suspect >> the tail recursion would be hard to trigger if the wakeup only happens once per >> page though, unless the events generated end up filling up a page. > > And only events from the irq work that was caused by the ftrace wakeup, > which is highly unlikely. > > Note, the lastest kernel only wakes up after half the buffer is full > (by default, but that can be changed), as I found that it gives the > best performance to keeping up with traces. I can actually trace small > loads and get all events now. Before, the waking of the tracer would > cause its own events to fill up the buffer and not be able to keep up > even on simple loads. > >> >> FWIW, LTTng avoids this entire issue by using a timer-based polling mechanism >> to ensure the tracing code does not call into the scheduler wakeup. > > Does your timer stop if the system is idle and tracing is enabled? LTTng exposes three functions for integration with CONFIG_NO_HZ: lib_ring_buffer_tick_nohz_flush() lib_ring_buffer_tick_nohz_stop() lib_ring_buffer_tick_nohz_restart() Unfortunately, last time I checked, the Linux kernel did not expose any modules API to register nohz notifiers to call those functions when going to idle and getting out of idle. Therefore, the plumbing is there in LTTng (it has been done when I implemented libringbuffer about 6 years ago intending to contribute it to the Linux kernel), but this code has never been reachable within lttng-modules due to lack of kernel support. Thomas Gleixner expressed strong feelings against introducing tick nohz notifiers for kernel modules in the past, so I never pushed in that direction. Perhaps there is now a way to properly wire this up that I've missed ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-01-14 22:31 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-10 1:38 Possible use of RCU while in extended QS: idle vs RCU read-side in interrupt vs rcu_eqs_exit Mathieu Desnoyers 2019-01-10 4:13 ` Paul E. McKenney 2019-01-10 6:30 ` Mathieu Desnoyers 2019-01-10 14:19 ` Paul E. McKenney 2019-01-10 16:08 ` Steven Rostedt 2019-01-10 16:44 ` Mathieu Desnoyers 2019-01-10 17:11 ` Mathieu Desnoyers 2019-01-10 17:23 ` Steven Rostedt 2019-01-10 17:25 ` Mathieu Desnoyers 2019-01-10 17:45 ` Perf: event wakeup discards sched_waking events Mathieu Desnoyers 2019-01-14 13:09 ` Peter Zijlstra 2019-01-14 21:36 ` Mathieu Desnoyers 2019-01-14 22:04 ` Steven Rostedt 2019-01-14 22:31 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).