* [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook @ 2020-02-10 22:06 Steven Rostedt 2020-02-11 0:30 ` Mathieu Desnoyers ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Steven Rostedt @ 2020-02-10 22:06 UTC (permalink / raw) To: LKML Cc: Peter Zijlstra, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Paul E. McKenney, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan From: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Commit e6753f23d961d ("tracepoint: Make rcuidle tracepoint callers use SRCU") removed the calls to rcu_irq_enter/exit_irqson() and replaced it with srcu callbacks as that much faster for the rcuidle cases. But this caused an issue with perf, because perf only uses rcu to synchronize trace points. The issue was that if perf traced one of the "rcuidle" paths, that path no longer enabled RCU if it was not watching, and this caused lockdep to complain when the perf code used rcu_read_lock() and RCU was not "watching". Commit 865e63b04e9b2 ("tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints") added back the rcu_irq_enter/exit_irqson() code, but this made the srcu changes no longer applicable. As perf is the only callback that needs the heavier weight "rcu_irq_enter/exit_irqson()" calls, move it to the perf specific code and not bog down those that do not require it. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> --- include/linux/tracepoint.h | 8 ++------ include/trace/perf.h | 17 +++++++++++++++-- kernel/rcu/tree.c | 2 ++ 3 files changed, 19 insertions(+), 8 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 1fb11daa5c53..a83fd076a312 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -179,10 +179,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) * For rcuidle callers, use srcu since sched-rcu \ * doesn't work from the idle path. \ */ \ - if (rcuidle) { \ + if (rcuidle) \ __idx = srcu_read_lock_notrace(&tracepoint_srcu);\ - rcu_irq_enter_irqson(); \ - } \ \ it_func_ptr = rcu_dereference_raw((tp)->funcs); \ \ @@ -194,10 +192,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) } while ((++it_func_ptr)->func); \ } \ \ - if (rcuidle) { \ - rcu_irq_exit_irqson(); \ + if (rcuidle) \ srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\ - } \ \ preempt_enable_notrace(); \ } while (0) diff --git a/include/trace/perf.h b/include/trace/perf.h index dbc6c74defc3..1c94ce0cd4e2 100644 --- a/include/trace/perf.h +++ b/include/trace/perf.h @@ -39,17 +39,27 @@ perf_trace_##call(void *__data, proto) \ u64 __count = 1; \ struct task_struct *__task = NULL; \ struct hlist_head *head; \ + bool rcu_watching; \ int __entry_size; \ int __data_size; \ int rctx; \ \ + rcu_watching = rcu_is_watching(); \ + \ __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \ \ + if (!rcu_watching) { \ + /* Can not use RCU if rcu is not watching and in NMI */ \ + if (in_nmi()) \ + return; \ + rcu_irq_enter_irqson(); \ + } \ + \ head = this_cpu_ptr(event_call->perf_events); \ if (!bpf_prog_array_valid(event_call) && \ __builtin_constant_p(!__task) && !__task && \ hlist_empty(head)) \ - return; \ + goto out; \ \ __entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\ sizeof(u64)); \ @@ -57,7 +67,7 @@ perf_trace_##call(void *__data, proto) \ \ entry = perf_trace_buf_alloc(__entry_size, &__regs, &rctx); \ if (!entry) \ - return; \ + goto out; \ \ perf_fetch_caller_regs(__regs); \ \ @@ -68,6 +78,9 @@ perf_trace_##call(void *__data, proto) \ perf_trace_run_bpf_submit(entry, __entry_size, rctx, \ event_call, __count, __regs, \ head, __task); \ +out: \ + if (!rcu_watching) \ + rcu_irq_exit_irqson(); \ } /* diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 1694a6b57ad8..3e6f07b62515 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -719,6 +719,7 @@ void rcu_irq_exit_irqson(void) rcu_irq_exit(); local_irq_restore(flags); } +EXPORT_SYMBOL_GPL(rcu_irq_exit_irqson); /* * Exit an RCU extended quiescent state, which can be either the @@ -890,6 +891,7 @@ void rcu_irq_enter_irqson(void) rcu_irq_enter(); local_irq_restore(flags); } +EXPORT_SYMBOL_GPL(rcu_irq_enter_irqson); /* * If any sort of urgency was applied to the current CPU (for example, -- 2.20.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-10 22:06 [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook Steven Rostedt @ 2020-02-11 0:30 ` Mathieu Desnoyers 2020-02-11 2:22 ` Steven Rostedt 2020-02-11 12:00 ` Peter Zijlstra 2020-02-11 11:49 ` Peter Zijlstra 2020-02-11 12:21 ` Peter Zijlstra 2 siblings, 2 replies; 23+ messages in thread From: Mathieu Desnoyers @ 2020-02-11 0:30 UTC (permalink / raw) To: rostedt Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Joel Fernandes, Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, paulmck, Josh Triplett, Lai Jiangshan ----- On Feb 10, 2020, at 5:06 PM, rostedt rostedt@goodmis.org wrote: > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Hi Steven, I agree with the general direction taken by this patch, but I would like to bring a clarification to the changelog, and I'm worried about its handling of NMI handlers nesting over rcuidle context. > > Commit e6753f23d961d ("tracepoint: Make rcuidle tracepoint callers use > SRCU") removed the calls to rcu_irq_enter/exit_irqson() and replaced it with > srcu callbacks as that much faster for the rcuidle cases. But this caused an > issue with perf, so far, so good. > because perf only uses rcu to synchronize trace points. That last part seems inaccurate. The tracepoint synchronization is two-fold: one part is internal to tracepoint.c (see rcu_free_old_probes()), and the other is only needed if the probes are within modules which can be unloaded (see tracepoint_synchronize_unregister()). AFAIK, perf never implements probe callbacks within modules, so the latter is not needed by perf. The culprit of the problem here is that perf issues "rcu_read_lock()" and "rcu_read_unlock()" within the probe callbacks it registers to the tracepoints, including the rcuidle ones. Those require that RCU is "watching", which is triggering the regression when we remove the calls to rcu_irq_enter/exit_irqson() from the rcuidle tracepoint instrumentation sites. > > The issue was that if perf traced one of the "rcuidle" paths, that path no > longer enabled RCU if it was not watching, and this caused lockdep to > complain when the perf code used rcu_read_lock() and RCU was not "watching". Yes. > > Commit 865e63b04e9b2 ("tracing: Add back in rcu_irq_enter/exit_irqson() for > rcuidle tracepoints") added back the rcu_irq_enter/exit_irqson() code, but > this made the srcu changes no longer applicable. > > As perf is the only callback that needs the heavier weight > "rcu_irq_enter/exit_irqson()" calls, move it to the perf specific code and > not bog down those that do not require it. Yes. Which brings a question about handling of NMIs: in the proposed patch, if a NMI nests over rcuidle context, AFAIU it will be in a state !rcu_is_watching() && in_nmi(), which is handled by this patch with a simple "return", meaning important NMIs doing hardware event sampling can be completely lost. Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI context, is it at all valid to use rcu_read_lock/unlock() as perf does from NMI handlers, considering that those can be nested on top of rcuidle context ? Thanks, Mathieu > > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> > --- > include/linux/tracepoint.h | 8 ++------ > include/trace/perf.h | 17 +++++++++++++++-- > kernel/rcu/tree.c | 2 ++ > 3 files changed, 19 insertions(+), 8 deletions(-) > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index 1fb11daa5c53..a83fd076a312 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -179,10 +179,8 @@ static inline struct tracepoint > *tracepoint_ptr_deref(tracepoint_ptr_t *p) > * For rcuidle callers, use srcu since sched-rcu \ > * doesn't work from the idle path. \ > */ \ > - if (rcuidle) { \ > + if (rcuidle) \ > __idx = srcu_read_lock_notrace(&tracepoint_srcu);\ > - rcu_irq_enter_irqson(); \ > - } \ > \ > it_func_ptr = rcu_dereference_raw((tp)->funcs); \ > \ > @@ -194,10 +192,8 @@ static inline struct tracepoint > *tracepoint_ptr_deref(tracepoint_ptr_t *p) > } while ((++it_func_ptr)->func); \ > } \ > \ > - if (rcuidle) { \ > - rcu_irq_exit_irqson(); \ > + if (rcuidle) \ > srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\ > - } \ > \ > preempt_enable_notrace(); \ > } while (0) > diff --git a/include/trace/perf.h b/include/trace/perf.h > index dbc6c74defc3..1c94ce0cd4e2 100644 > --- a/include/trace/perf.h > +++ b/include/trace/perf.h > @@ -39,17 +39,27 @@ perf_trace_##call(void *__data, proto) \ > u64 __count = 1; \ > struct task_struct *__task = NULL; \ > struct hlist_head *head; \ > + bool rcu_watching; \ > int __entry_size; \ > int __data_size; \ > int rctx; \ > \ > + rcu_watching = rcu_is_watching(); \ > + \ > __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \ > \ > + if (!rcu_watching) { \ > + /* Can not use RCU if rcu is not watching and in NMI */ \ > + if (in_nmi()) \ > + return; \ > + rcu_irq_enter_irqson(); \ > + } \ > + \ > head = this_cpu_ptr(event_call->perf_events); \ > if (!bpf_prog_array_valid(event_call) && \ > __builtin_constant_p(!__task) && !__task && \ > hlist_empty(head)) \ > - return; \ > + goto out; \ > \ > __entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\ > sizeof(u64)); \ > @@ -57,7 +67,7 @@ perf_trace_##call(void *__data, proto) \ > \ > entry = perf_trace_buf_alloc(__entry_size, &__regs, &rctx); \ > if (!entry) \ > - return; \ > + goto out; \ > \ > perf_fetch_caller_regs(__regs); \ > \ > @@ -68,6 +78,9 @@ perf_trace_##call(void *__data, proto) \ > perf_trace_run_bpf_submit(entry, __entry_size, rctx, \ > event_call, __count, __regs, \ > head, __task); \ > +out: \ > + if (!rcu_watching) \ > + rcu_irq_exit_irqson(); \ > } > > /* > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 1694a6b57ad8..3e6f07b62515 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -719,6 +719,7 @@ void rcu_irq_exit_irqson(void) > rcu_irq_exit(); > local_irq_restore(flags); > } > +EXPORT_SYMBOL_GPL(rcu_irq_exit_irqson); > > /* > * Exit an RCU extended quiescent state, which can be either the > @@ -890,6 +891,7 @@ void rcu_irq_enter_irqson(void) > rcu_irq_enter(); > local_irq_restore(flags); > } > +EXPORT_SYMBOL_GPL(rcu_irq_enter_irqson); > > /* > * If any sort of urgency was applied to the current CPU (for example, > -- > 2.20.1 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 0:30 ` Mathieu Desnoyers @ 2020-02-11 2:22 ` Steven Rostedt 2020-02-11 2:32 ` joel 2020-02-11 15:19 ` Mathieu Desnoyers 2020-02-11 12:00 ` Peter Zijlstra 1 sibling, 2 replies; 23+ messages in thread From: Steven Rostedt @ 2020-02-11 2:22 UTC (permalink / raw) To: Mathieu Desnoyers Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Joel Fernandes, Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, paulmck, Josh Triplett, Lai Jiangshan On Mon, 10 Feb 2020 19:30:32 -0500 (EST) Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > ----- On Feb 10, 2020, at 5:06 PM, rostedt rostedt@goodmis.org wrote: > > > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org> > > Hi Steven, Hi Mathieu! > > > because perf only uses rcu to synchronize trace points. > > That last part seems inaccurate. The tracepoint synchronization is two-fold: > one part is internal to tracepoint.c (see rcu_free_old_probes()), and the other > is only needed if the probes are within modules which can be unloaded (see > tracepoint_synchronize_unregister()). AFAIK, perf never implements probe callbacks > within modules, so the latter is not needed by perf. > > The culprit of the problem here is that perf issues "rcu_read_lock()" and > "rcu_read_unlock()" within the probe callbacks it registers to the tracepoints, > including the rcuidle ones. Those require that RCU is "watching", which is > triggering the regression when we remove the calls to rcu_irq_enter/exit_irqson() > from the rcuidle tracepoint instrumentation sites. 100% agree. I guess I need to clarify what I meant with "rcu to synchronize trace points". I meant its trace point *callbacks*, not the trace point code itself. > > Which brings a question about handling of NMIs: in the proposed patch, if > a NMI nests over rcuidle context, AFAIU it will be in a state > !rcu_is_watching() && in_nmi(), which is handled by this patch with a simple > "return", meaning important NMIs doing hardware event sampling can be > completely lost. > > Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI context, > is it at all valid to use rcu_read_lock/unlock() as perf does from NMI handlers, > considering that those can be nested on top of rcuidle context ? > Note, in the __DO_TRACE macro, we've had this for a long time: /* srcu can't be used from NMI */ \ WARN_ON_ONCE(rcuidle && in_nmi()); \ With nothing triggering. -- Steve ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 2:22 ` Steven Rostedt @ 2020-02-11 2:32 ` joel 2020-02-11 15:19 ` Mathieu Desnoyers 1 sibling, 0 replies; 23+ messages in thread From: joel @ 2020-02-11 2:32 UTC (permalink / raw) To: Steven Rostedt, Mathieu Desnoyers Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, paulmck, Josh Triplett, Lai Jiangshan On February 10, 2020 9:22:22 PM EST, Steven Rostedt <rostedt@goodmis.org> wrote: > >> Which brings a question about handling of NMIs: in the proposed >patch, if >> a NMI nests over rcuidle context, AFAIU it will be in a state >> !rcu_is_watching() && in_nmi(), which is handled by this patch with a >simple >> "return", meaning important NMIs doing hardware event sampling can be >> completely lost. >> >> Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI >context, >> is it at all valid to use rcu_read_lock/unlock() as perf does from >NMI handlers, >> considering that those can be nested on top of rcuidle context ? >> > >Note, in the __DO_TRACE macro, we've had this for a long time: > > /* srcu can't be used from NMI */ \ > WARN_ON_ONCE(rcuidle && in_nmi()); \ > >With nothing triggering. I did not understand Mathieu's question, afaik perf event sampling code in NMI handler does not invoke trace_..._rcuidle functions anywhere. That afair is independently dealt within perf and does not involve tracepoint code. And if NMI has interrupted any code currently running in __DO_TRACE, that's ok because NMI is higher priority and will run to completion before resuming the interrupted code. Did I miss something? I am not surprised the warning doesn't ever trigger. Thanks, Joel. > >-- Steve -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 2:22 ` Steven Rostedt 2020-02-11 2:32 ` joel @ 2020-02-11 15:19 ` Mathieu Desnoyers 1 sibling, 0 replies; 23+ messages in thread From: Mathieu Desnoyers @ 2020-02-11 15:19 UTC (permalink / raw) To: rostedt Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Joel Fernandes, Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, paulmck, Josh Triplett, Lai Jiangshan ----- On Feb 10, 2020, at 9:22 PM, rostedt rostedt@goodmis.org wrote: > On Mon, 10 Feb 2020 19:30:32 -0500 (EST) > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >> ----- On Feb 10, 2020, at 5:06 PM, rostedt rostedt@goodmis.org wrote: >> >> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org> >> >> Hi Steven, > > Hi Mathieu! [...] >> >> Which brings a question about handling of NMIs: in the proposed patch, if >> a NMI nests over rcuidle context, AFAIU it will be in a state >> !rcu_is_watching() && in_nmi(), which is handled by this patch with a simple >> "return", meaning important NMIs doing hardware event sampling can be >> completely lost. >> >> Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI context, >> is it at all valid to use rcu_read_lock/unlock() as perf does from NMI handlers, >> considering that those can be nested on top of rcuidle context ? >> > > Note, in the __DO_TRACE macro, we've had this for a long time: > > /* srcu can't be used from NMI */ \ > WARN_ON_ONCE(rcuidle && in_nmi()); \ > > With nothing triggering. The "rcuidle" argument is only true for tracepoints which are declared to be used within the rcuidle code. AFAIK, it does not cover tracepoints which can be placed in NMI handlers. The state I am concerned about is really: WARN_ON_ONCE(!rcu_is_watching() && in_nmi()) As pointed out by Peter further down in this thread. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 0:30 ` Mathieu Desnoyers 2020-02-11 2:22 ` Steven Rostedt @ 2020-02-11 12:00 ` Peter Zijlstra 2020-02-11 13:03 ` Paul E. McKenney 2020-02-11 14:10 ` Steven Rostedt 1 sibling, 2 replies; 23+ messages in thread From: Peter Zijlstra @ 2020-02-11 12:00 UTC (permalink / raw) To: Mathieu Desnoyers Cc: rostedt, linux-kernel, Ingo Molnar, Joel Fernandes, Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, paulmck, Josh Triplett, Lai Jiangshan On Mon, Feb 10, 2020 at 07:30:32PM -0500, Mathieu Desnoyers wrote: > > because perf only uses rcu to synchronize trace points. > > That last part seems inaccurate. The tracepoint synchronization is two-fold: > one part is internal to tracepoint.c (see rcu_free_old_probes()), and the other > is only needed if the probes are within modules which can be unloaded (see > tracepoint_synchronize_unregister()). AFAIK, perf never implements probe callbacks > within modules, so the latter is not needed by perf. > > The culprit of the problem here is that perf issues "rcu_read_lock()" and > "rcu_read_unlock()" within the probe callbacks it registers to the tracepoints, > including the rcuidle ones. Those require that RCU is "watching", which is > triggering the regression when we remove the calls to rcu_irq_enter/exit_irqson() > from the rcuidle tracepoint instrumentation sites. It is not the fact that perf issues rcu_read_lock() that is the problem. As we established yesterday, I can probably remove most rcu_read_lock() calls from perf today (yay RCU flavour unification). The problem is that the core perf code uses RCU managed data; and we need an existence guarantee for it. It would be BAD (TM) if the ring-buffer we're writing data to were to suddenly dissapear under our feet etc.. > Which brings a question about handling of NMIs: in the proposed patch, if > a NMI nests over rcuidle context, AFAIU it will be in a state > !rcu_is_watching() && in_nmi(), which is handled by this patch with a simple > "return", meaning important NMIs doing hardware event sampling can be > completely lost. > > Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI context, > is it at all valid to use rcu_read_lock/unlock() as perf does from NMI handlers, Again, rcu_read_lock() itself really isn't the problem. But we need NMIs, just like regular interrupts, to imply rcu_read_lock(). That is, any observable (RCU managed) pointer must stay valid during the NMI/IRQ execution. > considering that those can be nested on top of rcuidle context ? As per nmi_enter() calling rcu_nmi_enter() I've always assumed that NMIs are fully covered by RCU. If this isn't so, RCU it terminally broken :-) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 12:00 ` Peter Zijlstra @ 2020-02-11 13:03 ` Paul E. McKenney 2020-02-11 13:16 ` Peter Zijlstra 2020-02-11 14:10 ` Steven Rostedt 1 sibling, 1 reply; 23+ messages in thread From: Paul E. McKenney @ 2020-02-11 13:03 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, rostedt, linux-kernel, Ingo Molnar, Joel Fernandes, Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Josh Triplett, Lai Jiangshan On Tue, Feb 11, 2020 at 01:00:15PM +0100, Peter Zijlstra wrote: > On Mon, Feb 10, 2020 at 07:30:32PM -0500, Mathieu Desnoyers wrote: > > > > because perf only uses rcu to synchronize trace points. > > > > That last part seems inaccurate. The tracepoint synchronization is two-fold: > > one part is internal to tracepoint.c (see rcu_free_old_probes()), and the other > > is only needed if the probes are within modules which can be unloaded (see > > tracepoint_synchronize_unregister()). AFAIK, perf never implements probe callbacks > > within modules, so the latter is not needed by perf. > > > > The culprit of the problem here is that perf issues "rcu_read_lock()" and > > "rcu_read_unlock()" within the probe callbacks it registers to the tracepoints, > > including the rcuidle ones. Those require that RCU is "watching", which is > > triggering the regression when we remove the calls to rcu_irq_enter/exit_irqson() > > from the rcuidle tracepoint instrumentation sites. > > It is not the fact that perf issues rcu_read_lock() that is the problem. > As we established yesterday, I can probably remove most rcu_read_lock() > calls from perf today (yay RCU flavour unification). Glad some aspect of this unification is actually helping you. ;-) > The problem is that the core perf code uses RCU managed data; and we > need an existence guarantee for it. It would be BAD (TM) if the > ring-buffer we're writing data to were to suddenly dissapear under our > feet etc.. > > > Which brings a question about handling of NMIs: in the proposed patch, if > > a NMI nests over rcuidle context, AFAIU it will be in a state > > !rcu_is_watching() && in_nmi(), which is handled by this patch with a simple > > "return", meaning important NMIs doing hardware event sampling can be > > completely lost. > > > > Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI context, > > is it at all valid to use rcu_read_lock/unlock() as perf does from NMI handlers, > > Again, rcu_read_lock() itself really isn't the problem. But we need > NMIs, just like regular interrupts, to imply rcu_read_lock(). That is, > any observable (RCU managed) pointer must stay valid during the NMI/IRQ > execution. > > > considering that those can be nested on top of rcuidle context ? > > As per nmi_enter() calling rcu_nmi_enter() I've always assumed that NMIs > are fully covered by RCU. > > If this isn't so, RCU it terminally broken :-) All RCU can do is respond to calls to rcu_nmi_enter() and rcu_nmi_exit(). It has not yet figured out how to force people to add these calls where they are needed. ;-) But yes, it would be very nice if architectures arranged things so that all NMI handlers were visible to RCU. And we no longer have half-interrupts, so maybe there is hope... Thanx, Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 13:03 ` Paul E. McKenney @ 2020-02-11 13:16 ` Peter Zijlstra 2020-02-11 13:23 ` Paul E. McKenney 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2020-02-11 13:16 UTC (permalink / raw) To: Paul E. McKenney Cc: Mathieu Desnoyers, rostedt, linux-kernel, Ingo Molnar, Joel Fernandes, Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Josh Triplett, Lai Jiangshan On Tue, Feb 11, 2020 at 05:03:01AM -0800, Paul E. McKenney wrote: > > It is not the fact that perf issues rcu_read_lock() that is the problem. > > As we established yesterday, I can probably remove most rcu_read_lock() > > calls from perf today (yay RCU flavour unification). > > Glad some aspect of this unification is actually helping you. ;-) rcu_read_lock() is exceedingly cheap though, so I never really worried about it. But now that RCU includes RCU-sched (again) we can go and remove a bunch of them. > > As per nmi_enter() calling rcu_nmi_enter() I've always assumed that NMIs > > are fully covered by RCU. > > > > If this isn't so, RCU it terminally broken :-) > > All RCU can do is respond to calls to rcu_nmi_enter() and rcu_nmi_exit(). > It has not yet figured out how to force people to add these calls where > they are needed. ;-) > > But yes, it would be very nice if architectures arranged things so > that all NMI handlers were visible to RCU. And we no longer have > half-interrupts, so maybe there is hope... Well,.. you could go back to simply _always_ watching :-) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 13:16 ` Peter Zijlstra @ 2020-02-11 13:23 ` Paul E. McKenney 0 siblings, 0 replies; 23+ messages in thread From: Paul E. McKenney @ 2020-02-11 13:23 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, rostedt, linux-kernel, Ingo Molnar, Joel Fernandes, Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Josh Triplett, Lai Jiangshan On Tue, Feb 11, 2020 at 02:16:37PM +0100, Peter Zijlstra wrote: > On Tue, Feb 11, 2020 at 05:03:01AM -0800, Paul E. McKenney wrote: > > > > It is not the fact that perf issues rcu_read_lock() that is the problem. > > > As we established yesterday, I can probably remove most rcu_read_lock() > > > calls from perf today (yay RCU flavour unification). > > > > Glad some aspect of this unification is actually helping you. ;-) > > rcu_read_lock() is exceedingly cheap though, so I never really worried > about it. But now that RCU includes RCU-sched (again) we can go and > remove a bunch of them. > > > > As per nmi_enter() calling rcu_nmi_enter() I've always assumed that NMIs > > > are fully covered by RCU. > > > > > > If this isn't so, RCU it terminally broken :-) > > > > All RCU can do is respond to calls to rcu_nmi_enter() and rcu_nmi_exit(). > > It has not yet figured out how to force people to add these calls where > > they are needed. ;-) > > > > But yes, it would be very nice if architectures arranged things so > > that all NMI handlers were visible to RCU. And we no longer have > > half-interrupts, so maybe there is hope... > > Well,.. you could go back to simply _always_ watching :-) The idle loop always was unwatched, even back in DYNIX/ptx. And watching the idle loop requires waking up idle CPUs, which makes lots of people quite unhappy. But it could be done with something sort of like synchronize_rcu_tasks(), as long as this didn't need to be used in production on battery-powered systems. Thanx, Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 12:00 ` Peter Zijlstra 2020-02-11 13:03 ` Paul E. McKenney @ 2020-02-11 14:10 ` Steven Rostedt 1 sibling, 0 replies; 23+ messages in thread From: Steven Rostedt @ 2020-02-11 14:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Joel Fernandes, Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, paulmck, Josh Triplett, Lai Jiangshan On Tue, 11 Feb 2020 13:00:15 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > As per nmi_enter() calling rcu_nmi_enter() I've always assumed that NMIs > are fully covered by RCU. > > If this isn't so, RCU it terminally broken :-) Most of the time it is. But for tracing that injects callbacks at arbitrary points of code, it can break. I've always said that tracing is a more sensitive context than NMI itself. The reason NMIs are sensitive is because they can happen pretty much anywhere. But tracing can happen also in the context switch that enters NMI. This is why function tracing does the "rude" RCU flavor (yes Paul, I'd love you to add that flavor!), that performs a schedule_on_each_cpu() before freeing anything. Because it traces when RCU is not watching. But RCU really shouldn't have to bend over backward for tracing, as tracing is the exception and not the norm. -- Steve ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-10 22:06 [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook Steven Rostedt 2020-02-11 0:30 ` Mathieu Desnoyers @ 2020-02-11 11:49 ` Peter Zijlstra 2020-02-11 12:59 ` Paul E. McKenney 2020-02-11 14:05 ` Steven Rostedt 2020-02-11 12:21 ` Peter Zijlstra 2 siblings, 2 replies; 23+ messages in thread From: Peter Zijlstra @ 2020-02-11 11:49 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Paul E. McKenney, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote: > + if (!rcu_watching) { \ > + /* Can not use RCU if rcu is not watching and in NMI */ \ > + if (in_nmi()) \ > + return; \ > + rcu_irq_enter_irqson(); \ > + } \ I saw the same weirdness in __trace_stack(), and I'm confused by it. How can we ever get to: in_nmi() && !rcu_watching() ? That should be a BUG. In particular, nmi_enter() has rcu_nmi_enter(). Paul, can that really happen? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 11:49 ` Peter Zijlstra @ 2020-02-11 12:59 ` Paul E. McKenney 2020-02-11 13:10 ` Peter Zijlstra 2020-02-11 14:05 ` Steven Rostedt 1 sibling, 1 reply; 23+ messages in thread From: Paul E. McKenney @ 2020-02-11 12:59 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Tue, Feb 11, 2020 at 12:49:54PM +0100, Peter Zijlstra wrote: > On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote: > > + if (!rcu_watching) { \ > > + /* Can not use RCU if rcu is not watching and in NMI */ \ > > + if (in_nmi()) \ > > + return; \ > > + rcu_irq_enter_irqson(); \ > > + } \ > > I saw the same weirdness in __trace_stack(), and I'm confused by it. > > How can we ever get to: in_nmi() && !rcu_watching() ? That should be a > BUG. In particular, nmi_enter() has rcu_nmi_enter(). > > Paul, can that really happen? Not sure what the current situation is, but if I remember correctly it used to be possible to get to an NMI handler without RCU being informed. If NMI handlers now unconditionally inform RCU, then like you, I don't see that the "if (in_nmi()) return" is needed. However, a quick grep for NMI_MASK didn't show me the NMI_MASK bit getting set. Help? Thanx, Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 12:59 ` Paul E. McKenney @ 2020-02-11 13:10 ` Peter Zijlstra 2020-02-11 13:20 ` Paul E. McKenney 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2020-02-11 13:10 UTC (permalink / raw) To: Paul E. McKenney Cc: Steven Rostedt, LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Tue, Feb 11, 2020 at 04:59:29AM -0800, Paul E. McKenney wrote: > However, a quick grep for NMI_MASK didn't show me the NMI_MASK bit > getting set. Help? | #define nmi_enter() \ | do { \ | arch_nmi_enter(); \ | printk_nmi_enter(); \ | lockdep_off(); \ | ftrace_nmi_enter(); \ | BUG_ON(in_nmi()); \ | preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \ ^^^^ right there | rcu_nmi_enter(); \ | trace_hardirq_enter(); \ | } while (0) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 13:10 ` Peter Zijlstra @ 2020-02-11 13:20 ` Paul E. McKenney 0 siblings, 0 replies; 23+ messages in thread From: Paul E. McKenney @ 2020-02-11 13:20 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Tue, Feb 11, 2020 at 02:10:46PM +0100, Peter Zijlstra wrote: > On Tue, Feb 11, 2020 at 04:59:29AM -0800, Paul E. McKenney wrote: > > > However, a quick grep for NMI_MASK didn't show me the NMI_MASK bit > > getting set. Help? > > | #define nmi_enter() \ > | do { \ > | arch_nmi_enter(); \ > | printk_nmi_enter(); \ > | lockdep_off(); \ > | ftrace_nmi_enter(); \ > | BUG_ON(in_nmi()); \ > | preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \ > > ^^^^ right there > > | rcu_nmi_enter(); \ > | trace_hardirq_enter(); \ > | } while (0) Color me blind, and thank you! Thanx, Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 11:49 ` Peter Zijlstra 2020-02-11 12:59 ` Paul E. McKenney @ 2020-02-11 14:05 ` Steven Rostedt 2020-02-11 15:05 ` Peter Zijlstra 2020-02-11 15:06 ` Paul E. McKenney 1 sibling, 2 replies; 23+ messages in thread From: Steven Rostedt @ 2020-02-11 14:05 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Paul E. McKenney, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Tue, 11 Feb 2020 12:49:54 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote: > > + if (!rcu_watching) { \ > > + /* Can not use RCU if rcu is not watching and in NMI */ \ > > + if (in_nmi()) \ > > + return; \ > > + rcu_irq_enter_irqson(); \ > > + } \ > > I saw the same weirdness in __trace_stack(), and I'm confused by it. > > How can we ever get to: in_nmi() && !rcu_watching() ? That should be a > BUG. In particular, nmi_enter() has rcu_nmi_enter(). > > Paul, can that really happen? The stack tracer connects to the function tracer and is called at all the places that function tracing can be called from. As I like being able to trace RCU internal functions (especially as they are complex), I don't want to set them all to notrace. But, for callbacks that require RCU to be watching, we need this check, because there's some internal state that we can be in an NMI and RCU is not watching (as there's some places in nmi_enter that can be traced!). And if we are tracing preempt_enable and preempt_disable (as Joel added trace events there), it may be the case for trace events too. -- Steve ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 14:05 ` Steven Rostedt @ 2020-02-11 15:05 ` Peter Zijlstra 2020-02-11 15:29 ` Peter Zijlstra 2020-02-11 15:06 ` Paul E. McKenney 1 sibling, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2020-02-11 15:05 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Paul E. McKenney, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Tue, Feb 11, 2020 at 09:05:03AM -0500, Steven Rostedt wrote: > On Tue, 11 Feb 2020 12:49:54 +0100 > Peter Zijlstra <peterz@infradead.org> wrote: > > > On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote: > > > + if (!rcu_watching) { \ > > > + /* Can not use RCU if rcu is not watching and in NMI */ \ > > > + if (in_nmi()) \ > > > + return; \ > > > + rcu_irq_enter_irqson(); \ > > > + } \ > > > > I saw the same weirdness in __trace_stack(), and I'm confused by it. > > > > How can we ever get to: in_nmi() && !rcu_watching() ? That should be a > > BUG. In particular, nmi_enter() has rcu_nmi_enter(). > > > > Paul, can that really happen? > > The stack tracer connects to the function tracer and is called at all > the places that function tracing can be called from. As I like being > able to trace RCU internal functions (especially as they are complex), > I don't want to set them all to notrace. But, for callbacks that > require RCU to be watching, we need this check, because there's some > internal state that we can be in an NMI and RCU is not watching (as > there's some places in nmi_enter that can be traced!). > > And if we are tracing preempt_enable and preempt_disable (as Joel added > trace events there), it may be the case for trace events too. Bloody hell; what a trainwreck. Luckily there's comments around that explain this! So we haz: | #define nmi_enter() \ | do { \ | arch_nmi_enter(); \ arm64 only, lets ignore for now | printk_nmi_enter(); \ notrace | lockdep_off(); \ notrace | ftrace_nmi_enter(); \ !notrace !!! | BUG_ON(in_nmi()); \ | preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);\ lets make this __preempt_count_add() ASAP ! | rcu_nmi_enter(); \ are you _really_ sure you want to go trace that ?!? | trace_hardirq_enter(); \ | } while (0) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 15:05 ` Peter Zijlstra @ 2020-02-11 15:29 ` Peter Zijlstra 2020-02-11 16:16 ` Steven Rostedt 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2020-02-11 15:29 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Paul E. McKenney, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Tue, Feb 11, 2020 at 04:05:32PM +0100, Peter Zijlstra wrote: > So we haz: > > | #define nmi_enter() \ > | do { \ > | arch_nmi_enter(); \ > > arm64 only, lets ignore for now > > | printk_nmi_enter(); \ > > notrace > > | lockdep_off(); \ > > notrace > > | ftrace_nmi_enter(); \ > > !notrace !!! > > | BUG_ON(in_nmi()); \ > | preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);\ > > lets make this __preempt_count_add() ASAP ! preempt_count_add() first frobs the actual preempt_count and then does the trace, so that might just work. But it does need a notrace annotation, I'm thinking, because calling into the function tracer _before_ we do the preempt_count increment is irrecoverable crap. > | rcu_nmi_enter(); \ > > are you _really_ sure you want to go trace that ?!? > > | trace_hardirq_enter(); \ > | } while (0) > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 15:29 ` Peter Zijlstra @ 2020-02-11 16:16 ` Steven Rostedt 0 siblings, 0 replies; 23+ messages in thread From: Steven Rostedt @ 2020-02-11 16:16 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Paul E. McKenney, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Tue, 11 Feb 2020 16:29:45 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > > | ftrace_nmi_enter(); \ > > > > !notrace !!! Note, all inline is "notrace" by default, and ftrace_nmi_enter() is inline. in include/linux/compiler_types.h: #if !defined(CONFIG_OPTIMIZE_INLINING) #define inline inline __attribute__((__always_inline__)) __gnu_inline \ __inline_maybe_unused notrace #else #define inline inline __gnu_inline \ __inline_maybe_unused notrace #endif -- Steve ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 14:05 ` Steven Rostedt 2020-02-11 15:05 ` Peter Zijlstra @ 2020-02-11 15:06 ` Paul E. McKenney 2020-02-11 15:31 ` Peter Zijlstra 1 sibling, 1 reply; 23+ messages in thread From: Paul E. McKenney @ 2020-02-11 15:06 UTC (permalink / raw) To: Steven Rostedt Cc: Peter Zijlstra, LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Tue, Feb 11, 2020 at 09:05:03AM -0500, Steven Rostedt wrote: > On Tue, 11 Feb 2020 12:49:54 +0100 > Peter Zijlstra <peterz@infradead.org> wrote: > > > On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote: > > > + if (!rcu_watching) { \ > > > + /* Can not use RCU if rcu is not watching and in NMI */ \ > > > + if (in_nmi()) \ > > > + return; \ > > > + rcu_irq_enter_irqson(); \ > > > + } \ > > > > I saw the same weirdness in __trace_stack(), and I'm confused by it. > > > > How can we ever get to: in_nmi() && !rcu_watching() ? That should be a > > BUG. In particular, nmi_enter() has rcu_nmi_enter(). > > > > Paul, can that really happen? > > The stack tracer connects to the function tracer and is called at all > the places that function tracing can be called from. As I like being > able to trace RCU internal functions (especially as they are complex), > I don't want to set them all to notrace. But, for callbacks that > require RCU to be watching, we need this check, because there's some > internal state that we can be in an NMI and RCU is not watching (as > there's some places in nmi_enter that can be traced!). > > And if we are tracing preempt_enable and preempt_disable (as Joel added > trace events there), it may be the case for trace events too. Ah, thank you for the reminder! Should Documentation/RCU/Design/Requirements/Requirements.rst be updated to include this? And I have to ask... What happens if we are very early in from-idle NMI entry (or very late in NMI exit), such that both in_nmi() and rcu_is_watching() are returning false? Or did I miss a turn somewhere? Thanx, Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 15:06 ` Paul E. McKenney @ 2020-02-11 15:31 ` Peter Zijlstra 2020-02-11 15:40 ` Paul E. McKenney 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2020-02-11 15:31 UTC (permalink / raw) To: Paul E. McKenney Cc: Steven Rostedt, LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Tue, Feb 11, 2020 at 07:06:15AM -0800, Paul E. McKenney wrote: > And I have to ask... What happens if we are very early in from-idle > NMI entry (or very late in NMI exit), such that both in_nmi() and > rcu_is_watching() are returning false? Or did I miss a turn somewhere? We must, by very careful inspection, ensure that doesn't happen. No tracing must happen before preempt_count increment / after preempt_count decrement. Otherwise we can no longer recover. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 15:31 ` Peter Zijlstra @ 2020-02-11 15:40 ` Paul E. McKenney 0 siblings, 0 replies; 23+ messages in thread From: Paul E. McKenney @ 2020-02-11 15:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Tue, Feb 11, 2020 at 04:31:24PM +0100, Peter Zijlstra wrote: > On Tue, Feb 11, 2020 at 07:06:15AM -0800, Paul E. McKenney wrote: > > And I have to ask... What happens if we are very early in from-idle > > NMI entry (or very late in NMI exit), such that both in_nmi() and > > rcu_is_watching() are returning false? Or did I miss a turn somewhere? > > We must, by very careful inspection, ensure that doesn't happen. > > No tracing must happen before preempt_count increment / after > preempt_count decrement. Otherwise we can no longer recover. I was afraid of that, but agreed. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-10 22:06 [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook Steven Rostedt 2020-02-11 0:30 ` Mathieu Desnoyers 2020-02-11 11:49 ` Peter Zijlstra @ 2020-02-11 12:21 ` Peter Zijlstra 2020-02-11 14:10 ` Steven Rostedt 2 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2020-02-11 12:21 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Paul E. McKenney, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote: > diff --git a/include/trace/perf.h b/include/trace/perf.h > index dbc6c74defc3..1c94ce0cd4e2 100644 > --- a/include/trace/perf.h > +++ b/include/trace/perf.h > @@ -39,17 +39,27 @@ perf_trace_##call(void *__data, proto) \ > u64 __count = 1; \ > struct task_struct *__task = NULL; \ > struct hlist_head *head; \ > + bool rcu_watching; \ > int __entry_size; \ > int __data_size; \ > int rctx; \ > \ > + rcu_watching = rcu_is_watching(); \ > + \ > __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \ > \ > + if (!rcu_watching) { \ > + /* Can not use RCU if rcu is not watching and in NMI */ \ > + if (in_nmi()) \ > + return; \ > + rcu_irq_enter_irqson(); \ > + } \ > + \ > head = this_cpu_ptr(event_call->perf_events); \ > if (!bpf_prog_array_valid(event_call) && \ > __builtin_constant_p(!__task) && !__task && \ > hlist_empty(head)) \ > - return; \ > + goto out; \ > \ > __entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\ > sizeof(u64)); \ > @@ -57,7 +67,7 @@ perf_trace_##call(void *__data, proto) \ > \ > entry = perf_trace_buf_alloc(__entry_size, &__regs, &rctx); \ > if (!entry) \ > - return; \ > + goto out; \ > \ > perf_fetch_caller_regs(__regs); \ > \ > @@ -68,6 +78,9 @@ perf_trace_##call(void *__data, proto) \ > perf_trace_run_bpf_submit(entry, __entry_size, rctx, \ > event_call, __count, __regs, \ > head, __task); \ > +out: \ > + if (!rcu_watching) \ > + rcu_irq_exit_irqson(); \ > } It is probably okay to move that into perf_tp_event(), then this: > /* > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 1694a6b57ad8..3e6f07b62515 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -719,6 +719,7 @@ void rcu_irq_exit_irqson(void) > rcu_irq_exit(); > local_irq_restore(flags); > } > +EXPORT_SYMBOL_GPL(rcu_irq_exit_irqson); > > /* > * Exit an RCU extended quiescent state, which can be either the > @@ -890,6 +891,7 @@ void rcu_irq_enter_irqson(void) > rcu_irq_enter(); > local_irq_restore(flags); > } > +EXPORT_SYMBOL_GPL(rcu_irq_enter_irqson); > > /* > * If any sort of urgency was applied to the current CPU (for example, can go too. Those things really should not be exported. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook 2020-02-11 12:21 ` Peter Zijlstra @ 2020-02-11 14:10 ` Steven Rostedt 0 siblings, 0 replies; 23+ messages in thread From: Steven Rostedt @ 2020-02-11 14:10 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Ingo Molnar, Joel Fernandes (Google), Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner, Paul E. McKenney, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan On Tue, 11 Feb 2020 13:21:20 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > > @@ -68,6 +78,9 @@ perf_trace_##call(void *__data, proto) \ > > perf_trace_run_bpf_submit(entry, __entry_size, rctx, \ > > event_call, __count, __regs, \ > > head, __task); \ > > +out: \ > > + if (!rcu_watching) \ > > + rcu_irq_exit_irqson(); \ > > } > > It is probably okay to move that into perf_tp_event(), then this: > > > /* > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 1694a6b57ad8..3e6f07b62515 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -719,6 +719,7 @@ void rcu_irq_exit_irqson(void) > > rcu_irq_exit(); > > local_irq_restore(flags); > > } > > +EXPORT_SYMBOL_GPL(rcu_irq_exit_irqson); > > > > /* > > * Exit an RCU extended quiescent state, which can be either the > > @@ -890,6 +891,7 @@ void rcu_irq_enter_irqson(void) > > rcu_irq_enter(); > > local_irq_restore(flags); > > } > > +EXPORT_SYMBOL_GPL(rcu_irq_enter_irqson); > > > > /* > > * If any sort of urgency was applied to the current CPU (for example, > > can go too. Those things really should not be exported. Thanks, I'll send an updated patch. -- Steve ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2020-02-11 16:16 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-02-10 22:06 [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook Steven Rostedt 2020-02-11 0:30 ` Mathieu Desnoyers 2020-02-11 2:22 ` Steven Rostedt 2020-02-11 2:32 ` joel 2020-02-11 15:19 ` Mathieu Desnoyers 2020-02-11 12:00 ` Peter Zijlstra 2020-02-11 13:03 ` Paul E. McKenney 2020-02-11 13:16 ` Peter Zijlstra 2020-02-11 13:23 ` Paul E. McKenney 2020-02-11 14:10 ` Steven Rostedt 2020-02-11 11:49 ` Peter Zijlstra 2020-02-11 12:59 ` Paul E. McKenney 2020-02-11 13:10 ` Peter Zijlstra 2020-02-11 13:20 ` Paul E. McKenney 2020-02-11 14:05 ` Steven Rostedt 2020-02-11 15:05 ` Peter Zijlstra 2020-02-11 15:29 ` Peter Zijlstra 2020-02-11 16:16 ` Steven Rostedt 2020-02-11 15:06 ` Paul E. McKenney 2020-02-11 15:31 ` Peter Zijlstra 2020-02-11 15:40 ` Paul E. McKenney 2020-02-11 12:21 ` Peter Zijlstra 2020-02-11 14:10 ` Steven Rostedt
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.