linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
@ 2020-02-10 22:06 Steven Rostedt
  2020-02-11  0:30 ` Mathieu Desnoyers
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Steven Rostedt @ 2020-02-10 22:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan


From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Commit e6753f23d961d ("tracepoint: Make rcuidle tracepoint callers use
SRCU") removed the calls to rcu_irq_enter/exit_irqson() and replaced it with
srcu callbacks as that much faster for the rcuidle cases. But this caused an
issue with perf, because perf only uses rcu to synchronize trace points.

The issue was that if perf traced one of the "rcuidle" paths, that path no
longer enabled RCU if it was not watching, and this caused lockdep to
complain when the perf code used rcu_read_lock() and RCU was not "watching".

Commit 865e63b04e9b2 ("tracing: Add back in rcu_irq_enter/exit_irqson() for
rcuidle tracepoints") added back the rcu_irq_enter/exit_irqson() code, but
this made the srcu changes no longer applicable.

As perf is the only callback that needs the heavier weight
"rcu_irq_enter/exit_irqson()" calls, move it to the perf specific code and
not bog down those that do not require it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/tracepoint.h |  8 ++------
 include/trace/perf.h       | 17 +++++++++++++++--
 kernel/rcu/tree.c          |  2 ++
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 1fb11daa5c53..a83fd076a312 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -179,10 +179,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 		 * For rcuidle callers, use srcu since sched-rcu	\
 		 * doesn't work from the idle path.			\
 		 */							\
-		if (rcuidle) {						\
+		if (rcuidle)						\
 			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
-			rcu_irq_enter_irqson();				\
-		}							\
 									\
 		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
 									\
@@ -194,10 +192,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 			} while ((++it_func_ptr)->func);		\
 		}							\
 									\
-		if (rcuidle) {						\
-			rcu_irq_exit_irqson();				\
+		if (rcuidle)						\
 			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
-		}							\
 									\
 		preempt_enable_notrace();				\
 	} while (0)
diff --git a/include/trace/perf.h b/include/trace/perf.h
index dbc6c74defc3..1c94ce0cd4e2 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -39,17 +39,27 @@ perf_trace_##call(void *__data, proto)					\
 	u64 __count = 1;						\
 	struct task_struct *__task = NULL;				\
 	struct hlist_head *head;					\
+	bool rcu_watching;						\
 	int __entry_size;						\
 	int __data_size;						\
 	int rctx;							\
 									\
+	rcu_watching = rcu_is_watching();				\
+									\
 	__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
 									\
+	if (!rcu_watching) {						\
+		/* Can not use RCU if rcu is not watching and in NMI */	\
+		if (in_nmi())						\
+			return;						\
+		rcu_irq_enter_irqson();					\
+	}								\
+									\
 	head = this_cpu_ptr(event_call->perf_events);			\
 	if (!bpf_prog_array_valid(event_call) &&			\
 	    __builtin_constant_p(!__task) && !__task &&			\
 	    hlist_empty(head))						\
-		return;							\
+		goto out;						\
 									\
 	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
 			     sizeof(u64));				\
@@ -57,7 +67,7 @@ perf_trace_##call(void *__data, proto)					\
 									\
 	entry = perf_trace_buf_alloc(__entry_size, &__regs, &rctx);	\
 	if (!entry)							\
-		return;							\
+		goto out;						\
 									\
 	perf_fetch_caller_regs(__regs);					\
 									\
@@ -68,6 +78,9 @@ perf_trace_##call(void *__data, proto)					\
 	perf_trace_run_bpf_submit(entry, __entry_size, rctx,		\
 				  event_call, __count, __regs,		\
 				  head, __task);			\
+out:									\
+	if (!rcu_watching)						\
+		rcu_irq_exit_irqson();					\
 }
 
 /*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1694a6b57ad8..3e6f07b62515 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -719,6 +719,7 @@ void rcu_irq_exit_irqson(void)
 	rcu_irq_exit();
 	local_irq_restore(flags);
 }
+EXPORT_SYMBOL_GPL(rcu_irq_exit_irqson);
 
 /*
  * Exit an RCU extended quiescent state, which can be either the
@@ -890,6 +891,7 @@ void rcu_irq_enter_irqson(void)
 	rcu_irq_enter();
 	local_irq_restore(flags);
 }
+EXPORT_SYMBOL_GPL(rcu_irq_enter_irqson);
 
 /*
  * If any sort of urgency was applied to the current CPU (for example,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-10 22:06 [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook Steven Rostedt
@ 2020-02-11  0:30 ` Mathieu Desnoyers
  2020-02-11  2:22   ` Steven Rostedt
  2020-02-11 12:00   ` Peter Zijlstra
  2020-02-11 11:49 ` Peter Zijlstra
  2020-02-11 12:21 ` Peter Zijlstra
  2 siblings, 2 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2020-02-11  0:30 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Joel Fernandes,
	Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	paulmck, Josh Triplett, Lai Jiangshan

----- On Feb 10, 2020, at 5:06 PM, rostedt rostedt@goodmis.org wrote:

> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Hi Steven,

I agree with the general direction taken by this patch, but I would
like to bring a clarification to the changelog, and I'm worried about
its handling of NMI handlers nesting over rcuidle context.

> 
> Commit e6753f23d961d ("tracepoint: Make rcuidle tracepoint callers use
> SRCU") removed the calls to rcu_irq_enter/exit_irqson() and replaced it with
> srcu callbacks as that much faster for the rcuidle cases. But this caused an
> issue with perf,

so far, so good.

> because perf only uses rcu to synchronize trace points.

That last part seems inaccurate. The tracepoint synchronization is two-fold:
one part is internal to tracepoint.c (see rcu_free_old_probes()), and the other
is only needed if the probes are within modules which can be unloaded (see
tracepoint_synchronize_unregister()). AFAIK, perf never implements probe callbacks
within modules, so the latter is not needed by perf.

The culprit of the problem here is that perf issues "rcu_read_lock()" and
"rcu_read_unlock()" within the probe callbacks it registers to the tracepoints,
including the rcuidle ones. Those require that RCU is "watching", which is
triggering the regression when we remove the calls to rcu_irq_enter/exit_irqson()
from the rcuidle tracepoint instrumentation sites.

> 
> The issue was that if perf traced one of the "rcuidle" paths, that path no
> longer enabled RCU if it was not watching, and this caused lockdep to
> complain when the perf code used rcu_read_lock() and RCU was not "watching".

Yes.

> 
> Commit 865e63b04e9b2 ("tracing: Add back in rcu_irq_enter/exit_irqson() for
> rcuidle tracepoints") added back the rcu_irq_enter/exit_irqson() code, but
> this made the srcu changes no longer applicable.
> 
> As perf is the only callback that needs the heavier weight
> "rcu_irq_enter/exit_irqson()" calls, move it to the perf specific code and
> not bog down those that do not require it.

Yes.

Which brings a question about handling of NMIs: in the proposed patch, if
a NMI nests over rcuidle context, AFAIU it will be in a state
!rcu_is_watching() && in_nmi(), which is handled by this patch with a simple
"return", meaning important NMIs doing hardware event sampling can be
completely lost.

Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI context,
is it at all valid to use rcu_read_lock/unlock() as perf does from NMI handlers,
considering that those can be nested on top of rcuidle context ?

Thanks,

Mathieu


> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
> include/linux/tracepoint.h |  8 ++------
> include/trace/perf.h       | 17 +++++++++++++++--
> kernel/rcu/tree.c          |  2 ++
> 3 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 1fb11daa5c53..a83fd076a312 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -179,10 +179,8 @@ static inline struct tracepoint
> *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> 		 * For rcuidle callers, use srcu since sched-rcu	\
> 		 * doesn't work from the idle path.			\
> 		 */							\
> -		if (rcuidle) {						\
> +		if (rcuidle)						\
> 			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
> -			rcu_irq_enter_irqson();				\
> -		}							\
> 									\
> 		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
> 									\
> @@ -194,10 +192,8 @@ static inline struct tracepoint
> *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> 			} while ((++it_func_ptr)->func);		\
> 		}							\
> 									\
> -		if (rcuidle) {						\
> -			rcu_irq_exit_irqson();				\
> +		if (rcuidle)						\
> 			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
> -		}							\
> 									\
> 		preempt_enable_notrace();				\
> 	} while (0)
> diff --git a/include/trace/perf.h b/include/trace/perf.h
> index dbc6c74defc3..1c94ce0cd4e2 100644
> --- a/include/trace/perf.h
> +++ b/include/trace/perf.h
> @@ -39,17 +39,27 @@ perf_trace_##call(void *__data, proto)					\
> 	u64 __count = 1;						\
> 	struct task_struct *__task = NULL;				\
> 	struct hlist_head *head;					\
> +	bool rcu_watching;						\
> 	int __entry_size;						\
> 	int __data_size;						\
> 	int rctx;							\
> 									\
> +	rcu_watching = rcu_is_watching();				\
> +									\
> 	__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
> 									\
> +	if (!rcu_watching) {						\
> +		/* Can not use RCU if rcu is not watching and in NMI */	\
> +		if (in_nmi())						\
> +			return;						\
> +		rcu_irq_enter_irqson();					\
> +	}								\
> +									\
> 	head = this_cpu_ptr(event_call->perf_events);			\
> 	if (!bpf_prog_array_valid(event_call) &&			\
> 	    __builtin_constant_p(!__task) && !__task &&			\
> 	    hlist_empty(head))						\
> -		return;							\
> +		goto out;						\
> 									\
> 	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
> 			     sizeof(u64));				\
> @@ -57,7 +67,7 @@ perf_trace_##call(void *__data, proto)					\
> 									\
> 	entry = perf_trace_buf_alloc(__entry_size, &__regs, &rctx);	\
> 	if (!entry)							\
> -		return;							\
> +		goto out;						\
> 									\
> 	perf_fetch_caller_regs(__regs);					\
> 									\
> @@ -68,6 +78,9 @@ perf_trace_##call(void *__data, proto)					\
> 	perf_trace_run_bpf_submit(entry, __entry_size, rctx,		\
> 				  event_call, __count, __regs,		\
> 				  head, __task);			\
> +out:									\
> +	if (!rcu_watching)						\
> +		rcu_irq_exit_irqson();					\
> }
> 
> /*
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1694a6b57ad8..3e6f07b62515 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -719,6 +719,7 @@ void rcu_irq_exit_irqson(void)
> 	rcu_irq_exit();
> 	local_irq_restore(flags);
> }
> +EXPORT_SYMBOL_GPL(rcu_irq_exit_irqson);
> 
> /*
>  * Exit an RCU extended quiescent state, which can be either the
> @@ -890,6 +891,7 @@ void rcu_irq_enter_irqson(void)
> 	rcu_irq_enter();
> 	local_irq_restore(flags);
> }
> +EXPORT_SYMBOL_GPL(rcu_irq_enter_irqson);
> 
> /*
>  * If any sort of urgency was applied to the current CPU (for example,
> --
> 2.20.1

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11  0:30 ` Mathieu Desnoyers
@ 2020-02-11  2:22   ` Steven Rostedt
  2020-02-11  2:32     ` joel
  2020-02-11 15:19     ` Mathieu Desnoyers
  2020-02-11 12:00   ` Peter Zijlstra
  1 sibling, 2 replies; 23+ messages in thread
From: Steven Rostedt @ 2020-02-11  2:22 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Joel Fernandes,
	Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	paulmck, Josh Triplett, Lai Jiangshan

On Mon, 10 Feb 2020 19:30:32 -0500 (EST)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Feb 10, 2020, at 5:06 PM, rostedt rostedt@goodmis.org wrote:
> 
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>  
> 
> Hi Steven,

Hi Mathieu!

> 
> > because perf only uses rcu to synchronize trace points.  
> 
> That last part seems inaccurate. The tracepoint synchronization is two-fold:
> one part is internal to tracepoint.c (see rcu_free_old_probes()), and the other
> is only needed if the probes are within modules which can be unloaded (see
> tracepoint_synchronize_unregister()). AFAIK, perf never implements probe callbacks
> within modules, so the latter is not needed by perf.
> 
> The culprit of the problem here is that perf issues "rcu_read_lock()" and
> "rcu_read_unlock()" within the probe callbacks it registers to the tracepoints,
> including the rcuidle ones. Those require that RCU is "watching", which is
> triggering the regression when we remove the calls to rcu_irq_enter/exit_irqson()
> from the rcuidle tracepoint instrumentation sites.

100% agree. I guess I need to clarify what I meant with "rcu to
synchronize trace points". I meant its trace point *callbacks*, not the
trace point code itself.

> 
> Which brings a question about handling of NMIs: in the proposed patch, if
> a NMI nests over rcuidle context, AFAIU it will be in a state
> !rcu_is_watching() && in_nmi(), which is handled by this patch with a simple
> "return", meaning important NMIs doing hardware event sampling can be
> completely lost.
> 
> Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI context,
> is it at all valid to use rcu_read_lock/unlock() as perf does from NMI handlers,
> considering that those can be nested on top of rcuidle context ?
> 

Note, in the __DO_TRACE macro, we've had this for a long time:

		/* srcu can't be used from NMI */			\
		WARN_ON_ONCE(rcuidle && in_nmi());			\

With nothing triggering.

-- Steve


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11  2:22   ` Steven Rostedt
@ 2020-02-11  2:32     ` joel
  2020-02-11 15:19     ` Mathieu Desnoyers
  1 sibling, 0 replies; 23+ messages in thread
From: joel @ 2020-02-11  2:32 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Greg Kroah-Hartman,
	Gustavo A. R. Silva, Thomas Gleixner, paulmck, Josh Triplett,
	Lai Jiangshan



On February 10, 2020 9:22:22 PM EST, Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> Which brings a question about handling of NMIs: in the proposed
>patch, if
>> a NMI nests over rcuidle context, AFAIU it will be in a state
>> !rcu_is_watching() && in_nmi(), which is handled by this patch with a
>simple
>> "return", meaning important NMIs doing hardware event sampling can be
>> completely lost.
>> 
>> Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI
>context,
>> is it at all valid to use rcu_read_lock/unlock() as perf does from
>NMI handlers,
>> considering that those can be nested on top of rcuidle context ?
>> 
>
>Note, in the __DO_TRACE macro, we've had this for a long time:
>
>		/* srcu can't be used from NMI */			\
>		WARN_ON_ONCE(rcuidle && in_nmi());			\
>
>With nothing triggering.

I did not understand Mathieu's question, afaik perf event sampling code in NMI handler does not invoke trace_..._rcuidle functions anywhere. That afair is independently dealt within perf and does not involve tracepoint code. And if NMI has interrupted any code currently running in __DO_TRACE, that's ok because NMI is higher priority and will run to completion before resuming the interrupted code. Did I miss something? I am not surprised the warning doesn't ever trigger.

Thanks,
Joel.


>
>-- Steve

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-10 22:06 [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook Steven Rostedt
  2020-02-11  0:30 ` Mathieu Desnoyers
@ 2020-02-11 11:49 ` Peter Zijlstra
  2020-02-11 12:59   ` Paul E. McKenney
  2020-02-11 14:05   ` Steven Rostedt
  2020-02-11 12:21 ` Peter Zijlstra
  2 siblings, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2020-02-11 11:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan

On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote:
> +	if (!rcu_watching) {						\
> +		/* Can not use RCU if rcu is not watching and in NMI */	\
> +		if (in_nmi())						\
> +			return;						\
> +		rcu_irq_enter_irqson();					\
> +	}								\

I saw the same weirdness in __trace_stack(), and I'm confused by it.

How can we ever get to: in_nmi() && !rcu_watching() ? That should be a
BUG.  In particular, nmi_enter() has rcu_nmi_enter().

Paul, can that really happen?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11  0:30 ` Mathieu Desnoyers
  2020-02-11  2:22   ` Steven Rostedt
@ 2020-02-11 12:00   ` Peter Zijlstra
  2020-02-11 13:03     ` Paul E. McKenney
  2020-02-11 14:10     ` Steven Rostedt
  1 sibling, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2020-02-11 12:00 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: rostedt, linux-kernel, Ingo Molnar, Joel Fernandes, Google,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	paulmck, Josh Triplett, Lai Jiangshan

On Mon, Feb 10, 2020 at 07:30:32PM -0500, Mathieu Desnoyers wrote:

> > because perf only uses rcu to synchronize trace points.
> 
> That last part seems inaccurate. The tracepoint synchronization is two-fold:
> one part is internal to tracepoint.c (see rcu_free_old_probes()), and the other
> is only needed if the probes are within modules which can be unloaded (see
> tracepoint_synchronize_unregister()). AFAIK, perf never implements probe callbacks
> within modules, so the latter is not needed by perf.
> 
> The culprit of the problem here is that perf issues "rcu_read_lock()" and
> "rcu_read_unlock()" within the probe callbacks it registers to the tracepoints,
> including the rcuidle ones. Those require that RCU is "watching", which is
> triggering the regression when we remove the calls to rcu_irq_enter/exit_irqson()
> from the rcuidle tracepoint instrumentation sites.

It is not the fact that perf issues rcu_read_lock() that is the problem.
As we established yesterday, I can probably remove most rcu_read_lock()
calls from perf today (yay RCU flavour unification).

The problem is that the core perf code uses RCU managed data; and we
need an existence guarantee for it. It would be BAD (TM) if the
ring-buffer we're writing data to were to suddenly dissapear under our
feet etc..

> Which brings a question about handling of NMIs: in the proposed patch, if
> a NMI nests over rcuidle context, AFAIU it will be in a state
> !rcu_is_watching() && in_nmi(), which is handled by this patch with a simple
> "return", meaning important NMIs doing hardware event sampling can be
> completely lost.
> 
> Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI context,
> is it at all valid to use rcu_read_lock/unlock() as perf does from NMI handlers,

Again, rcu_read_lock() itself really isn't the problem. But we need
NMIs, just like regular interrupts, to imply rcu_read_lock(). That is,
any observable (RCU managed) pointer must stay valid during the NMI/IRQ
execution.

> considering that those can be nested on top of rcuidle context ?

As per nmi_enter() calling rcu_nmi_enter() I've always assumed that NMIs
are fully covered by RCU.

If this isn't so, RCU it terminally broken :-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-10 22:06 [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook Steven Rostedt
  2020-02-11  0:30 ` Mathieu Desnoyers
  2020-02-11 11:49 ` Peter Zijlstra
@ 2020-02-11 12:21 ` Peter Zijlstra
  2020-02-11 14:10   ` Steven Rostedt
  2 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-02-11 12:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan

On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote:
> diff --git a/include/trace/perf.h b/include/trace/perf.h
> index dbc6c74defc3..1c94ce0cd4e2 100644
> --- a/include/trace/perf.h
> +++ b/include/trace/perf.h
> @@ -39,17 +39,27 @@ perf_trace_##call(void *__data, proto)					\
>  	u64 __count = 1;						\
>  	struct task_struct *__task = NULL;				\
>  	struct hlist_head *head;					\
> +	bool rcu_watching;						\
>  	int __entry_size;						\
>  	int __data_size;						\
>  	int rctx;							\
>  									\
> +	rcu_watching = rcu_is_watching();				\
> +									\
>  	__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
>  									\
> +	if (!rcu_watching) {						\
> +		/* Can not use RCU if rcu is not watching and in NMI */	\
> +		if (in_nmi())						\
> +			return;						\
> +		rcu_irq_enter_irqson();					\
> +	}								\
> +									\
>  	head = this_cpu_ptr(event_call->perf_events);			\
>  	if (!bpf_prog_array_valid(event_call) &&			\
>  	    __builtin_constant_p(!__task) && !__task &&			\
>  	    hlist_empty(head))						\
> -		return;							\
> +		goto out;						\
>  									\
>  	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
>  			     sizeof(u64));				\
> @@ -57,7 +67,7 @@ perf_trace_##call(void *__data, proto)					\
>  									\
>  	entry = perf_trace_buf_alloc(__entry_size, &__regs, &rctx);	\
>  	if (!entry)							\
> -		return;							\
> +		goto out;						\
>  									\
>  	perf_fetch_caller_regs(__regs);					\
>  									\
> @@ -68,6 +78,9 @@ perf_trace_##call(void *__data, proto)					\
>  	perf_trace_run_bpf_submit(entry, __entry_size, rctx,		\
>  				  event_call, __count, __regs,		\
>  				  head, __task);			\
> +out:									\
> +	if (!rcu_watching)						\
> +		rcu_irq_exit_irqson();					\
>  }

It is probably okay to move that into perf_tp_event(), then this:

>  /*
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1694a6b57ad8..3e6f07b62515 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -719,6 +719,7 @@ void rcu_irq_exit_irqson(void)
>  	rcu_irq_exit();
>  	local_irq_restore(flags);
>  }
> +EXPORT_SYMBOL_GPL(rcu_irq_exit_irqson);
>  
>  /*
>   * Exit an RCU extended quiescent state, which can be either the
> @@ -890,6 +891,7 @@ void rcu_irq_enter_irqson(void)
>  	rcu_irq_enter();
>  	local_irq_restore(flags);
>  }
> +EXPORT_SYMBOL_GPL(rcu_irq_enter_irqson);
>  
>  /*
>   * If any sort of urgency was applied to the current CPU (for example,

can go too. Those things really should not be exported.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 11:49 ` Peter Zijlstra
@ 2020-02-11 12:59   ` Paul E. McKenney
  2020-02-11 13:10     ` Peter Zijlstra
  2020-02-11 14:05   ` Steven Rostedt
  1 sibling, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2020-02-11 12:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

On Tue, Feb 11, 2020 at 12:49:54PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote:
> > +	if (!rcu_watching) {						\
> > +		/* Can not use RCU if rcu is not watching and in NMI */	\
> > +		if (in_nmi())						\
> > +			return;						\
> > +		rcu_irq_enter_irqson();					\
> > +	}								\
> 
> I saw the same weirdness in __trace_stack(), and I'm confused by it.
> 
> How can we ever get to: in_nmi() && !rcu_watching() ? That should be a
> BUG.  In particular, nmi_enter() has rcu_nmi_enter().
> 
> Paul, can that really happen?

Not sure what the current situation is, but if I remember correctly it
used to be possible to get to an NMI handler without RCU being informed.
If NMI handlers now unconditionally inform RCU, then like you, I don't
see that the "if (in_nmi()) return" is needed.

However, a quick grep for NMI_MASK didn't show me the NMI_MASK bit
getting set.  Help?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 12:00   ` Peter Zijlstra
@ 2020-02-11 13:03     ` Paul E. McKenney
  2020-02-11 13:16       ` Peter Zijlstra
  2020-02-11 14:10     ` Steven Rostedt
  1 sibling, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2020-02-11 13:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, rostedt, linux-kernel, Ingo Molnar,
	Joel Fernandes, Google, Greg Kroah-Hartman, Gustavo A. R. Silva,
	Thomas Gleixner, Josh Triplett, Lai Jiangshan

On Tue, Feb 11, 2020 at 01:00:15PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 10, 2020 at 07:30:32PM -0500, Mathieu Desnoyers wrote:
> 
> > > because perf only uses rcu to synchronize trace points.
> > 
> > That last part seems inaccurate. The tracepoint synchronization is two-fold:
> > one part is internal to tracepoint.c (see rcu_free_old_probes()), and the other
> > is only needed if the probes are within modules which can be unloaded (see
> > tracepoint_synchronize_unregister()). AFAIK, perf never implements probe callbacks
> > within modules, so the latter is not needed by perf.
> > 
> > The culprit of the problem here is that perf issues "rcu_read_lock()" and
> > "rcu_read_unlock()" within the probe callbacks it registers to the tracepoints,
> > including the rcuidle ones. Those require that RCU is "watching", which is
> > triggering the regression when we remove the calls to rcu_irq_enter/exit_irqson()
> > from the rcuidle tracepoint instrumentation sites.
> 
> It is not the fact that perf issues rcu_read_lock() that is the problem.
> As we established yesterday, I can probably remove most rcu_read_lock()
> calls from perf today (yay RCU flavour unification).

Glad some aspect of this unification is actually helping you.  ;-)

> The problem is that the core perf code uses RCU managed data; and we
> need an existence guarantee for it. It would be BAD (TM) if the
> ring-buffer we're writing data to were to suddenly dissapear under our
> feet etc..
> 
> > Which brings a question about handling of NMIs: in the proposed patch, if
> > a NMI nests over rcuidle context, AFAIU it will be in a state
> > !rcu_is_watching() && in_nmi(), which is handled by this patch with a simple
> > "return", meaning important NMIs doing hardware event sampling can be
> > completely lost.
> > 
> > Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI context,
> > is it at all valid to use rcu_read_lock/unlock() as perf does from NMI handlers,
> 
> Again, rcu_read_lock() itself really isn't the problem. But we need
> NMIs, just like regular interrupts, to imply rcu_read_lock(). That is,
> any observable (RCU managed) pointer must stay valid during the NMI/IRQ
> execution.
> 
> > considering that those can be nested on top of rcuidle context ?
> 
> As per nmi_enter() calling rcu_nmi_enter() I've always assumed that NMIs
> are fully covered by RCU.
> 
> If this isn't so, RCU it terminally broken :-)

All RCU can do is respond to calls to rcu_nmi_enter() and rcu_nmi_exit().
It has not yet figured out how to force people to add these calls where
they are needed.  ;-)

But yes, it would be very nice if architectures arranged things so
that all NMI handlers were visible to RCU.  And we no longer have
half-interrupts, so maybe there is hope...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 12:59   ` Paul E. McKenney
@ 2020-02-11 13:10     ` Peter Zijlstra
  2020-02-11 13:20       ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-02-11 13:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

On Tue, Feb 11, 2020 at 04:59:29AM -0800, Paul E. McKenney wrote:

> However, a quick grep for NMI_MASK didn't show me the NMI_MASK bit
> getting set.  Help?

| #define nmi_enter()						\
| 	do {							\
| 		arch_nmi_enter();				\
| 		printk_nmi_enter();				\
| 		lockdep_off();					\
| 		ftrace_nmi_enter();				\
| 		BUG_ON(in_nmi());				\
| 		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\

		^^^^ right there

| 		rcu_nmi_enter();				\
| 		trace_hardirq_enter();				\
| 	} while (0)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 13:03     ` Paul E. McKenney
@ 2020-02-11 13:16       ` Peter Zijlstra
  2020-02-11 13:23         ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-02-11 13:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Mathieu Desnoyers, rostedt, linux-kernel, Ingo Molnar,
	Joel Fernandes, Google, Greg Kroah-Hartman, Gustavo A. R. Silva,
	Thomas Gleixner, Josh Triplett, Lai Jiangshan

On Tue, Feb 11, 2020 at 05:03:01AM -0800, Paul E. McKenney wrote:

> > It is not the fact that perf issues rcu_read_lock() that is the problem.
> > As we established yesterday, I can probably remove most rcu_read_lock()
> > calls from perf today (yay RCU flavour unification).
> 
> Glad some aspect of this unification is actually helping you.  ;-)

rcu_read_lock() is exceedingly cheap though, so I never really worried
about it. But now that RCU includes RCU-sched (again) we can go and
remove a bunch of them.

> > As per nmi_enter() calling rcu_nmi_enter() I've always assumed that NMIs
> > are fully covered by RCU.
> > 
> > If this isn't so, RCU it terminally broken :-)
> 
> All RCU can do is respond to calls to rcu_nmi_enter() and rcu_nmi_exit().
> It has not yet figured out how to force people to add these calls where
> they are needed.  ;-)
> 
> But yes, it would be very nice if architectures arranged things so
> that all NMI handlers were visible to RCU.  And we no longer have
> half-interrupts, so maybe there is hope...

Well,.. you could go back to simply _always_ watching :-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 13:10     ` Peter Zijlstra
@ 2020-02-11 13:20       ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2020-02-11 13:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

On Tue, Feb 11, 2020 at 02:10:46PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 11, 2020 at 04:59:29AM -0800, Paul E. McKenney wrote:
> 
> > However, a quick grep for NMI_MASK didn't show me the NMI_MASK bit
> > getting set.  Help?
> 
> | #define nmi_enter()						\
> | 	do {							\
> | 		arch_nmi_enter();				\
> | 		printk_nmi_enter();				\
> | 		lockdep_off();					\
> | 		ftrace_nmi_enter();				\
> | 		BUG_ON(in_nmi());				\
> | 		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
> 
> 		^^^^ right there
> 
> | 		rcu_nmi_enter();				\
> | 		trace_hardirq_enter();				\
> | 	} while (0)

Color me blind, and thank you!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 13:16       ` Peter Zijlstra
@ 2020-02-11 13:23         ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2020-02-11 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, rostedt, linux-kernel, Ingo Molnar,
	Joel Fernandes, Google, Greg Kroah-Hartman, Gustavo A. R. Silva,
	Thomas Gleixner, Josh Triplett, Lai Jiangshan

On Tue, Feb 11, 2020 at 02:16:37PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 11, 2020 at 05:03:01AM -0800, Paul E. McKenney wrote:
> 
> > > It is not the fact that perf issues rcu_read_lock() that is the problem.
> > > As we established yesterday, I can probably remove most rcu_read_lock()
> > > calls from perf today (yay RCU flavour unification).
> > 
> > Glad some aspect of this unification is actually helping you.  ;-)
> 
> rcu_read_lock() is exceedingly cheap though, so I never really worried
> about it. But now that RCU includes RCU-sched (again) we can go and
> remove a bunch of them.
> 
> > > As per nmi_enter() calling rcu_nmi_enter() I've always assumed that NMIs
> > > are fully covered by RCU.
> > > 
> > > If this isn't so, RCU it terminally broken :-)
> > 
> > All RCU can do is respond to calls to rcu_nmi_enter() and rcu_nmi_exit().
> > It has not yet figured out how to force people to add these calls where
> > they are needed.  ;-)
> > 
> > But yes, it would be very nice if architectures arranged things so
> > that all NMI handlers were visible to RCU.  And we no longer have
> > half-interrupts, so maybe there is hope...
> 
> Well,.. you could go back to simply _always_ watching :-)

The idle loop always was unwatched, even back in DYNIX/ptx.  And watching
the idle loop requires waking up idle CPUs, which makes lots of people
quite unhappy.  But it could be done with something sort of like
synchronize_rcu_tasks(), as long as this didn't need to be used in
production on battery-powered systems.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 11:49 ` Peter Zijlstra
  2020-02-11 12:59   ` Paul E. McKenney
@ 2020-02-11 14:05   ` Steven Rostedt
  2020-02-11 15:05     ` Peter Zijlstra
  2020-02-11 15:06     ` Paul E. McKenney
  1 sibling, 2 replies; 23+ messages in thread
From: Steven Rostedt @ 2020-02-11 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan

On Tue, 11 Feb 2020 12:49:54 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote:
> > +	if (!rcu_watching) {						\
> > +		/* Can not use RCU if rcu is not watching and in NMI */	\
> > +		if (in_nmi())						\
> > +			return;						\
> > +		rcu_irq_enter_irqson();					\
> > +	}								\  
> 
> I saw the same weirdness in __trace_stack(), and I'm confused by it.
> 
> How can we ever get to: in_nmi() && !rcu_watching() ? That should be a
> BUG.  In particular, nmi_enter() has rcu_nmi_enter().
> 
> Paul, can that really happen?

The stack tracer connects to the function tracer and is called at all
the places that function tracing can be called from. As I like being
able to trace RCU internal functions (especially as they are complex),
I don't want to set them all to notrace. But, for callbacks that
require RCU to be watching, we need this check, because there's some
internal state that we can be in an NMI and RCU is not watching (as
there's some places in nmi_enter that can be traced!).

And if we are tracing preempt_enable and preempt_disable (as Joel added
trace events there), it may be the case for trace events too.

-- Steve


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 12:00   ` Peter Zijlstra
  2020-02-11 13:03     ` Paul E. McKenney
@ 2020-02-11 14:10     ` Steven Rostedt
  1 sibling, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2020-02-11 14:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Joel Fernandes,
	Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	paulmck, Josh Triplett, Lai Jiangshan

On Tue, 11 Feb 2020 13:00:15 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> As per nmi_enter() calling rcu_nmi_enter() I've always assumed that NMIs
> are fully covered by RCU.
> 
> If this isn't so, RCU it terminally broken :-)

Most of the time it is. But for tracing that injects callbacks at
arbitrary points of code, it can break. I've always said that tracing
is a more sensitive context than NMI itself. The reason NMIs are
sensitive is because they can happen pretty much anywhere. But tracing
can happen also in the context switch that enters NMI.

This is why function tracing does the "rude" RCU flavor (yes Paul, I'd
love you to add that flavor!), that performs a schedule_on_each_cpu()
before freeing anything. Because it traces when RCU is not watching.

But RCU really shouldn't have to bend over backward for tracing, as
tracing is the exception and not the norm.

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 12:21 ` Peter Zijlstra
@ 2020-02-11 14:10   ` Steven Rostedt
  0 siblings, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2020-02-11 14:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan

On Tue, 11 Feb 2020 13:21:20 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> > @@ -68,6 +78,9 @@ perf_trace_##call(void *__data, proto)					\
> >  	perf_trace_run_bpf_submit(entry, __entry_size, rctx,		\
> >  				  event_call, __count, __regs,		\
> >  				  head, __task);			\
> > +out:									\
> > +	if (!rcu_watching)						\
> > +		rcu_irq_exit_irqson();					\
> >  }  
> 
> It is probably okay to move that into perf_tp_event(), then this:
> 
> >  /*
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 1694a6b57ad8..3e6f07b62515 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -719,6 +719,7 @@ void rcu_irq_exit_irqson(void)
> >  	rcu_irq_exit();
> >  	local_irq_restore(flags);
> >  }
> > +EXPORT_SYMBOL_GPL(rcu_irq_exit_irqson);
> >  
> >  /*
> >   * Exit an RCU extended quiescent state, which can be either the
> > @@ -890,6 +891,7 @@ void rcu_irq_enter_irqson(void)
> >  	rcu_irq_enter();
> >  	local_irq_restore(flags);
> >  }
> > +EXPORT_SYMBOL_GPL(rcu_irq_enter_irqson);
> >  
> >  /*
> >   * If any sort of urgency was applied to the current CPU (for example,  
> 
> can go too. Those things really should not be exported.

Thanks, I'll send an updated patch.

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 14:05   ` Steven Rostedt
@ 2020-02-11 15:05     ` Peter Zijlstra
  2020-02-11 15:29       ` Peter Zijlstra
  2020-02-11 15:06     ` Paul E. McKenney
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-02-11 15:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan

On Tue, Feb 11, 2020 at 09:05:03AM -0500, Steven Rostedt wrote:
> On Tue, 11 Feb 2020 12:49:54 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote:
> > > +	if (!rcu_watching) {						\
> > > +		/* Can not use RCU if rcu is not watching and in NMI */	\
> > > +		if (in_nmi())						\
> > > +			return;						\
> > > +		rcu_irq_enter_irqson();					\
> > > +	}								\  
> > 
> > I saw the same weirdness in __trace_stack(), and I'm confused by it.
> > 
> > How can we ever get to: in_nmi() && !rcu_watching() ? That should be a
> > BUG.  In particular, nmi_enter() has rcu_nmi_enter().
> > 
> > Paul, can that really happen?
> 
> The stack tracer connects to the function tracer and is called at all
> the places that function tracing can be called from. As I like being
> able to trace RCU internal functions (especially as they are complex),
> I don't want to set them all to notrace. But, for callbacks that
> require RCU to be watching, we need this check, because there's some
> internal state that we can be in an NMI and RCU is not watching (as
> there's some places in nmi_enter that can be traced!).
> 
> And if we are tracing preempt_enable and preempt_disable (as Joel added
> trace events there), it may be the case for trace events too.

Bloody hell; what a trainwreck. Luckily there's comments around that
explain this!

So we haz:

| #define nmi_enter()						\
| 	do {							\
| 		arch_nmi_enter();				\

arm64 only, lets ignore for now

| 		printk_nmi_enter();				\

notrace

| 		lockdep_off();					\

notrace

| 		ftrace_nmi_enter();				\

!notrace !!!

| 		BUG_ON(in_nmi());				\
| 		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);\

lets make this __preempt_count_add() ASAP !

| 		rcu_nmi_enter();				\

are you _really_ sure you want to go trace that ?!?

| 		trace_hardirq_enter();				\
| 	} while (0)



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 14:05   ` Steven Rostedt
  2020-02-11 15:05     ` Peter Zijlstra
@ 2020-02-11 15:06     ` Paul E. McKenney
  2020-02-11 15:31       ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2020-02-11 15:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

On Tue, Feb 11, 2020 at 09:05:03AM -0500, Steven Rostedt wrote:
> On Tue, 11 Feb 2020 12:49:54 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, Feb 10, 2020 at 05:06:43PM -0500, Steven Rostedt wrote:
> > > +	if (!rcu_watching) {						\
> > > +		/* Can not use RCU if rcu is not watching and in NMI */	\
> > > +		if (in_nmi())						\
> > > +			return;						\
> > > +		rcu_irq_enter_irqson();					\
> > > +	}								\  
> > 
> > I saw the same weirdness in __trace_stack(), and I'm confused by it.
> > 
> > How can we ever get to: in_nmi() && !rcu_watching() ? That should be a
> > BUG.  In particular, nmi_enter() has rcu_nmi_enter().
> > 
> > Paul, can that really happen?
> 
> The stack tracer connects to the function tracer and is called at all
> the places that function tracing can be called from. As I like being
> able to trace RCU internal functions (especially as they are complex),
> I don't want to set them all to notrace. But, for callbacks that
> require RCU to be watching, we need this check, because there's some
> internal state that we can be in an NMI and RCU is not watching (as
> there's some places in nmi_enter that can be traced!).
> 
> And if we are tracing preempt_enable and preempt_disable (as Joel added
> trace events there), it may be the case for trace events too.

Ah, thank you for the reminder!

Should Documentation/RCU/Design/Requirements/Requirements.rst be
updated to include this?

And I have to ask...  What happens if we are very early in from-idle
NMI entry (or very late in NMI exit), such that both in_nmi() and
rcu_is_watching() are returning false?  Or did I miss a turn somewhere?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11  2:22   ` Steven Rostedt
  2020-02-11  2:32     ` joel
@ 2020-02-11 15:19     ` Mathieu Desnoyers
  1 sibling, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2020-02-11 15:19 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Joel Fernandes,
	Google, Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	paulmck, Josh Triplett, Lai Jiangshan

----- On Feb 10, 2020, at 9:22 PM, rostedt rostedt@goodmis.org wrote:

> On Mon, 10 Feb 2020 19:30:32 -0500 (EST)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> ----- On Feb 10, 2020, at 5:06 PM, rostedt rostedt@goodmis.org wrote:
>> 
>> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
>> 
>> Hi Steven,
> 
> Hi Mathieu!
[...]
>> 
>> Which brings a question about handling of NMIs: in the proposed patch, if
>> a NMI nests over rcuidle context, AFAIU it will be in a state
>> !rcu_is_watching() && in_nmi(), which is handled by this patch with a simple
>> "return", meaning important NMIs doing hardware event sampling can be
>> completely lost.
>> 
>> Considering that we cannot use rcu_irq_enter/exit_irqson() from NMI context,
>> is it at all valid to use rcu_read_lock/unlock() as perf does from NMI handlers,
>> considering that those can be nested on top of rcuidle context ?
>> 
> 
> Note, in the __DO_TRACE macro, we've had this for a long time:
> 
>		/* srcu can't be used from NMI */			\
>		WARN_ON_ONCE(rcuidle && in_nmi());			\
> 
> With nothing triggering.

The "rcuidle" argument is only true for tracepoints which are declared to be used
within the rcuidle code. AFAIK, it does not cover tracepoints which can be placed
in NMI handlers. The state I am concerned about is really:

WARN_ON_ONCE(!rcu_is_watching() && in_nmi())

As pointed out by Peter further down in this thread.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 15:05     ` Peter Zijlstra
@ 2020-02-11 15:29       ` Peter Zijlstra
  2020-02-11 16:16         ` Steven Rostedt
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-02-11 15:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan

On Tue, Feb 11, 2020 at 04:05:32PM +0100, Peter Zijlstra wrote:

> So we haz:
> 
> | #define nmi_enter()						\
> | 	do {							\
> | 		arch_nmi_enter();				\
> 
> arm64 only, lets ignore for now
> 
> | 		printk_nmi_enter();				\
> 
> notrace
> 
> | 		lockdep_off();					\
> 
> notrace
> 
> | 		ftrace_nmi_enter();				\
> 
> !notrace !!!
> 
> | 		BUG_ON(in_nmi());				\
> | 		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);\
> 
> lets make this __preempt_count_add() ASAP !

preempt_count_add() first frobs the actual preempt_count and then does
the trace, so that might just work. But it does need a notrace
annotation, I'm thinking, because calling into the function tracer
_before_ we do the preempt_count increment is irrecoverable crap.

> | 		rcu_nmi_enter();				\
> 
> are you _really_ sure you want to go trace that ?!?
> 
> | 		trace_hardirq_enter();				\
> | 	} while (0)
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 15:06     ` Paul E. McKenney
@ 2020-02-11 15:31       ` Peter Zijlstra
  2020-02-11 15:40         ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-02-11 15:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

On Tue, Feb 11, 2020 at 07:06:15AM -0800, Paul E. McKenney wrote:
> And I have to ask...  What happens if we are very early in from-idle
> NMI entry (or very late in NMI exit), such that both in_nmi() and
> rcu_is_watching() are returning false?  Or did I miss a turn somewhere?

We must, by very careful inspection, ensure that doesn't happen.

No tracing must happen before preempt_count increment / after
preempt_count decrement. Otherwise we can no longer recover.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 15:31       ` Peter Zijlstra
@ 2020-02-11 15:40         ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2020-02-11 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

On Tue, Feb 11, 2020 at 04:31:24PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 11, 2020 at 07:06:15AM -0800, Paul E. McKenney wrote:
> > And I have to ask...  What happens if we are very early in from-idle
> > NMI entry (or very late in NMI exit), such that both in_nmi() and
> > rcu_is_watching() are returning false?  Or did I miss a turn somewhere?
> 
> We must, by very careful inspection, ensure that doesn't happen.
> 
> No tracing must happen before preempt_count increment / after
> preempt_count decrement. Otherwise we can no longer recover.

I was afraid of that, but agreed.  ;-)

						Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
  2020-02-11 15:29       ` Peter Zijlstra
@ 2020-02-11 16:16         ` Steven Rostedt
  0 siblings, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2020-02-11 16:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan

On Tue, 11 Feb 2020 16:29:45 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> > | 		ftrace_nmi_enter();				\
> > 
> > !notrace !!!

Note, all inline is "notrace" by default, and ftrace_nmi_enter() is
inline.

in include/linux/compiler_types.h:

#if !defined(CONFIG_OPTIMIZE_INLINING)
#define inline inline __attribute__((__always_inline__)) __gnu_inline \
	__inline_maybe_unused notrace
#else
#define inline inline                                    __gnu_inline \
	__inline_maybe_unused notrace
#endif

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-02-11 16:16 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10 22:06 [PATCH] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook Steven Rostedt
2020-02-11  0:30 ` Mathieu Desnoyers
2020-02-11  2:22   ` Steven Rostedt
2020-02-11  2:32     ` joel
2020-02-11 15:19     ` Mathieu Desnoyers
2020-02-11 12:00   ` Peter Zijlstra
2020-02-11 13:03     ` Paul E. McKenney
2020-02-11 13:16       ` Peter Zijlstra
2020-02-11 13:23         ` Paul E. McKenney
2020-02-11 14:10     ` Steven Rostedt
2020-02-11 11:49 ` Peter Zijlstra
2020-02-11 12:59   ` Paul E. McKenney
2020-02-11 13:10     ` Peter Zijlstra
2020-02-11 13:20       ` Paul E. McKenney
2020-02-11 14:05   ` Steven Rostedt
2020-02-11 15:05     ` Peter Zijlstra
2020-02-11 15:29       ` Peter Zijlstra
2020-02-11 16:16         ` Steven Rostedt
2020-02-11 15:06     ` Paul E. McKenney
2020-02-11 15:31       ` Peter Zijlstra
2020-02-11 15:40         ` Paul E. McKenney
2020-02-11 12:21 ` Peter Zijlstra
2020-02-11 14:10   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).