All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
@ 2018-04-27  4:26 Joel Fernandes
  2018-04-27 14:26 ` Mathieu Desnoyers
  2018-04-27 15:57 ` Paul E. McKenney
  0 siblings, 2 replies; 24+ messages in thread
From: Joel Fernandes @ 2018-04-27  4:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Steven Rostedt, Peter Zilstra, Ingo Molnar,
	Mathieu Desnoyers, Tom Zanussi, Namhyung Kim, Thomas Glexiner,
	Boqun Feng, Paul McKenney, Frederic Weisbecker, Randy Dunlap,
	Masami Hiramatsu, Fenguang Wu, Baohong Liu, Vedang Patel,
	kernel-team

In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

In the future, we can add a new may_sleep API which can use this
infrastructure for callbacks that actually can sleep which will support
Mathieu's usecase of blocking probes.

Test: Tested idle and preempt/irq tracepoints.

[1] https://patchwork.kernel.org/patch/10344297/

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zilstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Glexiner <tglx@linutronix.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Fenguang Wu <fengguang.wu@intel.com>
Cc: Baohong Liu <baohong.liu@intel.com>
Cc: Vedang Patel <vedang.patel@intel.com>
Cc: kernel-team@android.com
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 include/linux/tracepoint.h | 37 +++++++++++++++++++++++++++++--------
 kernel/tracepoint.c        | 10 +++++++++-
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..a1c1987de423 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/srcu.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -33,6 +34,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
  */
 static inline void tracepoint_synchronize_unregister(void)
 {
+	synchronize_srcu(&tracepoint_srcu);
 	synchronize_sched();
 }
 
@@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
+#define __DO_TRACE(tp, proto, args, cond, preempt_on)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
+		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
-		if (rcucheck)						\
-			rcu_irq_enter_irqson();				\
-		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+		if (preempt_on) {					\
+			WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */	\
+			idx = srcu_read_lock(&tracepoint_srcu);         \
+			it_func_ptr = srcu_dereference((tp)->funcs,	\
+						&tracepoint_srcu);	\
+		} else {						\
+			rcu_read_lock_sched_notrace();			\
+			it_func_ptr =					\
+				rcu_dereference_sched((tp)->funcs);	\
+		}							\
+									\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -148,12 +160,21 @@ extern void syscall_unregfunc(void);
 				((void(*)(proto))(it_func))(args);	\
 			} while ((++it_func_ptr)->func);		\
 		}							\
-		rcu_read_unlock_sched_notrace();			\
-		if (rcucheck)						\
-			rcu_irq_exit_irqson();				\
+									\
+		if (preempt_on)					\
+			srcu_read_unlock(&tracepoint_srcu, idx);	\
+		else							\
+			rcu_read_unlock_sched_notrace();		\
 	} while (0)
 
 #ifndef MODULE
+/*
+ * This is for tracepoints that may be called from an rcu idle path. To make
+ * sure we its safe in the idle path, use the srcu instead of sched-rcu by
+ * specifying the preempt_on flag below.  This has the obvious effect that any
+ * callback expecting preemption to be disabled should explicitly do so since
+ * with srcu it'd run with preempt on.
+ */
 #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \
 	static inline void trace_##name##_rcuidle(proto)		\
 	{								\
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 671b13457387..b3b1d65a2460 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -31,6 +31,9 @@
 extern struct tracepoint * const __start___tracepoints_ptrs[];
 extern struct tracepoint * const __stop___tracepoints_ptrs[];
 
+DEFINE_SRCU(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
 /* Set to 1 to enable tracepoint debug output */
 static const int tracepoint_debug;
 
@@ -67,11 +70,16 @@ static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
 {
 	kfree(container_of(head, struct tp_probes, rcu));
 }
 
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
 static inline void release_probes(struct tracepoint_func *old)
 {
 	if (old) {
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27  4:26 [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on Joel Fernandes
@ 2018-04-27 14:26 ` Mathieu Desnoyers
  2018-04-27 14:47   ` Steven Rostedt
  2018-04-27 15:57 ` Paul E. McKenney
  1 sibling, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2018-04-27 14:26 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, rostedt, Peter Zijlstra, Ingo Molnar, Tom Zanussi,
	Namhyung Kim, Thomas Gleixner, Boqun Feng, Paul E. McKenney,
	fweisbec, Randy Dunlap, Masami Hiramatsu, kbuild test robot,
	baohong liu, vedang patel, kernel-team

----- On Apr 27, 2018, at 12:26 AM, Joel Fernandes joelaf@google.com wrote:

> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
> 
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.

The general approach and the implementation look fine, except for
one small detail: I would be tempted to explicitly disable preemption
around the call to the tracepoint callback for the rcuidle variant,
unless we plan to audit every tracer right away to remove any assumption
that preemption is disabled in the callback implementation.

That would be the main difference between an eventual "may_sleep" tracepoint
and a rcuidle tracepoint: "may_sleep" would use SRCU and leave preemption
enabled when invoking the callback. rcuidle uses SRCU too, but would
disable preemption when invoking the callback.

Thoughts ?

Thanks,

Mathieu


> 
> In the future, we can add a new may_sleep API which can use this
> infrastructure for callbacks that actually can sleep which will support
> Mathieu's usecase of blocking probes.
> 
> Test: Tested idle and preempt/irq tracepoints.
> 
> [1] https://patchwork.kernel.org/patch/10344297/
> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Peter Zilstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Thomas Glexiner <tglx@linutronix.de>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Fenguang Wu <fengguang.wu@intel.com>
> Cc: Baohong Liu <baohong.liu@intel.com>
> Cc: Vedang Patel <vedang.patel@intel.com>
> Cc: kernel-team@android.com
> Signed-off-by: Joel Fernandes <joelaf@google.com>
> ---
> include/linux/tracepoint.h | 37 +++++++++++++++++++++++++++++--------
> kernel/tracepoint.c        | 10 +++++++++-
> 2 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..a1c1987de423 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
>  */
> 
> #include <linux/smp.h>
> +#include <linux/srcu.h>
> #include <linux/errno.h>
> #include <linux/types.h>
> #include <linux/cpumask.h>
> @@ -33,6 +34,8 @@ struct trace_eval_map {
> 
> #define TRACEPOINT_DEFAULT_PRIO	10
> 
> +extern struct srcu_struct tracepoint_srcu;
> +
> extern int
> tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> extern int
> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct
> notifier_block *nb)
>  */
> static inline void tracepoint_synchronize_unregister(void)
> {
> +	synchronize_srcu(&tracepoint_srcu);
> 	synchronize_sched();
> }
> 
> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
>  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
>  */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
> +#define __DO_TRACE(tp, proto, args, cond, preempt_on)			\
> 	do {								\
> 		struct tracepoint_func *it_func_ptr;			\
> 		void *it_func;						\
> 		void *__data;						\
> +		int __maybe_unused idx = 0;				\
> 									\
> 		if (!(cond))						\
> 			return;						\
> -		if (rcucheck)						\
> -			rcu_irq_enter_irqson();				\
> -		rcu_read_lock_sched_notrace();				\
> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> +		if (preempt_on) {					\
> +			WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */	\
> +			idx = srcu_read_lock(&tracepoint_srcu);         \
> +			it_func_ptr = srcu_dereference((tp)->funcs,	\
> +						&tracepoint_srcu);	\
> +		} else {						\
> +			rcu_read_lock_sched_notrace();			\
> +			it_func_ptr =					\
> +				rcu_dereference_sched((tp)->funcs);	\
> +		}							\
> +									\
> 		if (it_func_ptr) {					\
> 			do {						\
> 				it_func = (it_func_ptr)->func;		\
> @@ -148,12 +160,21 @@ extern void syscall_unregfunc(void);
> 				((void(*)(proto))(it_func))(args);	\
> 			} while ((++it_func_ptr)->func);		\
> 		}							\
> -		rcu_read_unlock_sched_notrace();			\
> -		if (rcucheck)						\
> -			rcu_irq_exit_irqson();				\
> +									\
> +		if (preempt_on)					\
> +			srcu_read_unlock(&tracepoint_srcu, idx);	\
> +		else							\
> +			rcu_read_unlock_sched_notrace();		\
> 	} while (0)
> 
> #ifndef MODULE
> +/*
> + * This is for tracepoints that may be called from an rcu idle path. To make
> + * sure we its safe in the idle path, use the srcu instead of sched-rcu by
> + * specifying the preempt_on flag below.  This has the obvious effect that any
> + * callback expecting preemption to be disabled should explicitly do so since
> + * with srcu it'd run with preempt on.
> + */
> #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \
> 	static inline void trace_##name##_rcuidle(proto)		\
> 	{								\
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 671b13457387..b3b1d65a2460 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -31,6 +31,9 @@
> extern struct tracepoint * const __start___tracepoints_ptrs[];
> extern struct tracepoint * const __stop___tracepoints_ptrs[];
> 
> +DEFINE_SRCU(tracepoint_srcu);
> +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> +
> /* Set to 1 to enable tracepoint debug output */
> static const int tracepoint_debug;
> 
> @@ -67,11 +70,16 @@ static inline void *allocate_probes(int count)
> 	return p == NULL ? NULL : p->probes;
> }
> 
> -static void rcu_free_old_probes(struct rcu_head *head)
> +static void srcu_free_old_probes(struct rcu_head *head)
> {
> 	kfree(container_of(head, struct tp_probes, rcu));
> }
> 
> +static void rcu_free_old_probes(struct rcu_head *head)
> +{
> +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> +}
> +
> static inline void release_probes(struct tracepoint_func *old)
> {
> 	if (old) {
> --
> 2.17.0.441.gb46fe60e1d-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 14:26 ` Mathieu Desnoyers
@ 2018-04-27 14:47   ` Steven Rostedt
  2018-04-27 15:38     ` Paul E. McKenney
                       ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Steven Rostedt @ 2018-04-27 14:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Joel Fernandes, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Tom Zanussi, Namhyung Kim, Thomas Gleixner, Boqun Feng,
	Paul E. McKenney, fweisbec, Randy Dunlap, Masami Hiramatsu,
	kbuild test robot, baohong liu, vedang patel, kernel-team

On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> The general approach and the implementation look fine, except for
> one small detail: I would be tempted to explicitly disable preemption
> around the call to the tracepoint callback for the rcuidle variant,
> unless we plan to audit every tracer right away to remove any assumption
> that preemption is disabled in the callback implementation.

I'm thinking that we do that audit. There shouldn't be many instances
of it. I like the idea that a tracepoint callback gets called with
preemption enabled.

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 14:47   ` Steven Rostedt
@ 2018-04-27 15:38     ` Paul E. McKenney
  2018-04-27 15:40       ` Steven Rostedt
  2018-04-27 15:42     ` Mathieu Desnoyers
  2018-04-27 16:30     ` Joel Fernandes
  2 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2018-04-27 15:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Joel Fernandes, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Tom Zanussi, Namhyung Kim, Thomas Gleixner,
	Boqun Feng, fweisbec, Randy Dunlap, Masami Hiramatsu,
	kbuild test robot, baohong liu, vedang patel, kernel-team

On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> > The general approach and the implementation look fine, except for
> > one small detail: I would be tempted to explicitly disable preemption
> > around the call to the tracepoint callback for the rcuidle variant,
> > unless we plan to audit every tracer right away to remove any assumption
> > that preemption is disabled in the callback implementation.
> 
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

Are you really sure you want to increase your state space that much?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 15:38     ` Paul E. McKenney
@ 2018-04-27 15:40       ` Steven Rostedt
  2018-04-27 15:43         ` Mathieu Desnoyers
  2018-04-27 15:58         ` Paul E. McKenney
  0 siblings, 2 replies; 24+ messages in thread
From: Steven Rostedt @ 2018-04-27 15:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Mathieu Desnoyers, Joel Fernandes, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Tom Zanussi, Namhyung Kim, Thomas Gleixner,
	Boqun Feng, fweisbec, Randy Dunlap, Masami Hiramatsu,
	kbuild test robot, baohong liu, vedang patel, kernel-team

On Fri, 27 Apr 2018 08:38:26 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >   
> > > The general approach and the implementation look fine, except for
> > > one small detail: I would be tempted to explicitly disable preemption
> > > around the call to the tracepoint callback for the rcuidle variant,
> > > unless we plan to audit every tracer right away to remove any assumption
> > > that preemption is disabled in the callback implementation.  
> > 
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.  
> 
> Are you really sure you want to increase your state space that much?

Why not? The code I have in callbacks already deals with all sorts of
context - normal, softirq, irq, NMI, preemption disabled, irq
disabled.

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 14:47   ` Steven Rostedt
  2018-04-27 15:38     ` Paul E. McKenney
@ 2018-04-27 15:42     ` Mathieu Desnoyers
  2018-04-27 16:07       ` Steven Rostedt
  2018-04-27 16:30     ` Joel Fernandes
  2 siblings, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2018-04-27 15:42 UTC (permalink / raw)
  To: rostedt
  Cc: Joel Fernandes, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Tom Zanussi, Namhyung Kim, Thomas Gleixner, Boqun Feng,
	Paul E. McKenney, fweisbec, Randy Dunlap, Masami Hiramatsu,
	kbuild test robot, baohong liu, vedang patel, kernel-team

----- On Apr 27, 2018, at 10:47 AM, rostedt rostedt@goodmis.org wrote:

> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> The general approach and the implementation look fine, except for
>> one small detail: I would be tempted to explicitly disable preemption
>> around the call to the tracepoint callback for the rcuidle variant,
>> unless we plan to audit every tracer right away to remove any assumption
>> that preemption is disabled in the callback implementation.
> 
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

I see that ftrace explicitly disables preemption in its ring buffer
code. FWIW, this is redundant when called from sched-rcu tracepoints
and from kprobes which adds unnecessary performance overhead.

LTTng expects preemption to be disabled when invoked. I can adapt on my
side as needed, but would prefer not to have redundant preemption disabling
for probes hooking on sched-rcu tracepoints (which is the common case).

Do perf callbacks expect preemption to be disabled ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 15:40       ` Steven Rostedt
@ 2018-04-27 15:43         ` Mathieu Desnoyers
  2018-04-27 16:08           ` Steven Rostedt
  2018-04-27 15:58         ` Paul E. McKenney
  1 sibling, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2018-04-27 15:43 UTC (permalink / raw)
  To: rostedt
  Cc: Paul E. McKenney, Joel Fernandes, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Tom Zanussi, Namhyung Kim, Thomas Gleixner,
	Boqun Feng, fweisbec, Randy Dunlap, Masami Hiramatsu,
	kbuild test robot, baohong liu, vedang patel, kernel-team

----- On Apr 27, 2018, at 11:40 AM, rostedt rostedt@goodmis.org wrote:

> On Fri, 27 Apr 2018 08:38:26 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
>> On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
>> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
>> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>> >   
>> > > The general approach and the implementation look fine, except for
>> > > one small detail: I would be tempted to explicitly disable preemption
>> > > around the call to the tracepoint callback for the rcuidle variant,
>> > > unless we plan to audit every tracer right away to remove any assumption
>> > > that preemption is disabled in the callback implementation.
>> > 
>> > I'm thinking that we do that audit. There shouldn't be many instances
>> > of it. I like the idea that a tracepoint callback gets called with
>> > preemption enabled.
>> 
>> Are you really sure you want to increase your state space that much?
> 
> Why not? The code I have in callbacks already deals with all sorts of
> context - normal, softirq, irq, NMI, preemption disabled, irq
> disabled.

It does so by disabling preemption in the callbacks, even when it's
redundant with the guarantees already provided by tracepoint-sched-rcu
and by kprobes. It's not that great for a fast-path.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27  4:26 [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on Joel Fernandes
  2018-04-27 14:26 ` Mathieu Desnoyers
@ 2018-04-27 15:57 ` Paul E. McKenney
  2018-04-27 16:13   ` Steven Rostedt
  2018-04-27 16:14   ` Joel Fernandes
  1 sibling, 2 replies; 24+ messages in thread
From: Paul E. McKenney @ 2018-04-27 15:57 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Steven Rostedt, Peter Zilstra, Ingo Molnar,
	Mathieu Desnoyers, Tom Zanussi, Namhyung Kim, Thomas Glexiner,
	Boqun Feng, Frederic Weisbecker, Randy Dunlap, Masami Hiramatsu,
	Fenguang Wu, Baohong Liu, Vedang Patel, kernel-team

On Thu, Apr 26, 2018 at 09:26:56PM -0700, Joel Fernandes wrote:
> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
> 
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.
> 
> In the future, we can add a new may_sleep API which can use this
> infrastructure for callbacks that actually can sleep which will support
> Mathieu's usecase of blocking probes.
> 
> Test: Tested idle and preempt/irq tracepoints.

Looks good overall!  One question and a few comments below.

							Thanx, Paul

> [1] https://patchwork.kernel.org/patch/10344297/
> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Peter Zilstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Thomas Glexiner <tglx@linutronix.de>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Fenguang Wu <fengguang.wu@intel.com>
> Cc: Baohong Liu <baohong.liu@intel.com>
> Cc: Vedang Patel <vedang.patel@intel.com>
> Cc: kernel-team@android.com
> Signed-off-by: Joel Fernandes <joelaf@google.com>
> ---
>  include/linux/tracepoint.h | 37 +++++++++++++++++++++++++++++--------
>  kernel/tracepoint.c        | 10 +++++++++-
>  2 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..a1c1987de423 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
>   */
> 
>  #include <linux/smp.h>
> +#include <linux/srcu.h>
>  #include <linux/errno.h>
>  #include <linux/types.h>
>  #include <linux/cpumask.h>
> @@ -33,6 +34,8 @@ struct trace_eval_map {
> 
>  #define TRACEPOINT_DEFAULT_PRIO	10
> 
> +extern struct srcu_struct tracepoint_srcu;
> +
>  extern int
>  tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
>  extern int
> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
>   */
>  static inline void tracepoint_synchronize_unregister(void)
>  {
> +	synchronize_srcu(&tracepoint_srcu);
>  	synchronize_sched();
>  }
> 
> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
>   * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>   * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
>   */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
> +#define __DO_TRACE(tp, proto, args, cond, preempt_on)			\
>  	do {								\
>  		struct tracepoint_func *it_func_ptr;			\
>  		void *it_func;						\
>  		void *__data;						\
> +		int __maybe_unused idx = 0;				\
>  									\
>  		if (!(cond))						\
>  			return;						\
> -		if (rcucheck)						\
> -			rcu_irq_enter_irqson();				\
> -		rcu_read_lock_sched_notrace();				\
> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> +		if (preempt_on) {					\
> +			WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */	\

Very good on this check, thank you!

> +			idx = srcu_read_lock(&tracepoint_srcu);         \

Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
and srcu_read_unlock()?

> +			it_func_ptr = srcu_dereference((tp)->funcs,	\
> +						&tracepoint_srcu);	\
> +		} else {						\
> +			rcu_read_lock_sched_notrace();			\
> +			it_func_ptr =					\
> +				rcu_dereference_sched((tp)->funcs);	\
> +		}							\
> +									\
>  		if (it_func_ptr) {					\
>  			do {						\
>  				it_func = (it_func_ptr)->func;		\
> @@ -148,12 +160,21 @@ extern void syscall_unregfunc(void);
>  				((void(*)(proto))(it_func))(args);	\
>  			} while ((++it_func_ptr)->func);		\
>  		}							\
> -		rcu_read_unlock_sched_notrace();			\
> -		if (rcucheck)						\
> -			rcu_irq_exit_irqson();				\
> +									\
> +		if (preempt_on)					\
> +			srcu_read_unlock(&tracepoint_srcu, idx);	\
> +		else							\
> +			rcu_read_unlock_sched_notrace();		\
>  	} while (0)
> 
>  #ifndef MODULE
> +/*
> + * This is for tracepoints that may be called from an rcu idle path. To make
> + * sure we its safe in the idle path, use the srcu instead of sched-rcu by
> + * specifying the preempt_on flag below.  This has the obvious effect that any
> + * callback expecting preemption to be disabled should explicitly do so since
> + * with srcu it'd run with preempt on.
> + */
>  #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \
>  	static inline void trace_##name##_rcuidle(proto)		\
>  	{								\
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 671b13457387..b3b1d65a2460 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -31,6 +31,9 @@
>  extern struct tracepoint * const __start___tracepoints_ptrs[];
>  extern struct tracepoint * const __stop___tracepoints_ptrs[];
> 
> +DEFINE_SRCU(tracepoint_srcu);
> +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> +
>  /* Set to 1 to enable tracepoint debug output */
>  static const int tracepoint_debug;
> 
> @@ -67,11 +70,16 @@ static inline void *allocate_probes(int count)
>  	return p == NULL ? NULL : p->probes;
>  }
> 
> -static void rcu_free_old_probes(struct rcu_head *head)
> +static void srcu_free_old_probes(struct rcu_head *head)
>  {
>  	kfree(container_of(head, struct tp_probes, rcu));
>  }
> 
> +static void rcu_free_old_probes(struct rcu_head *head)
> +{
> +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);

Chaining them through the two callbacks works, good!

> +}
> +
>  static inline void release_probes(struct tracepoint_func *old)
>  {
>  	if (old) {
> -- 
> 2.17.0.441.gb46fe60e1d-goog
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 15:40       ` Steven Rostedt
  2018-04-27 15:43         ` Mathieu Desnoyers
@ 2018-04-27 15:58         ` Paul E. McKenney
  1 sibling, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2018-04-27 15:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Joel Fernandes, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Tom Zanussi, Namhyung Kim, Thomas Gleixner,
	Boqun Feng, fweisbec, Randy Dunlap, Masami Hiramatsu,
	kbuild test robot, baohong liu, vedang patel, kernel-team

On Fri, Apr 27, 2018 at 11:40:05AM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 08:38:26 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> > > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> > >   
> > > > The general approach and the implementation look fine, except for
> > > > one small detail: I would be tempted to explicitly disable preemption
> > > > around the call to the tracepoint callback for the rcuidle variant,
> > > > unless we plan to audit every tracer right away to remove any assumption
> > > > that preemption is disabled in the callback implementation.  
> > > 
> > > I'm thinking that we do that audit. There shouldn't be many instances
> > > of it. I like the idea that a tracepoint callback gets called with
> > > preemption enabled.  
> > 
> > Are you really sure you want to increase your state space that much?
> 
> Why not? The code I have in callbacks already deals with all sorts of
> context - normal, softirq, irq, NMI, preemption disabled, irq
> disabled.

But why?  Do people really expect good real-time response on systems
instrumented with lots of tracing?

								Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 15:42     ` Mathieu Desnoyers
@ 2018-04-27 16:07       ` Steven Rostedt
  0 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2018-04-27 16:07 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Joel Fernandes, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Tom Zanussi, Namhyung Kim, Thomas Gleixner, Boqun Feng,
	Paul E. McKenney, fweisbec, Randy Dunlap, Masami Hiramatsu,
	kbuild test robot, baohong liu, vedang patel, kernel-team

On Fri, 27 Apr 2018 11:42:15 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Apr 27, 2018, at 10:47 AM, rostedt rostedt@goodmis.org wrote:
> 
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >   
> >> The general approach and the implementation look fine, except for
> >> one small detail: I would be tempted to explicitly disable preemption
> >> around the call to the tracepoint callback for the rcuidle variant,
> >> unless we plan to audit every tracer right away to remove any assumption
> >> that preemption is disabled in the callback implementation.  
> > 
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.  
> 
> I see that ftrace explicitly disables preemption in its ring buffer
> code. FWIW, this is redundant when called from sched-rcu tracepoints
> and from kprobes which adds unnecessary performance overhead.

Sure, but that code is called from other locations that do not have
preemption disabled. Calling preempt_disable() is far from the biggest
overhead of that code path.

> 
> LTTng expects preemption to be disabled when invoked. I can adapt on my
> side as needed, but would prefer not to have redundant preemption disabling
> for probes hooking on sched-rcu tracepoints (which is the common case).

Why not? Really, preempt_disable is simply a per cpu counter, with only
need of adding compiler barriers.

> 
> Do perf callbacks expect preemption to be disabled ?

I'll have to look, but wouldn't be hard to change.

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 15:43         ` Mathieu Desnoyers
@ 2018-04-27 16:08           ` Steven Rostedt
  0 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2018-04-27 16:08 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, Joel Fernandes, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Tom Zanussi, Namhyung Kim, Thomas Gleixner,
	Boqun Feng, fweisbec, Randy Dunlap, Masami Hiramatsu,
	kbuild test robot, baohong liu, vedang patel, kernel-team

On Fri, 27 Apr 2018 11:43:41 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> It does so by disabling preemption in the callbacks, even when it's
> redundant with the guarantees already provided by tracepoint-sched-rcu
> and by kprobes. It's not that great for a fast-path.

Really, preempt_disable() is not bad for a fast path. It's far better
than a local_irq_disable().

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 15:57 ` Paul E. McKenney
@ 2018-04-27 16:13   ` Steven Rostedt
  2018-04-27 16:22     ` Joel Fernandes
  2018-04-27 16:44     ` Paul E. McKenney
  2018-04-27 16:14   ` Joel Fernandes
  1 sibling, 2 replies; 24+ messages in thread
From: Steven Rostedt @ 2018-04-27 16:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, linux-kernel, Peter Zilstra, Ingo Molnar,
	Mathieu Desnoyers, Tom Zanussi, Namhyung Kim, Thomas Glexiner,
	Boqun Feng, Frederic Weisbecker, Randy Dunlap, Masami Hiramatsu,
	Fenguang Wu, Baohong Liu, Vedang Patel, kernel-team

On Fri, 27 Apr 2018 08:57:01 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> > +		if (preempt_on) {					\
> > +			WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */	\  
> 
> Very good on this check, thank you!

I think you need to return and not call the read lock.

			if (WARN_ON_ONCE(in_nmi()))
				return;

> 
> > +			idx = srcu_read_lock(&tracepoint_srcu);         \  
> 
> Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> and srcu_read_unlock()?

I think so.

-- Steve

> 
> > +			it_func_ptr = srcu_dereference((tp)->funcs,	\
> > +						&tracepoint_srcu);	\
> > +		} else {						\
> > +			rcu_read_lock_sched_notrace();			\
> > +			it_func_ptr =					\
> > +				rcu_dereference_sched((tp)->funcs);	\
> > +		}							\
> > +									\

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 15:57 ` Paul E. McKenney
  2018-04-27 16:13   ` Steven Rostedt
@ 2018-04-27 16:14   ` Joel Fernandes
  2018-04-27 16:22     ` Steven Rostedt
  1 sibling, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2018-04-27 16:14 UTC (permalink / raw)
  To: Paul McKenney
  Cc: LKML, Steven Rostedt, Peter Zilstra, Ingo Molnar,
	Mathieu Desnoyers, Tom Zanussi, Namhyung Kim, Thomas Glexiner,
	Boqun Feng, Frederic Weisbecker, Randy Dunlap, Masami Hiramatsu,
	Fenguang Wu, Baohong Liu, Vedang Patel, Cc: Android Kernel

Hi Paul,

On Fri, Apr 27, 2018 at 8:57 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, Apr 26, 2018 at 09:26:56PM -0700, Joel Fernandes wrote:
>> In recent tests with IRQ on/off tracepoints, a large performance
>> overhead ~10% is noticed when running hackbench. This is root caused to
>> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
>> tracepoint code. Following a long discussion on the list [1] about this,
>> we concluded that srcu is a better alternative for use during rcu idle.
>> Although it does involve extra barriers, its lighter than the sched-rcu
>> version which has to do additional RCU calls to notify RCU idle about
>> entry into RCU sections.
>>
>> In this patch, we change the underlying implementation of the
>> trace_*_rcuidle API to use SRCU. This has shown to improve performance
>> alot for the high frequency irq enable/disable tracepoints.
>>
>> In the future, we can add a new may_sleep API which can use this
>> infrastructure for callbacks that actually can sleep which will support
>> Mathieu's usecase of blocking probes.
>>
>> Test: Tested idle and preempt/irq tracepoints.
>
> Looks good overall!  One question and a few comments below.
>
>                                                         Thanx, Paul
>
>> [1] https://patchwork.kernel.org/patch/10344297/
>>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Peter Zilstra <peterz@infradead.org>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
>> Cc: Namhyung Kim <namhyung@kernel.org>
>> Cc: Thomas Glexiner <tglx@linutronix.de>
>> Cc: Boqun Feng <boqun.feng@gmail.com>
>> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Cc: Randy Dunlap <rdunlap@infradead.org>
>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>> Cc: Fenguang Wu <fengguang.wu@intel.com>
>> Cc: Baohong Liu <baohong.liu@intel.com>
>> Cc: Vedang Patel <vedang.patel@intel.com>
>> Cc: kernel-team@android.com
>> Signed-off-by: Joel Fernandes <joelaf@google.com>
>> ---
>>  include/linux/tracepoint.h | 37 +++++++++++++++++++++++++++++--------
>>  kernel/tracepoint.c        | 10 +++++++++-
>>  2 files changed, 38 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
>> index c94f466d57ef..a1c1987de423 100644
>> --- a/include/linux/tracepoint.h
>> +++ b/include/linux/tracepoint.h
>> @@ -15,6 +15,7 @@
>>   */
>>
>>  #include <linux/smp.h>
>> +#include <linux/srcu.h>
>>  #include <linux/errno.h>
>>  #include <linux/types.h>
>>  #include <linux/cpumask.h>
>> @@ -33,6 +34,8 @@ struct trace_eval_map {
>>
>>  #define TRACEPOINT_DEFAULT_PRIO      10
>>
>> +extern struct srcu_struct tracepoint_srcu;
>> +
>>  extern int
>>  tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
>>  extern int
>> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
>>   */
>>  static inline void tracepoint_synchronize_unregister(void)
>>  {
>> +     synchronize_srcu(&tracepoint_srcu);
>>       synchronize_sched();
>>  }
>>
>> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
>>   * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>>   * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
>>   */
>> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)                  \
>> +#define __DO_TRACE(tp, proto, args, cond, preempt_on)                        \
>>       do {                                                            \
>>               struct tracepoint_func *it_func_ptr;                    \
>>               void *it_func;                                          \
>>               void *__data;                                           \
>> +             int __maybe_unused idx = 0;                             \
>>                                                                       \
>>               if (!(cond))                                            \
>>                       return;                                         \
>> -             if (rcucheck)                                           \
>> -                     rcu_irq_enter_irqson();                         \
>> -             rcu_read_lock_sched_notrace();                          \
>> -             it_func_ptr = rcu_dereference_sched((tp)->funcs);       \
>> +             if (preempt_on) {                                       \
>> +                     WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \
>
> Very good on this check, thank you!

Sure thing :-)

>
>> +                     idx = srcu_read_lock(&tracepoint_srcu);         \
>
> Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> and srcu_read_unlock()?

That shouldn't be needed. For the rcu_read_lock_sched case, there is a
preempt_disable which needs to be a notrace, but for the srcu one,
since we don't do that, I think it should be fine.

Thanks!

- Joel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 16:14   ` Joel Fernandes
@ 2018-04-27 16:22     ` Steven Rostedt
  2018-04-27 16:45       ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2018-04-27 16:22 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul McKenney, LKML, Peter Zilstra, Ingo Molnar,
	Mathieu Desnoyers, Tom Zanussi, Namhyung Kim, Thomas Glexiner,
	Boqun Feng, Frederic Weisbecker, Randy Dunlap, Masami Hiramatsu,
	Fenguang Wu, Baohong Liu, Vedang Patel, Cc: Android Kernel

On Fri, 27 Apr 2018 09:14:30 -0700
Joel Fernandes <joelaf@google.com> wrote:

> > Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> > and srcu_read_unlock()?  
> 
> That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> preempt_disable which needs to be a notrace, but for the srcu one,
> since we don't do that, I think it should be fine.

Actually, I think I may agree here too. Because the _notrace is for
function tracing, and it shouldn't affect it. If people don't want it
traced, they could add those functions to the list in the notrace file.

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 16:13   ` Steven Rostedt
@ 2018-04-27 16:22     ` Joel Fernandes
  2018-04-27 16:44     ` Paul E. McKenney
  1 sibling, 0 replies; 24+ messages in thread
From: Joel Fernandes @ 2018-04-27 16:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, LKML, Peter Zilstra, Ingo Molnar,
	Mathieu Desnoyers, Tom Zanussi, Namhyung Kim, Thomas Glexiner,
	Boqun Feng, Frederic Weisbecker, Randy Dunlap, Masami Hiramatsu,
	Fenguang Wu, Baohong Liu, Vedang Patel, Cc: Android Kernel

On Fri, Apr 27, 2018 at 9:13 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 27 Apr 2018 08:57:01 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
>> > +           if (preempt_on) {                                       \
>> > +                   WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \
>>
>> Very good on this check, thank you!
>
> I think you need to return and not call the read lock.
>
>                         if (WARN_ON_ONCE(in_nmi()))
>                                 return;

Cool, I'll do that.

>>
>> > +                   idx = srcu_read_lock(&tracepoint_srcu);         \
>>
>> Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
>> and srcu_read_unlock()?
>
> I think so.

Oh yes, since otherwise we call into lockdep.

Paul, then I think that's true we'd need srcu _notrace variants that
don't do the rcu_lock_acquire. Sorry for my earlier email saying its
not needed. Thanks,

- Joel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 14:47   ` Steven Rostedt
  2018-04-27 15:38     ` Paul E. McKenney
  2018-04-27 15:42     ` Mathieu Desnoyers
@ 2018-04-27 16:30     ` Joel Fernandes
  2018-04-27 16:37       ` Steven Rostedt
  2 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2018-04-27 16:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Tom Zanussi, Namhyung Kim, Thomas Gleixner, Boqun Feng,
	Paul E. McKenney, fweisbec, Randy Dunlap, Masami Hiramatsu,
	kbuild test robot, baohong liu, vedang patel, Cc: Android Kernel

On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>
>> The general approach and the implementation look fine, except for
>> one small detail: I would be tempted to explicitly disable preemption
>> around the call to the tracepoint callback for the rcuidle variant,
>> unless we plan to audit every tracer right away to remove any assumption
>> that preemption is disabled in the callback implementation.
>
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

Here is the list of all callers of the _rcuidle :

trace_clk_disable_complete_rcuidle
trace_clk_disable_rcuidle
trace_clk_enable_complete_rcuidle
trace_clk_enable_rcuidle
trace_console_rcuidle
trace_cpu_idle_rcuidle
trace_ipi_entry_rcuidle
trace_ipi_exit_rcuidle
trace_ipi_raise_rcuidle
trace_irq_disable_rcuidle
trace_irq_enable_rcuidle
trace_power_domain_target_rcuidle
trace_preempt_disable_rcuidle
trace_preempt_enable_rcuidle
trace_rpm_idle_rcuidle
trace_rpm_resume_rcuidle
trace_rpm_return_int_rcuidle
trace_rpm_suspend_rcuidle
trace_tlb_flush_rcuidle

All of these are either called from irqs or preemption disabled
already. So I think it should be fine to keep preemption on. But I'm
Ok with disabling it before callback execution if we agree that we
want that.

(and the ring buffer code is able to cope anyway Steven pointed).

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 16:30     ` Joel Fernandes
@ 2018-04-27 16:37       ` Steven Rostedt
  2018-04-27 18:11         ` Joel Fernandes
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2018-04-27 16:37 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Mathieu Desnoyers, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Tom Zanussi, Namhyung Kim, Thomas Gleixner, Boqun Feng,
	Paul E. McKenney, fweisbec, Randy Dunlap, Masami Hiramatsu,
	kbuild test robot, baohong liu, vedang patel, Cc: Android Kernel

On Fri, 27 Apr 2018 09:30:05 -0700
Joel Fernandes <joelaf@google.com> wrote:

> On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >  
> >> The general approach and the implementation look fine, except for
> >> one small detail: I would be tempted to explicitly disable preemption
> >> around the call to the tracepoint callback for the rcuidle variant,
> >> unless we plan to audit every tracer right away to remove any assumption
> >> that preemption is disabled in the callback implementation.  
> >
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.  
> 
> Here is the list of all callers of the _rcuidle :

I was thinking of auditing who registers callbacks to any tracepoints.

-- Steve

> 
> trace_clk_disable_complete_rcuidle
> trace_clk_disable_rcuidle
> trace_clk_enable_complete_rcuidle
> trace_clk_enable_rcuidle
> trace_console_rcuidle
> trace_cpu_idle_rcuidle
> trace_ipi_entry_rcuidle
> trace_ipi_exit_rcuidle
> trace_ipi_raise_rcuidle
> trace_irq_disable_rcuidle
> trace_irq_enable_rcuidle
> trace_power_domain_target_rcuidle
> trace_preempt_disable_rcuidle
> trace_preempt_enable_rcuidle
> trace_rpm_idle_rcuidle
> trace_rpm_resume_rcuidle
> trace_rpm_return_int_rcuidle
> trace_rpm_suspend_rcuidle
> trace_tlb_flush_rcuidle
> 
> All of these are either called from irqs or preemption disabled
> already. So I think it should be fine to keep preemption on. But I'm
> Ok with disabling it before callback execution if we agree that we
> want that.
> 
> (and the ring buffer code is able to cope anyway Steven pointed).
> 
> thanks,
> 
>  - Joel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 16:13   ` Steven Rostedt
  2018-04-27 16:22     ` Joel Fernandes
@ 2018-04-27 16:44     ` Paul E. McKenney
  1 sibling, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2018-04-27 16:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, linux-kernel, Peter Zilstra, Ingo Molnar,
	Mathieu Desnoyers, Tom Zanussi, Namhyung Kim, Thomas Glexiner,
	Boqun Feng, Frederic Weisbecker, Randy Dunlap, Masami Hiramatsu,
	Fenguang Wu, Baohong Liu, Vedang Patel, kernel-team

On Fri, Apr 27, 2018 at 12:13:30PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 08:57:01 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > > +		if (preempt_on) {					\
> > > +			WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */	\  
> > 
> > Very good on this check, thank you!
> 
> I think you need to return and not call the read lock.

Works for me either way, at least assuming that the splat actually gets
printed.  ;-)

> 			if (WARN_ON_ONCE(in_nmi()))
> 				return;
> 
> > 
> > > +			idx = srcu_read_lock(&tracepoint_srcu);         \  
> > 
> > Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> > and srcu_read_unlock()?
> 
> I think so.

OK, please see the (untested) patch below.  Of course,
srcu_read_lock_notrace() invokes __srcu_read_lock(), which looks as
follows:

	int __srcu_read_lock(struct srcu_struct *sp)
	{
		int idx;

		idx = READ_ONCE(sp->srcu_idx) & 0x1;
		this_cpu_inc(sp->sda->srcu_lock_count[idx]);
		smp_mb(); /* B */  /* Avoid leaking the critical section. */
		return idx;
	}

Do I also need to make a notrace version of __srcu_read_lock()?
Same question for __srcu_read_unlock(), which is similar.  If so,
assuming that there is no need for a notrace variant of this_cpu_inc()
and smp_mb(), I suppose I could simply macro-ize the internals in both
cases, but perhaps you have a better approach.

								Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 91494d7e8e41..e2e2cf05a6eb 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -195,6 +195,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
 	return retval;
 }
 
+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+	int retval;
+
+	retval = __srcu_read_lock(sp);
+	return retval;
+}
+
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
  * @sp: srcu_struct in which to unregister the old reader.
@@ -205,6 +215,13 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
 static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
 	__releases(sp)
 {
+	__srcu_read_unlock(sp, idx);
+}
+
+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
 	rcu_lock_release(&(sp)->dep_map);
 	__srcu_read_unlock(sp, idx);
 }

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 16:22     ` Steven Rostedt
@ 2018-04-27 16:45       ` Paul E. McKenney
  2018-04-27 16:46         ` Steven Rostedt
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2018-04-27 16:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, LKML, Peter Zilstra, Ingo Molnar,
	Mathieu Desnoyers, Tom Zanussi, Namhyung Kim, Thomas Glexiner,
	Boqun Feng, Frederic Weisbecker, Randy Dunlap, Masami Hiramatsu,
	Fenguang Wu, Baohong Liu, Vedang Patel, Cc: Android Kernel

On Fri, Apr 27, 2018 at 12:22:01PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 09:14:30 -0700
> Joel Fernandes <joelaf@google.com> wrote:
> 
> > > Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> > > and srcu_read_unlock()?  
> > 
> > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > preempt_disable which needs to be a notrace, but for the srcu one,
> > since we don't do that, I think it should be fine.
> 
> Actually, I think I may agree here too. Because the _notrace is for
> function tracing, and it shouldn't affect it. If people don't want it
> traced, they could add those functions to the list in the notrace file.

OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 16:45       ` Paul E. McKenney
@ 2018-04-27 16:46         ` Steven Rostedt
  2018-04-27 17:00           ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2018-04-27 16:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, LKML, Peter Zilstra, Ingo Molnar,
	Mathieu Desnoyers, Tom Zanussi, Namhyung Kim, Thomas Glexiner,
	Boqun Feng, Frederic Weisbecker, Randy Dunlap, Masami Hiramatsu,
	Fenguang Wu, Baohong Liu, Vedang Patel, Cc: Android Kernel

On Fri, 27 Apr 2018 09:45:54 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > since we don't do that, I think it should be fine.  
> > 
> > Actually, I think I may agree here too. Because the _notrace is for
> > function tracing, and it shouldn't affect it. If people don't want it
> > traced, they could add those functions to the list in the notrace file.  
> 
> OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)

Of course I wasn't thinking about the lockdep tracepoints that Joel
mentioned, which happens to be the reason for all this discussion in
the first place :-)  Now I think we do need it. (OK, I can keep
changing my mind, can't I?).

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 16:46         ` Steven Rostedt
@ 2018-04-27 17:00           ` Paul E. McKenney
  2018-04-27 17:05             ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2018-04-27 17:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, LKML, Peter Zilstra, Ingo Molnar,
	Mathieu Desnoyers, Tom Zanussi, Namhyung Kim, Thomas Glexiner,
	Boqun Feng, Frederic Weisbecker, Randy Dunlap, Masami Hiramatsu,
	Fenguang Wu, Baohong Liu, Vedang Patel, Cc: Android Kernel

On Fri, Apr 27, 2018 at 12:46:41PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 09:45:54 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > > since we don't do that, I think it should be fine.  
> > > 
> > > Actually, I think I may agree here too. Because the _notrace is for
> > > function tracing, and it shouldn't affect it. If people don't want it
> > > traced, they could add those functions to the list in the notrace file.  
> > 
> > OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)
> 
> Of course I wasn't thinking about the lockdep tracepoints that Joel
> mentioned, which happens to be the reason for all this discussion in
> the first place :-)  Now I think we do need it. (OK, I can keep
> changing my mind, can't I?).

You can, but at some point I start applying heavy-duty hysteresis.  ;-)

So the current thought (as of your having sent the above email) is that
we need notrace versions of srcu_read_lock() and srcu_read_unlock(),
but not for __srcu_read_lock() and __srcu_read_unlock(), correct?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 17:00           ` Paul E. McKenney
@ 2018-04-27 17:05             ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2018-04-27 17:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, LKML, Peter Zilstra, Ingo Molnar,
	Mathieu Desnoyers, Tom Zanussi, Namhyung Kim, Thomas Glexiner,
	Boqun Feng, Frederic Weisbecker, Randy Dunlap, Masami Hiramatsu,
	Fenguang Wu, Baohong Liu, Vedang Patel, Cc: Android Kernel

On Fri, Apr 27, 2018 at 10:00:48AM -0700, Paul E. McKenney wrote:
> On Fri, Apr 27, 2018 at 12:46:41PM -0400, Steven Rostedt wrote:
> > On Fri, 27 Apr 2018 09:45:54 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > > > since we don't do that, I think it should be fine.  
> > > > 
> > > > Actually, I think I may agree here too. Because the _notrace is for
> > > > function tracing, and it shouldn't affect it. If people don't want it
> > > > traced, they could add those functions to the list in the notrace file.  
> > > 
> > > OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)
> > 
> > Of course I wasn't thinking about the lockdep tracepoints that Joel
> > mentioned, which happens to be the reason for all this discussion in
> > the first place :-)  Now I think we do need it. (OK, I can keep
> > changing my mind, can't I?).
> 
> You can, but at some point I start applying heavy-duty hysteresis.  ;-)
> 
> So the current thought (as of your having sent the above email) is that
> we need notrace versions of srcu_read_lock() and srcu_read_unlock(),
> but not for __srcu_read_lock() and __srcu_read_unlock(), correct?

And Joel noted offline that I messed up srcu_read_unlock_notrace(),
so here is an updated patch with that fixed.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 91494d7e8e41..3e72a291c401 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -195,6 +195,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
 	return retval;
 }
 
+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+	int retval;
+
+	retval = __srcu_read_lock(sp);
+	return retval;
+}
+
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
  * @sp: srcu_struct in which to unregister the old reader.
@@ -209,6 +219,13 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
 	__srcu_read_unlock(sp, idx);
 }
 
+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
+	__srcu_read_unlock(sp, idx);
+}
+
 /**
  * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
  *

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 16:37       ` Steven Rostedt
@ 2018-04-27 18:11         ` Joel Fernandes
  2018-04-27 18:42           ` Mathieu Desnoyers
  0 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2018-04-27 18:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Tom Zanussi, Namhyung Kim, Thomas Gleixner, Boqun Feng,
	Paul E. McKenney, fweisbec, Randy Dunlap, Masami Hiramatsu,
	kbuild test robot, baohong liu, vedang patel, Cc: Android Kernel

On Fri, Apr 27, 2018 at 9:37 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 27 Apr 2018 09:30:05 -0700
> Joel Fernandes <joelaf@google.com> wrote:
>
>> On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
>> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>> >
>> >> The general approach and the implementation look fine, except for
>> >> one small detail: I would be tempted to explicitly disable preemption
>> >> around the call to the tracepoint callback for the rcuidle variant,
>> >> unless we plan to audit every tracer right away to remove any assumption
>> >> that preemption is disabled in the callback implementation.
>> >
>> > I'm thinking that we do that audit. There shouldn't be many instances
>> > of it. I like the idea that a tracepoint callback gets called with
>> > preemption enabled.
>>
>> Here is the list of all callers of the _rcuidle :
>
> I was thinking of auditing who registers callbacks to any tracepoints.

Ok. If you feel strongly about this, I think for now I could also just
wrap the callback execution with preempt_disable_notrace. And, when/if
we get to doing the blocking callbacks work, we can considering
keeping preempts on.

thanks,

- Joel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on
  2018-04-27 18:11         ` Joel Fernandes
@ 2018-04-27 18:42           ` Mathieu Desnoyers
  0 siblings, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2018-04-27 18:42 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rostedt, linux-kernel, Peter Zijlstra, Ingo Molnar, Tom Zanussi,
	Namhyung Kim, Thomas Gleixner, Boqun Feng, Paul E. McKenney,
	fweisbec, Randy Dunlap, Masami Hiramatsu, kbuild test robot,
	baohong liu, vedang patel, kernel-team

----- On Apr 27, 2018, at 2:11 PM, Joel Fernandes joelaf@google.com wrote:

> On Fri, Apr 27, 2018 at 9:37 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> On Fri, 27 Apr 2018 09:30:05 -0700
>> Joel Fernandes <joelaf@google.com> wrote:
>>
>>> On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>>> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
>>> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>> >
>>> >> The general approach and the implementation look fine, except for
>>> >> one small detail: I would be tempted to explicitly disable preemption
>>> >> around the call to the tracepoint callback for the rcuidle variant,
>>> >> unless we plan to audit every tracer right away to remove any assumption
>>> >> that preemption is disabled in the callback implementation.
>>> >
>>> > I'm thinking that we do that audit. There shouldn't be many instances
>>> > of it. I like the idea that a tracepoint callback gets called with
>>> > preemption enabled.
>>>
>>> Here is the list of all callers of the _rcuidle :
>>
>> I was thinking of auditing who registers callbacks to any tracepoints.
> 
> Ok. If you feel strongly about this, I think for now I could also just
> wrap the callback execution with preempt_disable_notrace. And, when/if
> we get to doing the blocking callbacks work, we can considering
> keeping preempts on.

My main point here is to introduce the minimal change (keeping preemption
disabled) needed for the rcuidle variant, and only tackle the work of
dealing with preemptible callbacks when we really need it and when we can
properly test it (e.g. by using it for syscall entry/exit tracing).

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-04-27 18:42 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27  4:26 [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on Joel Fernandes
2018-04-27 14:26 ` Mathieu Desnoyers
2018-04-27 14:47   ` Steven Rostedt
2018-04-27 15:38     ` Paul E. McKenney
2018-04-27 15:40       ` Steven Rostedt
2018-04-27 15:43         ` Mathieu Desnoyers
2018-04-27 16:08           ` Steven Rostedt
2018-04-27 15:58         ` Paul E. McKenney
2018-04-27 15:42     ` Mathieu Desnoyers
2018-04-27 16:07       ` Steven Rostedt
2018-04-27 16:30     ` Joel Fernandes
2018-04-27 16:37       ` Steven Rostedt
2018-04-27 18:11         ` Joel Fernandes
2018-04-27 18:42           ` Mathieu Desnoyers
2018-04-27 15:57 ` Paul E. McKenney
2018-04-27 16:13   ` Steven Rostedt
2018-04-27 16:22     ` Joel Fernandes
2018-04-27 16:44     ` Paul E. McKenney
2018-04-27 16:14   ` Joel Fernandes
2018-04-27 16:22     ` Steven Rostedt
2018-04-27 16:45       ` Paul E. McKenney
2018-04-27 16:46         ` Steven Rostedt
2018-04-27 17:00           ` Paul E. McKenney
2018-04-27 17:05             ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.