All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] Revert SRCU from tracepoint infrastructure
@ 2020-02-07 20:56 Joel Fernandes (Google)
  2020-02-07 20:56 ` [RFC 1/3] Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique" Joel Fernandes (Google)
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Joel Fernandes (Google) @ 2020-02-07 20:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Steven Rostedt, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan

Hi,
These patches remove SRCU usage from tracepoints. The reason for proposing the
reverts is because the whole point of SRCU was to avoid having to call
rcu_irq_enter_irqson(). However this was added back in 865e63b04e9b2 ("tracing:
Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints") because perf
was breaking..

Further it occurs to me that, by using SRCU for tracepoints, we forgot that RCU
is not really watching the tracepoint callbacks. This means that anyone doing
preempt_disable() in their tracepoint callback, and expecting RCU to listen to
them is in for a big surprise. When RCU is not watching, it does not care about
preempt-disable sections on CPUs as you can see in the forced-quiescent state loop.

Since SRCU is not providing any benefit because of 865e63b04e9b2 anyway, let us
revert SRCU tracepoint code to maintain the sanity of potential
tracepoint callback registerers.

Joel Fernandes (Google) (3):
Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make
it unique"
Revert "tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle
tracepoints"
Revert "tracepoint: Make rcuidle tracepoint callers use SRCU"

include/linux/tracepoint.h | 40 ++++++--------------------------------
kernel/tracepoint.c        | 10 +---------
2 files changed, 7 insertions(+), 43 deletions(-)

--
2.25.0.341.g760bfbb309-goog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 1/3] Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique"
  2020-02-07 20:56 [RFC 0/3] Revert SRCU from tracepoint infrastructure Joel Fernandes (Google)
@ 2020-02-07 20:56 ` Joel Fernandes (Google)
  2020-02-07 21:07   ` Steven Rostedt
  2020-02-07 20:56 ` [RFC 2/3] Revert "tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints" Joel Fernandes (Google)
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes (Google) @ 2020-02-07 20:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Steven Rostedt, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan

This reverts commit 0c7a52e4d4b5c4d35b31f3c3ad32af814f1bf491.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/tracepoint.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 1fb11daa5c533..59463c90fdc3d 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -164,7 +164,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
-		int __maybe_unused __idx = 0;				\
+		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
@@ -180,7 +180,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 		 * doesn't work from the idle path.			\
 		 */							\
 		if (rcuidle) {						\
-			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
+			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
 			rcu_irq_enter_irqson();				\
 		}							\
 									\
@@ -196,7 +196,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 									\
 		if (rcuidle) {						\
 			rcu_irq_exit_irqson();				\
-			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
+			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
 		}							\
 									\
 		preempt_enable_notrace();				\
-- 
2.25.0.341.g760bfbb309-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC 2/3] Revert "tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints"
  2020-02-07 20:56 [RFC 0/3] Revert SRCU from tracepoint infrastructure Joel Fernandes (Google)
  2020-02-07 20:56 ` [RFC 1/3] Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique" Joel Fernandes (Google)
@ 2020-02-07 20:56 ` Joel Fernandes (Google)
  2020-02-07 20:56 ` [RFC 3/3] Revert "tracepoint: Make rcuidle tracepoint callers use SRCU" Joel Fernandes (Google)
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Joel Fernandes (Google) @ 2020-02-07 20:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Steven Rostedt, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan

This reverts commit 865e63b04e9b2a658d7f26bd13a71dcd964a9118.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/tracepoint.h | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 59463c90fdc3d..ab1f13b7f7d2c 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -179,10 +179,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 		 * For rcuidle callers, use srcu since sched-rcu	\
 		 * doesn't work from the idle path.			\
 		 */							\
-		if (rcuidle) {						\
+		if (rcuidle)						\
 			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
-			rcu_irq_enter_irqson();				\
-		}							\
 									\
 		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
 									\
@@ -194,10 +192,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 			} while ((++it_func_ptr)->func);		\
 		}							\
 									\
-		if (rcuidle) {						\
-			rcu_irq_exit_irqson();				\
+		if (rcuidle)						\
 			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
-		}							\
 									\
 		preempt_enable_notrace();				\
 	} while (0)
-- 
2.25.0.341.g760bfbb309-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC 3/3] Revert "tracepoint: Make rcuidle tracepoint callers use SRCU"
  2020-02-07 20:56 [RFC 0/3] Revert SRCU from tracepoint infrastructure Joel Fernandes (Google)
  2020-02-07 20:56 ` [RFC 1/3] Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique" Joel Fernandes (Google)
  2020-02-07 20:56 ` [RFC 2/3] Revert "tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints" Joel Fernandes (Google)
@ 2020-02-07 20:56 ` Joel Fernandes (Google)
  2020-02-07 21:24 ` [RFC 0/3] Revert SRCU from tracepoint infrastructure Paul E. McKenney
  2020-02-08 16:31 ` Mathieu Desnoyers
  4 siblings, 0 replies; 24+ messages in thread
From: Joel Fernandes (Google) @ 2020-02-07 20:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Steven Rostedt, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan

This reverts commit e6753f23d961d601dbae50a2fc2a3975c9715b14.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/tracepoint.h | 40 ++++++++------------------------------
 kernel/tracepoint.c        | 10 +---------
 2 files changed, 9 insertions(+), 41 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index ab1f13b7f7d2c..b2b13cf03e092 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -13,7 +13,6 @@
  */
 
 #include <linux/smp.h>
-#include <linux/srcu.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -32,8 +31,6 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
-extern struct srcu_struct tracepoint_srcu;
-
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -76,16 +73,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
  * probe unregistration and the end of module exit to make sure there is no
  * caller executing a probe when it is freed.
  */
-#ifdef CONFIG_TRACEPOINTS
 static inline void tracepoint_synchronize_unregister(void)
 {
-	synchronize_srcu(&tracepoint_srcu);
 	synchronize_rcu();
 }
-#else
-static inline void tracepoint_synchronize_unregister(void)
-{ }
-#endif
 
 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
 extern int syscall_regfunc(void);
@@ -159,31 +150,18 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
+#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
-		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
-									\
-		/* srcu can't be used from NMI */			\
-		WARN_ON_ONCE(rcuidle && in_nmi());			\
-									\
-		/* keep srcu and sched-rcu usage consistent */		\
-		preempt_disable_notrace();				\
-									\
-		/*							\
-		 * For rcuidle callers, use srcu since sched-rcu	\
-		 * doesn't work from the idle path.			\
-		 */							\
-		if (rcuidle)						\
-			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
-									\
-		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
-									\
+		if (rcucheck)						\
+			rcu_irq_enter_irqson();				\
+		rcu_read_lock_sched_notrace();				\
+		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -191,11 +169,9 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 				((void(*)(proto))(it_func))(args);	\
 			} while ((++it_func_ptr)->func);		\
 		}							\
-									\
-		if (rcuidle)						\
-			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
-									\
-		preempt_enable_notrace();				\
+		rcu_read_unlock_sched_notrace();			\
+		if (rcucheck)						\
+			rcu_irq_exit_irqson();				\
 	} while (0)
 
 #ifndef MODULE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 73956eaff8a9c..df956fc23ca57 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -18,9 +18,6 @@
 extern tracepoint_ptr_t __start___tracepoints_ptrs[];
 extern tracepoint_ptr_t __stop___tracepoints_ptrs[];
 
-DEFINE_SRCU(tracepoint_srcu);
-EXPORT_SYMBOL_GPL(tracepoint_srcu);
-
 /* Set to 1 to enable tracepoint debug output */
 static const int tracepoint_debug;
 
@@ -60,14 +57,9 @@ static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void srcu_free_old_probes(struct rcu_head *head)
-{
-	kfree(container_of(head, struct tp_probes, rcu));
-}
-
 static void rcu_free_old_probes(struct rcu_head *head)
 {
-	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+	kfree(container_of(head, struct tp_probes, rcu));
 }
 
 static __init int release_early_probes(void)
-- 
2.25.0.341.g760bfbb309-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC 1/3] Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique"
  2020-02-07 20:56 ` [RFC 1/3] Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique" Joel Fernandes (Google)
@ 2020-02-07 21:07   ` Steven Rostedt
  0 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2020-02-07 21:07 UTC (permalink / raw)
  To: Joel Fernandes (Google), linux-kernel
  Cc: Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Thomas Gleixner, Paul E. McKenney,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

[Top posting as I'm replying from an airport gate, from my phone ]

Although you have a cover letter explaining the revert, each patch must be standalone, otherwise looking at git history won't have any explanation for the revert.

-- Steve


On February 7, 2020 3:56:54 PM EST, "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
>This reverts commit 0c7a52e4d4b5c4d35b31f3c3ad32af814f1bf491.
>
>Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>---
> include/linux/tracepoint.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
>index 1fb11daa5c533..59463c90fdc3d 100644
>--- a/include/linux/tracepoint.h
>+++ b/include/linux/tracepoint.h
>@@ -164,7 +164,7 @@ static inline struct tracepoint
>*tracepoint_ptr_deref(tracepoint_ptr_t *p)
> 		struct tracepoint_func *it_func_ptr;			\
> 		void *it_func;						\
> 		void *__data;						\
>-		int __maybe_unused __idx = 0;				\
>+		int __maybe_unused idx = 0;				\
> 									\
> 		if (!(cond))						\
> 			return;						\
>@@ -180,7 +180,7 @@ static inline struct tracepoint
>*tracepoint_ptr_deref(tracepoint_ptr_t *p)
> 		 * doesn't work from the idle path.			\
> 		 */							\
> 		if (rcuidle) {						\
>-			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
>+			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> 			rcu_irq_enter_irqson();				\
> 		}							\
> 									\
>@@ -196,7 +196,7 @@ static inline struct tracepoint
>*tracepoint_ptr_deref(tracepoint_ptr_t *p)
> 									\
> 		if (rcuidle) {						\
> 			rcu_irq_exit_irqson();				\
>-			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
>+			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> 		}							\
> 									\
> 		preempt_enable_notrace();				\

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity and top posting.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-07 20:56 [RFC 0/3] Revert SRCU from tracepoint infrastructure Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2020-02-07 20:56 ` [RFC 3/3] Revert "tracepoint: Make rcuidle tracepoint callers use SRCU" Joel Fernandes (Google)
@ 2020-02-07 21:24 ` Paul E. McKenney
  2020-02-07 21:43   ` Joel Fernandes
  2020-02-08 16:31 ` Mathieu Desnoyers
  4 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2020-02-07 21:24 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Greg Kroah-Hartman, Gustavo A. R. Silva,
	Ingo Molnar, Richard Fontana, Steven Rostedt, Thomas Gleixner,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

On Fri, Feb 07, 2020 at 03:56:53PM -0500, Joel Fernandes (Google) wrote:
> Hi,
> These patches remove SRCU usage from tracepoints. The reason for proposing the
> reverts is because the whole point of SRCU was to avoid having to call
> rcu_irq_enter_irqson(). However this was added back in 865e63b04e9b2 ("tracing:
> Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints") because perf
> was breaking..
> 
> Further it occurs to me that, by using SRCU for tracepoints, we forgot that RCU
> is not really watching the tracepoint callbacks. This means that anyone doing
> preempt_disable() in their tracepoint callback, and expecting RCU to listen to
> them is in for a big surprise. When RCU is not watching, it does not care about
> preempt-disable sections on CPUs as you can see in the forced-quiescent state loop.
> 
> Since SRCU is not providing any benefit because of 865e63b04e9b2 anyway, let us
> revert SRCU tracepoint code to maintain the sanity of potential
> tracepoint callback registerers.

For whatever it is worth, SRCU is the exception to the "RCU needs to
be watching" rule.  You can have SRCU readers on idle CPUs, offline
CPUs, CPUs executing in userspace, whatever.

							Thanx, Paul

> Joel Fernandes (Google) (3):
> Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make
> it unique"
> Revert "tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle
> tracepoints"
> Revert "tracepoint: Make rcuidle tracepoint callers use SRCU"
> 
> include/linux/tracepoint.h | 40 ++++++--------------------------------
> kernel/tracepoint.c        | 10 +---------
> 2 files changed, 7 insertions(+), 43 deletions(-)
> 
> --
> 2.25.0.341.g760bfbb309-goog
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-07 21:24 ` [RFC 0/3] Revert SRCU from tracepoint infrastructure Paul E. McKenney
@ 2020-02-07 21:43   ` Joel Fernandes
  2020-02-08 16:39     ` Mathieu Desnoyers
  0 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2020-02-07 21:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Greg Kroah-Hartman, Gustavo A. R. Silva,
	Ingo Molnar, Richard Fontana, Steven Rostedt, Thomas Gleixner,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan

On Fri, Feb 07, 2020 at 01:24:50PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 07, 2020 at 03:56:53PM -0500, Joel Fernandes (Google) wrote:
> > Hi,
> > These patches remove SRCU usage from tracepoints. The reason for proposing the
> > reverts is because the whole point of SRCU was to avoid having to call
> > rcu_irq_enter_irqson(). However this was added back in 865e63b04e9b2 ("tracing:
> > Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints") because perf
> > was breaking..
> > 
> > Further it occurs to me that, by using SRCU for tracepoints, we forgot that RCU
> > is not really watching the tracepoint callbacks. This means that anyone doing
> > preempt_disable() in their tracepoint callback, and expecting RCU to listen to
> > them is in for a big surprise. When RCU is not watching, it does not care about
> > preempt-disable sections on CPUs as you can see in the forced-quiescent state loop.
> > 
> > Since SRCU is not providing any benefit because of 865e63b04e9b2 anyway, let us
> > revert SRCU tracepoint code to maintain the sanity of potential
> > tracepoint callback registerers.
> 
> For whatever it is worth, SRCU is the exception to the "RCU needs to
> be watching" rule.  You can have SRCU readers on idle CPUs, offline
> CPUs, CPUs executing in userspace, whatever.

Yes sure. My concern was that callbacks are still using regular RCU somewhere
and RCU isn't watching. I believe BPF is using RCU that way (not sure). But
could be other out-of-tree kernel modules etc.

thanks,

 - Joel

> 
> 							Thanx, Paul
> 
> > Joel Fernandes (Google) (3):
> > Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make
> > it unique"
> > Revert "tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle
> > tracepoints"
> > Revert "tracepoint: Make rcuidle tracepoint callers use SRCU"
> > 
> > include/linux/tracepoint.h | 40 ++++++--------------------------------
> > kernel/tracepoint.c        | 10 +---------
> > 2 files changed, 7 insertions(+), 43 deletions(-)
> > 
> > --
> > 2.25.0.341.g760bfbb309-goog
> > 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-07 20:56 [RFC 0/3] Revert SRCU from tracepoint infrastructure Joel Fernandes (Google)
                   ` (3 preceding siblings ...)
  2020-02-07 21:24 ` [RFC 0/3] Revert SRCU from tracepoint infrastructure Paul E. McKenney
@ 2020-02-08 16:31 ` Mathieu Desnoyers
  2020-02-10  9:46   ` Peter Zijlstra
  2020-02-10 16:59   ` Joel Fernandes
  4 siblings, 2 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2020-02-08 16:31 UTC (permalink / raw)
  To: Joel Fernandes, Google
  Cc: linux-kernel, Greg Kroah-Hartman, Gustavo A. R. Silva,
	Ingo Molnar, Richard Fontana, rostedt, Thomas Gleixner, paulmck,
	Josh Triplett, Lai Jiangshan, Peter Zijlstra,
	Arnaldo Carvalho de Melo

----- On Feb 7, 2020, at 3:56 PM, Joel Fernandes, Google joel@joelfernandes.org wrote:

> Hi,
> These patches remove SRCU usage from tracepoints. The reason for proposing the
> reverts is because the whole point of SRCU was to avoid having to call
> rcu_irq_enter_irqson(). However this was added back in 865e63b04e9b2 ("tracing:
> Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints") because perf
> was breaking..

I think the original patch re-enabling the rcu_irq_enter/exit_irqson() is a
tracepoint band-aid over what should actually been fixed within perf instead.

Perf should not do rcu_read_lock/unlock()/synchronize_rcu(), but rather use
tracepoint_synchronize_unregister() to match the read-side provided by
tracepoints.

If perf can then just rely on the underlying synchronization provided by each
instrumentation providers (tracepoint, kprobe, ...) and not explicitly add its own
unneeded synchronization on top (e.g. rcu_read_lock/unlock), then it should simplify
all this.

> 
> Further it occurs to me that, by using SRCU for tracepoints, we forgot that RCU
> is not really watching the tracepoint callbacks. This means that anyone doing
> preempt_disable() in their tracepoint callback, and expecting RCU to listen to
> them is in for a big surprise. When RCU is not watching, it does not care about
> preempt-disable sections on CPUs as you can see in the forced-quiescent state
> loop.

As Paul noted, SRCU is the exception to the "RCU watching".

> 
> Since SRCU is not providing any benefit because of 865e63b04e9b2 anyway, let us
> revert SRCU tracepoint code to maintain the sanity of potential
> tracepoint callback registerers.

Introducing SRCU was done to simplify handling of rcuidle, thus removing some
significant overhead that has been noticed due to use of rcu_irq_enter/exit_irqson().

There is another longer-term goal in adding SRCU-synchronized tracepoints: adding
the ability to create tracepoint probes which will be allowed to handle page
faults properly. This is very relevant for the syscall tracepoints reading the
system call parameters from user-space. Currently, we are stuck with sub par
hacks such as filling the missing data with zeroes. Usually nobody notices because
most syscall arguments are typically hot in the page cache, but it is still fragile.

I'd very much prefer see commits moving syscall tracepoints to use of SRCU
(without preempt disable around the tracepoint calls) rather than a commit removing
tracepoint SRCU use because of something that needs to be fixed within perf.

Thanks,

Mathieu


> 
> Joel Fernandes (Google) (3):
> Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make
> it unique"
> Revert "tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle
> tracepoints"
> Revert "tracepoint: Make rcuidle tracepoint callers use SRCU"
> 
> include/linux/tracepoint.h | 40 ++++++--------------------------------
> kernel/tracepoint.c        | 10 +---------
> 2 files changed, 7 insertions(+), 43 deletions(-)
> 
> --
> 2.25.0.341.g760bfbb309-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-07 21:43   ` Joel Fernandes
@ 2020-02-08 16:39     ` Mathieu Desnoyers
  0 siblings, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2020-02-08 16:39 UTC (permalink / raw)
  To: Joel Fernandes, Google
  Cc: paulmck, linux-kernel, Greg Kroah-Hartman, Gustavo A. R. Silva,
	Ingo Molnar, Richard Fontana, rostedt, Thomas Gleixner,
	Josh Triplett, Lai Jiangshan

----- On Feb 7, 2020, at 4:43 PM, Joel Fernandes, Google joel@joelfernandes.org wrote:

> On Fri, Feb 07, 2020 at 01:24:50PM -0800, Paul E. McKenney wrote:
>> On Fri, Feb 07, 2020 at 03:56:53PM -0500, Joel Fernandes (Google) wrote:
>> > Hi,
>> > These patches remove SRCU usage from tracepoints. The reason for proposing the
>> > reverts is because the whole point of SRCU was to avoid having to call
>> > rcu_irq_enter_irqson(). However this was added back in 865e63b04e9b2 ("tracing:
>> > Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints") because perf
>> > was breaking..
>> > 
>> > Further it occurs to me that, by using SRCU for tracepoints, we forgot that RCU
>> > is not really watching the tracepoint callbacks. This means that anyone doing
>> > preempt_disable() in their tracepoint callback, and expecting RCU to listen to
>> > them is in for a big surprise. When RCU is not watching, it does not care about
>> > preempt-disable sections on CPUs as you can see in the forced-quiescent state
>> > loop.
>> > 
>> > Since SRCU is not providing any benefit because of 865e63b04e9b2 anyway, let us
>> > revert SRCU tracepoint code to maintain the sanity of potential
>> > tracepoint callback registerers.
>> 
>> For whatever it is worth, SRCU is the exception to the "RCU needs to
>> be watching" rule.  You can have SRCU readers on idle CPUs, offline
>> CPUs, CPUs executing in userspace, whatever.
> 
> Yes sure. My concern was that callbacks are still using regular RCU somewhere
> and RCU isn't watching. I believe BPF is using RCU that way (not sure). But
> could be other out-of-tree kernel modules etc.

Tracepoint users should issue "tracepoint_synchronize_unregister()" rather than
expect tracepoints to rely on RCU.

I cannot find a good reason for perf to issue a redundant rcu_read_lock/unlock from
a tracepoint probe already providing RCU or SRCU synchronization. Same goes for BPF.
If they _really_ need to do it, then they could implement their own idle probes
which do an additional rcu_irq_enter/exit_irqson(), but this work-around does not
belong in tracepoint.h.

If out-of-tree modules fail to use the API properly, the burden of getting fixed
is on their shoulders, as it has always been.

Thanks,

Mathieu

> 
> thanks,
> 
> - Joel
> 
>> 
>> 							Thanx, Paul
>> 
>> > Joel Fernandes (Google) (3):
>> > Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make
>> > it unique"
>> > Revert "tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle
>> > tracepoints"
>> > Revert "tracepoint: Make rcuidle tracepoint callers use SRCU"
>> > 
>> > include/linux/tracepoint.h | 40 ++++++--------------------------------
>> > kernel/tracepoint.c        | 10 +---------
>> > 2 files changed, 7 insertions(+), 43 deletions(-)
>> > 
>> > --
>> > 2.25.0.341.g760bfbb309-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-08 16:31 ` Mathieu Desnoyers
@ 2020-02-10  9:46   ` Peter Zijlstra
  2020-02-10 10:19     ` Peter Zijlstra
                       ` (2 more replies)
  2020-02-10 16:59   ` Joel Fernandes
  1 sibling, 3 replies; 24+ messages in thread
From: Peter Zijlstra @ 2020-02-10  9:46 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Joel Fernandes, Google, linux-kernel, Greg Kroah-Hartman,
	Gustavo A. R. Silva, Ingo Molnar, Richard Fontana, rostedt,
	Thomas Gleixner, paulmck, Josh Triplett, Lai Jiangshan,
	Arnaldo Carvalho de Melo

On Sat, Feb 08, 2020 at 11:31:25AM -0500, Mathieu Desnoyers wrote:
> ----- On Feb 7, 2020, at 3:56 PM, Joel Fernandes, Google joel@joelfernandes.org wrote:
> 
> > Hi,
> > These patches remove SRCU usage from tracepoints. The reason for proposing the
> > reverts is because the whole point of SRCU was to avoid having to call
> > rcu_irq_enter_irqson(). However this was added back in 865e63b04e9b2 ("tracing:
> > Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints") because perf
> > was breaking..
> 
> I think the original patch re-enabling the rcu_irq_enter/exit_irqson() is a
> tracepoint band-aid over what should actually been fixed within perf instead.
> 
> Perf should not do rcu_read_lock/unlock()/synchronize_rcu(), but rather use
> tracepoint_synchronize_unregister() to match the read-side provided by
> tracepoints.
> 
> If perf can then just rely on the underlying synchronization provided by each
> instrumentation providers (tracepoint, kprobe, ...) and not explicitly add its own
> unneeded synchronization on top (e.g. rcu_read_lock/unlock), then it should simplify
> all this.

It can't. At this point it doesn't know where the event came from. Also,
the whole perf stuff is per definition non-preemptible, as it needs to
run from NMI context.

Furthermore, using srcu would be detrimental, because of how it has
smp_mb() in the read side primitives.

The best we can do is move that rcu_irq_enter/exit_*() crud into the
perf tracepoint glue I suppose.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10  9:46   ` Peter Zijlstra
@ 2020-02-10 10:19     ` Peter Zijlstra
  2020-02-10 13:36     ` Paul E. McKenney
  2020-02-10 17:05     ` Steven Rostedt
  2 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2020-02-10 10:19 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Joel Fernandes, Google, linux-kernel, Greg Kroah-Hartman,
	Gustavo A. R. Silva, Ingo Molnar, Richard Fontana, rostedt,
	Thomas Gleixner, paulmck, Josh Triplett, Lai Jiangshan,
	Arnaldo Carvalho de Melo

On Mon, Feb 10, 2020 at 10:46:16AM +0100, Peter Zijlstra wrote:
> On Sat, Feb 08, 2020 at 11:31:25AM -0500, Mathieu Desnoyers wrote:
> > ----- On Feb 7, 2020, at 3:56 PM, Joel Fernandes, Google joel@joelfernandes.org wrote:
> > 
> > > Hi,
> > > These patches remove SRCU usage from tracepoints. The reason for proposing the
> > > reverts is because the whole point of SRCU was to avoid having to call
> > > rcu_irq_enter_irqson(). However this was added back in 865e63b04e9b2 ("tracing:
> > > Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints") because perf
> > > was breaking..
> > 
> > I think the original patch re-enabling the rcu_irq_enter/exit_irqson() is a
> > tracepoint band-aid over what should actually been fixed within perf instead.
> > 
> > Perf should not do rcu_read_lock/unlock()/synchronize_rcu(), but rather use
> > tracepoint_synchronize_unregister() to match the read-side provided by
> > tracepoints.
> > 
> > If perf can then just rely on the underlying synchronization provided by each
> > instrumentation providers (tracepoint, kprobe, ...) and not explicitly add its own
> > unneeded synchronization on top (e.g. rcu_read_lock/unlock), then it should simplify
> > all this.
> 
> It can't. At this point it doesn't know where the event came from. Also,
> the whole perf stuff is per definition non-preemptible, as it needs to
> run from NMI context.
> 
> Furthermore, using srcu would be detrimental, because of how it has
> smp_mb() in the read side primitives.
> 
> The best we can do is move that rcu_irq_enter/exit_*() crud into the
> perf tracepoint glue I suppose.

I can't even tell how to do that; the knowledge of this is long gone by
the time we get there. That is, the @rcuidle state is lost in
__DO_TRACE(), it is not passed further down the chain.

Just to clarify; perf doesn't care about the tracepoint synchronization
beyond the glue code. Perf uses RCU itself for it's own purposes.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10  9:46   ` Peter Zijlstra
  2020-02-10 10:19     ` Peter Zijlstra
@ 2020-02-10 13:36     ` Paul E. McKenney
  2020-02-10 13:44       ` Peter Zijlstra
  2020-02-10 17:17       ` Joel Fernandes
  2020-02-10 17:05     ` Steven Rostedt
  2 siblings, 2 replies; 24+ messages in thread
From: Paul E. McKenney @ 2020-02-10 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, Joel Fernandes, Google, linux-kernel,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, rostedt, Thomas Gleixner, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

On Mon, Feb 10, 2020 at 10:46:16AM +0100, Peter Zijlstra wrote:
> On Sat, Feb 08, 2020 at 11:31:25AM -0500, Mathieu Desnoyers wrote:
> > ----- On Feb 7, 2020, at 3:56 PM, Joel Fernandes, Google joel@joelfernandes.org wrote:
> > 
> > > Hi,
> > > These patches remove SRCU usage from tracepoints. The reason for proposing the
> > > reverts is because the whole point of SRCU was to avoid having to call
> > > rcu_irq_enter_irqson(). However this was added back in 865e63b04e9b2 ("tracing:
> > > Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints") because perf
> > > was breaking..
> > 
> > I think the original patch re-enabling the rcu_irq_enter/exit_irqson() is a
> > tracepoint band-aid over what should actually been fixed within perf instead.
> > 
> > Perf should not do rcu_read_lock/unlock()/synchronize_rcu(), but rather use
> > tracepoint_synchronize_unregister() to match the read-side provided by
> > tracepoints.
> > 
> > If perf can then just rely on the underlying synchronization provided by each
> > instrumentation providers (tracepoint, kprobe, ...) and not explicitly add its own
> > unneeded synchronization on top (e.g. rcu_read_lock/unlock), then it should simplify
> > all this.
> 
> It can't. At this point it doesn't know where the event came from. Also,
> the whole perf stuff is per definition non-preemptible, as it needs to
> run from NMI context.
> 
> Furthermore, using srcu would be detrimental, because of how it has
> smp_mb() in the read side primitives.

Note that rcu_irq_enter() and rcu_irq_exit() also contain value-returning
atomics, which imply full memory barriers.

> The best we can do is move that rcu_irq_enter/exit_*() crud into the
> perf tracepoint glue I suppose.

One approach would be to define a synchronize_preempt_disable() that
waits only for pre-existing disabled-preemption regions (including
of course diabled-irq and NMI-handler regions.  Something like Steve
Rostedt's workqueue-baed schedule_on_each_cpu(ftrace_sync) implementation
might work.

There are of course some plusses and minuses:

+	Works on preempt-disable regions in idle-loop code without
	the need to invoke rcu_idle_exit() and rcu_idle_enter()..

+	Straightforward implementation.

-	Does not work on preempt-disable regions on offline CPUs.
	(I have no idea if this really matters.)

-	Schedules on idle CPUs, so usage needs to be restricted to
	avoid messing up energy-efficient systems.  (It should be
	just fine to use this for tracing.)

Thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10 13:36     ` Paul E. McKenney
@ 2020-02-10 13:44       ` Peter Zijlstra
  2020-02-10 13:57         ` Paul E. McKenney
  2020-02-10 17:17       ` Joel Fernandes
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2020-02-10 13:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Mathieu Desnoyers, Joel Fernandes, Google, linux-kernel,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, rostedt, Thomas Gleixner, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

On Mon, Feb 10, 2020 at 05:36:52AM -0800, Paul E. McKenney wrote:

> > Furthermore, using srcu would be detrimental, because of how it has
> > smp_mb() in the read side primitives.
> 
> Note that rcu_irq_enter() and rcu_irq_exit() also contain value-returning
> atomics, which imply full memory barriers.

There is a whole lot of perf that doesn't go through tracepoints. It
makes absolutely no sense to make all that more expensive just because
tracepoints are getting 'funny'.

> > The best we can do is move that rcu_irq_enter/exit_*() crud into the
> > perf tracepoint glue I suppose.
> 
> One approach would be to define a synchronize_preempt_disable() that
> waits only for pre-existing disabled-preemption regions (including
> of course diabled-irq and NMI-handler regions.  Something like Steve
> Rostedt's workqueue-baed schedule_on_each_cpu(ftrace_sync) implementation
> might work.
> 
> There are of course some plusses and minuses:
> 
> +	Works on preempt-disable regions in idle-loop code without
> 	the need to invoke rcu_idle_exit() and rcu_idle_enter()..
> 
> +	Straightforward implementation.
> 
> -	Does not work on preempt-disable regions on offline CPUs.
> 	(I have no idea if this really matters.)

I'd hope not ;-)

> -	Schedules on idle CPUs, so usage needs to be restricted to
> 	avoid messing up energy-efficient systems.  (It should be
> 	just fine to use this for tracing.)

Unless you're tracing energy usage -- weird some people actually do that
:-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10 13:44       ` Peter Zijlstra
@ 2020-02-10 13:57         ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2020-02-10 13:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, Joel Fernandes, Google, linux-kernel,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, rostedt, Thomas Gleixner, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

On Mon, Feb 10, 2020 at 02:44:32PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 10, 2020 at 05:36:52AM -0800, Paul E. McKenney wrote:
> 
> > > Furthermore, using srcu would be detrimental, because of how it has
> > > smp_mb() in the read side primitives.
> > 
> > Note that rcu_irq_enter() and rcu_irq_exit() also contain value-returning
> > atomics, which imply full memory barriers.
> 
> There is a whole lot of perf that doesn't go through tracepoints. It
> makes absolutely no sense to make all that more expensive just because
> tracepoints are getting 'funny'.

OK.

> > > The best we can do is move that rcu_irq_enter/exit_*() crud into the
> > > perf tracepoint glue I suppose.
> > 
> > One approach would be to define a synchronize_preempt_disable() that
> > waits only for pre-existing disabled-preemption regions (including
> > of course diabled-irq and NMI-handler regions.  Something like Steve
> > Rostedt's workqueue-baed schedule_on_each_cpu(ftrace_sync) implementation
> > might work.
> > 
> > There are of course some plusses and minuses:
> > 
> > +	Works on preempt-disable regions in idle-loop code without
> > 	the need to invoke rcu_idle_exit() and rcu_idle_enter()..
> > 
> > +	Straightforward implementation.
> > 
> > -	Does not work on preempt-disable regions on offline CPUs.
> > 	(I have no idea if this really matters.)
> 
> I'd hope not ;-)

Me too, but I have harbored a great many hopes over the decades.  ;-)

> > -	Schedules on idle CPUs, so usage needs to be restricted to
> > 	avoid messing up energy-efficient systems.  (It should be
> > 	just fine to use this for tracing.)
> 
> Unless you're tracing energy usage -- weird some people actually do that
> :-)

Would they be changing tracing of their energy usage often in production?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-08 16:31 ` Mathieu Desnoyers
  2020-02-10  9:46   ` Peter Zijlstra
@ 2020-02-10 16:59   ` Joel Fernandes
  1 sibling, 0 replies; 24+ messages in thread
From: Joel Fernandes @ 2020-02-10 16:59 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Greg Kroah-Hartman, Gustavo A. R. Silva,
	Ingo Molnar, Richard Fontana, rostedt, Thomas Gleixner, paulmck,
	Josh Triplett, Lai Jiangshan, Peter Zijlstra,
	Arnaldo Carvalho de Melo

Hi Mathieu,
Nice to hear from you. I replied below:

On Sat, Feb 08, 2020 at 11:31:25AM -0500, Mathieu Desnoyers wrote:
> ----- On Feb 7, 2020, at 3:56 PM, Joel Fernandes, Google joel@joelfernandes.org wrote:
> 
> > Hi,
> > These patches remove SRCU usage from tracepoints. The reason for proposing the
> > reverts is because the whole point of SRCU was to avoid having to call
> > rcu_irq_enter_irqson(). However this was added back in 865e63b04e9b2 ("tracing:
> > Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints") because perf
> > was breaking..
> 
> I think the original patch re-enabling the rcu_irq_enter/exit_irqson() is a
> tracepoint band-aid over what should actually been fixed within perf instead.
> 
> Perf should not do rcu_read_lock/unlock()/synchronize_rcu(), but rather use
> tracepoint_synchronize_unregister() to match the read-side provided by
> tracepoints.

It feels like here you are kind of forcing tracepoint callbacks on what to
do. Why should we limit what tracepoint callbacks want to do? Further if the
callback indirectly calls some other kernel API that does use rcu_read_lock(), then it
is trouble again. I would rather make callbacks more robust, than having us
force down unwritten / undocumented rules onto them. BPF in their callbacks
also use rcu_read_lock from what I remember (but I'll have to double check).

> If perf can then just rely on the underlying synchronization provided by each
> instrumentation providers (tracepoint, kprobe, ...) and not explicitly add its own
> unneeded synchronization on top (e.g. rcu_read_lock/unlock), then it should simplify
> all this.

I kind of got lost here. The SRCU synchronization in current code is for
tracepoint_srcu which is for the tracepoint function table. Perf can't rely
on _that_ "underlying" synchronization. That is for a completely different
SRCU domain.

I think what you're proposing is:
1. Perf use its own SRCU domain and synchronize on that.
2. We remove rcu_irq_enter_irqson() for *rcuidle cases and just use SRCU in
all callbacks.

Is that right?

I think Peter said he does now want / like a separate SRCU domain within Perf
so that sounds like settled ;-)

Further what if a tracepoint callback calls into some code that does
preempt_disable() and exepects that to be an RCU read-side section? That will
get hosed too since RCU is not watching.

I would say RCU _has_ to watch callback code to be fair to them. Anything
else is a cop out IMO.

> > Since SRCU is not providing any benefit because of 865e63b04e9b2 anyway, let us
> > revert SRCU tracepoint code to maintain the sanity of potential
> > tracepoint callback registerers.
> 
> Introducing SRCU was done to simplify handling of rcuidle, thus removing some
> significant overhead that has been noticed due to use of rcu_irq_enter/exit_irqson().

But rcu_irq_enter() was added right back thus nulling that benefit.

> There is another longer-term goal in adding SRCU-synchronized tracepoints: adding
> the ability to create tracepoint probes which will be allowed to handle page
> faults properly. This is very relevant for the syscall tracepoints reading the
> system call parameters from user-space. Currently, we are stuck with sub par
> hacks such as filling the missing data with zeroes. Usually nobody notices because
> most syscall arguments are typically hot in the page cache, but it is still fragile.
> 
> I'd very much prefer see commits moving syscall tracepoints to use of SRCU
> (without preempt disable around the tracepoint calls) rather than a commit removing
> tracepoint SRCU use because of something that needs to be fixed within perf.

But such SRCU implementation / usage has to be done within the callback
itself (for syscalls in this case), that has nothing to do with removing SRCU
for tracepoint_srcu (the table). Perhaps for the syscall case, we can add a
new trace_ API specifically for SRCU that does the rcu_irq_enters_on() call
but does not do preempt_disable_notrace() before calling callbacks, thus
allowing the callback to handle page faults? And such new trace_ API can call
srcu_read_{,un}lock() on an SRCU domain specific to the tracepoint,
before/after the callback invocation.

thanks,

 - Joel


> Thanks,
> 
> Mathieu
> 
> 
> > 
> > Joel Fernandes (Google) (3):
> > Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make
> > it unique"
> > Revert "tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle
> > tracepoints"
> > Revert "tracepoint: Make rcuidle tracepoint callers use SRCU"
> > 
> > include/linux/tracepoint.h | 40 ++++++--------------------------------
> > kernel/tracepoint.c        | 10 +---------
> > 2 files changed, 7 insertions(+), 43 deletions(-)
> > 
> > --
> > 2.25.0.341.g760bfbb309-goog
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10  9:46   ` Peter Zijlstra
  2020-02-10 10:19     ` Peter Zijlstra
  2020-02-10 13:36     ` Paul E. McKenney
@ 2020-02-10 17:05     ` Steven Rostedt
  2020-02-10 17:33       ` Mathieu Desnoyers
  2020-02-10 18:07       ` Paul E. McKenney
  2 siblings, 2 replies; 24+ messages in thread
From: Steven Rostedt @ 2020-02-10 17:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, Joel Fernandes, Google, linux-kernel,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Thomas Gleixner, paulmck, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

On Mon, 10 Feb 2020 10:46:16 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Furthermore, using srcu would be detrimental, because of how it has
> smp_mb() in the read side primitives.

I didn't realize that there was a full memory barrier in the srcu read
side. Seems to me that itself is rational for reverting it. And also a
big NAK for any suggestion to have any of the function tracing to use
it as well (which comes up here and there).

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10 13:36     ` Paul E. McKenney
  2020-02-10 13:44       ` Peter Zijlstra
@ 2020-02-10 17:17       ` Joel Fernandes
  1 sibling, 0 replies; 24+ messages in thread
From: Joel Fernandes @ 2020-02-10 17:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Mathieu Desnoyers, linux-kernel,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, rostedt, Thomas Gleixner, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

On Mon, Feb 10, 2020 at 05:36:52AM -0800, Paul E. McKenney wrote:
[snip]
> > The best we can do is move that rcu_irq_enter/exit_*() crud into the
> > perf tracepoint glue I suppose.
> 
> One approach would be to define a synchronize_preempt_disable() that
> waits only for pre-existing disabled-preemption regions (including
> of course diabled-irq and NMI-handler regions.  Something like Steve
> Rostedt's workqueue-baed schedule_on_each_cpu(ftrace_sync) implementation
> might work.
> 
> There are of course some plusses and minuses:

Thanks for enlisting them.

> +	Works on preempt-disable regions in idle-loop code without
> 	the need to invoke rcu_idle_exit() and rcu_idle_enter()..
> 
> +	Straightforward implementation.
> 
> -	Does not work on preempt-disable regions on offline CPUs.
> 	(I have no idea if this really matters.)

One more area where it would not work is, if in the future we made it
possible to sleep within tracepoint callbacks (something like what Mathieu
said about handling page faults properly in callback code), then such an
implementation would not work there.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10 17:05     ` Steven Rostedt
@ 2020-02-10 17:33       ` Mathieu Desnoyers
  2020-02-10 18:30         ` Steven Rostedt
  2020-02-10 18:07       ` Paul E. McKenney
  1 sibling, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2020-02-10 17:33 UTC (permalink / raw)
  To: rostedt
  Cc: Peter Zijlstra, Joel Fernandes, Google, linux-kernel,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Thomas Gleixner, paulmck, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

----- On Feb 10, 2020, at 12:05 PM, rostedt rostedt@goodmis.org wrote:

> On Mon, 10 Feb 2020 10:46:16 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> Furthermore, using srcu would be detrimental, because of how it has
>> smp_mb() in the read side primitives.
> 
> I didn't realize that there was a full memory barrier in the srcu read
> side. Seems to me that itself is rational for reverting it. And also a
> big NAK for any suggestion to have any of the function tracing to use
> it as well (which comes up here and there).

The rcu_irq_enter/exit_irqson() does atomic_add_return(), which is even worse
than a memory barrier.

Let me summarize my understanding of a few use-cases we have with tracepoints
and other instrumentation mechanisms and the guarantees they provide (or not):

* Tracepoints
  - Uses sched-rcu (typically)
  - Uses SRCU for _cpuidle callsites
  - Planned use of SRCU to allow syscall entry/exit instrumentation to
    take page faults. (currently all tracers paper over that issue by filling
    with zeroes rather than handle the fault)
  - Grace period waits for both sched-rcu and SRCU.

* kprobes/kretprobes
  - interrupts off around probe invocation

* Hardware performance counters
  - Probe invoked from NMI context 

- Software performance counters
  - preempt off around probe invocation

Moving _rcuidle instrumentation to SRCU aimed at removing a significant
overhead incurred by having all _rcuidle tracepoints perform the atomic_add_return
on the shared variable (which is frequent enough to impact performance).

There are a couple of approaches that perf could take in order to tackle this
without hurting performance for all other tracers:

- If perf wishes to keep using explicit rcu_read_lock/unlock in its probes:

Use is_rcu_watching() within the perf probe, and only invoke rcu_irq_enter/exit_irqson()
when needed.

As an alternative, perf could implement a "trampoline" which would only be used
when registering a perf probe to a _rcuidle tracepoint. That trampoline would
perform rcu_irq_entrer/exit_irqson() around the call to the real perf probe.

- If perf can remove the redundant RCU read-side lock/unlock and replace this
  by waiting for the relevant RCU/SRCU grace periods instead:

Basically, looking at all the instrumentation sources perf uses, all of them
already provide some kind of RCU guarantee, which makes the explicit rcu read-side
locks within the perf probes redundant. Removing the redundant rcu read-side lock/unlock
from the perf probes should bring a slight performance improvement as well.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10 17:05     ` Steven Rostedt
  2020-02-10 17:33       ` Mathieu Desnoyers
@ 2020-02-10 18:07       ` Paul E. McKenney
  1 sibling, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2020-02-10 18:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Mathieu Desnoyers, Joel Fernandes, Google,
	linux-kernel, Greg Kroah-Hartman, Gustavo A. R. Silva,
	Ingo Molnar, Richard Fontana, Thomas Gleixner, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

On Mon, Feb 10, 2020 at 12:05:52PM -0500, Steven Rostedt wrote:
> On Mon, 10 Feb 2020 10:46:16 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Furthermore, using srcu would be detrimental, because of how it has
> > smp_mb() in the read side primitives.
> 
> I didn't realize that there was a full memory barrier in the srcu read
> side. Seems to me that itself is rational for reverting it. And also a
> big NAK for any suggestion to have any of the function tracing to use
> it as well (which comes up here and there).

Yeah, that was added some years back when people were complaining about
three synchronize_sched()'s worth of latency for synchronize_srcu().

I did prepare a patch about a year ago that would allow an srcu_struct
to be set up to not need the read-side smp_mb() calls, but this means
longer-latency grace periods (though nowhere near as long as those of
synchronize_rcu_tasks()) and also that the "fast SRCU" readers cannot
be used from idle or offline CPUs.

But an easy change if it is useful.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10 17:33       ` Mathieu Desnoyers
@ 2020-02-10 18:30         ` Steven Rostedt
  2020-02-10 19:05           ` Mathieu Desnoyers
  2020-02-10 19:53           ` Joel Fernandes
  0 siblings, 2 replies; 24+ messages in thread
From: Steven Rostedt @ 2020-02-10 18:30 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Joel Fernandes, Google, linux-kernel,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Thomas Gleixner, paulmck, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

On Mon, 10 Feb 2020 12:33:04 -0500 (EST)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> The rcu_irq_enter/exit_irqson() does atomic_add_return(), which is even worse
> than a memory barrier.

As we discussed on IRC, would something like this work (not even
compiled tested).

-- Steve

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 1fb11daa5c53..a83fd076a312 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -179,10 +179,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 		 * For rcuidle callers, use srcu since sched-rcu	\
 		 * doesn't work from the idle path.			\
 		 */							\
-		if (rcuidle) {						\
+		if (rcuidle)						\
 			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
-			rcu_irq_enter_irqson();				\
-		}							\
 									\
 		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
 									\
@@ -194,10 +192,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 			} while ((++it_func_ptr)->func);		\
 		}							\
 									\
-		if (rcuidle) {						\
-			rcu_irq_exit_irqson();				\
+		if (rcuidle)						\
 			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
-		}							\
 									\
 		preempt_enable_notrace();				\
 	} while (0)
diff --git a/include/trace/perf.h b/include/trace/perf.h
index dbc6c74defc3..86d3b2eb00cd 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -39,17 +39,27 @@ perf_trace_##call(void *__data, proto)					\
 	u64 __count = 1;						\
 	struct task_struct *__task = NULL;				\
 	struct hlist_head *head;					\
+	bool rcu_watching;						\
 	int __entry_size;						\
 	int __data_size;						\
 	int rctx;							\
 									\
+	rcu_watching = rcu_is_watching();				\
+									\
+	/* Can not use RCU if rcu is not watching and in NMI */		\
+	if (!rcu_watching && in_nmi())					\
+		return;							\
+									\
 	__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
 									\
+	if (!rcu_watching)						\
+		rcu_irq_enter_irqson();					\
+									\
 	head = this_cpu_ptr(event_call->perf_events);			\
 	if (!bpf_prog_array_valid(event_call) &&			\
 	    __builtin_constant_p(!__task) && !__task &&			\
 	    hlist_empty(head))						\
-		return;							\
+		goto out;						\
 									\
 	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
 			     sizeof(u64));				\
@@ -57,7 +67,7 @@ perf_trace_##call(void *__data, proto)					\
 									\
 	entry = perf_trace_buf_alloc(__entry_size, &__regs, &rctx);	\
 	if (!entry)							\
-		return;							\
+		goto out;						\
 									\
 	perf_fetch_caller_regs(__regs);					\
 									\
@@ -68,6 +78,9 @@ perf_trace_##call(void *__data, proto)					\
 	perf_trace_run_bpf_submit(entry, __entry_size, rctx,		\
 				  event_call, __count, __regs,		\
 				  head, __task);			\
+out:									\
+	if (!rcu_watching)						\
+		rcu_irq_exit_irqson();					\
 }
 
 /*

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10 18:30         ` Steven Rostedt
@ 2020-02-10 19:05           ` Mathieu Desnoyers
  2020-02-10 19:53           ` Joel Fernandes
  1 sibling, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2020-02-10 19:05 UTC (permalink / raw)
  To: rostedt
  Cc: Peter Zijlstra, Joel Fernandes, Google, linux-kernel,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Thomas Gleixner, paulmck, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

----- On Feb 10, 2020, at 1:30 PM, rostedt rostedt@goodmis.org wrote:

> On Mon, 10 Feb 2020 12:33:04 -0500 (EST)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> The rcu_irq_enter/exit_irqson() does atomic_add_return(), which is even worse
>> than a memory barrier.
> 
> As we discussed on IRC, would something like this work (not even
> compiled tested).

Yes, it's very close to what I have prototyped locally. With one very minor
detail below:

> 
> -- Steve
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 1fb11daa5c53..a83fd076a312 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -179,10 +179,8 @@ static inline struct tracepoint
> *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> 		 * For rcuidle callers, use srcu since sched-rcu	\
> 		 * doesn't work from the idle path.			\
> 		 */							\
> -		if (rcuidle) {						\
> +		if (rcuidle)						\
> 			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
> -			rcu_irq_enter_irqson();				\
> -		}							\
> 									\
> 		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
> 									\
> @@ -194,10 +192,8 @@ static inline struct tracepoint
> *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> 			} while ((++it_func_ptr)->func);		\
> 		}							\
> 									\
> -		if (rcuidle) {						\
> -			rcu_irq_exit_irqson();				\
> +		if (rcuidle)						\
> 			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
> -		}							\
> 									\
> 		preempt_enable_notrace();				\
> 	} while (0)
> diff --git a/include/trace/perf.h b/include/trace/perf.h
> index dbc6c74defc3..86d3b2eb00cd 100644
> --- a/include/trace/perf.h
> +++ b/include/trace/perf.h
> @@ -39,17 +39,27 @@ perf_trace_##call(void *__data, proto)					\
> 	u64 __count = 1;						\
> 	struct task_struct *__task = NULL;				\
> 	struct hlist_head *head;					\
> +	bool rcu_watching;						\
> 	int __entry_size;						\
> 	int __data_size;						\
> 	int rctx;							\
> 									\
> +	rcu_watching = rcu_is_watching();				\
> +									\
> +	/* Can not use RCU if rcu is not watching and in NMI */		\
> +	if (!rcu_watching && in_nmi())					\
> +		return;							\
> +									\
> 	__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
> 									\
> +	if (!rcu_watching)						\
> +		rcu_irq_enter_irqson();					\

You might want to fold the line above into the first check like this,
considering that doing the rcu_irq_enter_irqson() earlier should not
matter, and I expect it to remove a branch from the probe:

rcu_watching = rcu_is_watching();

if (!rcu_watching) {
        if (in_nmi())
                return;
        rcu_irq_enter_irqson();
}

Thanks!

Mathieu

> +									\
> 	head = this_cpu_ptr(event_call->perf_events);			\
> 	if (!bpf_prog_array_valid(event_call) &&			\
> 	    __builtin_constant_p(!__task) && !__task &&			\
> 	    hlist_empty(head))						\
> -		return;							\
> +		goto out;						\
> 									\
> 	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
> 			     sizeof(u64));				\
> @@ -57,7 +67,7 @@ perf_trace_##call(void *__data, proto)					\
> 									\
> 	entry = perf_trace_buf_alloc(__entry_size, &__regs, &rctx);	\
> 	if (!entry)							\
> -		return;							\
> +		goto out;						\
> 									\
> 	perf_fetch_caller_regs(__regs);					\
> 									\
> @@ -68,6 +78,9 @@ perf_trace_##call(void *__data, proto)					\
> 	perf_trace_run_bpf_submit(entry, __entry_size, rctx,		\
> 				  event_call, __count, __regs,		\
> 				  head, __task);			\
> +out:									\
> +	if (!rcu_watching)						\
> +		rcu_irq_exit_irqson();					\
> }
> 
>  /*

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10 18:30         ` Steven Rostedt
  2020-02-10 19:05           ` Mathieu Desnoyers
@ 2020-02-10 19:53           ` Joel Fernandes
  2020-02-10 20:03             ` Steven Rostedt
  1 sibling, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2020-02-10 19:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Peter Zijlstra, linux-kernel,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Thomas Gleixner, paulmck, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

On Mon, Feb 10, 2020 at 01:30:45PM -0500, Steven Rostedt wrote:
> On Mon, 10 Feb 2020 12:33:04 -0500 (EST)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> > The rcu_irq_enter/exit_irqson() does atomic_add_return(), which is even worse
> > than a memory barrier.
> 
> As we discussed on IRC, would something like this work (not even
> compiled tested).
> 
> -- Steve
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 1fb11daa5c53..a83fd076a312 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -179,10 +179,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
>  		 * For rcuidle callers, use srcu since sched-rcu	\
>  		 * doesn't work from the idle path.			\
>  		 */							\
> -		if (rcuidle) {						\
> +		if (rcuidle)						\
>  			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
> -			rcu_irq_enter_irqson();				\
> -		}							\

This would still break out-of-tree modules or future code that does
rcu_read_lock() right in a tracepoint callback right?

Or are we saying that rcu_read_lock() in a tracepoint callback is not
allowed? I believe this should then at least be documented somewhere.  Also,
what about code in tracepoint callback that calls rcu_read_lock() indirectly
through a path in the kernel, and also code that may expect RCU readers when
doing preempt_disable()?

So basically we are saying with this patch:
1. Don't call in a callback: rcu_read_lock() or preempt_disable() and expect RCU to do
anything for you.
2. Don't call code that does anything that 1. needs.

Is that intended? thanks,

 - Joel


>  									\
>  		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
>  									\
> @@ -194,10 +192,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
>  			} while ((++it_func_ptr)->func);		\
>  		}							\
>  									\
> -		if (rcuidle) {						\
> -			rcu_irq_exit_irqson();				\
> +		if (rcuidle)						\
>  			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
> -		}							\
>  									\
>  		preempt_enable_notrace();				\
>  	} while (0)
> diff --git a/include/trace/perf.h b/include/trace/perf.h
> index dbc6c74defc3..86d3b2eb00cd 100644
> --- a/include/trace/perf.h
> +++ b/include/trace/perf.h
> @@ -39,17 +39,27 @@ perf_trace_##call(void *__data, proto)					\
>  	u64 __count = 1;						\
>  	struct task_struct *__task = NULL;				\
>  	struct hlist_head *head;					\
> +	bool rcu_watching;						\
>  	int __entry_size;						\
>  	int __data_size;						\
>  	int rctx;							\
>  									\
> +	rcu_watching = rcu_is_watching();				\
> +									\
> +	/* Can not use RCU if rcu is not watching and in NMI */		\
> +	if (!rcu_watching && in_nmi())					\
> +		return;							\
> +									\
>  	__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
>  									\
> +	if (!rcu_watching)						\
> +		rcu_irq_enter_irqson();					\
> +									\
>  	head = this_cpu_ptr(event_call->perf_events);			\
>  	if (!bpf_prog_array_valid(event_call) &&			\
>  	    __builtin_constant_p(!__task) && !__task &&			\
>  	    hlist_empty(head))						\
> -		return;							\
> +		goto out;						\
>  									\
>  	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
>  			     sizeof(u64));				\
> @@ -57,7 +67,7 @@ perf_trace_##call(void *__data, proto)					\
>  									\
>  	entry = perf_trace_buf_alloc(__entry_size, &__regs, &rctx);	\
>  	if (!entry)							\
> -		return;							\
> +		goto out;						\
>  									\
>  	perf_fetch_caller_regs(__regs);					\
>  									\
> @@ -68,6 +78,9 @@ perf_trace_##call(void *__data, proto)					\
>  	perf_trace_run_bpf_submit(entry, __entry_size, rctx,		\
>  				  event_call, __count, __regs,		\
>  				  head, __task);			\
> +out:									\
> +	if (!rcu_watching)						\
> +		rcu_irq_exit_irqson();					\
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10 19:53           ` Joel Fernandes
@ 2020-02-10 20:03             ` Steven Rostedt
  2020-02-10 20:30               ` Joel Fernandes
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2020-02-10 20:03 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Mathieu Desnoyers, Peter Zijlstra, linux-kernel,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Thomas Gleixner, paulmck, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

On Mon, 10 Feb 2020 14:53:02 -0500
Joel Fernandes <joel@joelfernandes.org> wrote:

> > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> > index 1fb11daa5c53..a83fd076a312 100644
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -179,10 +179,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> >  		 * For rcuidle callers, use srcu since sched-rcu	\
> >  		 * doesn't work from the idle path.			\
> >  		 */							\
> > -		if (rcuidle) {						\
> > +		if (rcuidle)						\
> >  			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
> > -			rcu_irq_enter_irqson();				\
> > -		}							\  
> 
> This would still break out-of-tree modules or future code that does
> rcu_read_lock() right in a tracepoint callback right?

Yes, and that's fine.

> 
> Or are we saying that rcu_read_lock() in a tracepoint callback is not
> allowed? I believe this should then at least be documented somewhere.  Also,

No, it's only not allowed if you you attached to a tracepoint that can
be called without rcu watching. That's up to the caller to figure it
out. Tracepoints were never meant to be a generic thing people should
use without knowing what they are really doing.

> what about code in tracepoint callback that calls rcu_read_lock() indirectly
> through a path in the kernel, and also code that may expect RCU readers when
> doing preempt_disable()?

Then they need to know what they are doing.

> 
> So basically we are saying with this patch:
> 1. Don't call in a callback: rcu_read_lock() or preempt_disable() and expect RCU to do
> anything for you.

We can just say, "If you plan on using RCU, be aware that it man not be
watching and you get do deal with the fallout. Use rcu_is_watching() to
figure it out."

> 2. Don't call code that does anything that 1. needs.
> 
> Is that intended? thanks,
> 

No, look what the patch did for perf. Why make *all* callbacks suffer
if only some use RCU? If you use RCU from a callback, then you need to
figure it out. The same goes for attaching to the function tracer.

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure
  2020-02-10 20:03             ` Steven Rostedt
@ 2020-02-10 20:30               ` Joel Fernandes
  0 siblings, 0 replies; 24+ messages in thread
From: Joel Fernandes @ 2020-02-10 20:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Peter Zijlstra, linux-kernel,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Ingo Molnar,
	Richard Fontana, Thomas Gleixner, paulmck, Josh Triplett,
	Lai Jiangshan, Arnaldo Carvalho de Melo

Hi Steve,

On Mon, Feb 10, 2020 at 03:03:48PM -0500, Steven Rostedt wrote:
> On Mon, 10 Feb 2020 14:53:02 -0500
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> > > index 1fb11daa5c53..a83fd076a312 100644
> > > --- a/include/linux/tracepoint.h
> > > +++ b/include/linux/tracepoint.h
> > > @@ -179,10 +179,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> > >  		 * For rcuidle callers, use srcu since sched-rcu	\
> > >  		 * doesn't work from the idle path.			\
> > >  		 */							\
> > > -		if (rcuidle) {						\
> > > +		if (rcuidle)						\
> > >  			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
> > > -			rcu_irq_enter_irqson();				\
> > > -		}							\  
> > 
> > This would still break out-of-tree modules or future code that does
> > rcu_read_lock() right in a tracepoint callback right?
> 
> Yes, and that's fine.
> 
> > 
> > Or are we saying that rcu_read_lock() in a tracepoint callback is not
> > allowed? I believe this should then at least be documented somewhere.  Also,
> 
> No, it's only not allowed if you you attached to a tracepoint that can
> be called without rcu watching. That's up to the caller to figure it
> out. Tracepoints were never meant to be a generic thing people should
> use without knowing what they are really doing.

Ok, right.

> > what about code in tracepoint callback that calls rcu_read_lock() indirectly
> > through a path in the kernel, and also code that may expect RCU readers when
> > doing preempt_disable()?
> 
> Then they need to know what they are doing.

Ok.

> > So basically we are saying with this patch:
> > 1. Don't call in a callback: rcu_read_lock() or preempt_disable() and expect RCU to do
> > anything for you.
> 
> We can just say, "If you plan on using RCU, be aware that it man not be
> watching and you get do deal with the fallout. Use rcu_is_watching() to
> figure it out."

Ok.

> > 2. Don't call code that does anything that 1. needs.
> > 
> > Is that intended? thanks,
> > 
> 
> No, look what the patch did for perf. Why make *all* callbacks suffer
> if only some use RCU? If you use RCU from a callback, then you need to
> figure it out. The same goes for attaching to the function tracer.

Only the callbacks on the rcuidle ones would suffer though, not all
callbacks.

Yes I saw the patch, it looks like a good idea to me and I am Ok with it.

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-02-10 20:30 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07 20:56 [RFC 0/3] Revert SRCU from tracepoint infrastructure Joel Fernandes (Google)
2020-02-07 20:56 ` [RFC 1/3] Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique" Joel Fernandes (Google)
2020-02-07 21:07   ` Steven Rostedt
2020-02-07 20:56 ` [RFC 2/3] Revert "tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints" Joel Fernandes (Google)
2020-02-07 20:56 ` [RFC 3/3] Revert "tracepoint: Make rcuidle tracepoint callers use SRCU" Joel Fernandes (Google)
2020-02-07 21:24 ` [RFC 0/3] Revert SRCU from tracepoint infrastructure Paul E. McKenney
2020-02-07 21:43   ` Joel Fernandes
2020-02-08 16:39     ` Mathieu Desnoyers
2020-02-08 16:31 ` Mathieu Desnoyers
2020-02-10  9:46   ` Peter Zijlstra
2020-02-10 10:19     ` Peter Zijlstra
2020-02-10 13:36     ` Paul E. McKenney
2020-02-10 13:44       ` Peter Zijlstra
2020-02-10 13:57         ` Paul E. McKenney
2020-02-10 17:17       ` Joel Fernandes
2020-02-10 17:05     ` Steven Rostedt
2020-02-10 17:33       ` Mathieu Desnoyers
2020-02-10 18:30         ` Steven Rostedt
2020-02-10 19:05           ` Mathieu Desnoyers
2020-02-10 19:53           ` Joel Fernandes
2020-02-10 20:03             ` Steven Rostedt
2020-02-10 20:30               ` Joel Fernandes
2020-02-10 18:07       ` Paul E. McKenney
2020-02-10 16:59   ` Joel Fernandes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.