linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 0/3] tracing: Centralize preemptirq tracepoints and unify their usage
@ 2018-07-13 21:55 Joel Fernandes
  2018-07-13 21:55 ` [PATCH v10 1/3] lockdep: use this_cpu_ptr instead of get_cpu_var stats Joel Fernandes
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Joel Fernandes @ 2018-07-13 21:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, Masami Hiramatsu, Mathieu Desnoyers,
	Namhyung Kim, Paul McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Glexiner, Todd Kjos, Tom Zanussi, Will Deacon

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

This series contains the last 2 patches of the previous series, with
minor changes suggested by Peter and Steven, and an additional patch for
get_lock_stats cleanup suggested by Peter.

The preempt/irq tracepoints exist but not everything in the kernel is using it
whenever they need to be notified that a preempt disable/enable or an irq
disable/enable has occurred.  This makes things not work simultaneously (for
example, only either lockdep or irqsoff trace-events can be used at a time).

This is particularly painful to deal with, since turning on lockdep breaks
tracers that install probes on IRQ events, such as the BCC atomic critical
section tracer [1]. This constraint also makes it not possible to use synthetic
events to trace irqsoff sections with lockdep simulataneously turned on.

This series solves that, and also results in a nice clean up of relevant parts
of the kernel. Several ifdefs are simpler, and the design is more unified and
better. Also as a result of this, we also speeded performance all rcuidle
tracepoints since their handling is simpler.

[1] https://github.com/iovisor/bcc/pull/1801/

v9->v10:
- Dropped first 3 and last 2 patches that were applied previously
- Folded SPDK license into the main patch introducing trace_preemptirq.c (Steve)
- Dropped lockdep_recursing & use simplify get_cpu_var instead (PeterZ)
- Simplify __DO_TRACE and use rcu_dereference_raw for both RCU and SRCU (PeterZ)

v8->v9:
- Small style changes to tracepoint code (Mathieu)
- Minor style fix to use PTR_ERR_OR_ZERO (0-day bot)
- Minor fix to test_atomic_sections to use unsigned long.
- Added Namhyung's, Mathieu's Reviewed-by to some patches.

v7->v8:
- Refactored irqsoff tracer probe defines (Namhyung)

v6->v7:
- Added a module to simulate an atomic section, a kselftest to load and
  and trigger it which verifies the preempt-tracer and this series.

- Fixed a new warning after I rebased in early boot, this is because
early_boot_irqs_disabled was set too early, I moved it after the lockdep
initialization.

- added back the softirq fix since it appears it wasn't picked up.

- Ran Ingo's locking API selftest suite which are passing with this
  series.

- Mathieu suggested ifdef'ing the tracepoint_synchronize_unregister
  function incase tracepoints aren't enabled, did that.

Joel Fernandes (Google) (3):
  lockdep: use this_cpu_ptr instead of get_cpu_var stats
  tracepoint: Make rcuidle tracepoint callers use SRCU
  tracing: Centralize preemptirq tracepoints and unify their usage

 include/linux/ftrace.h            |  11 +-
 include/linux/irqflags.h          |  11 +-
 include/linux/lockdep.h           |   8 +-
 include/linux/preempt.h           |   2 +-
 include/linux/tracepoint.h        |  45 ++++--
 include/trace/events/preemptirq.h |  23 +--
 init/main.c                       |   5 +-
 kernel/locking/lockdep.c          |  45 ++----
 kernel/sched/core.c               |   2 +-
 kernel/trace/Kconfig              |  22 ++-
 kernel/trace/Makefile             |   2 +-
 kernel/trace/trace_irqsoff.c      | 231 ++++++++----------------------
 kernel/trace/trace_preemptirq.c   |  72 ++++++++++
 kernel/tracepoint.c               |  16 ++-
 14 files changed, 248 insertions(+), 247 deletions(-)
 create mode 100644 kernel/trace/trace_preemptirq.c

-- 
2.18.0.203.gfac676dfb9-goog

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v10 1/3] lockdep: use this_cpu_ptr instead of get_cpu_var stats
  2018-07-13 21:55 [PATCH v10 0/3] tracing: Centralize preemptirq tracepoints and unify their usage Joel Fernandes
@ 2018-07-13 21:55 ` Joel Fernandes
  2018-07-13 21:55 ` [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU Joel Fernandes
  2018-07-13 21:55 ` [PATCH v10 3/3] tracing: Centralize preemptirq tracepoints and unify their usage Joel Fernandes
  2 siblings, 0 replies; 12+ messages in thread
From: Joel Fernandes @ 2018-07-13 21:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, Masami Hiramatsu, Mathieu Desnoyers,
	Namhyung Kim, Paul McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Glexiner, Todd Kjos, Tom Zanussi, Will Deacon

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

get_cpu_var disables preemption which has the potential to call into the
preemption disable trace points causing some complications. There's also
no need to disable preemption in uses of get_lock_stats anyway since
preempt is already disabled. So lets simplify the code.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/locking/lockdep.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 5fa4d3138bf1..8f5ce0048d15 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -248,12 +248,7 @@ void clear_lock_stats(struct lock_class *class)
 
 static struct lock_class_stats *get_lock_stats(struct lock_class *class)
 {
-	return &get_cpu_var(cpu_lock_stats)[class - lock_classes];
-}
-
-static void put_lock_stats(struct lock_class_stats *stats)
-{
-	put_cpu_var(cpu_lock_stats);
+	return &this_cpu_ptr(&cpu_lock_stats)[class - lock_classes];
 }
 
 static void lock_release_holdtime(struct held_lock *hlock)
@@ -271,7 +266,6 @@ static void lock_release_holdtime(struct held_lock *hlock)
 		lock_time_inc(&stats->read_holdtime, holdtime);
 	else
 		lock_time_inc(&stats->write_holdtime, holdtime);
-	put_lock_stats(stats);
 }
 #else
 static inline void lock_release_holdtime(struct held_lock *hlock)
@@ -4090,7 +4084,6 @@ __lock_contended(struct lockdep_map *lock, unsigned long ip)
 		stats->contending_point[contending_point]++;
 	if (lock->cpu != smp_processor_id())
 		stats->bounces[bounce_contended + !!hlock->read]++;
-	put_lock_stats(stats);
 }
 
 static void
@@ -4138,7 +4131,6 @@ __lock_acquired(struct lockdep_map *lock, unsigned long ip)
 	}
 	if (lock->cpu != cpu)
 		stats->bounces[bounce_acquired + !!hlock->read]++;
-	put_lock_stats(stats);
 
 	lock->cpu = cpu;
 	lock->ip = ip;
-- 
2.18.0.203.gfac676dfb9-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-07-13 21:55 [PATCH v10 0/3] tracing: Centralize preemptirq tracepoints and unify their usage Joel Fernandes
  2018-07-13 21:55 ` [PATCH v10 1/3] lockdep: use this_cpu_ptr instead of get_cpu_var stats Joel Fernandes
@ 2018-07-13 21:55 ` Joel Fernandes
  2018-07-14 14:31   ` Mathieu Desnoyers
  2018-07-17 18:25   ` Peter Zijlstra
  2018-07-13 21:55 ` [PATCH v10 3/3] tracing: Centralize preemptirq tracepoints and unify their usage Joel Fernandes
  2 siblings, 2 replies; 12+ messages in thread
From: Joel Fernandes @ 2018-07-13 21:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, Masami Hiramatsu, Mathieu Desnoyers,
	Namhyung Kim, Paul McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Glexiner, Todd Kjos, Tom Zanussi, Will Deacon

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

Test: Tested idle and preempt/irq tracepoints.

Here are some performance numbers:

With a run of the following 30 times on a single core x86 Qemu instance
with 1GB memory:
hackbench -g 4 -f 2 -l 3000

Completion times in seconds. CONFIG_PROVE_LOCKING=y.

No patches (without this series)
Mean: 3.048
Median: 3.025
Std Dev: 0.064

With Lockdep using irq tracepoints with RCU implementation:
Mean: 3.451   (-11.66 %)
Median: 3.447 (-12.22%)
Std Dev: 0.049

With Lockdep using irq tracepoints with SRCU implementation (this series):
Mean: 3.020   (I would consider the improvement against the "without
	       this series" case as just noise).
Median: 3.013
Std Dev: 0.033

[1] https://patchwork.kernel.org/patch/10344297/

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cleaned-up-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/tracepoint.h | 45 +++++++++++++++++++++++++++++++-------
 kernel/tracepoint.c        | 16 +++++++++++++-
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 19a690b559ca..97e1d365a817 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/srcu.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -33,6 +34,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -75,10 +78,16 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
  * probe unregistration and the end of module exit to make sure there is no
  * caller executing a probe when it is freed.
  */
+#ifdef CONFIG_TRACEPOINTS
 static inline void tracepoint_synchronize_unregister(void)
 {
+	synchronize_srcu(&tracepoint_srcu);
 	synchronize_sched();
 }
+#else
+static inline void tracepoint_synchronize_unregister(void)
+{ }
+#endif
 
 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
 extern int syscall_regfunc(void);
@@ -129,18 +138,34 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
+#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
+		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
-		if (rcucheck)						\
-			rcu_irq_enter_irqson();				\
-		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+									\
+		/* srcu can't be used from NMI */			\
+		if (rcuidle && in_nmi())				\
+			WARN_ON_ONCE(1);				\
+									\
+		/* keep srcu and sched-rcu usage consistent */		\
+		preempt_disable_notrace();				\
+									\
+		/*							\
+		 * For rcuidle callers, use srcu since sched-rcu	\
+		 * doesn't work from the idle path.			\
+		 */							\
+		if (rcuidle)						\
+			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
+		else							\
+			rcu_read_lock_sched_notrace();			\
+									\
+		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
+									\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -148,9 +173,13 @@ extern void syscall_unregfunc(void);
 				((void(*)(proto))(it_func))(args);	\
 			} while ((++it_func_ptr)->func);		\
 		}							\
-		rcu_read_unlock_sched_notrace();			\
-		if (rcucheck)						\
-			rcu_irq_exit_irqson();				\
+									\
+		if (rcuidle)						\
+			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
+		else							\
+			rcu_read_unlock_sched_notrace();		\
+									\
+		preempt_enable_notrace();				\
 	} while (0)
 
 #ifndef MODULE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 6dc6356c3327..955148d91b74 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -31,6 +31,9 @@
 extern struct tracepoint * const __start___tracepoints_ptrs[];
 extern struct tracepoint * const __stop___tracepoints_ptrs[];
 
+DEFINE_SRCU(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
 /* Set to 1 to enable tracepoint debug output */
 static const int tracepoint_debug;
 
@@ -67,16 +70,27 @@ static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
 {
 	kfree(container_of(head, struct tp_probes, rcu));
 }
 
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
 static inline void release_probes(struct tracepoint_func *old)
 {
 	if (old) {
 		struct tp_probes *tp_probes = container_of(old,
 			struct tp_probes, probes[0]);
+		/*
+		 * Tracepoint probes are protected by both sched RCU and SRCU,
+		 * by calling the SRCU callback in the sched RCU callback we
+		 * cover both cases. So let us chain the SRCU and sched RCU
+		 * callbacks to wait for both grace periods.
+		 */
 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
 	}
 }
-- 
2.18.0.203.gfac676dfb9-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v10 3/3] tracing: Centralize preemptirq tracepoints and unify their usage
  2018-07-13 21:55 [PATCH v10 0/3] tracing: Centralize preemptirq tracepoints and unify their usage Joel Fernandes
  2018-07-13 21:55 ` [PATCH v10 1/3] lockdep: use this_cpu_ptr instead of get_cpu_var stats Joel Fernandes
  2018-07-13 21:55 ` [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU Joel Fernandes
@ 2018-07-13 21:55 ` Joel Fernandes
  2 siblings, 0 replies; 12+ messages in thread
From: Joel Fernandes @ 2018-07-13 21:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, Masami Hiramatsu, Mathieu Desnoyers,
	Namhyung Kim, Paul McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Glexiner, Todd Kjos, Tom Zanussi, Will Deacon

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.

Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.

* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
  3 users of the events exist:
  - Lockdep
  - irqsoff and preemptoff tracers
  - irqs and preempt trace events

The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.

In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:

 PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
       |                 |     \         |           |
       \    (selects)    /      \        \ (selects) /
      TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
                      \                  /
                       \ (depends on)   /
                     PREEMPTIRQ_TRACEPOINTS

One note, I have to check for lockdep recursion in the code that calls
the trace events API and bail out if we're in lockdep recursion
protection to prevent something like the following case: a spin_lock is
taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
and then sets lockdep_recursion, and then calls __lockdep_acquired. In
this function, a call to get_lock_stats happens which calls
preempt_disable, which calls trace IRQS off somewhere which enters my
tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
This flag is then never cleared causing lockdep paths to never be
entered and thus causing splats and other bad things.

Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.

I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.

This series + lockdep probes not registered (just to inject errors):
[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

With this series + lockdep probes registered, all locking tests pass:

[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

trace: Add SPDX-License-Identifier to trace_preemptirq.c

This patch adds a SPDX-License-Identifier to trace_preemptirq.c.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/ftrace.h            |  11 +-
 include/linux/irqflags.h          |  11 +-
 include/linux/lockdep.h           |   8 +-
 include/linux/preempt.h           |   2 +-
 include/trace/events/preemptirq.h |  23 +--
 init/main.c                       |   5 +-
 kernel/locking/lockdep.c          |  35 ++---
 kernel/sched/core.c               |   2 +-
 kernel/trace/Kconfig              |  22 ++-
 kernel/trace/Makefile             |   2 +-
 kernel/trace/trace_irqsoff.c      | 231 ++++++++----------------------
 kernel/trace/trace_preemptirq.c   |  72 ++++++++++
 12 files changed, 195 insertions(+), 229 deletions(-)
 create mode 100644 kernel/trace/trace_preemptirq.c

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 8154f4920fcb..f32e3c81407e 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void)
 	return CALLER_ADDR2;
 }
 
-#ifdef CONFIG_IRQSOFF_TRACER
-  extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
-  extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
-#else
-  static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
-  static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
-#endif
-
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
   extern void trace_preempt_on(unsigned long a0, unsigned long a1);
   extern void trace_preempt_off(unsigned long a0, unsigned long a1);
 #else
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 9700f00bbc04..50edb9cbbd26 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -15,9 +15,16 @@
 #include <linux/typecheck.h>
 #include <asm/irqflags.h>
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+/* Currently trace_softirqs_on/off is used only by lockdep */
+#ifdef CONFIG_PROVE_LOCKING
   extern void trace_softirqs_on(unsigned long ip);
   extern void trace_softirqs_off(unsigned long ip);
+#else
+# define trace_softirqs_on(ip)	do { } while (0)
+# define trace_softirqs_off(ip)	do { } while (0)
+#endif
+
+#ifdef CONFIG_TRACE_IRQFLAGS
   extern void trace_hardirqs_on(void);
   extern void trace_hardirqs_off(void);
 # define trace_hardirq_context(p)	((p)->hardirq_context)
@@ -43,8 +50,6 @@ do {						\
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
-# define trace_softirqs_on(ip)		do { } while (0)
-# define trace_softirqs_off(ip)		do { } while (0)
 # define trace_hardirq_context(p)	0
 # define trace_softirq_context(p)	0
 # define trace_hardirqs_enabled(p)	0
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..a8113357ceeb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -266,7 +266,8 @@ struct held_lock {
 /*
  * Initialization, self-test and debugging-output methods:
  */
-extern void lockdep_info(void);
+extern void lockdep_init(void);
+extern void lockdep_init_early(void);
 extern void lockdep_reset(void);
 extern void lockdep_reset_lock(struct lockdep_map *lock);
 extern void lockdep_free_key_range(void *start, unsigned long size);
@@ -406,7 +407,8 @@ static inline void lockdep_on(void)
 # define lock_downgrade(l, i)			do { } while (0)
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
-# define lockdep_info()				do { } while (0)
+# define lockdep_init()				do { } while (0)
+# define lockdep_init_early()			do { } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
@@ -532,7 +534,7 @@ do {								\
 
 #endif /* CONFIG_LOCKDEP */
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_PROVE_LOCKING
 extern void print_irqtrace_events(struct task_struct *curr);
 #else
 static inline void print_irqtrace_events(struct task_struct *curr)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 5bd3f151da78..c01813c3fbe9 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -150,7 +150,7 @@
  */
 #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
 
-#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
 extern void preempt_count_add(int val);
 extern void preempt_count_sub(int val);
 #define preempt_count_dec_and_test() \
diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
index 9c4eb33c5a1d..9a0d4ceeb166 100644
--- a/include/trace/events/preemptirq.h
+++ b/include/trace/events/preemptirq.h
@@ -1,4 +1,4 @@
-#ifdef CONFIG_PREEMPTIRQ_EVENTS
+#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM preemptirq
@@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template,
 		  (void *)((unsigned long)(_stext) + __entry->parent_offs))
 );
 
-#ifndef CONFIG_PROVE_LOCKING
+#ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_EVENT(preemptirq_template, irq_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable,
 DEFINE_EVENT(preemptirq_template, irq_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_irq_enable(...)
+#define trace_irq_disable(...)
+#define trace_irq_enable_rcuidle(...)
+#define trace_irq_disable_rcuidle(...)
 #endif
 
-#ifdef CONFIG_DEBUG_PREEMPT
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
 DEFINE_EVENT(preemptirq_template, preempt_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable,
 DEFINE_EVENT(preemptirq_template, preempt_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_preempt_enable(...)
+#define trace_preempt_disable(...)
+#define trace_preempt_enable_rcuidle(...)
+#define trace_preempt_disable_rcuidle(...)
 #endif
 
 #endif /* _TRACE_PREEMPTIRQ_H */
 
 #include <trace/define_trace.h>
 
-#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
+#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */
 #define trace_irq_enable(...)
 #define trace_irq_disable(...)
 #define trace_irq_enable_rcuidle(...)
 #define trace_irq_disable_rcuidle(...)
-#endif
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
 #define trace_preempt_enable(...)
 #define trace_preempt_disable(...)
 #define trace_preempt_enable_rcuidle(...)
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..44fe43be84c1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void)
 	profile_init();
 	call_function_init();
 	WARN(!irqs_disabled(), "Interrupts were enabled early\n");
+
+	lockdep_init_early();
+
 	early_boot_irqs_disabled = false;
 	local_irq_enable();
 
@@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void)
 		panic("Too many boot %s vars at `%s'", panic_later,
 		      panic_param);
 
-	lockdep_info();
+	lockdep_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 8f5ce0048d15..a6a6b7eb4b82 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -55,6 +55,7 @@
 
 #include "lockdep_internals.h"
 
+#include <trace/events/preemptirq.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/lock.h>
 
@@ -2839,10 +2840,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip)
 	debug_atomic_inc(hardirqs_on_events);
 }
 
-__visible void trace_hardirqs_on_caller(unsigned long ip)
+static void lockdep_hardirqs_on(void *none, unsigned long ignore,
+				unsigned long ip)
 {
-	time_hardirqs_on(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2881,23 +2881,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip)
 	__trace_hardirqs_on_caller(ip);
 	current->lockdep_recursion = 0;
 }
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-void trace_hardirqs_on(void)
-{
-	trace_hardirqs_on_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
 
 /*
  * Hardirqs were disabled:
  */
-__visible void trace_hardirqs_off_caller(unsigned long ip)
+static void lockdep_hardirqs_off(void *none, unsigned long ignore,
+				 unsigned long ip)
 {
 	struct task_struct *curr = current;
 
-	time_hardirqs_off(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2919,13 +2911,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip)
 	} else
 		debug_atomic_inc(redundant_hardirqs_off);
 }
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-void trace_hardirqs_off(void)
-{
-	trace_hardirqs_off_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
 
 /*
  * Softirqs will be enabled:
@@ -4330,7 +4315,15 @@ void lockdep_reset_lock(struct lockdep_map *lock)
 	raw_local_irq_restore(flags);
 }
 
-void __init lockdep_info(void)
+void __init lockdep_init_early(void)
+{
+#ifdef CONFIG_PROVE_LOCKING
+	register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
+	register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
+#endif
+}
+
+void __init lockdep_init(void)
 {
 	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78d8facba456..4c956f6849ec 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3192,7 +3192,7 @@ static inline void sched_tick_stop(int cpu) { }
 #endif
 
 #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
-				defined(CONFIG_PREEMPT_TRACER))
+				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
 /*
  * If the value passed in is equal to the current preempt count
  * then we just disabled preemption. Start timing the latency.
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e15fadbe5dfb..eb5ab6b511e2 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP
 	 Allow the use of ring_buffer_swap_cpu.
 	 Adds a very slight overhead to tracing when enabled.
 
+config PREEMPTIRQ_TRACEPOINTS
+	bool
+	depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
+	select TRACING
+	default y
+	help
+	  Create preempt/irq toggle tracepoints if needed, so that other parts
+	  of the kernel can use them to generate or add hooks to them.
+
 # All tracer options should select GENERIC_TRACER. For those options that are
 # enabled by all tracers (context switch and event tracer) they select TRACING.
 # This allows those options to appear when no other tracer is selected. But the
@@ -155,18 +164,20 @@ config FUNCTION_GRAPH_TRACER
 	  the return value. This is done by setting the current return
 	  address on the current task structure into a stack of calls.
 
+config TRACE_PREEMPT_TOGGLE
+	bool
+	help
+	  Enables hooks which will be called when preemption is first disabled,
+	  and last enabled.
 
 config PREEMPTIRQ_EVENTS
 	bool "Enable trace events for preempt and irq disable/enable"
 	select TRACE_IRQFLAGS
-	depends on DEBUG_PREEMPT || !PROVE_LOCKING
-	depends on TRACING
+	select TRACE_PREEMPT_TOGGLE if PREEMPT
+	select GENERIC_TRACER
 	default n
 	help
 	  Enable tracing of disable and enable events for preemption and irqs.
-	  For tracing preempt disable/enable events, DEBUG_PREEMPT must be
-	  enabled. For tracing irq disable/enable events, PROVE_LOCKING must
-	  be disabled.
 
 config IRQSOFF_TRACER
 	bool "Interrupts-off Latency Tracer"
@@ -203,6 +214,7 @@ config PREEMPT_TRACER
 	select RING_BUFFER_ALLOW_SWAP
 	select TRACER_SNAPSHOT
 	select TRACER_SNAPSHOT_PER_CPU_SWAP
+	select TRACE_PREEMPT_TOGGLE
 	help
 	  This option measures the time spent in preemption-off critical
 	  sections, with microsecond accuracy.
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 31c6b524c260..b916db076561 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -36,7 +36,7 @@ obj-$(CONFIG_TRACING_MAP) += tracing_map.o
 obj-$(CONFIG_PREEMPTIRQ_DELAY_TEST) += preemptirq_delay_test.o
 obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
-obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o
+obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
 obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index f8daa754cce2..770cd30cda40 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -16,7 +16,6 @@
 
 #include "trace.h"
 
-#define CREATE_TRACE_POINTS
 #include <trace/events/preemptirq.h>
 
 #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER)
@@ -450,66 +449,6 @@ void stop_critical_timings(void)
 }
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 
-#ifdef CONFIG_IRQSOFF_TRACER
-#ifdef CONFIG_PROVE_LOCKING
-void time_hardirqs_on(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-void time_hardirqs_off(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(a0, a1);
-}
-
-#else /* !CONFIG_PROVE_LOCKING */
-
-/*
- * We are only interested in hardirq on/off events:
- */
-static inline void tracer_hardirqs_on(void)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_off(void)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-#endif /* CONFIG_PROVE_LOCKING */
-#endif /*  CONFIG_IRQSOFF_TRACER */
-
-#ifdef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		start_critical_timing(a0, a1);
-}
-#endif /* CONFIG_PREEMPT_TRACER */
-
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
@@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr)
 }
 
 #ifdef CONFIG_IRQSOFF_TRACER
+/*
+ * We are only interested in hardirq on/off events:
+ */
+static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int irqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void irqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_irqsoff(trace) register_tracer(&trace)
-#else
-# define register_irqsoff(trace) do { } while (0)
-#endif
+#endif /*  CONFIG_IRQSOFF_TRACER */
 
 #ifdef CONFIG_PREEMPT_TRACER
+static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int preemptoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_PREEMPT_OFF;
 
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_preemptoff(trace) register_tracer(&trace)
-#else
-# define register_preemptoff(trace) do { } while (0)
-#endif
+#endif /* CONFIG_PREEMPT_TRACER */
 
-#if defined(CONFIG_IRQSOFF_TRACER) && \
-	defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
 
 static int preemptirqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptirqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-
-# define register_preemptirqsoff(trace) register_tracer(&trace)
-#else
-# define register_preemptirqsoff(trace) do { } while (0)
 #endif
 
 __init static int init_irqsoff_tracer(void)
 {
-	register_irqsoff(irqsoff_tracer);
-	register_preemptoff(preemptoff_tracer);
-	register_preemptirqsoff(preemptirqsoff_tracer);
-
-	return 0;
-}
-core_initcall(init_irqsoff_tracer);
-#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
-
-#ifndef CONFIG_IRQSOFF_TRACER
-static inline void tracer_hardirqs_on(void) { }
-static inline void tracer_hardirqs_off(void) { }
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { }
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { }
+#ifdef CONFIG_IRQSOFF_TRACER
+	register_tracer(&irqsoff_tracer);
 #endif
-
-#ifndef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { }
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { }
+#ifdef CONFIG_PREEMPT_TRACER
+	register_tracer(&preemptoff_tracer);
 #endif
-
-#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING)
-/* Per-cpu variable to prevent redundant calls when IRQs already off */
-static DEFINE_PER_CPU(int, tracing_irq_cpu);
-
-void trace_hardirqs_on(void)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_on();
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
-
-void trace_hardirqs_off(void)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_off();
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
-
-__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_on_caller(caller_addr);
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_off_caller(caller_addr);
-}
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-/*
- * Stubs:
- */
-
-void trace_softirqs_on(unsigned long ip)
-{
-}
-
-void trace_softirqs_off(unsigned long ip)
-{
-}
-
-inline void print_irqtrace_events(struct task_struct *curr)
-{
-}
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
+	register_tracer(&preemptirqsoff_tracer);
 #endif
 
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
-void trace_preempt_on(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_enable_rcuidle(a0, a1);
-	tracer_preempt_on(a0, a1);
-}
-
-void trace_preempt_off(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_disable_rcuidle(a0, a1);
-	tracer_preempt_off(a0, a1);
+	return 0;
 }
-#endif
+core_initcall(init_irqsoff_tracer);
+#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
new file mode 100644
index 000000000000..e76b78bf258e
--- /dev/null
+++ b/kernel/trace/trace_preemptirq.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * preemptoff and irqoff tracepoints
+ *
+ * Copyright (C) Joel Fernandes (Google) <joel@joelfernandes.org>
+ */
+
+#include <linux/kallsyms.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/ftrace.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/preemptirq.h>
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+/* Per-cpu variable to prevent redundant calls when IRQs already off */
+static DEFINE_PER_CPU(int, tracing_irq_cpu);
+
+void trace_hardirqs_on(void)
+{
+	if (!this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on);
+
+void trace_hardirqs_off(void)
+{
+	if (this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+}
+EXPORT_SYMBOL(trace_hardirqs_off);
+
+__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
+{
+	if (!this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on_caller);
+
+__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
+{
+	if (this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+}
+EXPORT_SYMBOL(trace_hardirqs_off_caller);
+#endif /* CONFIG_TRACE_IRQFLAGS */
+
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
+
+void trace_preempt_on(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_enable_rcuidle(a0, a1);
+}
+
+void trace_preempt_off(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_disable_rcuidle(a0, a1);
+}
+#endif
-- 
2.18.0.203.gfac676dfb9-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-07-13 21:55 ` [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU Joel Fernandes
@ 2018-07-14 14:31   ` Mathieu Desnoyers
  2018-07-17 17:11     ` Joel Fernandes
  2018-07-17 18:25   ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Mathieu Desnoyers @ 2018-07-14 14:31 UTC (permalink / raw)
  To: Joel Fernandes, Google
  Cc: linux-kernel, kernel-team, Boqun Feng, Byungchul Park,
	Erick Reyes, Ingo Molnar, Julia Cartwright, Masami Hiramatsu,
	Namhyung Kim, Paul E. McKenney, Peter Zijlstra, rostedt,
	Thomas Gleixner, Todd Kjos, Tom Zanussi, Will Deacon

----- On Jul 13, 2018, at 5:55 PM, Joel Fernandes, Google joel@joelfernandes.org wrote:

> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
> 
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.
> 
> Test: Tested idle and preempt/irq tracepoints.
> 
> Here are some performance numbers:
> 
> With a run of the following 30 times on a single core x86 Qemu instance
> with 1GB memory:
> hackbench -g 4 -f 2 -l 3000
> 
> Completion times in seconds. CONFIG_PROVE_LOCKING=y.
> 
> No patches (without this series)
> Mean: 3.048
> Median: 3.025
> Std Dev: 0.064
> 
> With Lockdep using irq tracepoints with RCU implementation:
> Mean: 3.451   (-11.66 %)
> Median: 3.447 (-12.22%)
> Std Dev: 0.049
> 
> With Lockdep using irq tracepoints with SRCU implementation (this series):
> Mean: 3.020   (I would consider the improvement against the "without
>	       this series" case as just noise).
> Median: 3.013
> Std Dev: 0.033
> 
> [1] https://patchwork.kernel.org/patch/10344297/
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

I'm fine with the changes done since last iteration.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Thanks!

Mathieu

> Cleaned-up-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
> include/linux/tracepoint.h | 45 +++++++++++++++++++++++++++++++-------
> kernel/tracepoint.c        | 16 +++++++++++++-
> 2 files changed, 52 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 19a690b559ca..97e1d365a817 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
>  */
> 
> #include <linux/smp.h>
> +#include <linux/srcu.h>
> #include <linux/errno.h>
> #include <linux/types.h>
> #include <linux/cpumask.h>
> @@ -33,6 +34,8 @@ struct trace_eval_map {
> 
> #define TRACEPOINT_DEFAULT_PRIO	10
> 
> +extern struct srcu_struct tracepoint_srcu;
> +
> extern int
> tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> extern int
> @@ -75,10 +78,16 @@ int unregister_tracepoint_module_notifier(struct
> notifier_block *nb)
>  * probe unregistration and the end of module exit to make sure there is no
>  * caller executing a probe when it is freed.
>  */
> +#ifdef CONFIG_TRACEPOINTS
> static inline void tracepoint_synchronize_unregister(void)
> {
> +	synchronize_srcu(&tracepoint_srcu);
> 	synchronize_sched();
> }
> +#else
> +static inline void tracepoint_synchronize_unregister(void)
> +{ }
> +#endif
> 
> #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
> extern int syscall_regfunc(void);
> @@ -129,18 +138,34 @@ extern void syscall_unregfunc(void);
>  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
>  */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
> +#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
> 	do {								\
> 		struct tracepoint_func *it_func_ptr;			\
> 		void *it_func;						\
> 		void *__data;						\
> +		int __maybe_unused idx = 0;				\
> 									\
> 		if (!(cond))						\
> 			return;						\
> -		if (rcucheck)						\
> -			rcu_irq_enter_irqson();				\
> -		rcu_read_lock_sched_notrace();				\
> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> +									\
> +		/* srcu can't be used from NMI */			\
> +		if (rcuidle && in_nmi())				\
> +			WARN_ON_ONCE(1);				\
> +									\
> +		/* keep srcu and sched-rcu usage consistent */		\
> +		preempt_disable_notrace();				\
> +									\
> +		/*							\
> +		 * For rcuidle callers, use srcu since sched-rcu	\
> +		 * doesn't work from the idle path.			\
> +		 */							\
> +		if (rcuidle)						\
> +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> +		else							\
> +			rcu_read_lock_sched_notrace();			\
> +									\
> +		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
> +									\
> 		if (it_func_ptr) {					\
> 			do {						\
> 				it_func = (it_func_ptr)->func;		\
> @@ -148,9 +173,13 @@ extern void syscall_unregfunc(void);
> 				((void(*)(proto))(it_func))(args);	\
> 			} while ((++it_func_ptr)->func);		\
> 		}							\
> -		rcu_read_unlock_sched_notrace();			\
> -		if (rcucheck)						\
> -			rcu_irq_exit_irqson();				\
> +									\
> +		if (rcuidle)						\
> +			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> +		else							\
> +			rcu_read_unlock_sched_notrace();		\
> +									\
> +		preempt_enable_notrace();				\
> 	} while (0)
> 
> #ifndef MODULE
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 6dc6356c3327..955148d91b74 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -31,6 +31,9 @@
> extern struct tracepoint * const __start___tracepoints_ptrs[];
> extern struct tracepoint * const __stop___tracepoints_ptrs[];
> 
> +DEFINE_SRCU(tracepoint_srcu);
> +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> +
> /* Set to 1 to enable tracepoint debug output */
> static const int tracepoint_debug;
> 
> @@ -67,16 +70,27 @@ static inline void *allocate_probes(int count)
> 	return p == NULL ? NULL : p->probes;
> }
> 
> -static void rcu_free_old_probes(struct rcu_head *head)
> +static void srcu_free_old_probes(struct rcu_head *head)
> {
> 	kfree(container_of(head, struct tp_probes, rcu));
> }
> 
> +static void rcu_free_old_probes(struct rcu_head *head)
> +{
> +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> +}
> +
> static inline void release_probes(struct tracepoint_func *old)
> {
> 	if (old) {
> 		struct tp_probes *tp_probes = container_of(old,
> 			struct tp_probes, probes[0]);
> +		/*
> +		 * Tracepoint probes are protected by both sched RCU and SRCU,
> +		 * by calling the SRCU callback in the sched RCU callback we
> +		 * cover both cases. So let us chain the SRCU and sched RCU
> +		 * callbacks to wait for both grace periods.
> +		 */
> 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
> 	}
> }
> --
> 2.18.0.203.gfac676dfb9-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-07-14 14:31   ` Mathieu Desnoyers
@ 2018-07-17 17:11     ` Joel Fernandes
  0 siblings, 0 replies; 12+ messages in thread
From: Joel Fernandes @ 2018-07-17 17:11 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, kernel-team, Boqun Feng, Byungchul Park,
	Erick Reyes, Ingo Molnar, Julia Cartwright, Masami Hiramatsu,
	Namhyung Kim, Paul E. McKenney, Peter Zijlstra, rostedt,
	Thomas Gleixner, Todd Kjos, Tom Zanussi, Will Deacon

On Sat, Jul 14, 2018 at 10:31:57AM -0400, Mathieu Desnoyers wrote:
> ----- On Jul 13, 2018, at 5:55 PM, Joel Fernandes, Google joel@joelfernandes.org wrote:
> 
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > 
> > In recent tests with IRQ on/off tracepoints, a large performance
> > overhead ~10% is noticed when running hackbench. This is root caused to
> > calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> > tracepoint code. Following a long discussion on the list [1] about this,
> > we concluded that srcu is a better alternative for use during rcu idle.
> > Although it does involve extra barriers, its lighter than the sched-rcu
> > version which has to do additional RCU calls to notify RCU idle about
> > entry into RCU sections.
> > 
> > In this patch, we change the underlying implementation of the
> > trace_*_rcuidle API to use SRCU. This has shown to improve performance
> > alot for the high frequency irq enable/disable tracepoints.
> > 
> > Test: Tested idle and preempt/irq tracepoints.
> > 
> > Here are some performance numbers:
> > 
> > With a run of the following 30 times on a single core x86 Qemu instance
> > with 1GB memory:
> > hackbench -g 4 -f 2 -l 3000
> > 
> > Completion times in seconds. CONFIG_PROVE_LOCKING=y.
> > 
> > No patches (without this series)
> > Mean: 3.048
> > Median: 3.025
> > Std Dev: 0.064
> > 
> > With Lockdep using irq tracepoints with RCU implementation:
> > Mean: 3.451   (-11.66 %)
> > Median: 3.447 (-12.22%)
> > Std Dev: 0.049
> > 
> > With Lockdep using irq tracepoints with SRCU implementation (this series):
> > Mean: 3.020   (I would consider the improvement against the "without
> >	       this series" case as just noise).
> > Median: 3.013
> > Std Dev: 0.033
> > 
> > [1] https://patchwork.kernel.org/patch/10344297/
> > 
> > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> 
> I'm fine with the changes done since last iteration.
> 
> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Thanks a lot Mathieu!

Steve, Peter, looks good to you too now?

thanks,

- Joel


[...] 
> > Cleaned-up-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> > include/linux/tracepoint.h | 45 +++++++++++++++++++++++++++++++-------
> > kernel/tracepoint.c        | 16 +++++++++++++-
> > 2 files changed, 52 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> > index 19a690b559ca..97e1d365a817 100644
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -15,6 +15,7 @@
> >  */
> > 
> > #include <linux/smp.h>
> > +#include <linux/srcu.h>
> > #include <linux/errno.h>
> > #include <linux/types.h>
> > #include <linux/cpumask.h>
> > @@ -33,6 +34,8 @@ struct trace_eval_map {
> > 
> > #define TRACEPOINT_DEFAULT_PRIO	10
> > 
> > +extern struct srcu_struct tracepoint_srcu;
> > +
> > extern int
> > tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> > extern int
> > @@ -75,10 +78,16 @@ int unregister_tracepoint_module_notifier(struct
> > notifier_block *nb)
> >  * probe unregistration and the end of module exit to make sure there is no
> >  * caller executing a probe when it is freed.
> >  */
> > +#ifdef CONFIG_TRACEPOINTS
> > static inline void tracepoint_synchronize_unregister(void)
> > {
> > +	synchronize_srcu(&tracepoint_srcu);
> > 	synchronize_sched();
> > }
> > +#else
> > +static inline void tracepoint_synchronize_unregister(void)
> > +{ }
> > +#endif
> > 
> > #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
> > extern int syscall_regfunc(void);
> > @@ -129,18 +138,34 @@ extern void syscall_unregfunc(void);
> >  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
> >  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
> >  */
> > -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
> > +#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
> > 	do {								\
> > 		struct tracepoint_func *it_func_ptr;			\
> > 		void *it_func;						\
> > 		void *__data;						\
> > +		int __maybe_unused idx = 0;				\
> > 									\
> > 		if (!(cond))						\
> > 			return;						\
> > -		if (rcucheck)						\
> > -			rcu_irq_enter_irqson();				\
> > -		rcu_read_lock_sched_notrace();				\
> > -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> > +									\
> > +		/* srcu can't be used from NMI */			\
> > +		if (rcuidle && in_nmi())				\
> > +			WARN_ON_ONCE(1);				\
> > +									\
> > +		/* keep srcu and sched-rcu usage consistent */		\
> > +		preempt_disable_notrace();				\
> > +									\
> > +		/*							\
> > +		 * For rcuidle callers, use srcu since sched-rcu	\
> > +		 * doesn't work from the idle path.			\
> > +		 */							\
> > +		if (rcuidle)						\
> > +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> > +		else							\
> > +			rcu_read_lock_sched_notrace();			\
> > +									\
> > +		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
> > +									\
> > 		if (it_func_ptr) {					\
> > 			do {						\
> > 				it_func = (it_func_ptr)->func;		\
> > @@ -148,9 +173,13 @@ extern void syscall_unregfunc(void);
> > 				((void(*)(proto))(it_func))(args);	\
> > 			} while ((++it_func_ptr)->func);		\
> > 		}							\
> > -		rcu_read_unlock_sched_notrace();			\
> > -		if (rcucheck)						\
> > -			rcu_irq_exit_irqson();				\
> > +									\
> > +		if (rcuidle)						\
> > +			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> > +		else							\
> > +			rcu_read_unlock_sched_notrace();		\
> > +									\
> > +		preempt_enable_notrace();				\
> > 	} while (0)
> > 
> > #ifndef MODULE
> > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> > index 6dc6356c3327..955148d91b74 100644
> > --- a/kernel/tracepoint.c
> > +++ b/kernel/tracepoint.c
> > @@ -31,6 +31,9 @@
> > extern struct tracepoint * const __start___tracepoints_ptrs[];
> > extern struct tracepoint * const __stop___tracepoints_ptrs[];
> > 
> > +DEFINE_SRCU(tracepoint_srcu);
> > +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> > +
> > /* Set to 1 to enable tracepoint debug output */
> > static const int tracepoint_debug;
> > 
> > @@ -67,16 +70,27 @@ static inline void *allocate_probes(int count)
> > 	return p == NULL ? NULL : p->probes;
> > }
> > 
> > -static void rcu_free_old_probes(struct rcu_head *head)
> > +static void srcu_free_old_probes(struct rcu_head *head)
> > {
> > 	kfree(container_of(head, struct tp_probes, rcu));
> > }
> > 
> > +static void rcu_free_old_probes(struct rcu_head *head)
> > +{
> > +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> > +}
> > +
> > static inline void release_probes(struct tracepoint_func *old)
> > {
> > 	if (old) {
> > 		struct tp_probes *tp_probes = container_of(old,
> > 			struct tp_probes, probes[0]);
> > +		/*
> > +		 * Tracepoint probes are protected by both sched RCU and SRCU,
> > +		 * by calling the SRCU callback in the sched RCU callback we
> > +		 * cover both cases. So let us chain the SRCU and sched RCU
> > +		 * callbacks to wait for both grace periods.
> > +		 */
> > 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
> > 	}
> > }
> > --
> > 2.18.0.203.gfac676dfb9-goog
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-07-13 21:55 ` [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU Joel Fernandes
  2018-07-14 14:31   ` Mathieu Desnoyers
@ 2018-07-17 18:25   ` Peter Zijlstra
  2018-07-17 20:15     ` Joel Fernandes
  2018-07-18  0:16     ` Joel Fernandes
  1 sibling, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2018-07-17 18:25 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, kernel-team, Boqun Feng, Byungchul Park,
	Erick Reyes, Ingo Molnar, Julia Cartwright, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Steven Rostedt,
	Thomas Glexiner, Todd Kjos, Tom Zanussi, Will Deacon

On Fri, Jul 13, 2018 at 02:55:46PM -0700, Joel Fernandes wrote:
> +		/* keep srcu and sched-rcu usage consistent */		\
> +		preempt_disable_notrace();				\
> +									\
> +		/*							\
> +		 * For rcuidle callers, use srcu since sched-rcu	\
> +		 * doesn't work from the idle path.			\
> +		 */							\
> +		if (rcuidle)						\
> +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> +		else							\
> +			rcu_read_lock_sched_notrace();			\

That else is completely superfluous. rcu_read_lock_sched_notrace() :=
prempt_disable_notrace().

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-07-17 18:25   ` Peter Zijlstra
@ 2018-07-17 20:15     ` Joel Fernandes
  2018-07-18  0:16     ` Joel Fernandes
  1 sibling, 0 replies; 12+ messages in thread
From: Joel Fernandes @ 2018-07-17 20:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, kernel-team, Boqun Feng, Byungchul Park,
	Erick Reyes, Ingo Molnar, Julia Cartwright, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Steven Rostedt,
	Thomas Glexiner, Todd Kjos, Tom Zanussi, Will Deacon

On Tue, Jul 17, 2018 at 08:25:28PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 13, 2018 at 02:55:46PM -0700, Joel Fernandes wrote:
> > +		/* keep srcu and sched-rcu usage consistent */		\
> > +		preempt_disable_notrace();				\
> > +									\
> > +		/*							\
> > +		 * For rcuidle callers, use srcu since sched-rcu	\
> > +		 * doesn't work from the idle path.			\
> > +		 */							\
> > +		if (rcuidle)						\
> > +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> > +		else							\
> > +			rcu_read_lock_sched_notrace();			\
> 
> That else is completely superfluous. rcu_read_lock_sched_notrace() :=
> prempt_disable_notrace().

Yes, sorry I missed this in your original idea. Will do it this way and
repost. Thanks Peter!

- Joel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-07-17 18:25   ` Peter Zijlstra
  2018-07-17 20:15     ` Joel Fernandes
@ 2018-07-18  0:16     ` Joel Fernandes
  2018-07-18  8:21       ` Peter Zijlstra
  2018-07-26 23:18       ` Steven Rostedt
  1 sibling, 2 replies; 12+ messages in thread
From: Joel Fernandes @ 2018-07-18  0:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, kernel-team, Boqun Feng, Byungchul Park,
	Erick Reyes, Ingo Molnar, Julia Cartwright, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Steven Rostedt,
	Thomas Glexiner, Todd Kjos, Tom Zanussi, Will Deacon

On Tue, Jul 17, 2018 at 08:25:28PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 13, 2018 at 02:55:46PM -0700, Joel Fernandes wrote:
> > +		/* keep srcu and sched-rcu usage consistent */		\
> > +		preempt_disable_notrace();				\
> > +									\
> > +		/*							\
> > +		 * For rcuidle callers, use srcu since sched-rcu	\
> > +		 * doesn't work from the idle path.			\
> > +		 */							\
> > +		if (rcuidle)						\
> > +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> > +		else							\
> > +			rcu_read_lock_sched_notrace();			\
> 
> That else is completely superfluous. rcu_read_lock_sched_notrace() :=
> prempt_disable_notrace().

I have a patch as below on top of this series. Thanks for the suggestion.

Steve, Peter,

Is it Ok with you to take the below patch ontop of this series? I avoided
resending the 3-patch series since this is the only change, but let me know
what you prefer or makes it easy for you.

------->8--------

From: Joel Fernandes <joelaf@google.com>
Subject: [PATCH] tracepoint: Remove extra unneeded rcu_read_lock_sched_notrace

A great idea was provided that rcu_read_lock_sched_notrace being the
equivalent of preempt_disable_notrace is unnecessary to call in
tracepoint code, since we already call preempt_disable_notrace in
advance. So lets remove the extra call.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 include/linux/tracepoint.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 97e1d365a817..6e7bc6ebfcd8 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -161,8 +161,6 @@ extern void syscall_unregfunc(void);
 		 */							\
 		if (rcuidle)						\
 			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
-		else							\
-			rcu_read_lock_sched_notrace();			\
 									\
 		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
 									\
@@ -176,8 +174,6 @@ extern void syscall_unregfunc(void);
 									\
 		if (rcuidle)						\
 			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
-		else							\
-			rcu_read_unlock_sched_notrace();		\
 									\
 		preempt_enable_notrace();				\
 	} while (0)
-- 
2.18.0.203.gfac676dfb9-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-07-18  0:16     ` Joel Fernandes
@ 2018-07-18  8:21       ` Peter Zijlstra
  2018-07-26 23:18       ` Steven Rostedt
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2018-07-18  8:21 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, kernel-team, Boqun Feng, Byungchul Park,
	Erick Reyes, Ingo Molnar, Julia Cartwright, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Steven Rostedt,
	Thomas Glexiner, Todd Kjos, Tom Zanussi, Will Deacon

On Tue, Jul 17, 2018 at 05:16:25PM -0700, Joel Fernandes wrote:

> Steve, Peter,
> 
> Is it Ok with you to take the below patch ontop of this series? I avoided
> resending the 3-patch series since this is the only change, but let me know
> what you prefer or makes it easy for you.

Yeah, whatever Steve wants really. But I think we'll have to wait for
that a bit, I got one of those humorous vacation notices from him
again.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-07-18  0:16     ` Joel Fernandes
  2018-07-18  8:21       ` Peter Zijlstra
@ 2018-07-26 23:18       ` Steven Rostedt
  2018-07-26 23:35         ` Joel Fernandes
  1 sibling, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2018-07-26 23:18 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Peter Zijlstra, linux-kernel, kernel-team, Boqun Feng,
	Byungchul Park, Erick Reyes, Ingo Molnar, Julia Cartwright,
	Masami Hiramatsu, Mathieu Desnoyers, Namhyung Kim, Paul McKenney,
	Thomas Glexiner, Todd Kjos, Tom Zanussi, Will Deacon

On Tue, 17 Jul 2018 17:16:25 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> Is it Ok with you to take the below patch ontop of this series? I avoided
> resending the 3-patch series since this is the only change, but let me know
> what you prefer or makes it easy for you.

Can you send a v11, even though this is the only change. My INBOX is
currently a mess, and I want to make sure I pull in the right patches.

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-07-26 23:18       ` Steven Rostedt
@ 2018-07-26 23:35         ` Joel Fernandes
  0 siblings, 0 replies; 12+ messages in thread
From: Joel Fernandes @ 2018-07-26 23:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, kernel-team, Boqun Feng,
	Byungchul Park, Erick Reyes, Ingo Molnar, Julia Cartwright,
	Masami Hiramatsu, Mathieu Desnoyers, Namhyung Kim, Paul McKenney,
	Thomas Glexiner, Todd Kjos, Tom Zanussi, Will Deacon

On Thu, Jul 26, 2018 at 07:18:00PM -0400, Steven Rostedt wrote:
> On Tue, 17 Jul 2018 17:16:25 -0700
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > Is it Ok with you to take the below patch ontop of this series? I avoided
> > resending the 3-patch series since this is the only change, but let me know
> > what you prefer or makes it easy for you.
> 
> Can you send a v11, even though this is the only change. My INBOX is
> currently a mess, and I want to make sure I pull in the right patches.

Sure, I'll squash the extra patch and send it out as v11 now. Thanks!

- Joel


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-07-26 23:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-13 21:55 [PATCH v10 0/3] tracing: Centralize preemptirq tracepoints and unify their usage Joel Fernandes
2018-07-13 21:55 ` [PATCH v10 1/3] lockdep: use this_cpu_ptr instead of get_cpu_var stats Joel Fernandes
2018-07-13 21:55 ` [PATCH v10 2/3] tracepoint: Make rcuidle tracepoint callers use SRCU Joel Fernandes
2018-07-14 14:31   ` Mathieu Desnoyers
2018-07-17 17:11     ` Joel Fernandes
2018-07-17 18:25   ` Peter Zijlstra
2018-07-17 20:15     ` Joel Fernandes
2018-07-18  0:16     ` Joel Fernandes
2018-07-18  8:21       ` Peter Zijlstra
2018-07-26 23:18       ` Steven Rostedt
2018-07-26 23:35         ` Joel Fernandes
2018-07-13 21:55 ` [PATCH v10 3/3] tracing: Centralize preemptirq tracepoints and unify their usage Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).