All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/8] Centralize and unify usage of preempt/irq tracepoints
@ 2018-05-30  0:04 ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

The preempt/irq tracepoints exist but not everything in the kernel is
using it during preempt/irq on/off events. This makes things not work
simultaneously (for ex, only either lockdep or irqsoff events can be
used at a time). This series is an attempt to solve that, and also
results in a nice clean up of the kernel in general. Several ifdefs are
simpler, and the design is more unified and better. Also as a result of
this, we also speeded up  performance all rcuidle tracepoints since
their handling is simpler.

v7->v8:
- Refactored irqsoff tracer probe defines based on Namhyung's
  suggestions. No functional change.

v6->v7:
- Added a module to simulate an atomic section, a kselftest to load and
  and trigger it which verifies the preempt-tracer and this series.

- Fixed a new warning after I rebased in early boot, this is because
early_boot_irqs_disabled was set too early, I moved it after the lockdep
initialization.

- added back the softirq fix since it appears it wasn't picked up.

- Ran Ingo's locking API selftest suite which are passing with this
  series.

- Mathieu suggested ifdef'ing the tracepoint_synchronize_unregister
  function incase tracepoints aren't enabled, did that.

Joel Fernandes (Google) (7):
  softirq: reorder trace_softirqs_on to prevent lockdep splat
  srcu: Add notrace variant of srcu_dereference
  trace/irqsoff: Split reset into separate functions
  tracepoint: Make rcuidle tracepoint callers use SRCU
  tracing: Centralize preemptirq tracepoints and unify their usage
  lib: Add module to simulate atomic sections for testing preemptoff
    tracers
  kselftests: Add tests for the preemptoff and irqsoff tracers

Paul McKenney (1):
  srcu: Add notrace variants of srcu_read_{lock,unlock}

 include/linux/ftrace.h                        |  11 +-
 include/linux/irqflags.h                      |  11 +-
 include/linux/lockdep.h                       |   8 +-
 include/linux/preempt.h                       |   2 +-
 include/linux/srcu.h                          |  22 ++
 include/linux/tracepoint.h                    |  48 +++-
 include/trace/events/preemptirq.h             |  23 +-
 init/main.c                                   |   5 +-
 kernel/locking/lockdep.c                      |  35 +--
 kernel/sched/core.c                           |   2 +-
 kernel/softirq.c                              |   6 +-
 kernel/trace/Kconfig                          |  22 +-
 kernel/trace/Makefile                         |   2 +-
 kernel/trace/trace_irqsoff.c                  | 253 ++++++------------
 kernel/trace/trace_preemptirq.c               |  71 +++++
 kernel/tracepoint.c                           |  15 +-
 lib/Kconfig.debug                             |   8 +
 lib/Makefile                                  |   1 +
 lib/test_atomic_sections.c                    |  79 ++++++
 tools/testing/selftests/ftrace/config         |   3 +
 .../test.d/preemptirq/irqsoff_tracer.tc       |  74 +++++
 21 files changed, 459 insertions(+), 242 deletions(-)
 create mode 100644 kernel/trace/trace_preemptirq.c
 create mode 100644 lib/test_atomic_sections.c
 create mode 100644 tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc

-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 0/8] Centralize and unify usage of preempt/irq tracepoints
@ 2018-05-30  0:04 ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: joelaf @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel at joelfernandes.org>

The preempt/irq tracepoints exist but not everything in the kernel is
using it during preempt/irq on/off events. This makes things not work
simultaneously (for ex, only either lockdep or irqsoff events can be
used at a time). This series is an attempt to solve that, and also
results in a nice clean up of the kernel in general. Several ifdefs are
simpler, and the design is more unified and better. Also as a result of
this, we also speeded up  performance all rcuidle tracepoints since
their handling is simpler.

v7->v8:
- Refactored irqsoff tracer probe defines based on Namhyung's
  suggestions. No functional change.

v6->v7:
- Added a module to simulate an atomic section, a kselftest to load and
  and trigger it which verifies the preempt-tracer and this series.

- Fixed a new warning after I rebased in early boot, this is because
early_boot_irqs_disabled was set too early, I moved it after the lockdep
initialization.

- added back the softirq fix since it appears it wasn't picked up.

- Ran Ingo's locking API selftest suite which are passing with this
  series.

- Mathieu suggested ifdef'ing the tracepoint_synchronize_unregister
  function incase tracepoints aren't enabled, did that.

Joel Fernandes (Google) (7):
  softirq: reorder trace_softirqs_on to prevent lockdep splat
  srcu: Add notrace variant of srcu_dereference
  trace/irqsoff: Split reset into separate functions
  tracepoint: Make rcuidle tracepoint callers use SRCU
  tracing: Centralize preemptirq tracepoints and unify their usage
  lib: Add module to simulate atomic sections for testing preemptoff
    tracers
  kselftests: Add tests for the preemptoff and irqsoff tracers

Paul McKenney (1):
  srcu: Add notrace variants of srcu_read_{lock,unlock}

 include/linux/ftrace.h                        |  11 +-
 include/linux/irqflags.h                      |  11 +-
 include/linux/lockdep.h                       |   8 +-
 include/linux/preempt.h                       |   2 +-
 include/linux/srcu.h                          |  22 ++
 include/linux/tracepoint.h                    |  48 +++-
 include/trace/events/preemptirq.h             |  23 +-
 init/main.c                                   |   5 +-
 kernel/locking/lockdep.c                      |  35 +--
 kernel/sched/core.c                           |   2 +-
 kernel/softirq.c                              |   6 +-
 kernel/trace/Kconfig                          |  22 +-
 kernel/trace/Makefile                         |   2 +-
 kernel/trace/trace_irqsoff.c                  | 253 ++++++------------
 kernel/trace/trace_preemptirq.c               |  71 +++++
 kernel/tracepoint.c                           |  15 +-
 lib/Kconfig.debug                             |   8 +
 lib/Makefile                                  |   1 +
 lib/test_atomic_sections.c                    |  79 ++++++
 tools/testing/selftests/ftrace/config         |   3 +
 .../test.d/preemptirq/irqsoff_tracer.tc       |  74 +++++
 21 files changed, 459 insertions(+), 242 deletions(-)
 create mode 100644 kernel/trace/trace_preemptirq.c
 create mode 100644 lib/test_atomic_sections.c
 create mode 100644 tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc

-- 
2.17.0.921.gf22659ad46-goog


--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 0/8] Centralize and unify usage of preempt/irq tracepoints
@ 2018-05-30  0:04 ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

The preempt/irq tracepoints exist but not everything in the kernel is
using it during preempt/irq on/off events. This makes things not work
simultaneously (for ex, only either lockdep or irqsoff events can be
used at a time). This series is an attempt to solve that, and also
results in a nice clean up of the kernel in general. Several ifdefs are
simpler, and the design is more unified and better. Also as a result of
this, we also speeded up  performance all rcuidle tracepoints since
their handling is simpler.

v7->v8:
- Refactored irqsoff tracer probe defines based on Namhyung's
  suggestions. No functional change.

v6->v7:
- Added a module to simulate an atomic section, a kselftest to load and
  and trigger it which verifies the preempt-tracer and this series.

- Fixed a new warning after I rebased in early boot, this is because
early_boot_irqs_disabled was set too early, I moved it after the lockdep
initialization.

- added back the softirq fix since it appears it wasn't picked up.

- Ran Ingo's locking API selftest suite which are passing with this
  series.

- Mathieu suggested ifdef'ing the tracepoint_synchronize_unregister
  function incase tracepoints aren't enabled, did that.

Joel Fernandes (Google) (7):
  softirq: reorder trace_softirqs_on to prevent lockdep splat
  srcu: Add notrace variant of srcu_dereference
  trace/irqsoff: Split reset into separate functions
  tracepoint: Make rcuidle tracepoint callers use SRCU
  tracing: Centralize preemptirq tracepoints and unify their usage
  lib: Add module to simulate atomic sections for testing preemptoff
    tracers
  kselftests: Add tests for the preemptoff and irqsoff tracers

Paul McKenney (1):
  srcu: Add notrace variants of srcu_read_{lock,unlock}

 include/linux/ftrace.h                        |  11 +-
 include/linux/irqflags.h                      |  11 +-
 include/linux/lockdep.h                       |   8 +-
 include/linux/preempt.h                       |   2 +-
 include/linux/srcu.h                          |  22 ++
 include/linux/tracepoint.h                    |  48 +++-
 include/trace/events/preemptirq.h             |  23 +-
 init/main.c                                   |   5 +-
 kernel/locking/lockdep.c                      |  35 +--
 kernel/sched/core.c                           |   2 +-
 kernel/softirq.c                              |   6 +-
 kernel/trace/Kconfig                          |  22 +-
 kernel/trace/Makefile                         |   2 +-
 kernel/trace/trace_irqsoff.c                  | 253 ++++++------------
 kernel/trace/trace_preemptirq.c               |  71 +++++
 kernel/tracepoint.c                           |  15 +-
 lib/Kconfig.debug                             |   8 +
 lib/Makefile                                  |   1 +
 lib/test_atomic_sections.c                    |  79 ++++++
 tools/testing/selftests/ftrace/config         |   3 +
 .../test.d/preemptirq/irqsoff_tracer.tc       |  74 +++++
 21 files changed, 459 insertions(+), 242 deletions(-)
 create mode 100644 kernel/trace/trace_preemptirq.c
 create mode 100644 lib/test_atomic_sections.c
 create mode 100644 tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc

-- 
2.17.0.921.gf22659ad46-goog


--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 1/8] softirq: reorder trace_softirqs_on to prevent lockdep splat
  2018-05-30  0:04 ` joelaf
  (?)
@ 2018-05-30  0:04   ` joelaf
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	stable, Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

I'm able to reproduce a lockdep splat with config options:
CONFIG_PROVE_LOCKING=y,
CONFIG_DEBUG_LOCK_ALLOC=y and
CONFIG_PREEMPTIRQ_EVENTS=y

$ echo 1 > /d/tracing/events/preemptirq/preempt_enable/enable

[   26.112609] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
[   26.112636] WARNING: CPU: 0 PID: 118 at kernel/locking/lockdep.c:3854
[...]
[   26.144229] Call Trace:
[   26.144926]  <IRQ>
[   26.145506]  lock_acquire+0x55/0x1b0
[   26.146499]  ? __do_softirq+0x46f/0x4d9
[   26.147571]  ? __do_softirq+0x46f/0x4d9
[   26.148646]  trace_preempt_on+0x8f/0x240
[   26.149744]  ? trace_preempt_on+0x4d/0x240
[   26.150862]  ? __do_softirq+0x46f/0x4d9
[   26.151930]  preempt_count_sub+0x18a/0x1a0
[   26.152985]  __do_softirq+0x46f/0x4d9
[   26.153937]  irq_exit+0x68/0xe0
[   26.154755]  smp_apic_timer_interrupt+0x271/0x280
[   26.156056]  apic_timer_interrupt+0xf/0x20
[   26.157105]  </IRQ>

The issue was this:

preempt_count = 1 << SOFTIRQ_SHIFT

	__local_bh_enable(cnt = 1 << SOFTIRQ_SHIFT) {
		if (softirq_count() == (cnt && SOFTIRQ_MASK)) {
			trace_softirqs_on() {
				current->softirqs_enabled = 1;
			}
		}
		preempt_count_sub(cnt) {
			trace_preempt_on() {
				tracepoint() {
					rcu_read_lock_sched() {
						// jumps into lockdep

Where preempt_count still has softirqs disabled, but
current->softirqs_enabled is true, and we get a splat.

Cc: stable@vger.kernel.org
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: d59158162e032 ("tracing: Add support for preempt and irq enable/disable events")
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/softirq.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 177de3640c78..8a040bcaa033 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -139,9 +139,13 @@ static void __local_bh_enable(unsigned int cnt)
 {
 	lockdep_assert_irqs_disabled();
 
+	if (preempt_count() == cnt)
+		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
+
 	if (softirq_count() == (cnt & SOFTIRQ_MASK))
 		trace_softirqs_on(_RET_IP_);
-	preempt_count_sub(cnt);
+
+	__preempt_count_sub(cnt);
 }
 
 /*
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 1/8] softirq: reorder trace_softirqs_on to prevent lockdep splat
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: joelaf @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel at joelfernandes.org>

I'm able to reproduce a lockdep splat with config options:
CONFIG_PROVE_LOCKING=y,
CONFIG_DEBUG_LOCK_ALLOC=y and
CONFIG_PREEMPTIRQ_EVENTS=y

$ echo 1 > /d/tracing/events/preemptirq/preempt_enable/enable

[   26.112609] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
[   26.112636] WARNING: CPU: 0 PID: 118 at kernel/locking/lockdep.c:3854
[...]
[   26.144229] Call Trace:
[   26.144926]  <IRQ>
[   26.145506]  lock_acquire+0x55/0x1b0
[   26.146499]  ? __do_softirq+0x46f/0x4d9
[   26.147571]  ? __do_softirq+0x46f/0x4d9
[   26.148646]  trace_preempt_on+0x8f/0x240
[   26.149744]  ? trace_preempt_on+0x4d/0x240
[   26.150862]  ? __do_softirq+0x46f/0x4d9
[   26.151930]  preempt_count_sub+0x18a/0x1a0
[   26.152985]  __do_softirq+0x46f/0x4d9
[   26.153937]  irq_exit+0x68/0xe0
[   26.154755]  smp_apic_timer_interrupt+0x271/0x280
[   26.156056]  apic_timer_interrupt+0xf/0x20
[   26.157105]  </IRQ>

The issue was this:

preempt_count = 1 << SOFTIRQ_SHIFT

	__local_bh_enable(cnt = 1 << SOFTIRQ_SHIFT) {
		if (softirq_count() == (cnt && SOFTIRQ_MASK)) {
			trace_softirqs_on() {
				current->softirqs_enabled = 1;
			}
		}
		preempt_count_sub(cnt) {
			trace_preempt_on() {
				tracepoint() {
					rcu_read_lock_sched() {
						// jumps into lockdep

Where preempt_count still has softirqs disabled, but
current->softirqs_enabled is true, and we get a splat.

Cc: stable at vger.kernel.org
Reviewed-by: Steven Rostedt (VMware) <rostedt at goodmis.org>
Fixes: d59158162e032 ("tracing: Add support for preempt and irq enable/disable events")
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 kernel/softirq.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 177de3640c78..8a040bcaa033 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -139,9 +139,13 @@ static void __local_bh_enable(unsigned int cnt)
 {
 	lockdep_assert_irqs_disabled();
 
+	if (preempt_count() == cnt)
+		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
+
 	if (softirq_count() == (cnt & SOFTIRQ_MASK))
 		trace_softirqs_on(_RET_IP_);
-	preempt_count_sub(cnt);
+
+	__preempt_count_sub(cnt);
 }
 
 /*
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 1/8] softirq: reorder trace_softirqs_on to prevent lockdep splat
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

I'm able to reproduce a lockdep splat with config options:
CONFIG_PROVE_LOCKING=y,
CONFIG_DEBUG_LOCK_ALLOC=y and
CONFIG_PREEMPTIRQ_EVENTS=y

$ echo 1 > /d/tracing/events/preemptirq/preempt_enable/enable

[   26.112609] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
[   26.112636] WARNING: CPU: 0 PID: 118 at kernel/locking/lockdep.c:3854
[...]
[   26.144229] Call Trace:
[   26.144926]  <IRQ>
[   26.145506]  lock_acquire+0x55/0x1b0
[   26.146499]  ? __do_softirq+0x46f/0x4d9
[   26.147571]  ? __do_softirq+0x46f/0x4d9
[   26.148646]  trace_preempt_on+0x8f/0x240
[   26.149744]  ? trace_preempt_on+0x4d/0x240
[   26.150862]  ? __do_softirq+0x46f/0x4d9
[   26.151930]  preempt_count_sub+0x18a/0x1a0
[   26.152985]  __do_softirq+0x46f/0x4d9
[   26.153937]  irq_exit+0x68/0xe0
[   26.154755]  smp_apic_timer_interrupt+0x271/0x280
[   26.156056]  apic_timer_interrupt+0xf/0x20
[   26.157105]  </IRQ>

The issue was this:

preempt_count = 1 << SOFTIRQ_SHIFT

	__local_bh_enable(cnt = 1 << SOFTIRQ_SHIFT) {
		if (softirq_count() == (cnt && SOFTIRQ_MASK)) {
			trace_softirqs_on() {
				current->softirqs_enabled = 1;
			}
		}
		preempt_count_sub(cnt) {
			trace_preempt_on() {
				tracepoint() {
					rcu_read_lock_sched() {
						// jumps into lockdep

Where preempt_count still has softirqs disabled, but
current->softirqs_enabled is true, and we get a splat.

Cc: stable at vger.kernel.org
Reviewed-by: Steven Rostedt (VMware) <rostedt at goodmis.org>
Fixes: d59158162e032 ("tracing: Add support for preempt and irq enable/disable events")
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 kernel/softirq.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 177de3640c78..8a040bcaa033 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -139,9 +139,13 @@ static void __local_bh_enable(unsigned int cnt)
 {
 	lockdep_assert_irqs_disabled();
 
+	if (preempt_count() == cnt)
+		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
+
 	if (softirq_count() == (cnt & SOFTIRQ_MASK))
 		trace_softirqs_on(_RET_IP_);
-	preempt_count_sub(cnt);
+
+	__preempt_count_sub(cnt);
 }
 
 /*
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 2/8] srcu: Add notrace variants of srcu_read_{lock,unlock}
  2018-05-30  0:04 ` joelaf
  (?)
@ 2018-05-30  0:04   ` joelaf
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Paul McKenney, Joel Fernandes, Boqun Feng,
	Byungchul Park, Erick Reyes, Ingo Molnar, Julia Cartwright,
	linux-kselftest, Masami Hiramatsu, Mathieu Desnoyers,
	Namhyung Kim, Peter Zijlstra, Shuah Khan, Steven Rostedt,
	Thomas Glexiner, Todd Kjos, Tom Zanussi

From: Paul McKenney <paulmck@linux.vnet.ibm.com>

This is needed for a future tracepoint patch that uses srcu, and to make
sure it doesn't call into lockdep.

tracepoint code already calls notrace variants for rcu_read_lock_sched
so this patch does the same for srcu which will be used in a later
patch. Keeps it consistent with rcu-sched.

[Joel: Added commit message]
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/srcu.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 33c1c698df09..2ec618979b20 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -161,6 +161,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
 	return retval;
 }
 
+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+	int retval;
+
+	retval = __srcu_read_lock(sp);
+	return retval;
+}
+
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
  * @sp: srcu_struct in which to unregister the old reader.
@@ -175,6 +185,13 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
 	__srcu_read_unlock(sp, idx);
 }
 
+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
+	__srcu_read_unlock(sp, idx);
+}
+
 /**
  * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
  *
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 2/8] srcu: Add notrace variants of srcu_read_{lock,unlock}
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: joelaf @ 2018-05-30  0:04 UTC (permalink / raw)


From: Paul McKenney <paulmck at linux.vnet.ibm.com>

This is needed for a future tracepoint patch that uses srcu, and to make
sure it doesn't call into lockdep.

tracepoint code already calls notrace variants for rcu_read_lock_sched
so this patch does the same for srcu which will be used in a later
patch. Keeps it consistent with rcu-sched.

[Joel: Added commit message]
Reviewed-by: Steven Rostedt (VMware) <rostedt at goodmis.org>
Signed-off-by: Paul McKenney <paulmck at linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/srcu.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 33c1c698df09..2ec618979b20 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -161,6 +161,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
 	return retval;
 }
 
+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+	int retval;
+
+	retval = __srcu_read_lock(sp);
+	return retval;
+}
+
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
  * @sp: srcu_struct in which to unregister the old reader.
@@ -175,6 +185,13 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
 	__srcu_read_unlock(sp, idx);
 }
 
+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
+	__srcu_read_unlock(sp, idx);
+}
+
 /**
  * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
  *
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 2/8] srcu: Add notrace variants of srcu_read_{lock,unlock}
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)


From: Paul McKenney <paulmck@linux.vnet.ibm.com>

This is needed for a future tracepoint patch that uses srcu, and to make
sure it doesn't call into lockdep.

tracepoint code already calls notrace variants for rcu_read_lock_sched
so this patch does the same for srcu which will be used in a later
patch. Keeps it consistent with rcu-sched.

[Joel: Added commit message]
Reviewed-by: Steven Rostedt (VMware) <rostedt at goodmis.org>
Signed-off-by: Paul McKenney <paulmck at linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/srcu.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 33c1c698df09..2ec618979b20 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -161,6 +161,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
 	return retval;
 }
 
+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+	int retval;
+
+	retval = __srcu_read_lock(sp);
+	return retval;
+}
+
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
  * @sp: srcu_struct in which to unregister the old reader.
@@ -175,6 +185,13 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
 	__srcu_read_unlock(sp, idx);
 }
 
+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
+	__srcu_read_unlock(sp, idx);
+}
+
 /**
  * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
  *
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 3/8] srcu: Add notrace variant of srcu_dereference
  2018-05-30  0:04 ` joelaf
  (?)
@ 2018-05-30  0:04   ` joelaf
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

In the last patch in this series, we are making lockdep register hooks
onto the irq_{disable,enable} tracepoints. These tracepoints use the
_rcuidle tracepoint variant. In this series we switch the _rcuidle
tracepoint callers to use SRCU instead of sched-RCU. Inorder to
dereference the pointer to the probe functions, we could call
srcu_dereference, however this API will call back into lockdep to check
if the lock is held *before* the lockdep probe hooks have a chance to
run and annotate the IRQ enabled/disabled state.

For this reason we need a notrace variant of srcu_dereference since
otherwise we get lockdep splats. This patch adds the needed
srcu_dereference_notrace variant.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/srcu.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 2ec618979b20..a1c4947be877 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -135,6 +135,11 @@ static inline int srcu_read_lock_held(const struct srcu_struct *sp)
  */
 #define srcu_dereference(p, sp) srcu_dereference_check((p), (sp), 0)
 
+/**
+ * srcu_dereference_notrace - no tracing and no lockdep calls from here
+ */
+#define srcu_dereference_notrace(p, sp) srcu_dereference_check((p), (sp), 1)
+
 /**
  * srcu_read_lock - register a new reader for an SRCU-protected structure.
  * @sp: srcu_struct in which to register the new reader.
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 3/8] srcu: Add notrace variant of srcu_dereference
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: joelaf @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel at joelfernandes.org>

In the last patch in this series, we are making lockdep register hooks
onto the irq_{disable,enable} tracepoints. These tracepoints use the
_rcuidle tracepoint variant. In this series we switch the _rcuidle
tracepoint callers to use SRCU instead of sched-RCU. Inorder to
dereference the pointer to the probe functions, we could call
srcu_dereference, however this API will call back into lockdep to check
if the lock is held *before* the lockdep probe hooks have a chance to
run and annotate the IRQ enabled/disabled state.

For this reason we need a notrace variant of srcu_dereference since
otherwise we get lockdep splats. This patch adds the needed
srcu_dereference_notrace variant.

Reviewed-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/srcu.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 2ec618979b20..a1c4947be877 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -135,6 +135,11 @@ static inline int srcu_read_lock_held(const struct srcu_struct *sp)
  */
 #define srcu_dereference(p, sp) srcu_dereference_check((p), (sp), 0)
 
+/**
+ * srcu_dereference_notrace - no tracing and no lockdep calls from here
+ */
+#define srcu_dereference_notrace(p, sp) srcu_dereference_check((p), (sp), 1)
+
 /**
  * srcu_read_lock - register a new reader for an SRCU-protected structure.
  * @sp: srcu_struct in which to register the new reader.
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 3/8] srcu: Add notrace variant of srcu_dereference
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

In the last patch in this series, we are making lockdep register hooks
onto the irq_{disable,enable} tracepoints. These tracepoints use the
_rcuidle tracepoint variant. In this series we switch the _rcuidle
tracepoint callers to use SRCU instead of sched-RCU. Inorder to
dereference the pointer to the probe functions, we could call
srcu_dereference, however this API will call back into lockdep to check
if the lock is held *before* the lockdep probe hooks have a chance to
run and annotate the IRQ enabled/disabled state.

For this reason we need a notrace variant of srcu_dereference since
otherwise we get lockdep splats. This patch adds the needed
srcu_dereference_notrace variant.

Reviewed-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/srcu.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 2ec618979b20..a1c4947be877 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -135,6 +135,11 @@ static inline int srcu_read_lock_held(const struct srcu_struct *sp)
  */
 #define srcu_dereference(p, sp) srcu_dereference_check((p), (sp), 0)
 
+/**
+ * srcu_dereference_notrace - no tracing and no lockdep calls from here
+ */
+#define srcu_dereference_notrace(p, sp) srcu_dereference_check((p), (sp), 1)
+
 /**
  * srcu_read_lock - register a new reader for an SRCU-protected structure.
  * @sp: srcu_struct in which to register the new reader.
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 4/8] trace/irqsoff: Split reset into separate functions
  2018-05-30  0:04 ` joelaf
  (?)
@ 2018-05-30  0:04   ` joelaf
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Split reset functions into seperate functions in preparation
of future patches that need to do tracer specific reset.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/trace/trace_irqsoff.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 03ecb4465ee4..f8daa754cce2 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -634,7 +634,7 @@ static int __irqsoff_tracer_init(struct trace_array *tr)
 	return 0;
 }
 
-static void irqsoff_tracer_reset(struct trace_array *tr)
+static void __irqsoff_tracer_reset(struct trace_array *tr)
 {
 	int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT;
 	int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE;
@@ -665,6 +665,12 @@ static int irqsoff_tracer_init(struct trace_array *tr)
 
 	return __irqsoff_tracer_init(tr);
 }
+
+static void irqsoff_tracer_reset(struct trace_array *tr)
+{
+	__irqsoff_tracer_reset(tr);
+}
+
 static struct tracer irqsoff_tracer __read_mostly =
 {
 	.name		= "irqsoff",
@@ -697,11 +703,16 @@ static int preemptoff_tracer_init(struct trace_array *tr)
 	return __irqsoff_tracer_init(tr);
 }
 
+static void preemptoff_tracer_reset(struct trace_array *tr)
+{
+	__irqsoff_tracer_reset(tr);
+}
+
 static struct tracer preemptoff_tracer __read_mostly =
 {
 	.name		= "preemptoff",
 	.init		= preemptoff_tracer_init,
-	.reset		= irqsoff_tracer_reset,
+	.reset		= preemptoff_tracer_reset,
 	.start		= irqsoff_tracer_start,
 	.stop		= irqsoff_tracer_stop,
 	.print_max	= true,
@@ -731,11 +742,16 @@ static int preemptirqsoff_tracer_init(struct trace_array *tr)
 	return __irqsoff_tracer_init(tr);
 }
 
+static void preemptirqsoff_tracer_reset(struct trace_array *tr)
+{
+	__irqsoff_tracer_reset(tr);
+}
+
 static struct tracer preemptirqsoff_tracer __read_mostly =
 {
 	.name		= "preemptirqsoff",
 	.init		= preemptirqsoff_tracer_init,
-	.reset		= irqsoff_tracer_reset,
+	.reset		= preemptirqsoff_tracer_reset,
 	.start		= irqsoff_tracer_start,
 	.stop		= irqsoff_tracer_stop,
 	.print_max	= true,
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 4/8] trace/irqsoff: Split reset into separate functions
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: joelaf @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel at joelfernandes.org>

Split reset functions into seperate functions in preparation
of future patches that need to do tracer specific reset.

Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 kernel/trace/trace_irqsoff.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 03ecb4465ee4..f8daa754cce2 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -634,7 +634,7 @@ static int __irqsoff_tracer_init(struct trace_array *tr)
 	return 0;
 }
 
-static void irqsoff_tracer_reset(struct trace_array *tr)
+static void __irqsoff_tracer_reset(struct trace_array *tr)
 {
 	int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT;
 	int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE;
@@ -665,6 +665,12 @@ static int irqsoff_tracer_init(struct trace_array *tr)
 
 	return __irqsoff_tracer_init(tr);
 }
+
+static void irqsoff_tracer_reset(struct trace_array *tr)
+{
+	__irqsoff_tracer_reset(tr);
+}
+
 static struct tracer irqsoff_tracer __read_mostly =
 {
 	.name		= "irqsoff",
@@ -697,11 +703,16 @@ static int preemptoff_tracer_init(struct trace_array *tr)
 	return __irqsoff_tracer_init(tr);
 }
 
+static void preemptoff_tracer_reset(struct trace_array *tr)
+{
+	__irqsoff_tracer_reset(tr);
+}
+
 static struct tracer preemptoff_tracer __read_mostly =
 {
 	.name		= "preemptoff",
 	.init		= preemptoff_tracer_init,
-	.reset		= irqsoff_tracer_reset,
+	.reset		= preemptoff_tracer_reset,
 	.start		= irqsoff_tracer_start,
 	.stop		= irqsoff_tracer_stop,
 	.print_max	= true,
@@ -731,11 +742,16 @@ static int preemptirqsoff_tracer_init(struct trace_array *tr)
 	return __irqsoff_tracer_init(tr);
 }
 
+static void preemptirqsoff_tracer_reset(struct trace_array *tr)
+{
+	__irqsoff_tracer_reset(tr);
+}
+
 static struct tracer preemptirqsoff_tracer __read_mostly =
 {
 	.name		= "preemptirqsoff",
 	.init		= preemptirqsoff_tracer_init,
-	.reset		= irqsoff_tracer_reset,
+	.reset		= preemptirqsoff_tracer_reset,
 	.start		= irqsoff_tracer_start,
 	.stop		= irqsoff_tracer_stop,
 	.print_max	= true,
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 4/8] trace/irqsoff: Split reset into separate functions
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Split reset functions into seperate functions in preparation
of future patches that need to do tracer specific reset.

Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 kernel/trace/trace_irqsoff.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 03ecb4465ee4..f8daa754cce2 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -634,7 +634,7 @@ static int __irqsoff_tracer_init(struct trace_array *tr)
 	return 0;
 }
 
-static void irqsoff_tracer_reset(struct trace_array *tr)
+static void __irqsoff_tracer_reset(struct trace_array *tr)
 {
 	int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT;
 	int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE;
@@ -665,6 +665,12 @@ static int irqsoff_tracer_init(struct trace_array *tr)
 
 	return __irqsoff_tracer_init(tr);
 }
+
+static void irqsoff_tracer_reset(struct trace_array *tr)
+{
+	__irqsoff_tracer_reset(tr);
+}
+
 static struct tracer irqsoff_tracer __read_mostly =
 {
 	.name		= "irqsoff",
@@ -697,11 +703,16 @@ static int preemptoff_tracer_init(struct trace_array *tr)
 	return __irqsoff_tracer_init(tr);
 }
 
+static void preemptoff_tracer_reset(struct trace_array *tr)
+{
+	__irqsoff_tracer_reset(tr);
+}
+
 static struct tracer preemptoff_tracer __read_mostly =
 {
 	.name		= "preemptoff",
 	.init		= preemptoff_tracer_init,
-	.reset		= irqsoff_tracer_reset,
+	.reset		= preemptoff_tracer_reset,
 	.start		= irqsoff_tracer_start,
 	.stop		= irqsoff_tracer_stop,
 	.print_max	= true,
@@ -731,11 +742,16 @@ static int preemptirqsoff_tracer_init(struct trace_array *tr)
 	return __irqsoff_tracer_init(tr);
 }
 
+static void preemptirqsoff_tracer_reset(struct trace_array *tr)
+{
+	__irqsoff_tracer_reset(tr);
+}
+
 static struct tracer preemptirqsoff_tracer __read_mostly =
 {
 	.name		= "preemptirqsoff",
 	.init		= preemptirqsoff_tracer_init,
-	.reset		= irqsoff_tracer_reset,
+	.reset		= preemptirqsoff_tracer_reset,
 	.start		= irqsoff_tracer_start,
 	.stop		= irqsoff_tracer_stop,
 	.print_max	= true,
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-05-30  0:04 ` joelaf
  (?)
@ 2018-05-30  0:04   ` joelaf
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

Test: Tested idle and preempt/irq tracepoints.

Here are some performance numbers:

With a run of the following 30 times on a single core x86 Qemu instance
with 1GB memory:
hackbench -g 4 -f 2 -l 3000

Completion times in seconds. CONFIG_PROVE_LOCKING=y.

No patches (without this series)
Mean: 3.048
Median: 3.025
Std Dev: 0.064

With Lockdep using irq tracepoints with RCU implementation:
Mean: 3.451   (-11.66 %)
Median: 3.447 (-12.22%)
Std Dev: 0.049

With Lockdep using irq tracepoints with SRCU implementation (this series):
Mean: 3.020   (I would consider the improvement against the "without
	       this series" case as just noise).
Median: 3.013
Std Dev: 0.033

[1] https://patchwork.kernel.org/patch/10344297/

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/tracepoint.h | 48 +++++++++++++++++++++++++++++++-------
 kernel/tracepoint.c        | 15 +++++++++++-
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..880794207921 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/srcu.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -33,6 +34,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -75,10 +78,15 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
  * probe unregistration and the end of module exit to make sure there is no
  * caller executing a probe when it is freed.
  */
+#ifdef CONFIG_TRACEPOINTS
 static inline void tracepoint_synchronize_unregister(void)
 {
+	synchronize_srcu(&tracepoint_srcu);
 	synchronize_sched();
 }
+#else
+static inline void tracepoint_synchronize_unregister(void) { }
+#endif
 
 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
 extern int syscall_regfunc(void);
@@ -129,18 +137,38 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
+#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
+		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
-		if (rcucheck)						\
-			rcu_irq_enter_irqson();				\
-		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+									\
+		/*							\
+		 * For rcuidle callers, use srcu since sched-rcu	\
+		 * doesn't work from the idle path.			\
+		 */							\
+		if (rcuidle) {						\
+			if (in_nmi()) {					\
+				WARN_ON_ONCE(1);			\
+				return; /* no srcu from nmi */		\
+			}						\
+									\
+			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
+			it_func_ptr =					\
+				srcu_dereference_notrace((tp)->funcs,	\
+						&tracepoint_srcu);	\
+			/* To keep it consistent with !rcuidle path */	\
+			preempt_disable_notrace();			\
+		} else {						\
+			rcu_read_lock_sched_notrace();			\
+			it_func_ptr =					\
+				rcu_dereference_sched((tp)->funcs);	\
+		}							\
+									\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -148,9 +176,13 @@ extern void syscall_unregfunc(void);
 				((void(*)(proto))(it_func))(args);	\
 			} while ((++it_func_ptr)->func);		\
 		}							\
-		rcu_read_unlock_sched_notrace();			\
-		if (rcucheck)						\
-			rcu_irq_exit_irqson();				\
+									\
+		if (rcuidle) {						\
+			preempt_enable_notrace();			\
+			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
+		} else {						\
+			rcu_read_unlock_sched_notrace();		\
+		}							\
 	} while (0)
 
 #ifndef MODULE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 1e37da2e0c25..54157792f5ab 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -31,6 +31,9 @@
 extern struct tracepoint * const __start___tracepoints_ptrs[];
 extern struct tracepoint * const __stop___tracepoints_ptrs[];
 
+DEFINE_SRCU(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
 /* Set to 1 to enable tracepoint debug output */
 static const int tracepoint_debug;
 
@@ -67,16 +70,26 @@ static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
 {
 	kfree(container_of(head, struct tp_probes, rcu));
 }
 
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
 static inline void release_probes(struct tracepoint_func *old)
 {
 	if (old) {
 		struct tp_probes *tp_probes = container_of(old,
 			struct tp_probes, probes[0]);
+		/*
+		 * Tracepoint probes are protected by both sched RCU and SRCU,
+		 * by calling the SRCU callback in the sched RCU callback we
+		 * cover both cases. So lets chain the SRCU and RCU callbacks.
+		 */
 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
 	}
 }
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: joelaf @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel at joelfernandes.org>

In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

Test: Tested idle and preempt/irq tracepoints.

Here are some performance numbers:

With a run of the following 30 times on a single core x86 Qemu instance
with 1GB memory:
hackbench -g 4 -f 2 -l 3000

Completion times in seconds. CONFIG_PROVE_LOCKING=y.

No patches (without this series)
Mean: 3.048
Median: 3.025
Std Dev: 0.064

With Lockdep using irq tracepoints with RCU implementation:
Mean: 3.451   (-11.66 %)
Median: 3.447 (-12.22%)
Std Dev: 0.049

With Lockdep using irq tracepoints with SRCU implementation (this series):
Mean: 3.020   (I would consider the improvement against the "without
	       this series" case as just noise).
Median: 3.013
Std Dev: 0.033

[1] https://patchwork.kernel.org/patch/10344297/

Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/tracepoint.h | 48 +++++++++++++++++++++++++++++++-------
 kernel/tracepoint.c        | 15 +++++++++++-
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..880794207921 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/srcu.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -33,6 +34,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -75,10 +78,15 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
  * probe unregistration and the end of module exit to make sure there is no
  * caller executing a probe when it is freed.
  */
+#ifdef CONFIG_TRACEPOINTS
 static inline void tracepoint_synchronize_unregister(void)
 {
+	synchronize_srcu(&tracepoint_srcu);
 	synchronize_sched();
 }
+#else
+static inline void tracepoint_synchronize_unregister(void) { }
+#endif
 
 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
 extern int syscall_regfunc(void);
@@ -129,18 +137,38 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
+#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
+		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
-		if (rcucheck)						\
-			rcu_irq_enter_irqson();				\
-		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+									\
+		/*							\
+		 * For rcuidle callers, use srcu since sched-rcu	\
+		 * doesn't work from the idle path.			\
+		 */							\
+		if (rcuidle) {						\
+			if (in_nmi()) {					\
+				WARN_ON_ONCE(1);			\
+				return; /* no srcu from nmi */		\
+			}						\
+									\
+			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
+			it_func_ptr =					\
+				srcu_dereference_notrace((tp)->funcs,	\
+						&tracepoint_srcu);	\
+			/* To keep it consistent with !rcuidle path */	\
+			preempt_disable_notrace();			\
+		} else {						\
+			rcu_read_lock_sched_notrace();			\
+			it_func_ptr =					\
+				rcu_dereference_sched((tp)->funcs);	\
+		}							\
+									\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -148,9 +176,13 @@ extern void syscall_unregfunc(void);
 				((void(*)(proto))(it_func))(args);	\
 			} while ((++it_func_ptr)->func);		\
 		}							\
-		rcu_read_unlock_sched_notrace();			\
-		if (rcucheck)						\
-			rcu_irq_exit_irqson();				\
+									\
+		if (rcuidle) {						\
+			preempt_enable_notrace();			\
+			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
+		} else {						\
+			rcu_read_unlock_sched_notrace();		\
+		}							\
 	} while (0)
 
 #ifndef MODULE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 1e37da2e0c25..54157792f5ab 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -31,6 +31,9 @@
 extern struct tracepoint * const __start___tracepoints_ptrs[];
 extern struct tracepoint * const __stop___tracepoints_ptrs[];
 
+DEFINE_SRCU(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
 /* Set to 1 to enable tracepoint debug output */
 static const int tracepoint_debug;
 
@@ -67,16 +70,26 @@ static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
 {
 	kfree(container_of(head, struct tp_probes, rcu));
 }
 
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
 static inline void release_probes(struct tracepoint_func *old)
 {
 	if (old) {
 		struct tp_probes *tp_probes = container_of(old,
 			struct tp_probes, probes[0]);
+		/*
+		 * Tracepoint probes are protected by both sched RCU and SRCU,
+		 * by calling the SRCU callback in the sched RCU callback we
+		 * cover both cases. So lets chain the SRCU and RCU callbacks.
+		 */
 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
 	}
 }
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

Test: Tested idle and preempt/irq tracepoints.

Here are some performance numbers:

With a run of the following 30 times on a single core x86 Qemu instance
with 1GB memory:
hackbench -g 4 -f 2 -l 3000

Completion times in seconds. CONFIG_PROVE_LOCKING=y.

No patches (without this series)
Mean: 3.048
Median: 3.025
Std Dev: 0.064

With Lockdep using irq tracepoints with RCU implementation:
Mean: 3.451   (-11.66 %)
Median: 3.447 (-12.22%)
Std Dev: 0.049

With Lockdep using irq tracepoints with SRCU implementation (this series):
Mean: 3.020   (I would consider the improvement against the "without
	       this series" case as just noise).
Median: 3.013
Std Dev: 0.033

[1] https://patchwork.kernel.org/patch/10344297/

Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/tracepoint.h | 48 +++++++++++++++++++++++++++++++-------
 kernel/tracepoint.c        | 15 +++++++++++-
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..880794207921 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/srcu.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -33,6 +34,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -75,10 +78,15 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
  * probe unregistration and the end of module exit to make sure there is no
  * caller executing a probe when it is freed.
  */
+#ifdef CONFIG_TRACEPOINTS
 static inline void tracepoint_synchronize_unregister(void)
 {
+	synchronize_srcu(&tracepoint_srcu);
 	synchronize_sched();
 }
+#else
+static inline void tracepoint_synchronize_unregister(void) { }
+#endif
 
 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
 extern int syscall_regfunc(void);
@@ -129,18 +137,38 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
+#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
+		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
-		if (rcucheck)						\
-			rcu_irq_enter_irqson();				\
-		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+									\
+		/*							\
+		 * For rcuidle callers, use srcu since sched-rcu	\
+		 * doesn't work from the idle path.			\
+		 */							\
+		if (rcuidle) {						\
+			if (in_nmi()) {					\
+				WARN_ON_ONCE(1);			\
+				return; /* no srcu from nmi */		\
+			}						\
+									\
+			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
+			it_func_ptr =					\
+				srcu_dereference_notrace((tp)->funcs,	\
+						&tracepoint_srcu);	\
+			/* To keep it consistent with !rcuidle path */	\
+			preempt_disable_notrace();			\
+		} else {						\
+			rcu_read_lock_sched_notrace();			\
+			it_func_ptr =					\
+				rcu_dereference_sched((tp)->funcs);	\
+		}							\
+									\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -148,9 +176,13 @@ extern void syscall_unregfunc(void);
 				((void(*)(proto))(it_func))(args);	\
 			} while ((++it_func_ptr)->func);		\
 		}							\
-		rcu_read_unlock_sched_notrace();			\
-		if (rcucheck)						\
-			rcu_irq_exit_irqson();				\
+									\
+		if (rcuidle) {						\
+			preempt_enable_notrace();			\
+			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
+		} else {						\
+			rcu_read_unlock_sched_notrace();		\
+		}							\
 	} while (0)
 
 #ifndef MODULE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 1e37da2e0c25..54157792f5ab 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -31,6 +31,9 @@
 extern struct tracepoint * const __start___tracepoints_ptrs[];
 extern struct tracepoint * const __stop___tracepoints_ptrs[];
 
+DEFINE_SRCU(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
 /* Set to 1 to enable tracepoint debug output */
 static const int tracepoint_debug;
 
@@ -67,16 +70,26 @@ static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
 {
 	kfree(container_of(head, struct tp_probes, rcu));
 }
 
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
 static inline void release_probes(struct tracepoint_func *old)
 {
 	if (old) {
 		struct tp_probes *tp_probes = container_of(old,
 			struct tp_probes, probes[0]);
+		/*
+		 * Tracepoint probes are protected by both sched RCU and SRCU,
+		 * by calling the SRCU callback in the sched RCU callback we
+		 * cover both cases. So lets chain the SRCU and RCU callbacks.
+		 */
 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
 	}
 }
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 6/8] tracing: Centralize preemptirq tracepoints and unify their usage
  2018-05-30  0:04 ` joelaf
  (?)
@ 2018-05-30  0:04   ` joelaf
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.

Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.

* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
  3 users of the events exist:
  - Lockdep
  - irqsoff and preemptoff tracers
  - irqs and preempt trace events

The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.

In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:

 PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
       |                 |     \         |           |
       \    (selects)    /      \        \ (selects) /
      TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
                      \                  /
                       \ (depends on)   /
                     PREEMPTIRQ_TRACEPOINTS

One note, I have to check for lockdep recursion in the code that calls
the trace events API and bail out if we're in lockdep recursion
protection to prevent something like the following case: a spin_lock is
taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
and then sets lockdep_recursion, and then calls __lockdep_acquired. In
this function, a call to get_lock_stats happens which calls
preempt_disable, which calls trace IRQS off somewhere which enters my
tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
This flag is then never cleared causing lockdep paths to never be
entered and thus causing splats and other bad things.

Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.

I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.

Without probes:

[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

With probes:

[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/ftrace.h            |  11 +-
 include/linux/irqflags.h          |  11 +-
 include/linux/lockdep.h           |   8 +-
 include/linux/preempt.h           |   2 +-
 include/trace/events/preemptirq.h |  23 +--
 init/main.c                       |   5 +-
 kernel/locking/lockdep.c          |  35 ++---
 kernel/sched/core.c               |   2 +-
 kernel/trace/Kconfig              |  22 ++-
 kernel/trace/Makefile             |   2 +-
 kernel/trace/trace_irqsoff.c      | 231 ++++++++----------------------
 kernel/trace/trace_preemptirq.c   |  71 +++++++++
 12 files changed, 194 insertions(+), 229 deletions(-)
 create mode 100644 kernel/trace/trace_preemptirq.c

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9c3c9a319e48..5191030af0c0 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void)
 	return CALLER_ADDR2;
 }
 
-#ifdef CONFIG_IRQSOFF_TRACER
-  extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
-  extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
-#else
-  static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
-  static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
-#endif
-
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
   extern void trace_preempt_on(unsigned long a0, unsigned long a1);
   extern void trace_preempt_off(unsigned long a0, unsigned long a1);
 #else
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 9700f00bbc04..50edb9cbbd26 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -15,9 +15,16 @@
 #include <linux/typecheck.h>
 #include <asm/irqflags.h>
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+/* Currently trace_softirqs_on/off is used only by lockdep */
+#ifdef CONFIG_PROVE_LOCKING
   extern void trace_softirqs_on(unsigned long ip);
   extern void trace_softirqs_off(unsigned long ip);
+#else
+# define trace_softirqs_on(ip)	do { } while (0)
+# define trace_softirqs_off(ip)	do { } while (0)
+#endif
+
+#ifdef CONFIG_TRACE_IRQFLAGS
   extern void trace_hardirqs_on(void);
   extern void trace_hardirqs_off(void);
 # define trace_hardirq_context(p)	((p)->hardirq_context)
@@ -43,8 +50,6 @@ do {						\
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
-# define trace_softirqs_on(ip)		do { } while (0)
-# define trace_softirqs_off(ip)		do { } while (0)
 # define trace_hardirq_context(p)	0
 # define trace_softirq_context(p)	0
 # define trace_hardirqs_enabled(p)	0
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..a8113357ceeb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -266,7 +266,8 @@ struct held_lock {
 /*
  * Initialization, self-test and debugging-output methods:
  */
-extern void lockdep_info(void);
+extern void lockdep_init(void);
+extern void lockdep_init_early(void);
 extern void lockdep_reset(void);
 extern void lockdep_reset_lock(struct lockdep_map *lock);
 extern void lockdep_free_key_range(void *start, unsigned long size);
@@ -406,7 +407,8 @@ static inline void lockdep_on(void)
 # define lock_downgrade(l, i)			do { } while (0)
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
-# define lockdep_info()				do { } while (0)
+# define lockdep_init()				do { } while (0)
+# define lockdep_init_early()			do { } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
@@ -532,7 +534,7 @@ do {								\
 
 #endif /* CONFIG_LOCKDEP */
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_PROVE_LOCKING
 extern void print_irqtrace_events(struct task_struct *curr);
 #else
 static inline void print_irqtrace_events(struct task_struct *curr)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 5bd3f151da78..c01813c3fbe9 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -150,7 +150,7 @@
  */
 #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
 
-#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
 extern void preempt_count_add(int val);
 extern void preempt_count_sub(int val);
 #define preempt_count_dec_and_test() \
diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
index 9c4eb33c5a1d..9a0d4ceeb166 100644
--- a/include/trace/events/preemptirq.h
+++ b/include/trace/events/preemptirq.h
@@ -1,4 +1,4 @@
-#ifdef CONFIG_PREEMPTIRQ_EVENTS
+#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM preemptirq
@@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template,
 		  (void *)((unsigned long)(_stext) + __entry->parent_offs))
 );
 
-#ifndef CONFIG_PROVE_LOCKING
+#ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_EVENT(preemptirq_template, irq_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable,
 DEFINE_EVENT(preemptirq_template, irq_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_irq_enable(...)
+#define trace_irq_disable(...)
+#define trace_irq_enable_rcuidle(...)
+#define trace_irq_disable_rcuidle(...)
 #endif
 
-#ifdef CONFIG_DEBUG_PREEMPT
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
 DEFINE_EVENT(preemptirq_template, preempt_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable,
 DEFINE_EVENT(preemptirq_template, preempt_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_preempt_enable(...)
+#define trace_preempt_disable(...)
+#define trace_preempt_enable_rcuidle(...)
+#define trace_preempt_disable_rcuidle(...)
 #endif
 
 #endif /* _TRACE_PREEMPTIRQ_H */
 
 #include <trace/define_trace.h>
 
-#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
+#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */
 #define trace_irq_enable(...)
 #define trace_irq_disable(...)
 #define trace_irq_enable_rcuidle(...)
 #define trace_irq_disable_rcuidle(...)
-#endif
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
 #define trace_preempt_enable(...)
 #define trace_preempt_disable(...)
 #define trace_preempt_enable_rcuidle(...)
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..44fe43be84c1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void)
 	profile_init();
 	call_function_init();
 	WARN(!irqs_disabled(), "Interrupts were enabled early\n");
+
+	lockdep_init_early();
+
 	early_boot_irqs_disabled = false;
 	local_irq_enable();
 
@@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void)
 		panic("Too many boot %s vars at `%s'", panic_later,
 		      panic_param);
 
-	lockdep_info();
+	lockdep_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 023386338269..871a42232858 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -55,6 +55,7 @@
 
 #include "lockdep_internals.h"
 
+#include <trace/events/preemptirq.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/lock.h>
 
@@ -2841,10 +2842,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip)
 	debug_atomic_inc(hardirqs_on_events);
 }
 
-__visible void trace_hardirqs_on_caller(unsigned long ip)
+static void lockdep_hardirqs_on(void *none, unsigned long ignore,
+				unsigned long ip)
 {
-	time_hardirqs_on(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2883,23 +2883,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip)
 	__trace_hardirqs_on_caller(ip);
 	current->lockdep_recursion = 0;
 }
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-void trace_hardirqs_on(void)
-{
-	trace_hardirqs_on_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
 
 /*
  * Hardirqs were disabled:
  */
-__visible void trace_hardirqs_off_caller(unsigned long ip)
+static void lockdep_hardirqs_off(void *none, unsigned long ignore,
+				 unsigned long ip)
 {
 	struct task_struct *curr = current;
 
-	time_hardirqs_off(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2921,13 +2913,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip)
 	} else
 		debug_atomic_inc(redundant_hardirqs_off);
 }
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-void trace_hardirqs_off(void)
-{
-	trace_hardirqs_off_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
 
 /*
  * Softirqs will be enabled:
@@ -4334,7 +4319,15 @@ void lockdep_reset_lock(struct lockdep_map *lock)
 	raw_local_irq_restore(flags);
 }
 
-void __init lockdep_info(void)
+void __init lockdep_init_early(void)
+{
+#ifdef CONFIG_PROVE_LOCKING
+	register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
+	register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
+#endif
+}
+
+void __init lockdep_init(void)
 {
 	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 092f7c4de903..4e9c2d254fba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3189,7 +3189,7 @@ static inline void sched_tick_stop(int cpu) { }
 #endif
 
 #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
-				defined(CONFIG_PREEMPT_TRACER))
+				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
 /*
  * If the value passed in is equal to the current preempt count
  * then we just disabled preemption. Start timing the latency.
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index c4f0f2e4126e..0bcba2a76ad9 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP
 	 Allow the use of ring_buffer_swap_cpu.
 	 Adds a very slight overhead to tracing when enabled.
 
+config PREEMPTIRQ_TRACEPOINTS
+	bool
+	depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
+	select TRACING
+	default y
+	help
+	  Create preempt/irq toggle tracepoints if needed, so that other parts
+	  of the kernel can use them to generate or add hooks to them.
+
 # All tracer options should select GENERIC_TRACER. For those options that are
 # enabled by all tracers (context switch and event tracer) they select TRACING.
 # This allows those options to appear when no other tracer is selected. But the
@@ -159,18 +168,20 @@ config FUNCTION_GRAPH_TRACER
 	  the return value. This is done by setting the current return
 	  address on the current task structure into a stack of calls.
 
+config TRACE_PREEMPT_TOGGLE
+	bool
+	help
+	  Enables hooks which will be called when preemption is first disabled,
+	  and last enabled.
 
 config PREEMPTIRQ_EVENTS
 	bool "Enable trace events for preempt and irq disable/enable"
 	select TRACE_IRQFLAGS
-	depends on DEBUG_PREEMPT || !PROVE_LOCKING
-	depends on TRACING
+	select TRACE_PREEMPT_TOGGLE if PREEMPT
+	select GENERIC_TRACER
 	default n
 	help
 	  Enable tracing of disable and enable events for preemption and irqs.
-	  For tracing preempt disable/enable events, DEBUG_PREEMPT must be
-	  enabled. For tracing irq disable/enable events, PROVE_LOCKING must
-	  be disabled.
 
 config IRQSOFF_TRACER
 	bool "Interrupts-off Latency Tracer"
@@ -207,6 +218,7 @@ config PREEMPT_TRACER
 	select RING_BUFFER_ALLOW_SWAP
 	select TRACER_SNAPSHOT
 	select TRACER_SNAPSHOT_PER_CPU_SWAP
+	select TRACE_PREEMPT_TOGGLE
 	help
 	  This option measures the time spent in preemption-off critical
 	  sections, with microsecond accuracy.
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e2538c7638d4..84a0cb222f20 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o
 obj-$(CONFIG_TRACING_MAP) += tracing_map.o
 obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
-obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o
+obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
 obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index f8daa754cce2..770cd30cda40 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -16,7 +16,6 @@
 
 #include "trace.h"
 
-#define CREATE_TRACE_POINTS
 #include <trace/events/preemptirq.h>
 
 #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER)
@@ -450,66 +449,6 @@ void stop_critical_timings(void)
 }
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 
-#ifdef CONFIG_IRQSOFF_TRACER
-#ifdef CONFIG_PROVE_LOCKING
-void time_hardirqs_on(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-void time_hardirqs_off(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(a0, a1);
-}
-
-#else /* !CONFIG_PROVE_LOCKING */
-
-/*
- * We are only interested in hardirq on/off events:
- */
-static inline void tracer_hardirqs_on(void)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_off(void)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-#endif /* CONFIG_PROVE_LOCKING */
-#endif /*  CONFIG_IRQSOFF_TRACER */
-
-#ifdef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		start_critical_timing(a0, a1);
-}
-#endif /* CONFIG_PREEMPT_TRACER */
-
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
@@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr)
 }
 
 #ifdef CONFIG_IRQSOFF_TRACER
+/*
+ * We are only interested in hardirq on/off events:
+ */
+static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int irqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void irqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_irqsoff(trace) register_tracer(&trace)
-#else
-# define register_irqsoff(trace) do { } while (0)
-#endif
+#endif /*  CONFIG_IRQSOFF_TRACER */
 
 #ifdef CONFIG_PREEMPT_TRACER
+static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int preemptoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_PREEMPT_OFF;
 
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_preemptoff(trace) register_tracer(&trace)
-#else
-# define register_preemptoff(trace) do { } while (0)
-#endif
+#endif /* CONFIG_PREEMPT_TRACER */
 
-#if defined(CONFIG_IRQSOFF_TRACER) && \
-	defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
 
 static int preemptirqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptirqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-
-# define register_preemptirqsoff(trace) register_tracer(&trace)
-#else
-# define register_preemptirqsoff(trace) do { } while (0)
 #endif
 
 __init static int init_irqsoff_tracer(void)
 {
-	register_irqsoff(irqsoff_tracer);
-	register_preemptoff(preemptoff_tracer);
-	register_preemptirqsoff(preemptirqsoff_tracer);
-
-	return 0;
-}
-core_initcall(init_irqsoff_tracer);
-#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
-
-#ifndef CONFIG_IRQSOFF_TRACER
-static inline void tracer_hardirqs_on(void) { }
-static inline void tracer_hardirqs_off(void) { }
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { }
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { }
+#ifdef CONFIG_IRQSOFF_TRACER
+	register_tracer(&irqsoff_tracer);
 #endif
-
-#ifndef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { }
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { }
+#ifdef CONFIG_PREEMPT_TRACER
+	register_tracer(&preemptoff_tracer);
 #endif
-
-#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING)
-/* Per-cpu variable to prevent redundant calls when IRQs already off */
-static DEFINE_PER_CPU(int, tracing_irq_cpu);
-
-void trace_hardirqs_on(void)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_on();
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
-
-void trace_hardirqs_off(void)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_off();
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
-
-__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_on_caller(caller_addr);
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_off_caller(caller_addr);
-}
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-/*
- * Stubs:
- */
-
-void trace_softirqs_on(unsigned long ip)
-{
-}
-
-void trace_softirqs_off(unsigned long ip)
-{
-}
-
-inline void print_irqtrace_events(struct task_struct *curr)
-{
-}
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
+	register_tracer(&preemptirqsoff_tracer);
 #endif
 
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
-void trace_preempt_on(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_enable_rcuidle(a0, a1);
-	tracer_preempt_on(a0, a1);
-}
-
-void trace_preempt_off(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_disable_rcuidle(a0, a1);
-	tracer_preempt_off(a0, a1);
+	return 0;
 }
-#endif
+core_initcall(init_irqsoff_tracer);
+#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
new file mode 100644
index 000000000000..dc01c7f4d326
--- /dev/null
+++ b/kernel/trace/trace_preemptirq.c
@@ -0,0 +1,71 @@
+/*
+ * preemptoff and irqoff tracepoints
+ *
+ * Copyright (C) Joel Fernandes (Google) <joel@joelfernandes.org>
+ */
+
+#include <linux/kallsyms.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/ftrace.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/preemptirq.h>
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+/* Per-cpu variable to prevent redundant calls when IRQs already off */
+static DEFINE_PER_CPU(int, tracing_irq_cpu);
+
+void trace_hardirqs_on(void)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on);
+
+void trace_hardirqs_off(void)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+}
+EXPORT_SYMBOL(trace_hardirqs_off);
+
+__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on_caller);
+
+__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+}
+EXPORT_SYMBOL(trace_hardirqs_off_caller);
+#endif /* CONFIG_TRACE_IRQFLAGS */
+
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
+
+void trace_preempt_on(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_enable_rcuidle(a0, a1);
+}
+
+void trace_preempt_off(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_disable_rcuidle(a0, a1);
+}
+#endif
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 6/8] tracing: Centralize preemptirq tracepoints and unify their usage
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: joelaf @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel at joelfernandes.org>

This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.

Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.

* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
  3 users of the events exist:
  - Lockdep
  - irqsoff and preemptoff tracers
  - irqs and preempt trace events

The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.

In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:

 PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
       |                 |     \         |           |
       \    (selects)    /      \        \ (selects) /
      TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
                      \                  /
                       \ (depends on)   /
                     PREEMPTIRQ_TRACEPOINTS

One note, I have to check for lockdep recursion in the code that calls
the trace events API and bail out if we're in lockdep recursion
protection to prevent something like the following case: a spin_lock is
taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
and then sets lockdep_recursion, and then calls __lockdep_acquired. In
this function, a call to get_lock_stats happens which calls
preempt_disable, which calls trace IRQS off somewhere which enters my
tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
This flag is then never cleared causing lockdep paths to never be
entered and thus causing splats and other bad things.

Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.

I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.

Without probes:

[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

With probes:

[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/ftrace.h            |  11 +-
 include/linux/irqflags.h          |  11 +-
 include/linux/lockdep.h           |   8 +-
 include/linux/preempt.h           |   2 +-
 include/trace/events/preemptirq.h |  23 +--
 init/main.c                       |   5 +-
 kernel/locking/lockdep.c          |  35 ++---
 kernel/sched/core.c               |   2 +-
 kernel/trace/Kconfig              |  22 ++-
 kernel/trace/Makefile             |   2 +-
 kernel/trace/trace_irqsoff.c      | 231 ++++++++----------------------
 kernel/trace/trace_preemptirq.c   |  71 +++++++++
 12 files changed, 194 insertions(+), 229 deletions(-)
 create mode 100644 kernel/trace/trace_preemptirq.c

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9c3c9a319e48..5191030af0c0 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void)
 	return CALLER_ADDR2;
 }
 
-#ifdef CONFIG_IRQSOFF_TRACER
-  extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
-  extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
-#else
-  static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
-  static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
-#endif
-
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
   extern void trace_preempt_on(unsigned long a0, unsigned long a1);
   extern void trace_preempt_off(unsigned long a0, unsigned long a1);
 #else
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 9700f00bbc04..50edb9cbbd26 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -15,9 +15,16 @@
 #include <linux/typecheck.h>
 #include <asm/irqflags.h>
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+/* Currently trace_softirqs_on/off is used only by lockdep */
+#ifdef CONFIG_PROVE_LOCKING
   extern void trace_softirqs_on(unsigned long ip);
   extern void trace_softirqs_off(unsigned long ip);
+#else
+# define trace_softirqs_on(ip)	do { } while (0)
+# define trace_softirqs_off(ip)	do { } while (0)
+#endif
+
+#ifdef CONFIG_TRACE_IRQFLAGS
   extern void trace_hardirqs_on(void);
   extern void trace_hardirqs_off(void);
 # define trace_hardirq_context(p)	((p)->hardirq_context)
@@ -43,8 +50,6 @@ do {						\
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
-# define trace_softirqs_on(ip)		do { } while (0)
-# define trace_softirqs_off(ip)		do { } while (0)
 # define trace_hardirq_context(p)	0
 # define trace_softirq_context(p)	0
 # define trace_hardirqs_enabled(p)	0
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..a8113357ceeb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -266,7 +266,8 @@ struct held_lock {
 /*
  * Initialization, self-test and debugging-output methods:
  */
-extern void lockdep_info(void);
+extern void lockdep_init(void);
+extern void lockdep_init_early(void);
 extern void lockdep_reset(void);
 extern void lockdep_reset_lock(struct lockdep_map *lock);
 extern void lockdep_free_key_range(void *start, unsigned long size);
@@ -406,7 +407,8 @@ static inline void lockdep_on(void)
 # define lock_downgrade(l, i)			do { } while (0)
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
-# define lockdep_info()				do { } while (0)
+# define lockdep_init()				do { } while (0)
+# define lockdep_init_early()			do { } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
@@ -532,7 +534,7 @@ do {								\
 
 #endif /* CONFIG_LOCKDEP */
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_PROVE_LOCKING
 extern void print_irqtrace_events(struct task_struct *curr);
 #else
 static inline void print_irqtrace_events(struct task_struct *curr)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 5bd3f151da78..c01813c3fbe9 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -150,7 +150,7 @@
  */
 #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
 
-#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
 extern void preempt_count_add(int val);
 extern void preempt_count_sub(int val);
 #define preempt_count_dec_and_test() \
diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
index 9c4eb33c5a1d..9a0d4ceeb166 100644
--- a/include/trace/events/preemptirq.h
+++ b/include/trace/events/preemptirq.h
@@ -1,4 +1,4 @@
-#ifdef CONFIG_PREEMPTIRQ_EVENTS
+#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM preemptirq
@@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template,
 		  (void *)((unsigned long)(_stext) + __entry->parent_offs))
 );
 
-#ifndef CONFIG_PROVE_LOCKING
+#ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_EVENT(preemptirq_template, irq_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable,
 DEFINE_EVENT(preemptirq_template, irq_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_irq_enable(...)
+#define trace_irq_disable(...)
+#define trace_irq_enable_rcuidle(...)
+#define trace_irq_disable_rcuidle(...)
 #endif
 
-#ifdef CONFIG_DEBUG_PREEMPT
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
 DEFINE_EVENT(preemptirq_template, preempt_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable,
 DEFINE_EVENT(preemptirq_template, preempt_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_preempt_enable(...)
+#define trace_preempt_disable(...)
+#define trace_preempt_enable_rcuidle(...)
+#define trace_preempt_disable_rcuidle(...)
 #endif
 
 #endif /* _TRACE_PREEMPTIRQ_H */
 
 #include <trace/define_trace.h>
 
-#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
+#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */
 #define trace_irq_enable(...)
 #define trace_irq_disable(...)
 #define trace_irq_enable_rcuidle(...)
 #define trace_irq_disable_rcuidle(...)
-#endif
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
 #define trace_preempt_enable(...)
 #define trace_preempt_disable(...)
 #define trace_preempt_enable_rcuidle(...)
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..44fe43be84c1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void)
 	profile_init();
 	call_function_init();
 	WARN(!irqs_disabled(), "Interrupts were enabled early\n");
+
+	lockdep_init_early();
+
 	early_boot_irqs_disabled = false;
 	local_irq_enable();
 
@@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void)
 		panic("Too many boot %s vars at `%s'", panic_later,
 		      panic_param);
 
-	lockdep_info();
+	lockdep_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 023386338269..871a42232858 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -55,6 +55,7 @@
 
 #include "lockdep_internals.h"
 
+#include <trace/events/preemptirq.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/lock.h>
 
@@ -2841,10 +2842,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip)
 	debug_atomic_inc(hardirqs_on_events);
 }
 
-__visible void trace_hardirqs_on_caller(unsigned long ip)
+static void lockdep_hardirqs_on(void *none, unsigned long ignore,
+				unsigned long ip)
 {
-	time_hardirqs_on(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2883,23 +2883,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip)
 	__trace_hardirqs_on_caller(ip);
 	current->lockdep_recursion = 0;
 }
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-void trace_hardirqs_on(void)
-{
-	trace_hardirqs_on_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
 
 /*
  * Hardirqs were disabled:
  */
-__visible void trace_hardirqs_off_caller(unsigned long ip)
+static void lockdep_hardirqs_off(void *none, unsigned long ignore,
+				 unsigned long ip)
 {
 	struct task_struct *curr = current;
 
-	time_hardirqs_off(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2921,13 +2913,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip)
 	} else
 		debug_atomic_inc(redundant_hardirqs_off);
 }
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-void trace_hardirqs_off(void)
-{
-	trace_hardirqs_off_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
 
 /*
  * Softirqs will be enabled:
@@ -4334,7 +4319,15 @@ void lockdep_reset_lock(struct lockdep_map *lock)
 	raw_local_irq_restore(flags);
 }
 
-void __init lockdep_info(void)
+void __init lockdep_init_early(void)
+{
+#ifdef CONFIG_PROVE_LOCKING
+	register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
+	register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
+#endif
+}
+
+void __init lockdep_init(void)
 {
 	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 092f7c4de903..4e9c2d254fba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3189,7 +3189,7 @@ static inline void sched_tick_stop(int cpu) { }
 #endif
 
 #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
-				defined(CONFIG_PREEMPT_TRACER))
+				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
 /*
  * If the value passed in is equal to the current preempt count
  * then we just disabled preemption. Start timing the latency.
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index c4f0f2e4126e..0bcba2a76ad9 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP
 	 Allow the use of ring_buffer_swap_cpu.
 	 Adds a very slight overhead to tracing when enabled.
 
+config PREEMPTIRQ_TRACEPOINTS
+	bool
+	depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
+	select TRACING
+	default y
+	help
+	  Create preempt/irq toggle tracepoints if needed, so that other parts
+	  of the kernel can use them to generate or add hooks to them.
+
 # All tracer options should select GENERIC_TRACER. For those options that are
 # enabled by all tracers (context switch and event tracer) they select TRACING.
 # This allows those options to appear when no other tracer is selected. But the
@@ -159,18 +168,20 @@ config FUNCTION_GRAPH_TRACER
 	  the return value. This is done by setting the current return
 	  address on the current task structure into a stack of calls.
 
+config TRACE_PREEMPT_TOGGLE
+	bool
+	help
+	  Enables hooks which will be called when preemption is first disabled,
+	  and last enabled.
 
 config PREEMPTIRQ_EVENTS
 	bool "Enable trace events for preempt and irq disable/enable"
 	select TRACE_IRQFLAGS
-	depends on DEBUG_PREEMPT || !PROVE_LOCKING
-	depends on TRACING
+	select TRACE_PREEMPT_TOGGLE if PREEMPT
+	select GENERIC_TRACER
 	default n
 	help
 	  Enable tracing of disable and enable events for preemption and irqs.
-	  For tracing preempt disable/enable events, DEBUG_PREEMPT must be
-	  enabled. For tracing irq disable/enable events, PROVE_LOCKING must
-	  be disabled.
 
 config IRQSOFF_TRACER
 	bool "Interrupts-off Latency Tracer"
@@ -207,6 +218,7 @@ config PREEMPT_TRACER
 	select RING_BUFFER_ALLOW_SWAP
 	select TRACER_SNAPSHOT
 	select TRACER_SNAPSHOT_PER_CPU_SWAP
+	select TRACE_PREEMPT_TOGGLE
 	help
 	  This option measures the time spent in preemption-off critical
 	  sections, with microsecond accuracy.
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e2538c7638d4..84a0cb222f20 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o
 obj-$(CONFIG_TRACING_MAP) += tracing_map.o
 obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
-obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o
+obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
 obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index f8daa754cce2..770cd30cda40 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -16,7 +16,6 @@
 
 #include "trace.h"
 
-#define CREATE_TRACE_POINTS
 #include <trace/events/preemptirq.h>
 
 #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER)
@@ -450,66 +449,6 @@ void stop_critical_timings(void)
 }
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 
-#ifdef CONFIG_IRQSOFF_TRACER
-#ifdef CONFIG_PROVE_LOCKING
-void time_hardirqs_on(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-void time_hardirqs_off(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(a0, a1);
-}
-
-#else /* !CONFIG_PROVE_LOCKING */
-
-/*
- * We are only interested in hardirq on/off events:
- */
-static inline void tracer_hardirqs_on(void)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_off(void)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-#endif /* CONFIG_PROVE_LOCKING */
-#endif /*  CONFIG_IRQSOFF_TRACER */
-
-#ifdef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		start_critical_timing(a0, a1);
-}
-#endif /* CONFIG_PREEMPT_TRACER */
-
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
@@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr)
 }
 
 #ifdef CONFIG_IRQSOFF_TRACER
+/*
+ * We are only interested in hardirq on/off events:
+ */
+static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int irqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void irqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_irqsoff(trace) register_tracer(&trace)
-#else
-# define register_irqsoff(trace) do { } while (0)
-#endif
+#endif /*  CONFIG_IRQSOFF_TRACER */
 
 #ifdef CONFIG_PREEMPT_TRACER
+static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int preemptoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_PREEMPT_OFF;
 
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_preemptoff(trace) register_tracer(&trace)
-#else
-# define register_preemptoff(trace) do { } while (0)
-#endif
+#endif /* CONFIG_PREEMPT_TRACER */
 
-#if defined(CONFIG_IRQSOFF_TRACER) && \
-	defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
 
 static int preemptirqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptirqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-
-# define register_preemptirqsoff(trace) register_tracer(&trace)
-#else
-# define register_preemptirqsoff(trace) do { } while (0)
 #endif
 
 __init static int init_irqsoff_tracer(void)
 {
-	register_irqsoff(irqsoff_tracer);
-	register_preemptoff(preemptoff_tracer);
-	register_preemptirqsoff(preemptirqsoff_tracer);
-
-	return 0;
-}
-core_initcall(init_irqsoff_tracer);
-#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
-
-#ifndef CONFIG_IRQSOFF_TRACER
-static inline void tracer_hardirqs_on(void) { }
-static inline void tracer_hardirqs_off(void) { }
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { }
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { }
+#ifdef CONFIG_IRQSOFF_TRACER
+	register_tracer(&irqsoff_tracer);
 #endif
-
-#ifndef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { }
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { }
+#ifdef CONFIG_PREEMPT_TRACER
+	register_tracer(&preemptoff_tracer);
 #endif
-
-#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING)
-/* Per-cpu variable to prevent redundant calls when IRQs already off */
-static DEFINE_PER_CPU(int, tracing_irq_cpu);
-
-void trace_hardirqs_on(void)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_on();
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
-
-void trace_hardirqs_off(void)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_off();
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
-
-__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_on_caller(caller_addr);
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_off_caller(caller_addr);
-}
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-/*
- * Stubs:
- */
-
-void trace_softirqs_on(unsigned long ip)
-{
-}
-
-void trace_softirqs_off(unsigned long ip)
-{
-}
-
-inline void print_irqtrace_events(struct task_struct *curr)
-{
-}
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
+	register_tracer(&preemptirqsoff_tracer);
 #endif
 
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
-void trace_preempt_on(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_enable_rcuidle(a0, a1);
-	tracer_preempt_on(a0, a1);
-}
-
-void trace_preempt_off(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_disable_rcuidle(a0, a1);
-	tracer_preempt_off(a0, a1);
+	return 0;
 }
-#endif
+core_initcall(init_irqsoff_tracer);
+#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
new file mode 100644
index 000000000000..dc01c7f4d326
--- /dev/null
+++ b/kernel/trace/trace_preemptirq.c
@@ -0,0 +1,71 @@
+/*
+ * preemptoff and irqoff tracepoints
+ *
+ * Copyright (C) Joel Fernandes (Google) <joel at joelfernandes.org>
+ */
+
+#include <linux/kallsyms.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/ftrace.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/preemptirq.h>
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+/* Per-cpu variable to prevent redundant calls when IRQs already off */
+static DEFINE_PER_CPU(int, tracing_irq_cpu);
+
+void trace_hardirqs_on(void)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on);
+
+void trace_hardirqs_off(void)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+}
+EXPORT_SYMBOL(trace_hardirqs_off);
+
+__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on_caller);
+
+__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+}
+EXPORT_SYMBOL(trace_hardirqs_off_caller);
+#endif /* CONFIG_TRACE_IRQFLAGS */
+
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
+
+void trace_preempt_on(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_enable_rcuidle(a0, a1);
+}
+
+void trace_preempt_off(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_disable_rcuidle(a0, a1);
+}
+#endif
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 6/8] tracing: Centralize preemptirq tracepoints and unify their usage
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.

Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.

* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
  3 users of the events exist:
  - Lockdep
  - irqsoff and preemptoff tracers
  - irqs and preempt trace events

The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.

In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:

 PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
       |                 |     \         |           |
       \    (selects)    /      \        \ (selects) /
      TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
                      \                  /
                       \ (depends on)   /
                     PREEMPTIRQ_TRACEPOINTS

One note, I have to check for lockdep recursion in the code that calls
the trace events API and bail out if we're in lockdep recursion
protection to prevent something like the following case: a spin_lock is
taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
and then sets lockdep_recursion, and then calls __lockdep_acquired. In
this function, a call to get_lock_stats happens which calls
preempt_disable, which calls trace IRQS off somewhere which enters my
tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
This flag is then never cleared causing lockdep paths to never be
entered and thus causing splats and other bad things.

Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.

I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.

Without probes:

[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

With probes:

[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/ftrace.h            |  11 +-
 include/linux/irqflags.h          |  11 +-
 include/linux/lockdep.h           |   8 +-
 include/linux/preempt.h           |   2 +-
 include/trace/events/preemptirq.h |  23 +--
 init/main.c                       |   5 +-
 kernel/locking/lockdep.c          |  35 ++---
 kernel/sched/core.c               |   2 +-
 kernel/trace/Kconfig              |  22 ++-
 kernel/trace/Makefile             |   2 +-
 kernel/trace/trace_irqsoff.c      | 231 ++++++++----------------------
 kernel/trace/trace_preemptirq.c   |  71 +++++++++
 12 files changed, 194 insertions(+), 229 deletions(-)
 create mode 100644 kernel/trace/trace_preemptirq.c

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9c3c9a319e48..5191030af0c0 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void)
 	return CALLER_ADDR2;
 }
 
-#ifdef CONFIG_IRQSOFF_TRACER
-  extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
-  extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
-#else
-  static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
-  static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
-#endif
-
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
   extern void trace_preempt_on(unsigned long a0, unsigned long a1);
   extern void trace_preempt_off(unsigned long a0, unsigned long a1);
 #else
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 9700f00bbc04..50edb9cbbd26 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -15,9 +15,16 @@
 #include <linux/typecheck.h>
 #include <asm/irqflags.h>
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+/* Currently trace_softirqs_on/off is used only by lockdep */
+#ifdef CONFIG_PROVE_LOCKING
   extern void trace_softirqs_on(unsigned long ip);
   extern void trace_softirqs_off(unsigned long ip);
+#else
+# define trace_softirqs_on(ip)	do { } while (0)
+# define trace_softirqs_off(ip)	do { } while (0)
+#endif
+
+#ifdef CONFIG_TRACE_IRQFLAGS
   extern void trace_hardirqs_on(void);
   extern void trace_hardirqs_off(void);
 # define trace_hardirq_context(p)	((p)->hardirq_context)
@@ -43,8 +50,6 @@ do {						\
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
-# define trace_softirqs_on(ip)		do { } while (0)
-# define trace_softirqs_off(ip)		do { } while (0)
 # define trace_hardirq_context(p)	0
 # define trace_softirq_context(p)	0
 # define trace_hardirqs_enabled(p)	0
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..a8113357ceeb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -266,7 +266,8 @@ struct held_lock {
 /*
  * Initialization, self-test and debugging-output methods:
  */
-extern void lockdep_info(void);
+extern void lockdep_init(void);
+extern void lockdep_init_early(void);
 extern void lockdep_reset(void);
 extern void lockdep_reset_lock(struct lockdep_map *lock);
 extern void lockdep_free_key_range(void *start, unsigned long size);
@@ -406,7 +407,8 @@ static inline void lockdep_on(void)
 # define lock_downgrade(l, i)			do { } while (0)
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
-# define lockdep_info()				do { } while (0)
+# define lockdep_init()				do { } while (0)
+# define lockdep_init_early()			do { } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
@@ -532,7 +534,7 @@ do {								\
 
 #endif /* CONFIG_LOCKDEP */
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_PROVE_LOCKING
 extern void print_irqtrace_events(struct task_struct *curr);
 #else
 static inline void print_irqtrace_events(struct task_struct *curr)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 5bd3f151da78..c01813c3fbe9 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -150,7 +150,7 @@
  */
 #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
 
-#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
 extern void preempt_count_add(int val);
 extern void preempt_count_sub(int val);
 #define preempt_count_dec_and_test() \
diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
index 9c4eb33c5a1d..9a0d4ceeb166 100644
--- a/include/trace/events/preemptirq.h
+++ b/include/trace/events/preemptirq.h
@@ -1,4 +1,4 @@
-#ifdef CONFIG_PREEMPTIRQ_EVENTS
+#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM preemptirq
@@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template,
 		  (void *)((unsigned long)(_stext) + __entry->parent_offs))
 );
 
-#ifndef CONFIG_PROVE_LOCKING
+#ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_EVENT(preemptirq_template, irq_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable,
 DEFINE_EVENT(preemptirq_template, irq_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_irq_enable(...)
+#define trace_irq_disable(...)
+#define trace_irq_enable_rcuidle(...)
+#define trace_irq_disable_rcuidle(...)
 #endif
 
-#ifdef CONFIG_DEBUG_PREEMPT
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
 DEFINE_EVENT(preemptirq_template, preempt_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable,
 DEFINE_EVENT(preemptirq_template, preempt_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_preempt_enable(...)
+#define trace_preempt_disable(...)
+#define trace_preempt_enable_rcuidle(...)
+#define trace_preempt_disable_rcuidle(...)
 #endif
 
 #endif /* _TRACE_PREEMPTIRQ_H */
 
 #include <trace/define_trace.h>
 
-#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
+#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */
 #define trace_irq_enable(...)
 #define trace_irq_disable(...)
 #define trace_irq_enable_rcuidle(...)
 #define trace_irq_disable_rcuidle(...)
-#endif
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
 #define trace_preempt_enable(...)
 #define trace_preempt_disable(...)
 #define trace_preempt_enable_rcuidle(...)
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..44fe43be84c1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void)
 	profile_init();
 	call_function_init();
 	WARN(!irqs_disabled(), "Interrupts were enabled early\n");
+
+	lockdep_init_early();
+
 	early_boot_irqs_disabled = false;
 	local_irq_enable();
 
@@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void)
 		panic("Too many boot %s vars at `%s'", panic_later,
 		      panic_param);
 
-	lockdep_info();
+	lockdep_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 023386338269..871a42232858 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -55,6 +55,7 @@
 
 #include "lockdep_internals.h"
 
+#include <trace/events/preemptirq.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/lock.h>
 
@@ -2841,10 +2842,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip)
 	debug_atomic_inc(hardirqs_on_events);
 }
 
-__visible void trace_hardirqs_on_caller(unsigned long ip)
+static void lockdep_hardirqs_on(void *none, unsigned long ignore,
+				unsigned long ip)
 {
-	time_hardirqs_on(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2883,23 +2883,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip)
 	__trace_hardirqs_on_caller(ip);
 	current->lockdep_recursion = 0;
 }
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-void trace_hardirqs_on(void)
-{
-	trace_hardirqs_on_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
 
 /*
  * Hardirqs were disabled:
  */
-__visible void trace_hardirqs_off_caller(unsigned long ip)
+static void lockdep_hardirqs_off(void *none, unsigned long ignore,
+				 unsigned long ip)
 {
 	struct task_struct *curr = current;
 
-	time_hardirqs_off(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2921,13 +2913,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip)
 	} else
 		debug_atomic_inc(redundant_hardirqs_off);
 }
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-void trace_hardirqs_off(void)
-{
-	trace_hardirqs_off_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
 
 /*
  * Softirqs will be enabled:
@@ -4334,7 +4319,15 @@ void lockdep_reset_lock(struct lockdep_map *lock)
 	raw_local_irq_restore(flags);
 }
 
-void __init lockdep_info(void)
+void __init lockdep_init_early(void)
+{
+#ifdef CONFIG_PROVE_LOCKING
+	register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
+	register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
+#endif
+}
+
+void __init lockdep_init(void)
 {
 	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 092f7c4de903..4e9c2d254fba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3189,7 +3189,7 @@ static inline void sched_tick_stop(int cpu) { }
 #endif
 
 #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
-				defined(CONFIG_PREEMPT_TRACER))
+				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
 /*
  * If the value passed in is equal to the current preempt count
  * then we just disabled preemption. Start timing the latency.
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index c4f0f2e4126e..0bcba2a76ad9 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP
 	 Allow the use of ring_buffer_swap_cpu.
 	 Adds a very slight overhead to tracing when enabled.
 
+config PREEMPTIRQ_TRACEPOINTS
+	bool
+	depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
+	select TRACING
+	default y
+	help
+	  Create preempt/irq toggle tracepoints if needed, so that other parts
+	  of the kernel can use them to generate or add hooks to them.
+
 # All tracer options should select GENERIC_TRACER. For those options that are
 # enabled by all tracers (context switch and event tracer) they select TRACING.
 # This allows those options to appear when no other tracer is selected. But the
@@ -159,18 +168,20 @@ config FUNCTION_GRAPH_TRACER
 	  the return value. This is done by setting the current return
 	  address on the current task structure into a stack of calls.
 
+config TRACE_PREEMPT_TOGGLE
+	bool
+	help
+	  Enables hooks which will be called when preemption is first disabled,
+	  and last enabled.
 
 config PREEMPTIRQ_EVENTS
 	bool "Enable trace events for preempt and irq disable/enable"
 	select TRACE_IRQFLAGS
-	depends on DEBUG_PREEMPT || !PROVE_LOCKING
-	depends on TRACING
+	select TRACE_PREEMPT_TOGGLE if PREEMPT
+	select GENERIC_TRACER
 	default n
 	help
 	  Enable tracing of disable and enable events for preemption and irqs.
-	  For tracing preempt disable/enable events, DEBUG_PREEMPT must be
-	  enabled. For tracing irq disable/enable events, PROVE_LOCKING must
-	  be disabled.
 
 config IRQSOFF_TRACER
 	bool "Interrupts-off Latency Tracer"
@@ -207,6 +218,7 @@ config PREEMPT_TRACER
 	select RING_BUFFER_ALLOW_SWAP
 	select TRACER_SNAPSHOT
 	select TRACER_SNAPSHOT_PER_CPU_SWAP
+	select TRACE_PREEMPT_TOGGLE
 	help
 	  This option measures the time spent in preemption-off critical
 	  sections, with microsecond accuracy.
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e2538c7638d4..84a0cb222f20 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o
 obj-$(CONFIG_TRACING_MAP) += tracing_map.o
 obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
-obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o
+obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
 obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index f8daa754cce2..770cd30cda40 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -16,7 +16,6 @@
 
 #include "trace.h"
 
-#define CREATE_TRACE_POINTS
 #include <trace/events/preemptirq.h>
 
 #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER)
@@ -450,66 +449,6 @@ void stop_critical_timings(void)
 }
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 
-#ifdef CONFIG_IRQSOFF_TRACER
-#ifdef CONFIG_PROVE_LOCKING
-void time_hardirqs_on(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-void time_hardirqs_off(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(a0, a1);
-}
-
-#else /* !CONFIG_PROVE_LOCKING */
-
-/*
- * We are only interested in hardirq on/off events:
- */
-static inline void tracer_hardirqs_on(void)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_off(void)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-#endif /* CONFIG_PROVE_LOCKING */
-#endif /*  CONFIG_IRQSOFF_TRACER */
-
-#ifdef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		start_critical_timing(a0, a1);
-}
-#endif /* CONFIG_PREEMPT_TRACER */
-
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
@@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr)
 }
 
 #ifdef CONFIG_IRQSOFF_TRACER
+/*
+ * We are only interested in hardirq on/off events:
+ */
+static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int irqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void irqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_irqsoff(trace) register_tracer(&trace)
-#else
-# define register_irqsoff(trace) do { } while (0)
-#endif
+#endif /*  CONFIG_IRQSOFF_TRACER */
 
 #ifdef CONFIG_PREEMPT_TRACER
+static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int preemptoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_PREEMPT_OFF;
 
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_preemptoff(trace) register_tracer(&trace)
-#else
-# define register_preemptoff(trace) do { } while (0)
-#endif
+#endif /* CONFIG_PREEMPT_TRACER */
 
-#if defined(CONFIG_IRQSOFF_TRACER) && \
-	defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
 
 static int preemptirqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptirqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-
-# define register_preemptirqsoff(trace) register_tracer(&trace)
-#else
-# define register_preemptirqsoff(trace) do { } while (0)
 #endif
 
 __init static int init_irqsoff_tracer(void)
 {
-	register_irqsoff(irqsoff_tracer);
-	register_preemptoff(preemptoff_tracer);
-	register_preemptirqsoff(preemptirqsoff_tracer);
-
-	return 0;
-}
-core_initcall(init_irqsoff_tracer);
-#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
-
-#ifndef CONFIG_IRQSOFF_TRACER
-static inline void tracer_hardirqs_on(void) { }
-static inline void tracer_hardirqs_off(void) { }
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { }
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { }
+#ifdef CONFIG_IRQSOFF_TRACER
+	register_tracer(&irqsoff_tracer);
 #endif
-
-#ifndef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { }
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { }
+#ifdef CONFIG_PREEMPT_TRACER
+	register_tracer(&preemptoff_tracer);
 #endif
-
-#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING)
-/* Per-cpu variable to prevent redundant calls when IRQs already off */
-static DEFINE_PER_CPU(int, tracing_irq_cpu);
-
-void trace_hardirqs_on(void)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_on();
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
-
-void trace_hardirqs_off(void)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_off();
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
-
-__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_on_caller(caller_addr);
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_off_caller(caller_addr);
-}
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-/*
- * Stubs:
- */
-
-void trace_softirqs_on(unsigned long ip)
-{
-}
-
-void trace_softirqs_off(unsigned long ip)
-{
-}
-
-inline void print_irqtrace_events(struct task_struct *curr)
-{
-}
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
+	register_tracer(&preemptirqsoff_tracer);
 #endif
 
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
-void trace_preempt_on(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_enable_rcuidle(a0, a1);
-	tracer_preempt_on(a0, a1);
-}
-
-void trace_preempt_off(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_disable_rcuidle(a0, a1);
-	tracer_preempt_off(a0, a1);
+	return 0;
 }
-#endif
+core_initcall(init_irqsoff_tracer);
+#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
new file mode 100644
index 000000000000..dc01c7f4d326
--- /dev/null
+++ b/kernel/trace/trace_preemptirq.c
@@ -0,0 +1,71 @@
+/*
+ * preemptoff and irqoff tracepoints
+ *
+ * Copyright (C) Joel Fernandes (Google) <joel at joelfernandes.org>
+ */
+
+#include <linux/kallsyms.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/ftrace.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/preemptirq.h>
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+/* Per-cpu variable to prevent redundant calls when IRQs already off */
+static DEFINE_PER_CPU(int, tracing_irq_cpu);
+
+void trace_hardirqs_on(void)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on);
+
+void trace_hardirqs_off(void)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+}
+EXPORT_SYMBOL(trace_hardirqs_off);
+
+__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on_caller);
+
+__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+}
+EXPORT_SYMBOL(trace_hardirqs_off_caller);
+#endif /* CONFIG_TRACE_IRQFLAGS */
+
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
+
+void trace_preempt_on(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_enable_rcuidle(a0, a1);
+}
+
+void trace_preempt_off(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_disable_rcuidle(a0, a1);
+}
+#endif
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
  2018-05-30  0:04 ` joelaf
  (?)
@ 2018-05-30  0:04   ` joelaf
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

In this patch we introduce a test module for simulating a long atomic
section in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
preempt -1066    2...2 500012us : <stack trace>
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -1069    1d..1    0us@: atomic_sect_run
irq dis -1069    1d..1 500001us : atomic_sect_run
irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
irq dis -1069    1d..1 500005us : <stack trace>
 => ret_from_fork

Co-authored-by: Erick Reyes <erickreyes@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 lib/Kconfig.debug          |  8 ++++
 lib/Makefile               |  1 +
 lib/test_atomic_sections.c | 79 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+)
 create mode 100644 lib/test_atomic_sections.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..faebf0fe3bcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1933,6 +1933,14 @@ config TEST_KMOD
 
 	  If unsure, say N.
 
+config TEST_ATOMIC_SECTIONS
+	tristate "Simulate atomic sections for tracers to detect"
+	depends on m
+	help
+	  Select this option to build a test module that can help test atomic
+	  sections by simulating them with a duration supplied as a module
+	  parameter. Preempt disable and irq disable modes can be requested.
+
 config TEST_DEBUG_VIRTUAL
 	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
 	depends on DEBUG_VIRTUAL
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..e82cf5445b7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,6 +46,7 @@ obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
+obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
 obj-y += kstrtox.o
 obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
new file mode 100644
index 000000000000..2c8c6419d183
--- /dev/null
+++ b/lib/test_atomic_sections.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Atomic section emulation test module
+ *
+ * Emulates atomic sections by disabling IRQs or preemption
+ * and doing a busy wait for a specified amount of time.
+ * This can be used for testing of different atomic section
+ * tracers such as irqsoff tracers.
+ *
+ * (c) 2018. Google LLC
+ */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+static int atomic_time = 100;
+static char atomic_mode[10] = "irq";
+
+module_param_named(atomic_time, atomic_time, int, S_IRUGO);
+module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
+MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
+MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)");
+
+static void busy_wait(int time)
+{
+	ktime_t start, end;
+	start = ktime_get();
+	do {
+		end = ktime_get();
+		if (kthread_should_stop())
+			break;
+	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
+}
+
+void atomic_sect_run(void *data)
+{
+	unsigned long flags;
+
+	if (!strcmp(atomic_mode, "irq")) {
+		local_irq_save(flags);
+		busy_wait(atomic_time);
+		local_irq_restore(flags);
+	} else if (!strcmp(atomic_mode, "preempt")) {
+		preempt_disable();
+		busy_wait(atomic_time);
+		preempt_enable();
+	}
+	do_exit(0);
+}
+
+static int __init atomic_sect_init(void)
+{
+	char task_name[50];
+	struct task_struct *test_task;
+
+	snprintf(task_name, 50, "%s dis test", atomic_mode);
+
+	test_task = kthread_run((void*)atomic_sect_run, NULL, task_name);
+	if (IS_ERR(test_task))
+		return PTR_ERR(test_task);
+
+	return 0;
+}
+
+static void __exit atomic_sect_exit(void)
+{
+	return;
+}
+
+module_init(atomic_sect_init)
+module_exit(atomic_sect_exit)
+MODULE_LICENSE("GPL");
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: joelaf @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel at joelfernandes.org>

In this patch we introduce a test module for simulating a long atomic
section in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
preempt -1066    2...2 500012us : <stack trace>
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -1069    1d..1    0us@: atomic_sect_run
irq dis -1069    1d..1 500001us : atomic_sect_run
irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
irq dis -1069    1d..1 500005us : <stack trace>
 => ret_from_fork

Co-authored-by: Erick Reyes <erickreyes at google.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 lib/Kconfig.debug          |  8 ++++
 lib/Makefile               |  1 +
 lib/test_atomic_sections.c | 79 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+)
 create mode 100644 lib/test_atomic_sections.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..faebf0fe3bcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1933,6 +1933,14 @@ config TEST_KMOD
 
 	  If unsure, say N.
 
+config TEST_ATOMIC_SECTIONS
+	tristate "Simulate atomic sections for tracers to detect"
+	depends on m
+	help
+	  Select this option to build a test module that can help test atomic
+	  sections by simulating them with a duration supplied as a module
+	  parameter. Preempt disable and irq disable modes can be requested.
+
 config TEST_DEBUG_VIRTUAL
 	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
 	depends on DEBUG_VIRTUAL
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..e82cf5445b7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,6 +46,7 @@ obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
+obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
 obj-y += kstrtox.o
 obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
new file mode 100644
index 000000000000..2c8c6419d183
--- /dev/null
+++ b/lib/test_atomic_sections.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Atomic section emulation test module
+ *
+ * Emulates atomic sections by disabling IRQs or preemption
+ * and doing a busy wait for a specified amount of time.
+ * This can be used for testing of different atomic section
+ * tracers such as irqsoff tracers.
+ *
+ * (c) 2018. Google LLC
+ */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+static int atomic_time = 100;
+static char atomic_mode[10] = "irq";
+
+module_param_named(atomic_time, atomic_time, int, S_IRUGO);
+module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
+MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
+MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)");
+
+static void busy_wait(int time)
+{
+	ktime_t start, end;
+	start = ktime_get();
+	do {
+		end = ktime_get();
+		if (kthread_should_stop())
+			break;
+	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
+}
+
+void atomic_sect_run(void *data)
+{
+	unsigned long flags;
+
+	if (!strcmp(atomic_mode, "irq")) {
+		local_irq_save(flags);
+		busy_wait(atomic_time);
+		local_irq_restore(flags);
+	} else if (!strcmp(atomic_mode, "preempt")) {
+		preempt_disable();
+		busy_wait(atomic_time);
+		preempt_enable();
+	}
+	do_exit(0);
+}
+
+static int __init atomic_sect_init(void)
+{
+	char task_name[50];
+	struct task_struct *test_task;
+
+	snprintf(task_name, 50, "%s dis test", atomic_mode);
+
+	test_task = kthread_run((void*)atomic_sect_run, NULL, task_name);
+	if (IS_ERR(test_task))
+		return PTR_ERR(test_task);
+
+	return 0;
+}
+
+static void __exit atomic_sect_exit(void)
+{
+	return;
+}
+
+module_init(atomic_sect_init)
+module_exit(atomic_sect_exit)
+MODULE_LICENSE("GPL");
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
@ 2018-05-30  0:04   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:04 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

In this patch we introduce a test module for simulating a long atomic
section in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
preempt -1066    2...2 500012us : <stack trace>
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -1069    1d..1    0us@: atomic_sect_run
irq dis -1069    1d..1 500001us : atomic_sect_run
irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
irq dis -1069    1d..1 500005us : <stack trace>
 => ret_from_fork

Co-authored-by: Erick Reyes <erickreyes at google.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 lib/Kconfig.debug          |  8 ++++
 lib/Makefile               |  1 +
 lib/test_atomic_sections.c | 79 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+)
 create mode 100644 lib/test_atomic_sections.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..faebf0fe3bcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1933,6 +1933,14 @@ config TEST_KMOD
 
 	  If unsure, say N.
 
+config TEST_ATOMIC_SECTIONS
+	tristate "Simulate atomic sections for tracers to detect"
+	depends on m
+	help
+	  Select this option to build a test module that can help test atomic
+	  sections by simulating them with a duration supplied as a module
+	  parameter. Preempt disable and irq disable modes can be requested.
+
 config TEST_DEBUG_VIRTUAL
 	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
 	depends on DEBUG_VIRTUAL
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..e82cf5445b7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,6 +46,7 @@ obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
+obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
 obj-y += kstrtox.o
 obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
new file mode 100644
index 000000000000..2c8c6419d183
--- /dev/null
+++ b/lib/test_atomic_sections.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Atomic section emulation test module
+ *
+ * Emulates atomic sections by disabling IRQs or preemption
+ * and doing a busy wait for a specified amount of time.
+ * This can be used for testing of different atomic section
+ * tracers such as irqsoff tracers.
+ *
+ * (c) 2018. Google LLC
+ */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+static int atomic_time = 100;
+static char atomic_mode[10] = "irq";
+
+module_param_named(atomic_time, atomic_time, int, S_IRUGO);
+module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
+MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
+MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)");
+
+static void busy_wait(int time)
+{
+	ktime_t start, end;
+	start = ktime_get();
+	do {
+		end = ktime_get();
+		if (kthread_should_stop())
+			break;
+	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
+}
+
+void atomic_sect_run(void *data)
+{
+	unsigned long flags;
+
+	if (!strcmp(atomic_mode, "irq")) {
+		local_irq_save(flags);
+		busy_wait(atomic_time);
+		local_irq_restore(flags);
+	} else if (!strcmp(atomic_mode, "preempt")) {
+		preempt_disable();
+		busy_wait(atomic_time);
+		preempt_enable();
+	}
+	do_exit(0);
+}
+
+static int __init atomic_sect_init(void)
+{
+	char task_name[50];
+	struct task_struct *test_task;
+
+	snprintf(task_name, 50, "%s dis test", atomic_mode);
+
+	test_task = kthread_run((void*)atomic_sect_run, NULL, task_name);
+	if (IS_ERR(test_task))
+		return PTR_ERR(test_task);
+
+	return 0;
+}
+
+static void __exit atomic_sect_exit(void)
+{
+	return;
+}
+
+module_init(atomic_sect_init)
+module_exit(atomic_sect_exit)
+MODULE_LICENSE("GPL");
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 8/8] kselftests: Add tests for the preemptoff and irqsoff tracers
  2018-05-30  0:04 ` joelaf
  (?)
@ 2018-05-30  0:05   ` joelaf
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Here we add unit tests for the preemptoff and irqsoff tracer by using a
kernel module introduced previously to trigger atomic sections in the
kernel.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/ftrace/config         |  3 +
 .../test.d/preemptirq/irqsoff_tracer.tc       | 74 +++++++++++++++++++
 2 files changed, 77 insertions(+)
 create mode 100644 tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc

diff --git a/tools/testing/selftests/ftrace/config b/tools/testing/selftests/ftrace/config
index b01924c71c09..29588b328345 100644
--- a/tools/testing/selftests/ftrace/config
+++ b/tools/testing/selftests/ftrace/config
@@ -4,3 +4,6 @@ CONFIG_FUNCTION_PROFILER=y
 CONFIG_TRACER_SNAPSHOT=y
 CONFIG_STACK_TRACER=y
 CONFIG_HIST_TRIGGERS=y
+CONFIG_PREEMPT_TRACER=y
+CONFIG_IRQSOFF_TRACER=y
+CONFIG_TEST_ATOMIC_SECTIONS=m
diff --git a/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc b/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc
new file mode 100644
index 000000000000..1764ff22c02b
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc
@@ -0,0 +1,74 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: test for the preemptirqsoff tracer
+
+MOD=test_atomic_sections
+
+fail() {
+    reset_tracer
+    rmmod $MOD || true
+    exit_fail
+}
+
+unsup() { #msg
+    reset_tracer
+    rmmod $MOD || true
+    echo $1
+    exit_unsupported
+}
+
+modprobe $MOD || unsup "$MOD module not available"
+rmmod $MOD
+
+grep -q "preemptoff" available_tracers || unsup "preemptoff tracer not enabled"
+grep -q "irqsoff" available_tracers || unsup "irqsoff tracer not enabled"
+
+reset_tracer
+
+# Simulate preemptoff section for half a second couple of times
+echo preemptoff > current_tracer
+sleep 1
+modprobe test_atomic_sections atomic_mode=preempt atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=preempt atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=preempt atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+
+cat trace
+
+# Confirm which tracer
+grep -q "tracer: preemptoff" trace || fail
+
+# Check the end of the section
+egrep -q "5.....us : <stack trace>" trace || fail
+
+# Check for 500ms of latency
+egrep -q "latency: 5..... us" trace || fail
+
+reset_tracer
+
+# Simulate irqsoff section for half a second couple of times
+echo irqsoff > current_tracer
+sleep 1
+modprobe test_atomic_sections atomic_mode=irq atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=irq atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=irq atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+
+cat trace
+
+# Confirm which tracer
+grep -q "tracer: irqsoff" trace || fail
+
+# Check the end of the section
+egrep -q "5.....us : <stack trace>" trace || fail
+
+# Check for 500ms of latency
+egrep -q "latency: 5..... us" trace || fail
+
+reset_tracer
+exit 0
+
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 8/8] kselftests: Add tests for the preemptoff and irqsoff tracers
@ 2018-05-30  0:05   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: joelaf @ 2018-05-30  0:05 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel at joelfernandes.org>

Here we add unit tests for the preemptoff and irqsoff tracer by using a
kernel module introduced previously to trigger atomic sections in the
kernel.

Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 tools/testing/selftests/ftrace/config         |  3 +
 .../test.d/preemptirq/irqsoff_tracer.tc       | 74 +++++++++++++++++++
 2 files changed, 77 insertions(+)
 create mode 100644 tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc

diff --git a/tools/testing/selftests/ftrace/config b/tools/testing/selftests/ftrace/config
index b01924c71c09..29588b328345 100644
--- a/tools/testing/selftests/ftrace/config
+++ b/tools/testing/selftests/ftrace/config
@@ -4,3 +4,6 @@ CONFIG_FUNCTION_PROFILER=y
 CONFIG_TRACER_SNAPSHOT=y
 CONFIG_STACK_TRACER=y
 CONFIG_HIST_TRIGGERS=y
+CONFIG_PREEMPT_TRACER=y
+CONFIG_IRQSOFF_TRACER=y
+CONFIG_TEST_ATOMIC_SECTIONS=m
diff --git a/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc b/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc
new file mode 100644
index 000000000000..1764ff22c02b
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc
@@ -0,0 +1,74 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: test for the preemptirqsoff tracer
+
+MOD=test_atomic_sections
+
+fail() {
+    reset_tracer
+    rmmod $MOD || true
+    exit_fail
+}
+
+unsup() { #msg
+    reset_tracer
+    rmmod $MOD || true
+    echo $1
+    exit_unsupported
+}
+
+modprobe $MOD || unsup "$MOD module not available"
+rmmod $MOD
+
+grep -q "preemptoff" available_tracers || unsup "preemptoff tracer not enabled"
+grep -q "irqsoff" available_tracers || unsup "irqsoff tracer not enabled"
+
+reset_tracer
+
+# Simulate preemptoff section for half a second couple of times
+echo preemptoff > current_tracer
+sleep 1
+modprobe test_atomic_sections atomic_mode=preempt atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=preempt atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=preempt atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+
+cat trace
+
+# Confirm which tracer
+grep -q "tracer: preemptoff" trace || fail
+
+# Check the end of the section
+egrep -q "5.....us : <stack trace>" trace || fail
+
+# Check for 500ms of latency
+egrep -q "latency: 5..... us" trace || fail
+
+reset_tracer
+
+# Simulate irqsoff section for half a second couple of times
+echo irqsoff > current_tracer
+sleep 1
+modprobe test_atomic_sections atomic_mode=irq atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=irq atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=irq atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+
+cat trace
+
+# Confirm which tracer
+grep -q "tracer: irqsoff" trace || fail
+
+# Check the end of the section
+egrep -q "5.....us : <stack trace>" trace || fail
+
+# Check for 500ms of latency
+egrep -q "latency: 5..... us" trace || fail
+
+reset_tracer
+exit 0
+
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 8/8] kselftests: Add tests for the preemptoff and irqsoff tracers
@ 2018-05-30  0:05   ` joelaf
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-30  0:05 UTC (permalink / raw)


From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Here we add unit tests for the preemptoff and irqsoff tracer by using a
kernel module introduced previously to trigger atomic sections in the
kernel.

Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 tools/testing/selftests/ftrace/config         |  3 +
 .../test.d/preemptirq/irqsoff_tracer.tc       | 74 +++++++++++++++++++
 2 files changed, 77 insertions(+)
 create mode 100644 tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc

diff --git a/tools/testing/selftests/ftrace/config b/tools/testing/selftests/ftrace/config
index b01924c71c09..29588b328345 100644
--- a/tools/testing/selftests/ftrace/config
+++ b/tools/testing/selftests/ftrace/config
@@ -4,3 +4,6 @@ CONFIG_FUNCTION_PROFILER=y
 CONFIG_TRACER_SNAPSHOT=y
 CONFIG_STACK_TRACER=y
 CONFIG_HIST_TRIGGERS=y
+CONFIG_PREEMPT_TRACER=y
+CONFIG_IRQSOFF_TRACER=y
+CONFIG_TEST_ATOMIC_SECTIONS=m
diff --git a/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc b/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc
new file mode 100644
index 000000000000..1764ff22c02b
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc
@@ -0,0 +1,74 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: test for the preemptirqsoff tracer
+
+MOD=test_atomic_sections
+
+fail() {
+    reset_tracer
+    rmmod $MOD || true
+    exit_fail
+}
+
+unsup() { #msg
+    reset_tracer
+    rmmod $MOD || true
+    echo $1
+    exit_unsupported
+}
+
+modprobe $MOD || unsup "$MOD module not available"
+rmmod $MOD
+
+grep -q "preemptoff" available_tracers || unsup "preemptoff tracer not enabled"
+grep -q "irqsoff" available_tracers || unsup "irqsoff tracer not enabled"
+
+reset_tracer
+
+# Simulate preemptoff section for half a second couple of times
+echo preemptoff > current_tracer
+sleep 1
+modprobe test_atomic_sections atomic_mode=preempt atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=preempt atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=preempt atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+
+cat trace
+
+# Confirm which tracer
+grep -q "tracer: preemptoff" trace || fail
+
+# Check the end of the section
+egrep -q "5.....us : <stack trace>" trace || fail
+
+# Check for 500ms of latency
+egrep -q "latency: 5..... us" trace || fail
+
+reset_tracer
+
+# Simulate irqsoff section for half a second couple of times
+echo irqsoff > current_tracer
+sleep 1
+modprobe test_atomic_sections atomic_mode=irq atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=irq atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+modprobe test_atomic_sections atomic_mode=irq atomic_time=500000 || fail
+rmmod test_atomic_sections || fail
+
+cat trace
+
+# Confirm which tracer
+grep -q "tracer: irqsoff" trace || fail
+
+# Check the end of the section
+egrep -q "5.....us : <stack trace>" trace || fail
+
+# Check for 500ms of latency
+egrep -q "latency: 5..... us" trace || fail
+
+reset_tracer
+exit 0
+
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 6/8] tracing: Centralize preemptirq tracepoints and unify their usage
  2018-05-30  0:04   ` joelaf
  (?)
@ 2018-05-31  1:56     ` namhyung
  -1 siblings, 0 replies; 60+ messages in thread
From: Namhyung Kim @ 2018-05-31  1:56 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Paul McKenney, Peter Zijlstra, Shuah Khan,
	Steven Rostedt, Thomas Glexiner, Todd Kjos, Tom Zanussi,
	kernel-team

On Tue, May 29, 2018 at 05:04:58PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> This patch detaches the preemptirq tracepoints from the tracers and
> keeps it separate.
> 
> Advantages:
> * Lockdep and irqsoff event can now run in parallel since they no longer
> have their own calls.
> 
> * This unifies the usecase of adding hooks to an irqsoff and irqson
> event, and a preemptoff and preempton event.
>   3 users of the events exist:
>   - Lockdep
>   - irqsoff and preemptoff tracers
>   - irqs and preempt trace events
> 
> The unification cleans up several ifdefs and makes the code in preempt
> tracer and irqsoff tracers simpler. It gets rid of all the horrific
> ifdeferry around PROVE_LOCKING and makes configuration of the different
> users of the tracepoints more easy and understandable. It also gets rid
> of the time_* function calls from the lockdep hooks used to call into
> the preemptirq tracer which is not needed anymore. The negative delta in
> lines of code in this patch is quite large too.
> 
> In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
> as a single point for registering probes onto the tracepoints. With
> this,
> the web of config options for preempt/irq toggle tracepoints and its
> users becomes:
> 
>  PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
>        |                 |     \         |           |
>        \    (selects)    /      \        \ (selects) /
>       TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
>                       \                  /
>                        \ (depends on)   /
>                      PREEMPTIRQ_TRACEPOINTS
> 
> One note, I have to check for lockdep recursion in the code that calls
> the trace events API and bail out if we're in lockdep recursion
> protection to prevent something like the following case: a spin_lock is
> taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
> and then sets lockdep_recursion, and then calls __lockdep_acquired. In
> this function, a call to get_lock_stats happens which calls
> preempt_disable, which calls trace IRQS off somewhere which enters my
> tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
> This flag is then never cleared causing lockdep paths to never be
> entered and thus causing splats and other bad things.
> 
> Other than the performance tests mentioned in the previous patch, I also
> ran the locking API test suite. I verified that all tests cases are
> passing.
> 
> I also injected issues by not registering lockdep probes onto the
> tracepoints and I see failures to confirm that the probes are indeed
> working.
> 
> Without probes:
> 
> [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
> [    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> [    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> [    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> [    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> 
> With probes:
> 
> [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
> [    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> [    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> [    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> [    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Reviewed-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


> ---
>  include/linux/ftrace.h            |  11 +-
>  include/linux/irqflags.h          |  11 +-
>  include/linux/lockdep.h           |   8 +-
>  include/linux/preempt.h           |   2 +-
>  include/trace/events/preemptirq.h |  23 +--
>  init/main.c                       |   5 +-
>  kernel/locking/lockdep.c          |  35 ++---
>  kernel/sched/core.c               |   2 +-
>  kernel/trace/Kconfig              |  22 ++-
>  kernel/trace/Makefile             |   2 +-
>  kernel/trace/trace_irqsoff.c      | 231 ++++++++----------------------
>  kernel/trace/trace_preemptirq.c   |  71 +++++++++
>  12 files changed, 194 insertions(+), 229 deletions(-)
>  create mode 100644 kernel/trace/trace_preemptirq.c
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 9c3c9a319e48..5191030af0c0 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void)
>  	return CALLER_ADDR2;
>  }
>  
> -#ifdef CONFIG_IRQSOFF_TRACER
> -  extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
> -  extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
> -#else
> -  static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
> -  static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
> -#endif
> -
> -#if defined(CONFIG_PREEMPT_TRACER) || \
> -	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
> +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
>    extern void trace_preempt_on(unsigned long a0, unsigned long a1);
>    extern void trace_preempt_off(unsigned long a0, unsigned long a1);
>  #else
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 9700f00bbc04..50edb9cbbd26 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -15,9 +15,16 @@
>  #include <linux/typecheck.h>
>  #include <asm/irqflags.h>
>  
> -#ifdef CONFIG_TRACE_IRQFLAGS
> +/* Currently trace_softirqs_on/off is used only by lockdep */
> +#ifdef CONFIG_PROVE_LOCKING
>    extern void trace_softirqs_on(unsigned long ip);
>    extern void trace_softirqs_off(unsigned long ip);
> +#else
> +# define trace_softirqs_on(ip)	do { } while (0)
> +# define trace_softirqs_off(ip)	do { } while (0)
> +#endif
> +
> +#ifdef CONFIG_TRACE_IRQFLAGS
>    extern void trace_hardirqs_on(void);
>    extern void trace_hardirqs_off(void);
>  # define trace_hardirq_context(p)	((p)->hardirq_context)
> @@ -43,8 +50,6 @@ do {						\
>  #else
>  # define trace_hardirqs_on()		do { } while (0)
>  # define trace_hardirqs_off()		do { } while (0)
> -# define trace_softirqs_on(ip)		do { } while (0)
> -# define trace_softirqs_off(ip)		do { } while (0)
>  # define trace_hardirq_context(p)	0
>  # define trace_softirq_context(p)	0
>  # define trace_hardirqs_enabled(p)	0
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 6fc77d4dbdcd..a8113357ceeb 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -266,7 +266,8 @@ struct held_lock {
>  /*
>   * Initialization, self-test and debugging-output methods:
>   */
> -extern void lockdep_info(void);
> +extern void lockdep_init(void);
> +extern void lockdep_init_early(void);
>  extern void lockdep_reset(void);
>  extern void lockdep_reset_lock(struct lockdep_map *lock);
>  extern void lockdep_free_key_range(void *start, unsigned long size);
> @@ -406,7 +407,8 @@ static inline void lockdep_on(void)
>  # define lock_downgrade(l, i)			do { } while (0)
>  # define lock_set_class(l, n, k, s, i)		do { } while (0)
>  # define lock_set_subclass(l, s, i)		do { } while (0)
> -# define lockdep_info()				do { } while (0)
> +# define lockdep_init()				do { } while (0)
> +# define lockdep_init_early()			do { } while (0)
>  # define lockdep_init_map(lock, name, key, sub) \
>  		do { (void)(name); (void)(key); } while (0)
>  # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
> @@ -532,7 +534,7 @@ do {								\
>  
>  #endif /* CONFIG_LOCKDEP */
>  
> -#ifdef CONFIG_TRACE_IRQFLAGS
> +#ifdef CONFIG_PROVE_LOCKING
>  extern void print_irqtrace_events(struct task_struct *curr);
>  #else
>  static inline void print_irqtrace_events(struct task_struct *curr)
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index 5bd3f151da78..c01813c3fbe9 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -150,7 +150,7 @@
>   */
>  #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
>  
> -#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
> +#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
>  extern void preempt_count_add(int val);
>  extern void preempt_count_sub(int val);
>  #define preempt_count_dec_and_test() \
> diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
> index 9c4eb33c5a1d..9a0d4ceeb166 100644
> --- a/include/trace/events/preemptirq.h
> +++ b/include/trace/events/preemptirq.h
> @@ -1,4 +1,4 @@
> -#ifdef CONFIG_PREEMPTIRQ_EVENTS
> +#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS
>  
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM preemptirq
> @@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template,
>  		  (void *)((unsigned long)(_stext) + __entry->parent_offs))
>  );
>  
> -#ifndef CONFIG_PROVE_LOCKING
> +#ifdef CONFIG_TRACE_IRQFLAGS
>  DEFINE_EVENT(preemptirq_template, irq_disable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> @@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable,
>  DEFINE_EVENT(preemptirq_template, irq_enable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> +#else
> +#define trace_irq_enable(...)
> +#define trace_irq_disable(...)
> +#define trace_irq_enable_rcuidle(...)
> +#define trace_irq_disable_rcuidle(...)
>  #endif
>  
> -#ifdef CONFIG_DEBUG_PREEMPT
> +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
>  DEFINE_EVENT(preemptirq_template, preempt_disable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> @@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable,
>  DEFINE_EVENT(preemptirq_template, preempt_enable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> +#else
> +#define trace_preempt_enable(...)
> +#define trace_preempt_disable(...)
> +#define trace_preempt_enable_rcuidle(...)
> +#define trace_preempt_disable_rcuidle(...)
>  #endif
>  
>  #endif /* _TRACE_PREEMPTIRQ_H */
>  
>  #include <trace/define_trace.h>
>  
> -#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
> -
> -#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
> +#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */
>  #define trace_irq_enable(...)
>  #define trace_irq_disable(...)
>  #define trace_irq_enable_rcuidle(...)
>  #define trace_irq_disable_rcuidle(...)
> -#endif
> -
> -#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
>  #define trace_preempt_enable(...)
>  #define trace_preempt_disable(...)
>  #define trace_preempt_enable_rcuidle(...)
> diff --git a/init/main.c b/init/main.c
> index 3b4ada11ed52..44fe43be84c1 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void)
>  	profile_init();
>  	call_function_init();
>  	WARN(!irqs_disabled(), "Interrupts were enabled early\n");
> +
> +	lockdep_init_early();
> +
>  	early_boot_irqs_disabled = false;
>  	local_irq_enable();
>  
> @@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void)
>  		panic("Too many boot %s vars at `%s'", panic_later,
>  		      panic_param);
>  
> -	lockdep_info();
> +	lockdep_init();
>  
>  	/*
>  	 * Need to run this when irqs are enabled, because it wants
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 023386338269..871a42232858 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -55,6 +55,7 @@
>  
>  #include "lockdep_internals.h"
>  
> +#include <trace/events/preemptirq.h>
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/lock.h>
>  
> @@ -2841,10 +2842,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip)
>  	debug_atomic_inc(hardirqs_on_events);
>  }
>  
> -__visible void trace_hardirqs_on_caller(unsigned long ip)
> +static void lockdep_hardirqs_on(void *none, unsigned long ignore,
> +				unsigned long ip)
>  {
> -	time_hardirqs_on(CALLER_ADDR0, ip);
> -
>  	if (unlikely(!debug_locks || current->lockdep_recursion))
>  		return;
>  
> @@ -2883,23 +2883,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip)
>  	__trace_hardirqs_on_caller(ip);
>  	current->lockdep_recursion = 0;
>  }
> -EXPORT_SYMBOL(trace_hardirqs_on_caller);
> -
> -void trace_hardirqs_on(void)
> -{
> -	trace_hardirqs_on_caller(CALLER_ADDR0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_on);
>  
>  /*
>   * Hardirqs were disabled:
>   */
> -__visible void trace_hardirqs_off_caller(unsigned long ip)
> +static void lockdep_hardirqs_off(void *none, unsigned long ignore,
> +				 unsigned long ip)
>  {
>  	struct task_struct *curr = current;
>  
> -	time_hardirqs_off(CALLER_ADDR0, ip);
> -
>  	if (unlikely(!debug_locks || current->lockdep_recursion))
>  		return;
>  
> @@ -2921,13 +2913,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip)
>  	} else
>  		debug_atomic_inc(redundant_hardirqs_off);
>  }
> -EXPORT_SYMBOL(trace_hardirqs_off_caller);
> -
> -void trace_hardirqs_off(void)
> -{
> -	trace_hardirqs_off_caller(CALLER_ADDR0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_off);
>  
>  /*
>   * Softirqs will be enabled:
> @@ -4334,7 +4319,15 @@ void lockdep_reset_lock(struct lockdep_map *lock)
>  	raw_local_irq_restore(flags);
>  }
>  
> -void __init lockdep_info(void)
> +void __init lockdep_init_early(void)
> +{
> +#ifdef CONFIG_PROVE_LOCKING
> +	register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
> +	register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
> +#endif
> +}
> +
> +void __init lockdep_init(void)
>  {
>  	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
>  
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 092f7c4de903..4e9c2d254fba 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3189,7 +3189,7 @@ static inline void sched_tick_stop(int cpu) { }
>  #endif
>  
>  #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
> -				defined(CONFIG_PREEMPT_TRACER))
> +				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
>  /*
>   * If the value passed in is equal to the current preempt count
>   * then we just disabled preemption. Start timing the latency.
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index c4f0f2e4126e..0bcba2a76ad9 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP
>  	 Allow the use of ring_buffer_swap_cpu.
>  	 Adds a very slight overhead to tracing when enabled.
>  
> +config PREEMPTIRQ_TRACEPOINTS
> +	bool
> +	depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
> +	select TRACING
> +	default y
> +	help
> +	  Create preempt/irq toggle tracepoints if needed, so that other parts
> +	  of the kernel can use them to generate or add hooks to them.
> +
>  # All tracer options should select GENERIC_TRACER. For those options that are
>  # enabled by all tracers (context switch and event tracer) they select TRACING.
>  # This allows those options to appear when no other tracer is selected. But the
> @@ -159,18 +168,20 @@ config FUNCTION_GRAPH_TRACER
>  	  the return value. This is done by setting the current return
>  	  address on the current task structure into a stack of calls.
>  
> +config TRACE_PREEMPT_TOGGLE
> +	bool
> +	help
> +	  Enables hooks which will be called when preemption is first disabled,
> +	  and last enabled.
>  
>  config PREEMPTIRQ_EVENTS
>  	bool "Enable trace events for preempt and irq disable/enable"
>  	select TRACE_IRQFLAGS
> -	depends on DEBUG_PREEMPT || !PROVE_LOCKING
> -	depends on TRACING
> +	select TRACE_PREEMPT_TOGGLE if PREEMPT
> +	select GENERIC_TRACER
>  	default n
>  	help
>  	  Enable tracing of disable and enable events for preemption and irqs.
> -	  For tracing preempt disable/enable events, DEBUG_PREEMPT must be
> -	  enabled. For tracing irq disable/enable events, PROVE_LOCKING must
> -	  be disabled.
>  
>  config IRQSOFF_TRACER
>  	bool "Interrupts-off Latency Tracer"
> @@ -207,6 +218,7 @@ config PREEMPT_TRACER
>  	select RING_BUFFER_ALLOW_SWAP
>  	select TRACER_SNAPSHOT
>  	select TRACER_SNAPSHOT_PER_CPU_SWAP
> +	select TRACE_PREEMPT_TOGGLE
>  	help
>  	  This option measures the time spent in preemption-off critical
>  	  sections, with microsecond accuracy.
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index e2538c7638d4..84a0cb222f20 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o
>  obj-$(CONFIG_TRACING_MAP) += tracing_map.o
>  obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
>  obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
> -obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o
> +obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
>  obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
>  obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
>  obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index f8daa754cce2..770cd30cda40 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -16,7 +16,6 @@
>  
>  #include "trace.h"
>  
> -#define CREATE_TRACE_POINTS
>  #include <trace/events/preemptirq.h>
>  
>  #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER)
> @@ -450,66 +449,6 @@ void stop_critical_timings(void)
>  }
>  EXPORT_SYMBOL_GPL(stop_critical_timings);
>  
> -#ifdef CONFIG_IRQSOFF_TRACER
> -#ifdef CONFIG_PROVE_LOCKING
> -void time_hardirqs_on(unsigned long a0, unsigned long a1)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		stop_critical_timing(a0, a1);
> -}
> -
> -void time_hardirqs_off(unsigned long a0, unsigned long a1)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		start_critical_timing(a0, a1);
> -}
> -
> -#else /* !CONFIG_PROVE_LOCKING */
> -
> -/*
> - * We are only interested in hardirq on/off events:
> - */
> -static inline void tracer_hardirqs_on(void)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
> -}
> -
> -static inline void tracer_hardirqs_off(void)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
> -}
> -
> -static inline void tracer_hardirqs_on_caller(unsigned long caller_addr)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		stop_critical_timing(CALLER_ADDR0, caller_addr);
> -}
> -
> -static inline void tracer_hardirqs_off_caller(unsigned long caller_addr)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		start_critical_timing(CALLER_ADDR0, caller_addr);
> -}
> -
> -#endif /* CONFIG_PROVE_LOCKING */
> -#endif /*  CONFIG_IRQSOFF_TRACER */
> -
> -#ifdef CONFIG_PREEMPT_TRACER
> -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1)
> -{
> -	if (preempt_trace() && !irq_trace())
> -		stop_critical_timing(a0, a1);
> -}
> -
> -static inline void tracer_preempt_off(unsigned long a0, unsigned long a1)
> -{
> -	if (preempt_trace() && !irq_trace())
> -		start_critical_timing(a0, a1);
> -}
> -#endif /* CONFIG_PREEMPT_TRACER */
> -
>  #ifdef CONFIG_FUNCTION_TRACER
>  static bool function_enabled;
>  
> @@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr)
>  }
>  
>  #ifdef CONFIG_IRQSOFF_TRACER
> +/*
> + * We are only interested in hardirq on/off events:
> + */
> +static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (!preempt_trace() && irq_trace())
> +		stop_critical_timing(a0, a1);
> +}
> +
> +static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (!preempt_trace() && irq_trace())
> +		start_critical_timing(a0, a1);
> +}
> +
>  static int irqsoff_tracer_init(struct trace_array *tr)
>  {
>  	trace_type = TRACER_IRQS_OFF;
>  
> +	register_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	register_trace_irq_enable(tracer_hardirqs_on, NULL);
>  	return __irqsoff_tracer_init(tr);
>  }
>  
>  static void irqsoff_tracer_reset(struct trace_array *tr)
>  {
> +	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
>  	__irqsoff_tracer_reset(tr);
>  }
>  
> @@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly =
>  	.allow_instances = true,
>  	.use_max_tr	= true,
>  };
> -# define register_irqsoff(trace) register_tracer(&trace)
> -#else
> -# define register_irqsoff(trace) do { } while (0)
> -#endif
> +#endif /*  CONFIG_IRQSOFF_TRACER */
>  
>  #ifdef CONFIG_PREEMPT_TRACER
> +static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (preempt_trace() && !irq_trace())
> +		stop_critical_timing(a0, a1);
> +}
> +
> +static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (preempt_trace() && !irq_trace())
> +		start_critical_timing(a0, a1);
> +}
> +
>  static int preemptoff_tracer_init(struct trace_array *tr)
>  {
>  	trace_type = TRACER_PREEMPT_OFF;
>  
> +	register_trace_preempt_disable(tracer_preempt_off, NULL);
> +	register_trace_preempt_enable(tracer_preempt_on, NULL);
>  	return __irqsoff_tracer_init(tr);
>  }
>  
>  static void preemptoff_tracer_reset(struct trace_array *tr)
>  {
> +	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
> +	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
>  	__irqsoff_tracer_reset(tr);
>  }
>  
> @@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly =
>  	.allow_instances = true,
>  	.use_max_tr	= true,
>  };
> -# define register_preemptoff(trace) register_tracer(&trace)
> -#else
> -# define register_preemptoff(trace) do { } while (0)
> -#endif
> +#endif /* CONFIG_PREEMPT_TRACER */
>  
> -#if defined(CONFIG_IRQSOFF_TRACER) && \
> -	defined(CONFIG_PREEMPT_TRACER)
> +#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
>  
>  static int preemptirqsoff_tracer_init(struct trace_array *tr)
>  {
>  	trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
>  
> +	register_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	register_trace_irq_enable(tracer_hardirqs_on, NULL);
> +	register_trace_preempt_disable(tracer_preempt_off, NULL);
> +	register_trace_preempt_enable(tracer_preempt_on, NULL);
> +
>  	return __irqsoff_tracer_init(tr);
>  }
>  
>  static void preemptirqsoff_tracer_reset(struct trace_array *tr)
>  {
> +	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
> +	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
> +	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
> +
>  	__irqsoff_tracer_reset(tr);
>  }
>  
> @@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
>  	.allow_instances = true,
>  	.use_max_tr	= true,
>  };
> -
> -# define register_preemptirqsoff(trace) register_tracer(&trace)
> -#else
> -# define register_preemptirqsoff(trace) do { } while (0)
>  #endif
>  
>  __init static int init_irqsoff_tracer(void)
>  {
> -	register_irqsoff(irqsoff_tracer);
> -	register_preemptoff(preemptoff_tracer);
> -	register_preemptirqsoff(preemptirqsoff_tracer);
> -
> -	return 0;
> -}
> -core_initcall(init_irqsoff_tracer);
> -#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
> -
> -#ifndef CONFIG_IRQSOFF_TRACER
> -static inline void tracer_hardirqs_on(void) { }
> -static inline void tracer_hardirqs_off(void) { }
> -static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { }
> -static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { }
> +#ifdef CONFIG_IRQSOFF_TRACER
> +	register_tracer(&irqsoff_tracer);
>  #endif
> -
> -#ifndef CONFIG_PREEMPT_TRACER
> -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { }
> -static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { }
> +#ifdef CONFIG_PREEMPT_TRACER
> +	register_tracer(&preemptoff_tracer);
>  #endif
> -
> -#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING)
> -/* Per-cpu variable to prevent redundant calls when IRQs already off */
> -static DEFINE_PER_CPU(int, tracing_irq_cpu);
> -
> -void trace_hardirqs_on(void)
> -{
> -	if (!this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> -	tracer_hardirqs_on();
> -
> -	this_cpu_write(tracing_irq_cpu, 0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_on);
> -
> -void trace_hardirqs_off(void)
> -{
> -	if (this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	this_cpu_write(tracing_irq_cpu, 1);
> -
> -	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> -	tracer_hardirqs_off();
> -}
> -EXPORT_SYMBOL(trace_hardirqs_off);
> -
> -__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
> -{
> -	if (!this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> -	tracer_hardirqs_on_caller(caller_addr);
> -
> -	this_cpu_write(tracing_irq_cpu, 0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_on_caller);
> -
> -__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
> -{
> -	if (this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	this_cpu_write(tracing_irq_cpu, 1);
> -
> -	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> -	tracer_hardirqs_off_caller(caller_addr);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_off_caller);
> -
> -/*
> - * Stubs:
> - */
> -
> -void trace_softirqs_on(unsigned long ip)
> -{
> -}
> -
> -void trace_softirqs_off(unsigned long ip)
> -{
> -}
> -
> -inline void print_irqtrace_events(struct task_struct *curr)
> -{
> -}
> +#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
> +	register_tracer(&preemptirqsoff_tracer);
>  #endif
>  
> -#if defined(CONFIG_PREEMPT_TRACER) || \
> -	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
> -void trace_preempt_on(unsigned long a0, unsigned long a1)
> -{
> -	trace_preempt_enable_rcuidle(a0, a1);
> -	tracer_preempt_on(a0, a1);
> -}
> -
> -void trace_preempt_off(unsigned long a0, unsigned long a1)
> -{
> -	trace_preempt_disable_rcuidle(a0, a1);
> -	tracer_preempt_off(a0, a1);
> +	return 0;
>  }
> -#endif
> +core_initcall(init_irqsoff_tracer);
> +#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
> diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
> new file mode 100644
> index 000000000000..dc01c7f4d326
> --- /dev/null
> +++ b/kernel/trace/trace_preemptirq.c
> @@ -0,0 +1,71 @@
> +/*
> + * preemptoff and irqoff tracepoints
> + *
> + * Copyright (C) Joel Fernandes (Google) <joel@joelfernandes.org>
> + */
> +
> +#include <linux/kallsyms.h>
> +#include <linux/uaccess.h>
> +#include <linux/module.h>
> +#include <linux/ftrace.h>
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/preemptirq.h>
> +
> +#ifdef CONFIG_TRACE_IRQFLAGS
> +/* Per-cpu variable to prevent redundant calls when IRQs already off */
> +static DEFINE_PER_CPU(int, tracing_irq_cpu);
> +
> +void trace_hardirqs_on(void)
> +{
> +	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> +	this_cpu_write(tracing_irq_cpu, 0);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_on);
> +
> +void trace_hardirqs_off(void)
> +{
> +	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	this_cpu_write(tracing_irq_cpu, 1);
> +	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_off);
> +
> +__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
> +{
> +	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> +	this_cpu_write(tracing_irq_cpu, 0);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_on_caller);
> +
> +__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
> +{
> +	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	this_cpu_write(tracing_irq_cpu, 1);
> +	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_off_caller);
> +#endif /* CONFIG_TRACE_IRQFLAGS */
> +
> +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
> +
> +void trace_preempt_on(unsigned long a0, unsigned long a1)
> +{
> +	trace_preempt_enable_rcuidle(a0, a1);
> +}
> +
> +void trace_preempt_off(unsigned long a0, unsigned long a1)
> +{
> +	trace_preempt_disable_rcuidle(a0, a1);
> +}
> +#endif
> -- 
> 2.17.0.921.gf22659ad46-goog
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 6/8] tracing: Centralize preemptirq tracepoints and unify their usage
@ 2018-05-31  1:56     ` namhyung
  0 siblings, 0 replies; 60+ messages in thread
From: namhyung @ 2018-05-31  1:56 UTC (permalink / raw)


On Tue, May 29, 2018 at 05:04:58PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> 
> This patch detaches the preemptirq tracepoints from the tracers and
> keeps it separate.
> 
> Advantages:
> * Lockdep and irqsoff event can now run in parallel since they no longer
> have their own calls.
> 
> * This unifies the usecase of adding hooks to an irqsoff and irqson
> event, and a preemptoff and preempton event.
>   3 users of the events exist:
>   - Lockdep
>   - irqsoff and preemptoff tracers
>   - irqs and preempt trace events
> 
> The unification cleans up several ifdefs and makes the code in preempt
> tracer and irqsoff tracers simpler. It gets rid of all the horrific
> ifdeferry around PROVE_LOCKING and makes configuration of the different
> users of the tracepoints more easy and understandable. It also gets rid
> of the time_* function calls from the lockdep hooks used to call into
> the preemptirq tracer which is not needed anymore. The negative delta in
> lines of code in this patch is quite large too.
> 
> In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
> as a single point for registering probes onto the tracepoints. With
> this,
> the web of config options for preempt/irq toggle tracepoints and its
> users becomes:
> 
>  PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
>        |                 |     \         |           |
>        \    (selects)    /      \        \ (selects) /
>       TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
>                       \                  /
>                        \ (depends on)   /
>                      PREEMPTIRQ_TRACEPOINTS
> 
> One note, I have to check for lockdep recursion in the code that calls
> the trace events API and bail out if we're in lockdep recursion
> protection to prevent something like the following case: a spin_lock is
> taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
> and then sets lockdep_recursion, and then calls __lockdep_acquired. In
> this function, a call to get_lock_stats happens which calls
> preempt_disable, which calls trace IRQS off somewhere which enters my
> tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
> This flag is then never cleared causing lockdep paths to never be
> entered and thus causing splats and other bad things.
> 
> Other than the performance tests mentioned in the previous patch, I also
> ran the locking API test suite. I verified that all tests cases are
> passing.
> 
> I also injected issues by not registering lockdep probes onto the
> tracepoints and I see failures to confirm that the probes are indeed
> working.
> 
> Without probes:
> 
> [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
> [    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> [    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> [    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> [    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> 
> With probes:
> 
> [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
> [    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> [    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> [    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> [    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> 
> Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>

Reviewed-by: Namhyung Kim <namhyung at kernel.org>

Thanks,
Namhyung


> ---
>  include/linux/ftrace.h            |  11 +-
>  include/linux/irqflags.h          |  11 +-
>  include/linux/lockdep.h           |   8 +-
>  include/linux/preempt.h           |   2 +-
>  include/trace/events/preemptirq.h |  23 +--
>  init/main.c                       |   5 +-
>  kernel/locking/lockdep.c          |  35 ++---
>  kernel/sched/core.c               |   2 +-
>  kernel/trace/Kconfig              |  22 ++-
>  kernel/trace/Makefile             |   2 +-
>  kernel/trace/trace_irqsoff.c      | 231 ++++++++----------------------
>  kernel/trace/trace_preemptirq.c   |  71 +++++++++
>  12 files changed, 194 insertions(+), 229 deletions(-)
>  create mode 100644 kernel/trace/trace_preemptirq.c
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 9c3c9a319e48..5191030af0c0 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void)
>  	return CALLER_ADDR2;
>  }
>  
> -#ifdef CONFIG_IRQSOFF_TRACER
> -  extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
> -  extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
> -#else
> -  static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
> -  static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
> -#endif
> -
> -#if defined(CONFIG_PREEMPT_TRACER) || \
> -	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
> +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
>    extern void trace_preempt_on(unsigned long a0, unsigned long a1);
>    extern void trace_preempt_off(unsigned long a0, unsigned long a1);
>  #else
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 9700f00bbc04..50edb9cbbd26 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -15,9 +15,16 @@
>  #include <linux/typecheck.h>
>  #include <asm/irqflags.h>
>  
> -#ifdef CONFIG_TRACE_IRQFLAGS
> +/* Currently trace_softirqs_on/off is used only by lockdep */
> +#ifdef CONFIG_PROVE_LOCKING
>    extern void trace_softirqs_on(unsigned long ip);
>    extern void trace_softirqs_off(unsigned long ip);
> +#else
> +# define trace_softirqs_on(ip)	do { } while (0)
> +# define trace_softirqs_off(ip)	do { } while (0)
> +#endif
> +
> +#ifdef CONFIG_TRACE_IRQFLAGS
>    extern void trace_hardirqs_on(void);
>    extern void trace_hardirqs_off(void);
>  # define trace_hardirq_context(p)	((p)->hardirq_context)
> @@ -43,8 +50,6 @@ do {						\
>  #else
>  # define trace_hardirqs_on()		do { } while (0)
>  # define trace_hardirqs_off()		do { } while (0)
> -# define trace_softirqs_on(ip)		do { } while (0)
> -# define trace_softirqs_off(ip)		do { } while (0)
>  # define trace_hardirq_context(p)	0
>  # define trace_softirq_context(p)	0
>  # define trace_hardirqs_enabled(p)	0
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 6fc77d4dbdcd..a8113357ceeb 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -266,7 +266,8 @@ struct held_lock {
>  /*
>   * Initialization, self-test and debugging-output methods:
>   */
> -extern void lockdep_info(void);
> +extern void lockdep_init(void);
> +extern void lockdep_init_early(void);
>  extern void lockdep_reset(void);
>  extern void lockdep_reset_lock(struct lockdep_map *lock);
>  extern void lockdep_free_key_range(void *start, unsigned long size);
> @@ -406,7 +407,8 @@ static inline void lockdep_on(void)
>  # define lock_downgrade(l, i)			do { } while (0)
>  # define lock_set_class(l, n, k, s, i)		do { } while (0)
>  # define lock_set_subclass(l, s, i)		do { } while (0)
> -# define lockdep_info()				do { } while (0)
> +# define lockdep_init()				do { } while (0)
> +# define lockdep_init_early()			do { } while (0)
>  # define lockdep_init_map(lock, name, key, sub) \
>  		do { (void)(name); (void)(key); } while (0)
>  # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
> @@ -532,7 +534,7 @@ do {								\
>  
>  #endif /* CONFIG_LOCKDEP */
>  
> -#ifdef CONFIG_TRACE_IRQFLAGS
> +#ifdef CONFIG_PROVE_LOCKING
>  extern void print_irqtrace_events(struct task_struct *curr);
>  #else
>  static inline void print_irqtrace_events(struct task_struct *curr)
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index 5bd3f151da78..c01813c3fbe9 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -150,7 +150,7 @@
>   */
>  #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
>  
> -#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
> +#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
>  extern void preempt_count_add(int val);
>  extern void preempt_count_sub(int val);
>  #define preempt_count_dec_and_test() \
> diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
> index 9c4eb33c5a1d..9a0d4ceeb166 100644
> --- a/include/trace/events/preemptirq.h
> +++ b/include/trace/events/preemptirq.h
> @@ -1,4 +1,4 @@
> -#ifdef CONFIG_PREEMPTIRQ_EVENTS
> +#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS
>  
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM preemptirq
> @@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template,
>  		  (void *)((unsigned long)(_stext) + __entry->parent_offs))
>  );
>  
> -#ifndef CONFIG_PROVE_LOCKING
> +#ifdef CONFIG_TRACE_IRQFLAGS
>  DEFINE_EVENT(preemptirq_template, irq_disable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> @@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable,
>  DEFINE_EVENT(preemptirq_template, irq_enable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> +#else
> +#define trace_irq_enable(...)
> +#define trace_irq_disable(...)
> +#define trace_irq_enable_rcuidle(...)
> +#define trace_irq_disable_rcuidle(...)
>  #endif
>  
> -#ifdef CONFIG_DEBUG_PREEMPT
> +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
>  DEFINE_EVENT(preemptirq_template, preempt_disable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> @@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable,
>  DEFINE_EVENT(preemptirq_template, preempt_enable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> +#else
> +#define trace_preempt_enable(...)
> +#define trace_preempt_disable(...)
> +#define trace_preempt_enable_rcuidle(...)
> +#define trace_preempt_disable_rcuidle(...)
>  #endif
>  
>  #endif /* _TRACE_PREEMPTIRQ_H */
>  
>  #include <trace/define_trace.h>
>  
> -#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
> -
> -#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
> +#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */
>  #define trace_irq_enable(...)
>  #define trace_irq_disable(...)
>  #define trace_irq_enable_rcuidle(...)
>  #define trace_irq_disable_rcuidle(...)
> -#endif
> -
> -#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
>  #define trace_preempt_enable(...)
>  #define trace_preempt_disable(...)
>  #define trace_preempt_enable_rcuidle(...)
> diff --git a/init/main.c b/init/main.c
> index 3b4ada11ed52..44fe43be84c1 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void)
>  	profile_init();
>  	call_function_init();
>  	WARN(!irqs_disabled(), "Interrupts were enabled early\n");
> +
> +	lockdep_init_early();
> +
>  	early_boot_irqs_disabled = false;
>  	local_irq_enable();
>  
> @@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void)
>  		panic("Too many boot %s vars at `%s'", panic_later,
>  		      panic_param);
>  
> -	lockdep_info();
> +	lockdep_init();
>  
>  	/*
>  	 * Need to run this when irqs are enabled, because it wants
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 023386338269..871a42232858 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -55,6 +55,7 @@
>  
>  #include "lockdep_internals.h"
>  
> +#include <trace/events/preemptirq.h>
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/lock.h>
>  
> @@ -2841,10 +2842,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip)
>  	debug_atomic_inc(hardirqs_on_events);
>  }
>  
> -__visible void trace_hardirqs_on_caller(unsigned long ip)
> +static void lockdep_hardirqs_on(void *none, unsigned long ignore,
> +				unsigned long ip)
>  {
> -	time_hardirqs_on(CALLER_ADDR0, ip);
> -
>  	if (unlikely(!debug_locks || current->lockdep_recursion))
>  		return;
>  
> @@ -2883,23 +2883,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip)
>  	__trace_hardirqs_on_caller(ip);
>  	current->lockdep_recursion = 0;
>  }
> -EXPORT_SYMBOL(trace_hardirqs_on_caller);
> -
> -void trace_hardirqs_on(void)
> -{
> -	trace_hardirqs_on_caller(CALLER_ADDR0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_on);
>  
>  /*
>   * Hardirqs were disabled:
>   */
> -__visible void trace_hardirqs_off_caller(unsigned long ip)
> +static void lockdep_hardirqs_off(void *none, unsigned long ignore,
> +				 unsigned long ip)
>  {
>  	struct task_struct *curr = current;
>  
> -	time_hardirqs_off(CALLER_ADDR0, ip);
> -
>  	if (unlikely(!debug_locks || current->lockdep_recursion))
>  		return;
>  
> @@ -2921,13 +2913,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip)
>  	} else
>  		debug_atomic_inc(redundant_hardirqs_off);
>  }
> -EXPORT_SYMBOL(trace_hardirqs_off_caller);
> -
> -void trace_hardirqs_off(void)
> -{
> -	trace_hardirqs_off_caller(CALLER_ADDR0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_off);
>  
>  /*
>   * Softirqs will be enabled:
> @@ -4334,7 +4319,15 @@ void lockdep_reset_lock(struct lockdep_map *lock)
>  	raw_local_irq_restore(flags);
>  }
>  
> -void __init lockdep_info(void)
> +void __init lockdep_init_early(void)
> +{
> +#ifdef CONFIG_PROVE_LOCKING
> +	register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
> +	register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
> +#endif
> +}
> +
> +void __init lockdep_init(void)
>  {
>  	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
>  
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 092f7c4de903..4e9c2d254fba 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3189,7 +3189,7 @@ static inline void sched_tick_stop(int cpu) { }
>  #endif
>  
>  #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
> -				defined(CONFIG_PREEMPT_TRACER))
> +				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
>  /*
>   * If the value passed in is equal to the current preempt count
>   * then we just disabled preemption. Start timing the latency.
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index c4f0f2e4126e..0bcba2a76ad9 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP
>  	 Allow the use of ring_buffer_swap_cpu.
>  	 Adds a very slight overhead to tracing when enabled.
>  
> +config PREEMPTIRQ_TRACEPOINTS
> +	bool
> +	depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
> +	select TRACING
> +	default y
> +	help
> +	  Create preempt/irq toggle tracepoints if needed, so that other parts
> +	  of the kernel can use them to generate or add hooks to them.
> +
>  # All tracer options should select GENERIC_TRACER. For those options that are
>  # enabled by all tracers (context switch and event tracer) they select TRACING.
>  # This allows those options to appear when no other tracer is selected. But the
> @@ -159,18 +168,20 @@ config FUNCTION_GRAPH_TRACER
>  	  the return value. This is done by setting the current return
>  	  address on the current task structure into a stack of calls.
>  
> +config TRACE_PREEMPT_TOGGLE
> +	bool
> +	help
> +	  Enables hooks which will be called when preemption is first disabled,
> +	  and last enabled.
>  
>  config PREEMPTIRQ_EVENTS
>  	bool "Enable trace events for preempt and irq disable/enable"
>  	select TRACE_IRQFLAGS
> -	depends on DEBUG_PREEMPT || !PROVE_LOCKING
> -	depends on TRACING
> +	select TRACE_PREEMPT_TOGGLE if PREEMPT
> +	select GENERIC_TRACER
>  	default n
>  	help
>  	  Enable tracing of disable and enable events for preemption and irqs.
> -	  For tracing preempt disable/enable events, DEBUG_PREEMPT must be
> -	  enabled. For tracing irq disable/enable events, PROVE_LOCKING must
> -	  be disabled.
>  
>  config IRQSOFF_TRACER
>  	bool "Interrupts-off Latency Tracer"
> @@ -207,6 +218,7 @@ config PREEMPT_TRACER
>  	select RING_BUFFER_ALLOW_SWAP
>  	select TRACER_SNAPSHOT
>  	select TRACER_SNAPSHOT_PER_CPU_SWAP
> +	select TRACE_PREEMPT_TOGGLE
>  	help
>  	  This option measures the time spent in preemption-off critical
>  	  sections, with microsecond accuracy.
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index e2538c7638d4..84a0cb222f20 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o
>  obj-$(CONFIG_TRACING_MAP) += tracing_map.o
>  obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
>  obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
> -obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o
> +obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
>  obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
>  obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
>  obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index f8daa754cce2..770cd30cda40 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -16,7 +16,6 @@
>  
>  #include "trace.h"
>  
> -#define CREATE_TRACE_POINTS
>  #include <trace/events/preemptirq.h>
>  
>  #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER)
> @@ -450,66 +449,6 @@ void stop_critical_timings(void)
>  }
>  EXPORT_SYMBOL_GPL(stop_critical_timings);
>  
> -#ifdef CONFIG_IRQSOFF_TRACER
> -#ifdef CONFIG_PROVE_LOCKING
> -void time_hardirqs_on(unsigned long a0, unsigned long a1)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		stop_critical_timing(a0, a1);
> -}
> -
> -void time_hardirqs_off(unsigned long a0, unsigned long a1)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		start_critical_timing(a0, a1);
> -}
> -
> -#else /* !CONFIG_PROVE_LOCKING */
> -
> -/*
> - * We are only interested in hardirq on/off events:
> - */
> -static inline void tracer_hardirqs_on(void)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
> -}
> -
> -static inline void tracer_hardirqs_off(void)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
> -}
> -
> -static inline void tracer_hardirqs_on_caller(unsigned long caller_addr)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		stop_critical_timing(CALLER_ADDR0, caller_addr);
> -}
> -
> -static inline void tracer_hardirqs_off_caller(unsigned long caller_addr)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		start_critical_timing(CALLER_ADDR0, caller_addr);
> -}
> -
> -#endif /* CONFIG_PROVE_LOCKING */
> -#endif /*  CONFIG_IRQSOFF_TRACER */
> -
> -#ifdef CONFIG_PREEMPT_TRACER
> -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1)
> -{
> -	if (preempt_trace() && !irq_trace())
> -		stop_critical_timing(a0, a1);
> -}
> -
> -static inline void tracer_preempt_off(unsigned long a0, unsigned long a1)
> -{
> -	if (preempt_trace() && !irq_trace())
> -		start_critical_timing(a0, a1);
> -}
> -#endif /* CONFIG_PREEMPT_TRACER */
> -
>  #ifdef CONFIG_FUNCTION_TRACER
>  static bool function_enabled;
>  
> @@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr)
>  }
>  
>  #ifdef CONFIG_IRQSOFF_TRACER
> +/*
> + * We are only interested in hardirq on/off events:
> + */
> +static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (!preempt_trace() && irq_trace())
> +		stop_critical_timing(a0, a1);
> +}
> +
> +static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (!preempt_trace() && irq_trace())
> +		start_critical_timing(a0, a1);
> +}
> +
>  static int irqsoff_tracer_init(struct trace_array *tr)
>  {
>  	trace_type = TRACER_IRQS_OFF;
>  
> +	register_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	register_trace_irq_enable(tracer_hardirqs_on, NULL);
>  	return __irqsoff_tracer_init(tr);
>  }
>  
>  static void irqsoff_tracer_reset(struct trace_array *tr)
>  {
> +	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
>  	__irqsoff_tracer_reset(tr);
>  }
>  
> @@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly =
>  	.allow_instances = true,
>  	.use_max_tr	= true,
>  };
> -# define register_irqsoff(trace) register_tracer(&trace)
> -#else
> -# define register_irqsoff(trace) do { } while (0)
> -#endif
> +#endif /*  CONFIG_IRQSOFF_TRACER */
>  
>  #ifdef CONFIG_PREEMPT_TRACER
> +static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (preempt_trace() && !irq_trace())
> +		stop_critical_timing(a0, a1);
> +}
> +
> +static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (preempt_trace() && !irq_trace())
> +		start_critical_timing(a0, a1);
> +}
> +
>  static int preemptoff_tracer_init(struct trace_array *tr)
>  {
>  	trace_type = TRACER_PREEMPT_OFF;
>  
> +	register_trace_preempt_disable(tracer_preempt_off, NULL);
> +	register_trace_preempt_enable(tracer_preempt_on, NULL);
>  	return __irqsoff_tracer_init(tr);
>  }
>  
>  static void preemptoff_tracer_reset(struct trace_array *tr)
>  {
> +	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
> +	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
>  	__irqsoff_tracer_reset(tr);
>  }
>  
> @@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly =
>  	.allow_instances = true,
>  	.use_max_tr	= true,
>  };
> -# define register_preemptoff(trace) register_tracer(&trace)
> -#else
> -# define register_preemptoff(trace) do { } while (0)
> -#endif
> +#endif /* CONFIG_PREEMPT_TRACER */
>  
> -#if defined(CONFIG_IRQSOFF_TRACER) && \
> -	defined(CONFIG_PREEMPT_TRACER)
> +#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
>  
>  static int preemptirqsoff_tracer_init(struct trace_array *tr)
>  {
>  	trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
>  
> +	register_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	register_trace_irq_enable(tracer_hardirqs_on, NULL);
> +	register_trace_preempt_disable(tracer_preempt_off, NULL);
> +	register_trace_preempt_enable(tracer_preempt_on, NULL);
> +
>  	return __irqsoff_tracer_init(tr);
>  }
>  
>  static void preemptirqsoff_tracer_reset(struct trace_array *tr)
>  {
> +	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
> +	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
> +	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
> +
>  	__irqsoff_tracer_reset(tr);
>  }
>  
> @@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
>  	.allow_instances = true,
>  	.use_max_tr	= true,
>  };
> -
> -# define register_preemptirqsoff(trace) register_tracer(&trace)
> -#else
> -# define register_preemptirqsoff(trace) do { } while (0)
>  #endif
>  
>  __init static int init_irqsoff_tracer(void)
>  {
> -	register_irqsoff(irqsoff_tracer);
> -	register_preemptoff(preemptoff_tracer);
> -	register_preemptirqsoff(preemptirqsoff_tracer);
> -
> -	return 0;
> -}
> -core_initcall(init_irqsoff_tracer);
> -#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
> -
> -#ifndef CONFIG_IRQSOFF_TRACER
> -static inline void tracer_hardirqs_on(void) { }
> -static inline void tracer_hardirqs_off(void) { }
> -static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { }
> -static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { }
> +#ifdef CONFIG_IRQSOFF_TRACER
> +	register_tracer(&irqsoff_tracer);
>  #endif
> -
> -#ifndef CONFIG_PREEMPT_TRACER
> -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { }
> -static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { }
> +#ifdef CONFIG_PREEMPT_TRACER
> +	register_tracer(&preemptoff_tracer);
>  #endif
> -
> -#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING)
> -/* Per-cpu variable to prevent redundant calls when IRQs already off */
> -static DEFINE_PER_CPU(int, tracing_irq_cpu);
> -
> -void trace_hardirqs_on(void)
> -{
> -	if (!this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> -	tracer_hardirqs_on();
> -
> -	this_cpu_write(tracing_irq_cpu, 0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_on);
> -
> -void trace_hardirqs_off(void)
> -{
> -	if (this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	this_cpu_write(tracing_irq_cpu, 1);
> -
> -	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> -	tracer_hardirqs_off();
> -}
> -EXPORT_SYMBOL(trace_hardirqs_off);
> -
> -__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
> -{
> -	if (!this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> -	tracer_hardirqs_on_caller(caller_addr);
> -
> -	this_cpu_write(tracing_irq_cpu, 0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_on_caller);
> -
> -__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
> -{
> -	if (this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	this_cpu_write(tracing_irq_cpu, 1);
> -
> -	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> -	tracer_hardirqs_off_caller(caller_addr);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_off_caller);
> -
> -/*
> - * Stubs:
> - */
> -
> -void trace_softirqs_on(unsigned long ip)
> -{
> -}
> -
> -void trace_softirqs_off(unsigned long ip)
> -{
> -}
> -
> -inline void print_irqtrace_events(struct task_struct *curr)
> -{
> -}
> +#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
> +	register_tracer(&preemptirqsoff_tracer);
>  #endif
>  
> -#if defined(CONFIG_PREEMPT_TRACER) || \
> -	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
> -void trace_preempt_on(unsigned long a0, unsigned long a1)
> -{
> -	trace_preempt_enable_rcuidle(a0, a1);
> -	tracer_preempt_on(a0, a1);
> -}
> -
> -void trace_preempt_off(unsigned long a0, unsigned long a1)
> -{
> -	trace_preempt_disable_rcuidle(a0, a1);
> -	tracer_preempt_off(a0, a1);
> +	return 0;
>  }
> -#endif
> +core_initcall(init_irqsoff_tracer);
> +#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
> diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
> new file mode 100644
> index 000000000000..dc01c7f4d326
> --- /dev/null
> +++ b/kernel/trace/trace_preemptirq.c
> @@ -0,0 +1,71 @@
> +/*
> + * preemptoff and irqoff tracepoints
> + *
> + * Copyright (C) Joel Fernandes (Google) <joel at joelfernandes.org>
> + */
> +
> +#include <linux/kallsyms.h>
> +#include <linux/uaccess.h>
> +#include <linux/module.h>
> +#include <linux/ftrace.h>
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/preemptirq.h>
> +
> +#ifdef CONFIG_TRACE_IRQFLAGS
> +/* Per-cpu variable to prevent redundant calls when IRQs already off */
> +static DEFINE_PER_CPU(int, tracing_irq_cpu);
> +
> +void trace_hardirqs_on(void)
> +{
> +	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> +	this_cpu_write(tracing_irq_cpu, 0);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_on);
> +
> +void trace_hardirqs_off(void)
> +{
> +	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	this_cpu_write(tracing_irq_cpu, 1);
> +	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_off);
> +
> +__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
> +{
> +	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> +	this_cpu_write(tracing_irq_cpu, 0);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_on_caller);
> +
> +__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
> +{
> +	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	this_cpu_write(tracing_irq_cpu, 1);
> +	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_off_caller);
> +#endif /* CONFIG_TRACE_IRQFLAGS */
> +
> +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
> +
> +void trace_preempt_on(unsigned long a0, unsigned long a1)
> +{
> +	trace_preempt_enable_rcuidle(a0, a1);
> +}
> +
> +void trace_preempt_off(unsigned long a0, unsigned long a1)
> +{
> +	trace_preempt_disable_rcuidle(a0, a1);
> +}
> +#endif
> -- 
> 2.17.0.921.gf22659ad46-goog
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 6/8] tracing: Centralize preemptirq tracepoints and unify their usage
@ 2018-05-31  1:56     ` namhyung
  0 siblings, 0 replies; 60+ messages in thread
From: Namhyung Kim @ 2018-05-31  1:56 UTC (permalink / raw)


On Tue, May 29, 2018@05:04:58PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> 
> This patch detaches the preemptirq tracepoints from the tracers and
> keeps it separate.
> 
> Advantages:
> * Lockdep and irqsoff event can now run in parallel since they no longer
> have their own calls.
> 
> * This unifies the usecase of adding hooks to an irqsoff and irqson
> event, and a preemptoff and preempton event.
>   3 users of the events exist:
>   - Lockdep
>   - irqsoff and preemptoff tracers
>   - irqs and preempt trace events
> 
> The unification cleans up several ifdefs and makes the code in preempt
> tracer and irqsoff tracers simpler. It gets rid of all the horrific
> ifdeferry around PROVE_LOCKING and makes configuration of the different
> users of the tracepoints more easy and understandable. It also gets rid
> of the time_* function calls from the lockdep hooks used to call into
> the preemptirq tracer which is not needed anymore. The negative delta in
> lines of code in this patch is quite large too.
> 
> In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
> as a single point for registering probes onto the tracepoints. With
> this,
> the web of config options for preempt/irq toggle tracepoints and its
> users becomes:
> 
>  PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
>        |                 |     \         |           |
>        \    (selects)    /      \        \ (selects) /
>       TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
>                       \                  /
>                        \ (depends on)   /
>                      PREEMPTIRQ_TRACEPOINTS
> 
> One note, I have to check for lockdep recursion in the code that calls
> the trace events API and bail out if we're in lockdep recursion
> protection to prevent something like the following case: a spin_lock is
> taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
> and then sets lockdep_recursion, and then calls __lockdep_acquired. In
> this function, a call to get_lock_stats happens which calls
> preempt_disable, which calls trace IRQS off somewhere which enters my
> tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
> This flag is then never cleared causing lockdep paths to never be
> entered and thus causing splats and other bad things.
> 
> Other than the performance tests mentioned in the previous patch, I also
> ran the locking API test suite. I verified that all tests cases are
> passing.
> 
> I also injected issues by not registering lockdep probes onto the
> tracepoints and I see failures to confirm that the probes are indeed
> working.
> 
> Without probes:
> 
> [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
> [    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> [    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> [    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> [    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> 
> With probes:
> 
> [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
> [    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
> [    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> [    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> [    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> [    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> 
> Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>

Reviewed-by: Namhyung Kim <namhyung at kernel.org>

Thanks,
Namhyung


> ---
>  include/linux/ftrace.h            |  11 +-
>  include/linux/irqflags.h          |  11 +-
>  include/linux/lockdep.h           |   8 +-
>  include/linux/preempt.h           |   2 +-
>  include/trace/events/preemptirq.h |  23 +--
>  init/main.c                       |   5 +-
>  kernel/locking/lockdep.c          |  35 ++---
>  kernel/sched/core.c               |   2 +-
>  kernel/trace/Kconfig              |  22 ++-
>  kernel/trace/Makefile             |   2 +-
>  kernel/trace/trace_irqsoff.c      | 231 ++++++++----------------------
>  kernel/trace/trace_preemptirq.c   |  71 +++++++++
>  12 files changed, 194 insertions(+), 229 deletions(-)
>  create mode 100644 kernel/trace/trace_preemptirq.c
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 9c3c9a319e48..5191030af0c0 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void)
>  	return CALLER_ADDR2;
>  }
>  
> -#ifdef CONFIG_IRQSOFF_TRACER
> -  extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
> -  extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
> -#else
> -  static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
> -  static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
> -#endif
> -
> -#if defined(CONFIG_PREEMPT_TRACER) || \
> -	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
> +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
>    extern void trace_preempt_on(unsigned long a0, unsigned long a1);
>    extern void trace_preempt_off(unsigned long a0, unsigned long a1);
>  #else
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 9700f00bbc04..50edb9cbbd26 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -15,9 +15,16 @@
>  #include <linux/typecheck.h>
>  #include <asm/irqflags.h>
>  
> -#ifdef CONFIG_TRACE_IRQFLAGS
> +/* Currently trace_softirqs_on/off is used only by lockdep */
> +#ifdef CONFIG_PROVE_LOCKING
>    extern void trace_softirqs_on(unsigned long ip);
>    extern void trace_softirqs_off(unsigned long ip);
> +#else
> +# define trace_softirqs_on(ip)	do { } while (0)
> +# define trace_softirqs_off(ip)	do { } while (0)
> +#endif
> +
> +#ifdef CONFIG_TRACE_IRQFLAGS
>    extern void trace_hardirqs_on(void);
>    extern void trace_hardirqs_off(void);
>  # define trace_hardirq_context(p)	((p)->hardirq_context)
> @@ -43,8 +50,6 @@ do {						\
>  #else
>  # define trace_hardirqs_on()		do { } while (0)
>  # define trace_hardirqs_off()		do { } while (0)
> -# define trace_softirqs_on(ip)		do { } while (0)
> -# define trace_softirqs_off(ip)		do { } while (0)
>  # define trace_hardirq_context(p)	0
>  # define trace_softirq_context(p)	0
>  # define trace_hardirqs_enabled(p)	0
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 6fc77d4dbdcd..a8113357ceeb 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -266,7 +266,8 @@ struct held_lock {
>  /*
>   * Initialization, self-test and debugging-output methods:
>   */
> -extern void lockdep_info(void);
> +extern void lockdep_init(void);
> +extern void lockdep_init_early(void);
>  extern void lockdep_reset(void);
>  extern void lockdep_reset_lock(struct lockdep_map *lock);
>  extern void lockdep_free_key_range(void *start, unsigned long size);
> @@ -406,7 +407,8 @@ static inline void lockdep_on(void)
>  # define lock_downgrade(l, i)			do { } while (0)
>  # define lock_set_class(l, n, k, s, i)		do { } while (0)
>  # define lock_set_subclass(l, s, i)		do { } while (0)
> -# define lockdep_info()				do { } while (0)
> +# define lockdep_init()				do { } while (0)
> +# define lockdep_init_early()			do { } while (0)
>  # define lockdep_init_map(lock, name, key, sub) \
>  		do { (void)(name); (void)(key); } while (0)
>  # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
> @@ -532,7 +534,7 @@ do {								\
>  
>  #endif /* CONFIG_LOCKDEP */
>  
> -#ifdef CONFIG_TRACE_IRQFLAGS
> +#ifdef CONFIG_PROVE_LOCKING
>  extern void print_irqtrace_events(struct task_struct *curr);
>  #else
>  static inline void print_irqtrace_events(struct task_struct *curr)
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index 5bd3f151da78..c01813c3fbe9 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -150,7 +150,7 @@
>   */
>  #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
>  
> -#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
> +#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
>  extern void preempt_count_add(int val);
>  extern void preempt_count_sub(int val);
>  #define preempt_count_dec_and_test() \
> diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
> index 9c4eb33c5a1d..9a0d4ceeb166 100644
> --- a/include/trace/events/preemptirq.h
> +++ b/include/trace/events/preemptirq.h
> @@ -1,4 +1,4 @@
> -#ifdef CONFIG_PREEMPTIRQ_EVENTS
> +#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS
>  
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM preemptirq
> @@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template,
>  		  (void *)((unsigned long)(_stext) + __entry->parent_offs))
>  );
>  
> -#ifndef CONFIG_PROVE_LOCKING
> +#ifdef CONFIG_TRACE_IRQFLAGS
>  DEFINE_EVENT(preemptirq_template, irq_disable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> @@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable,
>  DEFINE_EVENT(preemptirq_template, irq_enable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> +#else
> +#define trace_irq_enable(...)
> +#define trace_irq_disable(...)
> +#define trace_irq_enable_rcuidle(...)
> +#define trace_irq_disable_rcuidle(...)
>  #endif
>  
> -#ifdef CONFIG_DEBUG_PREEMPT
> +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
>  DEFINE_EVENT(preemptirq_template, preempt_disable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> @@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable,
>  DEFINE_EVENT(preemptirq_template, preempt_enable,
>  	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
>  	     TP_ARGS(ip, parent_ip));
> +#else
> +#define trace_preempt_enable(...)
> +#define trace_preempt_disable(...)
> +#define trace_preempt_enable_rcuidle(...)
> +#define trace_preempt_disable_rcuidle(...)
>  #endif
>  
>  #endif /* _TRACE_PREEMPTIRQ_H */
>  
>  #include <trace/define_trace.h>
>  
> -#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
> -
> -#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
> +#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */
>  #define trace_irq_enable(...)
>  #define trace_irq_disable(...)
>  #define trace_irq_enable_rcuidle(...)
>  #define trace_irq_disable_rcuidle(...)
> -#endif
> -
> -#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
>  #define trace_preempt_enable(...)
>  #define trace_preempt_disable(...)
>  #define trace_preempt_enable_rcuidle(...)
> diff --git a/init/main.c b/init/main.c
> index 3b4ada11ed52..44fe43be84c1 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void)
>  	profile_init();
>  	call_function_init();
>  	WARN(!irqs_disabled(), "Interrupts were enabled early\n");
> +
> +	lockdep_init_early();
> +
>  	early_boot_irqs_disabled = false;
>  	local_irq_enable();
>  
> @@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void)
>  		panic("Too many boot %s vars at `%s'", panic_later,
>  		      panic_param);
>  
> -	lockdep_info();
> +	lockdep_init();
>  
>  	/*
>  	 * Need to run this when irqs are enabled, because it wants
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 023386338269..871a42232858 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -55,6 +55,7 @@
>  
>  #include "lockdep_internals.h"
>  
> +#include <trace/events/preemptirq.h>
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/lock.h>
>  
> @@ -2841,10 +2842,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip)
>  	debug_atomic_inc(hardirqs_on_events);
>  }
>  
> -__visible void trace_hardirqs_on_caller(unsigned long ip)
> +static void lockdep_hardirqs_on(void *none, unsigned long ignore,
> +				unsigned long ip)
>  {
> -	time_hardirqs_on(CALLER_ADDR0, ip);
> -
>  	if (unlikely(!debug_locks || current->lockdep_recursion))
>  		return;
>  
> @@ -2883,23 +2883,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip)
>  	__trace_hardirqs_on_caller(ip);
>  	current->lockdep_recursion = 0;
>  }
> -EXPORT_SYMBOL(trace_hardirqs_on_caller);
> -
> -void trace_hardirqs_on(void)
> -{
> -	trace_hardirqs_on_caller(CALLER_ADDR0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_on);
>  
>  /*
>   * Hardirqs were disabled:
>   */
> -__visible void trace_hardirqs_off_caller(unsigned long ip)
> +static void lockdep_hardirqs_off(void *none, unsigned long ignore,
> +				 unsigned long ip)
>  {
>  	struct task_struct *curr = current;
>  
> -	time_hardirqs_off(CALLER_ADDR0, ip);
> -
>  	if (unlikely(!debug_locks || current->lockdep_recursion))
>  		return;
>  
> @@ -2921,13 +2913,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip)
>  	} else
>  		debug_atomic_inc(redundant_hardirqs_off);
>  }
> -EXPORT_SYMBOL(trace_hardirqs_off_caller);
> -
> -void trace_hardirqs_off(void)
> -{
> -	trace_hardirqs_off_caller(CALLER_ADDR0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_off);
>  
>  /*
>   * Softirqs will be enabled:
> @@ -4334,7 +4319,15 @@ void lockdep_reset_lock(struct lockdep_map *lock)
>  	raw_local_irq_restore(flags);
>  }
>  
> -void __init lockdep_info(void)
> +void __init lockdep_init_early(void)
> +{
> +#ifdef CONFIG_PROVE_LOCKING
> +	register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
> +	register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
> +#endif
> +}
> +
> +void __init lockdep_init(void)
>  {
>  	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
>  
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 092f7c4de903..4e9c2d254fba 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3189,7 +3189,7 @@ static inline void sched_tick_stop(int cpu) { }
>  #endif
>  
>  #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
> -				defined(CONFIG_PREEMPT_TRACER))
> +				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
>  /*
>   * If the value passed in is equal to the current preempt count
>   * then we just disabled preemption. Start timing the latency.
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index c4f0f2e4126e..0bcba2a76ad9 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP
>  	 Allow the use of ring_buffer_swap_cpu.
>  	 Adds a very slight overhead to tracing when enabled.
>  
> +config PREEMPTIRQ_TRACEPOINTS
> +	bool
> +	depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
> +	select TRACING
> +	default y
> +	help
> +	  Create preempt/irq toggle tracepoints if needed, so that other parts
> +	  of the kernel can use them to generate or add hooks to them.
> +
>  # All tracer options should select GENERIC_TRACER. For those options that are
>  # enabled by all tracers (context switch and event tracer) they select TRACING.
>  # This allows those options to appear when no other tracer is selected. But the
> @@ -159,18 +168,20 @@ config FUNCTION_GRAPH_TRACER
>  	  the return value. This is done by setting the current return
>  	  address on the current task structure into a stack of calls.
>  
> +config TRACE_PREEMPT_TOGGLE
> +	bool
> +	help
> +	  Enables hooks which will be called when preemption is first disabled,
> +	  and last enabled.
>  
>  config PREEMPTIRQ_EVENTS
>  	bool "Enable trace events for preempt and irq disable/enable"
>  	select TRACE_IRQFLAGS
> -	depends on DEBUG_PREEMPT || !PROVE_LOCKING
> -	depends on TRACING
> +	select TRACE_PREEMPT_TOGGLE if PREEMPT
> +	select GENERIC_TRACER
>  	default n
>  	help
>  	  Enable tracing of disable and enable events for preemption and irqs.
> -	  For tracing preempt disable/enable events, DEBUG_PREEMPT must be
> -	  enabled. For tracing irq disable/enable events, PROVE_LOCKING must
> -	  be disabled.
>  
>  config IRQSOFF_TRACER
>  	bool "Interrupts-off Latency Tracer"
> @@ -207,6 +218,7 @@ config PREEMPT_TRACER
>  	select RING_BUFFER_ALLOW_SWAP
>  	select TRACER_SNAPSHOT
>  	select TRACER_SNAPSHOT_PER_CPU_SWAP
> +	select TRACE_PREEMPT_TOGGLE
>  	help
>  	  This option measures the time spent in preemption-off critical
>  	  sections, with microsecond accuracy.
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index e2538c7638d4..84a0cb222f20 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o
>  obj-$(CONFIG_TRACING_MAP) += tracing_map.o
>  obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
>  obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
> -obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o
> +obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
>  obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
>  obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
>  obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index f8daa754cce2..770cd30cda40 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -16,7 +16,6 @@
>  
>  #include "trace.h"
>  
> -#define CREATE_TRACE_POINTS
>  #include <trace/events/preemptirq.h>
>  
>  #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER)
> @@ -450,66 +449,6 @@ void stop_critical_timings(void)
>  }
>  EXPORT_SYMBOL_GPL(stop_critical_timings);
>  
> -#ifdef CONFIG_IRQSOFF_TRACER
> -#ifdef CONFIG_PROVE_LOCKING
> -void time_hardirqs_on(unsigned long a0, unsigned long a1)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		stop_critical_timing(a0, a1);
> -}
> -
> -void time_hardirqs_off(unsigned long a0, unsigned long a1)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		start_critical_timing(a0, a1);
> -}
> -
> -#else /* !CONFIG_PROVE_LOCKING */
> -
> -/*
> - * We are only interested in hardirq on/off events:
> - */
> -static inline void tracer_hardirqs_on(void)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
> -}
> -
> -static inline void tracer_hardirqs_off(void)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
> -}
> -
> -static inline void tracer_hardirqs_on_caller(unsigned long caller_addr)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		stop_critical_timing(CALLER_ADDR0, caller_addr);
> -}
> -
> -static inline void tracer_hardirqs_off_caller(unsigned long caller_addr)
> -{
> -	if (!preempt_trace() && irq_trace())
> -		start_critical_timing(CALLER_ADDR0, caller_addr);
> -}
> -
> -#endif /* CONFIG_PROVE_LOCKING */
> -#endif /*  CONFIG_IRQSOFF_TRACER */
> -
> -#ifdef CONFIG_PREEMPT_TRACER
> -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1)
> -{
> -	if (preempt_trace() && !irq_trace())
> -		stop_critical_timing(a0, a1);
> -}
> -
> -static inline void tracer_preempt_off(unsigned long a0, unsigned long a1)
> -{
> -	if (preempt_trace() && !irq_trace())
> -		start_critical_timing(a0, a1);
> -}
> -#endif /* CONFIG_PREEMPT_TRACER */
> -
>  #ifdef CONFIG_FUNCTION_TRACER
>  static bool function_enabled;
>  
> @@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr)
>  }
>  
>  #ifdef CONFIG_IRQSOFF_TRACER
> +/*
> + * We are only interested in hardirq on/off events:
> + */
> +static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (!preempt_trace() && irq_trace())
> +		stop_critical_timing(a0, a1);
> +}
> +
> +static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (!preempt_trace() && irq_trace())
> +		start_critical_timing(a0, a1);
> +}
> +
>  static int irqsoff_tracer_init(struct trace_array *tr)
>  {
>  	trace_type = TRACER_IRQS_OFF;
>  
> +	register_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	register_trace_irq_enable(tracer_hardirqs_on, NULL);
>  	return __irqsoff_tracer_init(tr);
>  }
>  
>  static void irqsoff_tracer_reset(struct trace_array *tr)
>  {
> +	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
>  	__irqsoff_tracer_reset(tr);
>  }
>  
> @@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly =
>  	.allow_instances = true,
>  	.use_max_tr	= true,
>  };
> -# define register_irqsoff(trace) register_tracer(&trace)
> -#else
> -# define register_irqsoff(trace) do { } while (0)
> -#endif
> +#endif /*  CONFIG_IRQSOFF_TRACER */
>  
>  #ifdef CONFIG_PREEMPT_TRACER
> +static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (preempt_trace() && !irq_trace())
> +		stop_critical_timing(a0, a1);
> +}
> +
> +static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1)
> +{
> +	if (preempt_trace() && !irq_trace())
> +		start_critical_timing(a0, a1);
> +}
> +
>  static int preemptoff_tracer_init(struct trace_array *tr)
>  {
>  	trace_type = TRACER_PREEMPT_OFF;
>  
> +	register_trace_preempt_disable(tracer_preempt_off, NULL);
> +	register_trace_preempt_enable(tracer_preempt_on, NULL);
>  	return __irqsoff_tracer_init(tr);
>  }
>  
>  static void preemptoff_tracer_reset(struct trace_array *tr)
>  {
> +	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
> +	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
>  	__irqsoff_tracer_reset(tr);
>  }
>  
> @@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly =
>  	.allow_instances = true,
>  	.use_max_tr	= true,
>  };
> -# define register_preemptoff(trace) register_tracer(&trace)
> -#else
> -# define register_preemptoff(trace) do { } while (0)
> -#endif
> +#endif /* CONFIG_PREEMPT_TRACER */
>  
> -#if defined(CONFIG_IRQSOFF_TRACER) && \
> -	defined(CONFIG_PREEMPT_TRACER)
> +#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
>  
>  static int preemptirqsoff_tracer_init(struct trace_array *tr)
>  {
>  	trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
>  
> +	register_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	register_trace_irq_enable(tracer_hardirqs_on, NULL);
> +	register_trace_preempt_disable(tracer_preempt_off, NULL);
> +	register_trace_preempt_enable(tracer_preempt_on, NULL);
> +
>  	return __irqsoff_tracer_init(tr);
>  }
>  
>  static void preemptirqsoff_tracer_reset(struct trace_array *tr)
>  {
> +	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
> +	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
> +	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
> +	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
> +
>  	__irqsoff_tracer_reset(tr);
>  }
>  
> @@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
>  	.allow_instances = true,
>  	.use_max_tr	= true,
>  };
> -
> -# define register_preemptirqsoff(trace) register_tracer(&trace)
> -#else
> -# define register_preemptirqsoff(trace) do { } while (0)
>  #endif
>  
>  __init static int init_irqsoff_tracer(void)
>  {
> -	register_irqsoff(irqsoff_tracer);
> -	register_preemptoff(preemptoff_tracer);
> -	register_preemptirqsoff(preemptirqsoff_tracer);
> -
> -	return 0;
> -}
> -core_initcall(init_irqsoff_tracer);
> -#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
> -
> -#ifndef CONFIG_IRQSOFF_TRACER
> -static inline void tracer_hardirqs_on(void) { }
> -static inline void tracer_hardirqs_off(void) { }
> -static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { }
> -static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { }
> +#ifdef CONFIG_IRQSOFF_TRACER
> +	register_tracer(&irqsoff_tracer);
>  #endif
> -
> -#ifndef CONFIG_PREEMPT_TRACER
> -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { }
> -static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { }
> +#ifdef CONFIG_PREEMPT_TRACER
> +	register_tracer(&preemptoff_tracer);
>  #endif
> -
> -#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING)
> -/* Per-cpu variable to prevent redundant calls when IRQs already off */
> -static DEFINE_PER_CPU(int, tracing_irq_cpu);
> -
> -void trace_hardirqs_on(void)
> -{
> -	if (!this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> -	tracer_hardirqs_on();
> -
> -	this_cpu_write(tracing_irq_cpu, 0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_on);
> -
> -void trace_hardirqs_off(void)
> -{
> -	if (this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	this_cpu_write(tracing_irq_cpu, 1);
> -
> -	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> -	tracer_hardirqs_off();
> -}
> -EXPORT_SYMBOL(trace_hardirqs_off);
> -
> -__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
> -{
> -	if (!this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> -	tracer_hardirqs_on_caller(caller_addr);
> -
> -	this_cpu_write(tracing_irq_cpu, 0);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_on_caller);
> -
> -__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
> -{
> -	if (this_cpu_read(tracing_irq_cpu))
> -		return;
> -
> -	this_cpu_write(tracing_irq_cpu, 1);
> -
> -	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> -	tracer_hardirqs_off_caller(caller_addr);
> -}
> -EXPORT_SYMBOL(trace_hardirqs_off_caller);
> -
> -/*
> - * Stubs:
> - */
> -
> -void trace_softirqs_on(unsigned long ip)
> -{
> -}
> -
> -void trace_softirqs_off(unsigned long ip)
> -{
> -}
> -
> -inline void print_irqtrace_events(struct task_struct *curr)
> -{
> -}
> +#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
> +	register_tracer(&preemptirqsoff_tracer);
>  #endif
>  
> -#if defined(CONFIG_PREEMPT_TRACER) || \
> -	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
> -void trace_preempt_on(unsigned long a0, unsigned long a1)
> -{
> -	trace_preempt_enable_rcuidle(a0, a1);
> -	tracer_preempt_on(a0, a1);
> -}
> -
> -void trace_preempt_off(unsigned long a0, unsigned long a1)
> -{
> -	trace_preempt_disable_rcuidle(a0, a1);
> -	tracer_preempt_off(a0, a1);
> +	return 0;
>  }
> -#endif
> +core_initcall(init_irqsoff_tracer);
> +#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
> diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
> new file mode 100644
> index 000000000000..dc01c7f4d326
> --- /dev/null
> +++ b/kernel/trace/trace_preemptirq.c
> @@ -0,0 +1,71 @@
> +/*
> + * preemptoff and irqoff tracepoints
> + *
> + * Copyright (C) Joel Fernandes (Google) <joel at joelfernandes.org>
> + */
> +
> +#include <linux/kallsyms.h>
> +#include <linux/uaccess.h>
> +#include <linux/module.h>
> +#include <linux/ftrace.h>
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/preemptirq.h>
> +
> +#ifdef CONFIG_TRACE_IRQFLAGS
> +/* Per-cpu variable to prevent redundant calls when IRQs already off */
> +static DEFINE_PER_CPU(int, tracing_irq_cpu);
> +
> +void trace_hardirqs_on(void)
> +{
> +	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> +	this_cpu_write(tracing_irq_cpu, 0);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_on);
> +
> +void trace_hardirqs_off(void)
> +{
> +	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	this_cpu_write(tracing_irq_cpu, 1);
> +	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_off);
> +
> +__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
> +{
> +	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> +	this_cpu_write(tracing_irq_cpu, 0);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_on_caller);
> +
> +__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
> +{
> +	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
> +		return;
> +
> +	this_cpu_write(tracing_irq_cpu, 1);
> +	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> +}
> +EXPORT_SYMBOL(trace_hardirqs_off_caller);
> +#endif /* CONFIG_TRACE_IRQFLAGS */
> +
> +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
> +
> +void trace_preempt_on(unsigned long a0, unsigned long a1)
> +{
> +	trace_preempt_enable_rcuidle(a0, a1);
> +}
> +
> +void trace_preempt_off(unsigned long a0, unsigned long a1)
> +{
> +	trace_preempt_disable_rcuidle(a0, a1);
> +}
> +#endif
> -- 
> 2.17.0.921.gf22659ad46-goog
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 6/8] tracing: Centralize preemptirq tracepoints and unify their usage
  2018-05-31  1:56     ` namhyung
  (?)
@ 2018-05-31  6:26       ` joel
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-31  6:26 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Joel Fernandes, linux-kernel, Boqun Feng, Byungchul Park,
	Erick Reyes, Ingo Molnar, Julia Cartwright, linux-kselftest,
	Masami Hiramatsu, Mathieu Desnoyers, Paul McKenney,
	Peter Zijlstra, Shuah Khan, Steven Rostedt, Thomas Glexiner,
	Todd Kjos, Tom Zanussi, kernel-team

On Thu, May 31, 2018 at 10:56:39AM +0900, Namhyung Kim wrote:
> On Tue, May 29, 2018 at 05:04:58PM -0700, Joel Fernandes wrote:
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > 
> > This patch detaches the preemptirq tracepoints from the tracers and
> > keeps it separate.
> > 
> > Advantages:
> > * Lockdep and irqsoff event can now run in parallel since they no longer
> > have their own calls.
> > 
> > * This unifies the usecase of adding hooks to an irqsoff and irqson
> > event, and a preemptoff and preempton event.
> >   3 users of the events exist:
> >   - Lockdep
> >   - irqsoff and preemptoff tracers
> >   - irqs and preempt trace events
> > 
> > The unification cleans up several ifdefs and makes the code in preempt
> > tracer and irqsoff tracers simpler. It gets rid of all the horrific
> > ifdeferry around PROVE_LOCKING and makes configuration of the different
> > users of the tracepoints more easy and understandable. It also gets rid
> > of the time_* function calls from the lockdep hooks used to call into
> > the preemptirq tracer which is not needed anymore. The negative delta in
> > lines of code in this patch is quite large too.
> > 
> > In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
> > as a single point for registering probes onto the tracepoints. With
> > this,
> > the web of config options for preempt/irq toggle tracepoints and its
> > users becomes:
> > 
> >  PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
> >        |                 |     \         |           |
> >        \    (selects)    /      \        \ (selects) /
> >       TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
> >                       \                  /
> >                        \ (depends on)   /
> >                      PREEMPTIRQ_TRACEPOINTS
> > 
> > One note, I have to check for lockdep recursion in the code that calls
> > the trace events API and bail out if we're in lockdep recursion
> > protection to prevent something like the following case: a spin_lock is
> > taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
> > and then sets lockdep_recursion, and then calls __lockdep_acquired. In
> > this function, a call to get_lock_stats happens which calls
> > preempt_disable, which calls trace IRQS off somewhere which enters my
> > tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
> > This flag is then never cleared causing lockdep paths to never be
> > entered and thus causing splats and other bad things.
> > 
> > Other than the performance tests mentioned in the previous patch, I also
> > ran the locking API test suite. I verified that all tests cases are
> > passing.
> > 
> > I also injected issues by not registering lockdep probes onto the
> > tracepoints and I see failures to confirm that the probes are indeed
> > working.
> > 
> > Without probes:
> > 
> > [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> > [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > 
> > With probes:
> > 
> > [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> > [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Reviewed-by: Namhyung Kim <namhyung@kernel.org>

Thanks a lot Namhyung for the review, Also I swapped the 'With probes' and
'Without probes' in the commit message above. Below is the updated patch with
commit message fixed up and your Reviewed-by.

Steve, Ingo, does this patch series look Ok to you?

---8<-----------------------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Mon, 16 Apr 2018 20:18:00 -0700
Subject: [PATCH v8.1 6/8] tracing: Centralize preemptirq tracepoints and unify their
 usage

This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.

Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.

* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
  3 users of the events exist:
  - Lockdep
  - irqsoff and preemptoff tracers
  - irqs and preempt trace events

The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.

In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:

 PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
       |                 |     \         |           |
       \    (selects)    /      \        \ (selects) /
      TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
                      \                  /
                       \ (depends on)   /
                     PREEMPTIRQ_TRACEPOINTS

One note, I have to check for lockdep recursion in the code that calls
the trace events API and bail out if we're in lockdep recursion
protection to prevent something like the following case: a spin_lock is
taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
and then sets lockdep_recursion, and then calls __lockdep_acquired. In
this function, a call to get_lock_stats happens which calls
preempt_disable, which calls trace IRQS off somewhere which enters my
tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
This flag is then never cleared causing lockdep paths to never be
entered and thus causing splats and other bad things.

Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.

I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.

This series + lockdep probes not registered (just to inject errors):
[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

With this series + lockdep probes registered, all locking tests pass:

[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/ftrace.h            |  11 +-
 include/linux/irqflags.h          |  11 +-
 include/linux/lockdep.h           |   8 +-
 include/linux/preempt.h           |   2 +-
 include/trace/events/preemptirq.h |  23 +--
 init/main.c                       |   5 +-
 kernel/locking/lockdep.c          |  35 ++---
 kernel/sched/core.c               |   2 +-
 kernel/trace/Kconfig              |  22 ++-
 kernel/trace/Makefile             |   2 +-
 kernel/trace/trace_irqsoff.c      | 231 ++++++++----------------------
 kernel/trace/trace_preemptirq.c   |  71 +++++++++
 12 files changed, 194 insertions(+), 229 deletions(-)
 create mode 100644 kernel/trace/trace_preemptirq.c

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9c3c9a319e48..5191030af0c0 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void)
 	return CALLER_ADDR2;
 }
 
-#ifdef CONFIG_IRQSOFF_TRACER
-  extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
-  extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
-#else
-  static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
-  static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
-#endif
-
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
   extern void trace_preempt_on(unsigned long a0, unsigned long a1);
   extern void trace_preempt_off(unsigned long a0, unsigned long a1);
 #else
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 9700f00bbc04..50edb9cbbd26 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -15,9 +15,16 @@
 #include <linux/typecheck.h>
 #include <asm/irqflags.h>
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+/* Currently trace_softirqs_on/off is used only by lockdep */
+#ifdef CONFIG_PROVE_LOCKING
   extern void trace_softirqs_on(unsigned long ip);
   extern void trace_softirqs_off(unsigned long ip);
+#else
+# define trace_softirqs_on(ip)	do { } while (0)
+# define trace_softirqs_off(ip)	do { } while (0)
+#endif
+
+#ifdef CONFIG_TRACE_IRQFLAGS
   extern void trace_hardirqs_on(void);
   extern void trace_hardirqs_off(void);
 # define trace_hardirq_context(p)	((p)->hardirq_context)
@@ -43,8 +50,6 @@ do {						\
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
-# define trace_softirqs_on(ip)		do { } while (0)
-# define trace_softirqs_off(ip)		do { } while (0)
 # define trace_hardirq_context(p)	0
 # define trace_softirq_context(p)	0
 # define trace_hardirqs_enabled(p)	0
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..a8113357ceeb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -266,7 +266,8 @@ struct held_lock {
 /*
  * Initialization, self-test and debugging-output methods:
  */
-extern void lockdep_info(void);
+extern void lockdep_init(void);
+extern void lockdep_init_early(void);
 extern void lockdep_reset(void);
 extern void lockdep_reset_lock(struct lockdep_map *lock);
 extern void lockdep_free_key_range(void *start, unsigned long size);
@@ -406,7 +407,8 @@ static inline void lockdep_on(void)
 # define lock_downgrade(l, i)			do { } while (0)
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
-# define lockdep_info()				do { } while (0)
+# define lockdep_init()				do { } while (0)
+# define lockdep_init_early()			do { } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
@@ -532,7 +534,7 @@ do {								\
 
 #endif /* CONFIG_LOCKDEP */
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_PROVE_LOCKING
 extern void print_irqtrace_events(struct task_struct *curr);
 #else
 static inline void print_irqtrace_events(struct task_struct *curr)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 5bd3f151da78..c01813c3fbe9 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -150,7 +150,7 @@
  */
 #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
 
-#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
 extern void preempt_count_add(int val);
 extern void preempt_count_sub(int val);
 #define preempt_count_dec_and_test() \
diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
index 9c4eb33c5a1d..9a0d4ceeb166 100644
--- a/include/trace/events/preemptirq.h
+++ b/include/trace/events/preemptirq.h
@@ -1,4 +1,4 @@
-#ifdef CONFIG_PREEMPTIRQ_EVENTS
+#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM preemptirq
@@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template,
 		  (void *)((unsigned long)(_stext) + __entry->parent_offs))
 );
 
-#ifndef CONFIG_PROVE_LOCKING
+#ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_EVENT(preemptirq_template, irq_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable,
 DEFINE_EVENT(preemptirq_template, irq_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_irq_enable(...)
+#define trace_irq_disable(...)
+#define trace_irq_enable_rcuidle(...)
+#define trace_irq_disable_rcuidle(...)
 #endif
 
-#ifdef CONFIG_DEBUG_PREEMPT
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
 DEFINE_EVENT(preemptirq_template, preempt_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable,
 DEFINE_EVENT(preemptirq_template, preempt_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_preempt_enable(...)
+#define trace_preempt_disable(...)
+#define trace_preempt_enable_rcuidle(...)
+#define trace_preempt_disable_rcuidle(...)
 #endif
 
 #endif /* _TRACE_PREEMPTIRQ_H */
 
 #include <trace/define_trace.h>
 
-#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
+#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */
 #define trace_irq_enable(...)
 #define trace_irq_disable(...)
 #define trace_irq_enable_rcuidle(...)
 #define trace_irq_disable_rcuidle(...)
-#endif
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
 #define trace_preempt_enable(...)
 #define trace_preempt_disable(...)
 #define trace_preempt_enable_rcuidle(...)
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..44fe43be84c1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void)
 	profile_init();
 	call_function_init();
 	WARN(!irqs_disabled(), "Interrupts were enabled early\n");
+
+	lockdep_init_early();
+
 	early_boot_irqs_disabled = false;
 	local_irq_enable();
 
@@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void)
 		panic("Too many boot %s vars at `%s'", panic_later,
 		      panic_param);
 
-	lockdep_info();
+	lockdep_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 023386338269..871a42232858 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -55,6 +55,7 @@
 
 #include "lockdep_internals.h"
 
+#include <trace/events/preemptirq.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/lock.h>
 
@@ -2841,10 +2842,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip)
 	debug_atomic_inc(hardirqs_on_events);
 }
 
-__visible void trace_hardirqs_on_caller(unsigned long ip)
+static void lockdep_hardirqs_on(void *none, unsigned long ignore,
+				unsigned long ip)
 {
-	time_hardirqs_on(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2883,23 +2883,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip)
 	__trace_hardirqs_on_caller(ip);
 	current->lockdep_recursion = 0;
 }
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-void trace_hardirqs_on(void)
-{
-	trace_hardirqs_on_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
 
 /*
  * Hardirqs were disabled:
  */
-__visible void trace_hardirqs_off_caller(unsigned long ip)
+static void lockdep_hardirqs_off(void *none, unsigned long ignore,
+				 unsigned long ip)
 {
 	struct task_struct *curr = current;
 
-	time_hardirqs_off(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2921,13 +2913,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip)
 	} else
 		debug_atomic_inc(redundant_hardirqs_off);
 }
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-void trace_hardirqs_off(void)
-{
-	trace_hardirqs_off_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
 
 /*
  * Softirqs will be enabled:
@@ -4334,7 +4319,15 @@ void lockdep_reset_lock(struct lockdep_map *lock)
 	raw_local_irq_restore(flags);
 }
 
-void __init lockdep_info(void)
+void __init lockdep_init_early(void)
+{
+#ifdef CONFIG_PROVE_LOCKING
+	register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
+	register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
+#endif
+}
+
+void __init lockdep_init(void)
 {
 	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 092f7c4de903..4e9c2d254fba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3189,7 +3189,7 @@ static inline void sched_tick_stop(int cpu) { }
 #endif
 
 #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
-				defined(CONFIG_PREEMPT_TRACER))
+				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
 /*
  * If the value passed in is equal to the current preempt count
  * then we just disabled preemption. Start timing the latency.
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index c4f0f2e4126e..0bcba2a76ad9 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP
 	 Allow the use of ring_buffer_swap_cpu.
 	 Adds a very slight overhead to tracing when enabled.
 
+config PREEMPTIRQ_TRACEPOINTS
+	bool
+	depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
+	select TRACING
+	default y
+	help
+	  Create preempt/irq toggle tracepoints if needed, so that other parts
+	  of the kernel can use them to generate or add hooks to them.
+
 # All tracer options should select GENERIC_TRACER. For those options that are
 # enabled by all tracers (context switch and event tracer) they select TRACING.
 # This allows those options to appear when no other tracer is selected. But the
@@ -159,18 +168,20 @@ config FUNCTION_GRAPH_TRACER
 	  the return value. This is done by setting the current return
 	  address on the current task structure into a stack of calls.
 
+config TRACE_PREEMPT_TOGGLE
+	bool
+	help
+	  Enables hooks which will be called when preemption is first disabled,
+	  and last enabled.
 
 config PREEMPTIRQ_EVENTS
 	bool "Enable trace events for preempt and irq disable/enable"
 	select TRACE_IRQFLAGS
-	depends on DEBUG_PREEMPT || !PROVE_LOCKING
-	depends on TRACING
+	select TRACE_PREEMPT_TOGGLE if PREEMPT
+	select GENERIC_TRACER
 	default n
 	help
 	  Enable tracing of disable and enable events for preemption and irqs.
-	  For tracing preempt disable/enable events, DEBUG_PREEMPT must be
-	  enabled. For tracing irq disable/enable events, PROVE_LOCKING must
-	  be disabled.
 
 config IRQSOFF_TRACER
 	bool "Interrupts-off Latency Tracer"
@@ -207,6 +218,7 @@ config PREEMPT_TRACER
 	select RING_BUFFER_ALLOW_SWAP
 	select TRACER_SNAPSHOT
 	select TRACER_SNAPSHOT_PER_CPU_SWAP
+	select TRACE_PREEMPT_TOGGLE
 	help
 	  This option measures the time spent in preemption-off critical
 	  sections, with microsecond accuracy.
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e2538c7638d4..84a0cb222f20 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o
 obj-$(CONFIG_TRACING_MAP) += tracing_map.o
 obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
-obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o
+obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
 obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index f8daa754cce2..770cd30cda40 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -16,7 +16,6 @@
 
 #include "trace.h"
 
-#define CREATE_TRACE_POINTS
 #include <trace/events/preemptirq.h>
 
 #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER)
@@ -450,66 +449,6 @@ void stop_critical_timings(void)
 }
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 
-#ifdef CONFIG_IRQSOFF_TRACER
-#ifdef CONFIG_PROVE_LOCKING
-void time_hardirqs_on(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-void time_hardirqs_off(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(a0, a1);
-}
-
-#else /* !CONFIG_PROVE_LOCKING */
-
-/*
- * We are only interested in hardirq on/off events:
- */
-static inline void tracer_hardirqs_on(void)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_off(void)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-#endif /* CONFIG_PROVE_LOCKING */
-#endif /*  CONFIG_IRQSOFF_TRACER */
-
-#ifdef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		start_critical_timing(a0, a1);
-}
-#endif /* CONFIG_PREEMPT_TRACER */
-
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
@@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr)
 }
 
 #ifdef CONFIG_IRQSOFF_TRACER
+/*
+ * We are only interested in hardirq on/off events:
+ */
+static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int irqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void irqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_irqsoff(trace) register_tracer(&trace)
-#else
-# define register_irqsoff(trace) do { } while (0)
-#endif
+#endif /*  CONFIG_IRQSOFF_TRACER */
 
 #ifdef CONFIG_PREEMPT_TRACER
+static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int preemptoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_PREEMPT_OFF;
 
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_preemptoff(trace) register_tracer(&trace)
-#else
-# define register_preemptoff(trace) do { } while (0)
-#endif
+#endif /* CONFIG_PREEMPT_TRACER */
 
-#if defined(CONFIG_IRQSOFF_TRACER) && \
-	defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
 
 static int preemptirqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptirqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-
-# define register_preemptirqsoff(trace) register_tracer(&trace)
-#else
-# define register_preemptirqsoff(trace) do { } while (0)
 #endif
 
 __init static int init_irqsoff_tracer(void)
 {
-	register_irqsoff(irqsoff_tracer);
-	register_preemptoff(preemptoff_tracer);
-	register_preemptirqsoff(preemptirqsoff_tracer);
-
-	return 0;
-}
-core_initcall(init_irqsoff_tracer);
-#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
-
-#ifndef CONFIG_IRQSOFF_TRACER
-static inline void tracer_hardirqs_on(void) { }
-static inline void tracer_hardirqs_off(void) { }
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { }
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { }
+#ifdef CONFIG_IRQSOFF_TRACER
+	register_tracer(&irqsoff_tracer);
 #endif
-
-#ifndef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { }
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { }
+#ifdef CONFIG_PREEMPT_TRACER
+	register_tracer(&preemptoff_tracer);
 #endif
-
-#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING)
-/* Per-cpu variable to prevent redundant calls when IRQs already off */
-static DEFINE_PER_CPU(int, tracing_irq_cpu);
-
-void trace_hardirqs_on(void)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_on();
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
-
-void trace_hardirqs_off(void)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_off();
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
-
-__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_on_caller(caller_addr);
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_off_caller(caller_addr);
-}
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-/*
- * Stubs:
- */
-
-void trace_softirqs_on(unsigned long ip)
-{
-}
-
-void trace_softirqs_off(unsigned long ip)
-{
-}
-
-inline void print_irqtrace_events(struct task_struct *curr)
-{
-}
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
+	register_tracer(&preemptirqsoff_tracer);
 #endif
 
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
-void trace_preempt_on(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_enable_rcuidle(a0, a1);
-	tracer_preempt_on(a0, a1);
-}
-
-void trace_preempt_off(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_disable_rcuidle(a0, a1);
-	tracer_preempt_off(a0, a1);
+	return 0;
 }
-#endif
+core_initcall(init_irqsoff_tracer);
+#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
new file mode 100644
index 000000000000..dc01c7f4d326
--- /dev/null
+++ b/kernel/trace/trace_preemptirq.c
@@ -0,0 +1,71 @@
+/*
+ * preemptoff and irqoff tracepoints
+ *
+ * Copyright (C) Joel Fernandes (Google) <joel@joelfernandes.org>
+ */
+
+#include <linux/kallsyms.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/ftrace.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/preemptirq.h>
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+/* Per-cpu variable to prevent redundant calls when IRQs already off */
+static DEFINE_PER_CPU(int, tracing_irq_cpu);
+
+void trace_hardirqs_on(void)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on);
+
+void trace_hardirqs_off(void)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+}
+EXPORT_SYMBOL(trace_hardirqs_off);
+
+__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on_caller);
+
+__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+}
+EXPORT_SYMBOL(trace_hardirqs_off_caller);
+#endif /* CONFIG_TRACE_IRQFLAGS */
+
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
+
+void trace_preempt_on(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_enable_rcuidle(a0, a1);
+}
+
+void trace_preempt_off(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_disable_rcuidle(a0, a1);
+}
+#endif
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 6/8] tracing: Centralize preemptirq tracepoints and unify their usage
@ 2018-05-31  6:26       ` joel
  0 siblings, 0 replies; 60+ messages in thread
From: joel @ 2018-05-31  6:26 UTC (permalink / raw)


On Thu, May 31, 2018 at 10:56:39AM +0900, Namhyung Kim wrote:
> On Tue, May 29, 2018 at 05:04:58PM -0700, Joel Fernandes wrote:
> > From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> > 
> > This patch detaches the preemptirq tracepoints from the tracers and
> > keeps it separate.
> > 
> > Advantages:
> > * Lockdep and irqsoff event can now run in parallel since they no longer
> > have their own calls.
> > 
> > * This unifies the usecase of adding hooks to an irqsoff and irqson
> > event, and a preemptoff and preempton event.
> >   3 users of the events exist:
> >   - Lockdep
> >   - irqsoff and preemptoff tracers
> >   - irqs and preempt trace events
> > 
> > The unification cleans up several ifdefs and makes the code in preempt
> > tracer and irqsoff tracers simpler. It gets rid of all the horrific
> > ifdeferry around PROVE_LOCKING and makes configuration of the different
> > users of the tracepoints more easy and understandable. It also gets rid
> > of the time_* function calls from the lockdep hooks used to call into
> > the preemptirq tracer which is not needed anymore. The negative delta in
> > lines of code in this patch is quite large too.
> > 
> > In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
> > as a single point for registering probes onto the tracepoints. With
> > this,
> > the web of config options for preempt/irq toggle tracepoints and its
> > users becomes:
> > 
> >  PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
> >        |                 |     \         |           |
> >        \    (selects)    /      \        \ (selects) /
> >       TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
> >                       \                  /
> >                        \ (depends on)   /
> >                      PREEMPTIRQ_TRACEPOINTS
> > 
> > One note, I have to check for lockdep recursion in the code that calls
> > the trace events API and bail out if we're in lockdep recursion
> > protection to prevent something like the following case: a spin_lock is
> > taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
> > and then sets lockdep_recursion, and then calls __lockdep_acquired. In
> > this function, a call to get_lock_stats happens which calls
> > preempt_disable, which calls trace IRQS off somewhere which enters my
> > tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
> > This flag is then never cleared causing lockdep paths to never be
> > entered and thus causing splats and other bad things.
> > 
> > Other than the performance tests mentioned in the previous patch, I also
> > ran the locking API test suite. I verified that all tests cases are
> > passing.
> > 
> > I also injected issues by not registering lockdep probes onto the
> > tracepoints and I see failures to confirm that the probes are indeed
> > working.
> > 
> > Without probes:
> > 
> > [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> > [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > 
> > With probes:
> > 
> > [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> > [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
> 
> Reviewed-by: Namhyung Kim <namhyung at kernel.org>

Thanks a lot Namhyung for the review, Also I swapped the 'With probes' and
'Without probes' in the commit message above. Below is the updated patch with
commit message fixed up and your Reviewed-by.

Steve, Ingo, does this patch series look Ok to you?

---8<-----------------------

From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
Date: Mon, 16 Apr 2018 20:18:00 -0700
Subject: [PATCH v8.1 6/8] tracing: Centralize preemptirq tracepoints and unify their
 usage

This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.

Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.

* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
  3 users of the events exist:
  - Lockdep
  - irqsoff and preemptoff tracers
  - irqs and preempt trace events

The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.

In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:

 PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
       |                 |     \         |           |
       \    (selects)    /      \        \ (selects) /
      TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
                      \                  /
                       \ (depends on)   /
                     PREEMPTIRQ_TRACEPOINTS

One note, I have to check for lockdep recursion in the code that calls
the trace events API and bail out if we're in lockdep recursion
protection to prevent something like the following case: a spin_lock is
taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
and then sets lockdep_recursion, and then calls __lockdep_acquired. In
this function, a call to get_lock_stats happens which calls
preempt_disable, which calls trace IRQS off somewhere which enters my
tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
This flag is then never cleared causing lockdep paths to never be
entered and thus causing splats and other bad things.

Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.

I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.

This series + lockdep probes not registered (just to inject errors):
[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

With this series + lockdep probes registered, all locking tests pass:

[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

Reviewed-by: Namhyung Kim <namhyung at kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/ftrace.h            |  11 +-
 include/linux/irqflags.h          |  11 +-
 include/linux/lockdep.h           |   8 +-
 include/linux/preempt.h           |   2 +-
 include/trace/events/preemptirq.h |  23 +--
 init/main.c                       |   5 +-
 kernel/locking/lockdep.c          |  35 ++---
 kernel/sched/core.c               |   2 +-
 kernel/trace/Kconfig              |  22 ++-
 kernel/trace/Makefile             |   2 +-
 kernel/trace/trace_irqsoff.c      | 231 ++++++++----------------------
 kernel/trace/trace_preemptirq.c   |  71 +++++++++
 12 files changed, 194 insertions(+), 229 deletions(-)
 create mode 100644 kernel/trace/trace_preemptirq.c

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9c3c9a319e48..5191030af0c0 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void)
 	return CALLER_ADDR2;
 }
 
-#ifdef CONFIG_IRQSOFF_TRACER
-  extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
-  extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
-#else
-  static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
-  static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
-#endif
-
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
   extern void trace_preempt_on(unsigned long a0, unsigned long a1);
   extern void trace_preempt_off(unsigned long a0, unsigned long a1);
 #else
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 9700f00bbc04..50edb9cbbd26 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -15,9 +15,16 @@
 #include <linux/typecheck.h>
 #include <asm/irqflags.h>
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+/* Currently trace_softirqs_on/off is used only by lockdep */
+#ifdef CONFIG_PROVE_LOCKING
   extern void trace_softirqs_on(unsigned long ip);
   extern void trace_softirqs_off(unsigned long ip);
+#else
+# define trace_softirqs_on(ip)	do { } while (0)
+# define trace_softirqs_off(ip)	do { } while (0)
+#endif
+
+#ifdef CONFIG_TRACE_IRQFLAGS
   extern void trace_hardirqs_on(void);
   extern void trace_hardirqs_off(void);
 # define trace_hardirq_context(p)	((p)->hardirq_context)
@@ -43,8 +50,6 @@ do {						\
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
-# define trace_softirqs_on(ip)		do { } while (0)
-# define trace_softirqs_off(ip)		do { } while (0)
 # define trace_hardirq_context(p)	0
 # define trace_softirq_context(p)	0
 # define trace_hardirqs_enabled(p)	0
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..a8113357ceeb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -266,7 +266,8 @@ struct held_lock {
 /*
  * Initialization, self-test and debugging-output methods:
  */
-extern void lockdep_info(void);
+extern void lockdep_init(void);
+extern void lockdep_init_early(void);
 extern void lockdep_reset(void);
 extern void lockdep_reset_lock(struct lockdep_map *lock);
 extern void lockdep_free_key_range(void *start, unsigned long size);
@@ -406,7 +407,8 @@ static inline void lockdep_on(void)
 # define lock_downgrade(l, i)			do { } while (0)
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
-# define lockdep_info()				do { } while (0)
+# define lockdep_init()				do { } while (0)
+# define lockdep_init_early()			do { } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
@@ -532,7 +534,7 @@ do {								\
 
 #endif /* CONFIG_LOCKDEP */
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_PROVE_LOCKING
 extern void print_irqtrace_events(struct task_struct *curr);
 #else
 static inline void print_irqtrace_events(struct task_struct *curr)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 5bd3f151da78..c01813c3fbe9 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -150,7 +150,7 @@
  */
 #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
 
-#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
 extern void preempt_count_add(int val);
 extern void preempt_count_sub(int val);
 #define preempt_count_dec_and_test() \
diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
index 9c4eb33c5a1d..9a0d4ceeb166 100644
--- a/include/trace/events/preemptirq.h
+++ b/include/trace/events/preemptirq.h
@@ -1,4 +1,4 @@
-#ifdef CONFIG_PREEMPTIRQ_EVENTS
+#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM preemptirq
@@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template,
 		  (void *)((unsigned long)(_stext) + __entry->parent_offs))
 );
 
-#ifndef CONFIG_PROVE_LOCKING
+#ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_EVENT(preemptirq_template, irq_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable,
 DEFINE_EVENT(preemptirq_template, irq_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_irq_enable(...)
+#define trace_irq_disable(...)
+#define trace_irq_enable_rcuidle(...)
+#define trace_irq_disable_rcuidle(...)
 #endif
 
-#ifdef CONFIG_DEBUG_PREEMPT
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
 DEFINE_EVENT(preemptirq_template, preempt_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable,
 DEFINE_EVENT(preemptirq_template, preempt_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_preempt_enable(...)
+#define trace_preempt_disable(...)
+#define trace_preempt_enable_rcuidle(...)
+#define trace_preempt_disable_rcuidle(...)
 #endif
 
 #endif /* _TRACE_PREEMPTIRQ_H */
 
 #include <trace/define_trace.h>
 
-#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
+#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */
 #define trace_irq_enable(...)
 #define trace_irq_disable(...)
 #define trace_irq_enable_rcuidle(...)
 #define trace_irq_disable_rcuidle(...)
-#endif
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
 #define trace_preempt_enable(...)
 #define trace_preempt_disable(...)
 #define trace_preempt_enable_rcuidle(...)
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..44fe43be84c1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void)
 	profile_init();
 	call_function_init();
 	WARN(!irqs_disabled(), "Interrupts were enabled early\n");
+
+	lockdep_init_early();
+
 	early_boot_irqs_disabled = false;
 	local_irq_enable();
 
@@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void)
 		panic("Too many boot %s vars at `%s'", panic_later,
 		      panic_param);
 
-	lockdep_info();
+	lockdep_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 023386338269..871a42232858 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -55,6 +55,7 @@
 
 #include "lockdep_internals.h"
 
+#include <trace/events/preemptirq.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/lock.h>
 
@@ -2841,10 +2842,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip)
 	debug_atomic_inc(hardirqs_on_events);
 }
 
-__visible void trace_hardirqs_on_caller(unsigned long ip)
+static void lockdep_hardirqs_on(void *none, unsigned long ignore,
+				unsigned long ip)
 {
-	time_hardirqs_on(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2883,23 +2883,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip)
 	__trace_hardirqs_on_caller(ip);
 	current->lockdep_recursion = 0;
 }
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-void trace_hardirqs_on(void)
-{
-	trace_hardirqs_on_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
 
 /*
  * Hardirqs were disabled:
  */
-__visible void trace_hardirqs_off_caller(unsigned long ip)
+static void lockdep_hardirqs_off(void *none, unsigned long ignore,
+				 unsigned long ip)
 {
 	struct task_struct *curr = current;
 
-	time_hardirqs_off(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2921,13 +2913,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip)
 	} else
 		debug_atomic_inc(redundant_hardirqs_off);
 }
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-void trace_hardirqs_off(void)
-{
-	trace_hardirqs_off_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
 
 /*
  * Softirqs will be enabled:
@@ -4334,7 +4319,15 @@ void lockdep_reset_lock(struct lockdep_map *lock)
 	raw_local_irq_restore(flags);
 }
 
-void __init lockdep_info(void)
+void __init lockdep_init_early(void)
+{
+#ifdef CONFIG_PROVE_LOCKING
+	register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
+	register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
+#endif
+}
+
+void __init lockdep_init(void)
 {
 	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 092f7c4de903..4e9c2d254fba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3189,7 +3189,7 @@ static inline void sched_tick_stop(int cpu) { }
 #endif
 
 #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
-				defined(CONFIG_PREEMPT_TRACER))
+				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
 /*
  * If the value passed in is equal to the current preempt count
  * then we just disabled preemption. Start timing the latency.
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index c4f0f2e4126e..0bcba2a76ad9 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP
 	 Allow the use of ring_buffer_swap_cpu.
 	 Adds a very slight overhead to tracing when enabled.
 
+config PREEMPTIRQ_TRACEPOINTS
+	bool
+	depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
+	select TRACING
+	default y
+	help
+	  Create preempt/irq toggle tracepoints if needed, so that other parts
+	  of the kernel can use them to generate or add hooks to them.
+
 # All tracer options should select GENERIC_TRACER. For those options that are
 # enabled by all tracers (context switch and event tracer) they select TRACING.
 # This allows those options to appear when no other tracer is selected. But the
@@ -159,18 +168,20 @@ config FUNCTION_GRAPH_TRACER
 	  the return value. This is done by setting the current return
 	  address on the current task structure into a stack of calls.
 
+config TRACE_PREEMPT_TOGGLE
+	bool
+	help
+	  Enables hooks which will be called when preemption is first disabled,
+	  and last enabled.
 
 config PREEMPTIRQ_EVENTS
 	bool "Enable trace events for preempt and irq disable/enable"
 	select TRACE_IRQFLAGS
-	depends on DEBUG_PREEMPT || !PROVE_LOCKING
-	depends on TRACING
+	select TRACE_PREEMPT_TOGGLE if PREEMPT
+	select GENERIC_TRACER
 	default n
 	help
 	  Enable tracing of disable and enable events for preemption and irqs.
-	  For tracing preempt disable/enable events, DEBUG_PREEMPT must be
-	  enabled. For tracing irq disable/enable events, PROVE_LOCKING must
-	  be disabled.
 
 config IRQSOFF_TRACER
 	bool "Interrupts-off Latency Tracer"
@@ -207,6 +218,7 @@ config PREEMPT_TRACER
 	select RING_BUFFER_ALLOW_SWAP
 	select TRACER_SNAPSHOT
 	select TRACER_SNAPSHOT_PER_CPU_SWAP
+	select TRACE_PREEMPT_TOGGLE
 	help
 	  This option measures the time spent in preemption-off critical
 	  sections, with microsecond accuracy.
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e2538c7638d4..84a0cb222f20 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o
 obj-$(CONFIG_TRACING_MAP) += tracing_map.o
 obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
-obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o
+obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
 obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index f8daa754cce2..770cd30cda40 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -16,7 +16,6 @@
 
 #include "trace.h"
 
-#define CREATE_TRACE_POINTS
 #include <trace/events/preemptirq.h>
 
 #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER)
@@ -450,66 +449,6 @@ void stop_critical_timings(void)
 }
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 
-#ifdef CONFIG_IRQSOFF_TRACER
-#ifdef CONFIG_PROVE_LOCKING
-void time_hardirqs_on(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-void time_hardirqs_off(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(a0, a1);
-}
-
-#else /* !CONFIG_PROVE_LOCKING */
-
-/*
- * We are only interested in hardirq on/off events:
- */
-static inline void tracer_hardirqs_on(void)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_off(void)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-#endif /* CONFIG_PROVE_LOCKING */
-#endif /*  CONFIG_IRQSOFF_TRACER */
-
-#ifdef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		start_critical_timing(a0, a1);
-}
-#endif /* CONFIG_PREEMPT_TRACER */
-
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
@@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr)
 }
 
 #ifdef CONFIG_IRQSOFF_TRACER
+/*
+ * We are only interested in hardirq on/off events:
+ */
+static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int irqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void irqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_irqsoff(trace) register_tracer(&trace)
-#else
-# define register_irqsoff(trace) do { } while (0)
-#endif
+#endif /*  CONFIG_IRQSOFF_TRACER */
 
 #ifdef CONFIG_PREEMPT_TRACER
+static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int preemptoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_PREEMPT_OFF;
 
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_preemptoff(trace) register_tracer(&trace)
-#else
-# define register_preemptoff(trace) do { } while (0)
-#endif
+#endif /* CONFIG_PREEMPT_TRACER */
 
-#if defined(CONFIG_IRQSOFF_TRACER) && \
-	defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
 
 static int preemptirqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptirqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-
-# define register_preemptirqsoff(trace) register_tracer(&trace)
-#else
-# define register_preemptirqsoff(trace) do { } while (0)
 #endif
 
 __init static int init_irqsoff_tracer(void)
 {
-	register_irqsoff(irqsoff_tracer);
-	register_preemptoff(preemptoff_tracer);
-	register_preemptirqsoff(preemptirqsoff_tracer);
-
-	return 0;
-}
-core_initcall(init_irqsoff_tracer);
-#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
-
-#ifndef CONFIG_IRQSOFF_TRACER
-static inline void tracer_hardirqs_on(void) { }
-static inline void tracer_hardirqs_off(void) { }
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { }
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { }
+#ifdef CONFIG_IRQSOFF_TRACER
+	register_tracer(&irqsoff_tracer);
 #endif
-
-#ifndef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { }
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { }
+#ifdef CONFIG_PREEMPT_TRACER
+	register_tracer(&preemptoff_tracer);
 #endif
-
-#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING)
-/* Per-cpu variable to prevent redundant calls when IRQs already off */
-static DEFINE_PER_CPU(int, tracing_irq_cpu);
-
-void trace_hardirqs_on(void)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_on();
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
-
-void trace_hardirqs_off(void)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_off();
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
-
-__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_on_caller(caller_addr);
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_off_caller(caller_addr);
-}
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-/*
- * Stubs:
- */
-
-void trace_softirqs_on(unsigned long ip)
-{
-}
-
-void trace_softirqs_off(unsigned long ip)
-{
-}
-
-inline void print_irqtrace_events(struct task_struct *curr)
-{
-}
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
+	register_tracer(&preemptirqsoff_tracer);
 #endif
 
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
-void trace_preempt_on(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_enable_rcuidle(a0, a1);
-	tracer_preempt_on(a0, a1);
-}
-
-void trace_preempt_off(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_disable_rcuidle(a0, a1);
-	tracer_preempt_off(a0, a1);
+	return 0;
 }
-#endif
+core_initcall(init_irqsoff_tracer);
+#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
new file mode 100644
index 000000000000..dc01c7f4d326
--- /dev/null
+++ b/kernel/trace/trace_preemptirq.c
@@ -0,0 +1,71 @@
+/*
+ * preemptoff and irqoff tracepoints
+ *
+ * Copyright (C) Joel Fernandes (Google) <joel at joelfernandes.org>
+ */
+
+#include <linux/kallsyms.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/ftrace.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/preemptirq.h>
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+/* Per-cpu variable to prevent redundant calls when IRQs already off */
+static DEFINE_PER_CPU(int, tracing_irq_cpu);
+
+void trace_hardirqs_on(void)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on);
+
+void trace_hardirqs_off(void)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+}
+EXPORT_SYMBOL(trace_hardirqs_off);
+
+__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on_caller);
+
+__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+}
+EXPORT_SYMBOL(trace_hardirqs_off_caller);
+#endif /* CONFIG_TRACE_IRQFLAGS */
+
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
+
+void trace_preempt_on(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_enable_rcuidle(a0, a1);
+}
+
+void trace_preempt_off(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_disable_rcuidle(a0, a1);
+}
+#endif
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 6/8] tracing: Centralize preemptirq tracepoints and unify their usage
@ 2018-05-31  6:26       ` joel
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-31  6:26 UTC (permalink / raw)


On Thu, May 31, 2018@10:56:39AM +0900, Namhyung Kim wrote:
> On Tue, May 29, 2018@05:04:58PM -0700, Joel Fernandes wrote:
> > From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> > 
> > This patch detaches the preemptirq tracepoints from the tracers and
> > keeps it separate.
> > 
> > Advantages:
> > * Lockdep and irqsoff event can now run in parallel since they no longer
> > have their own calls.
> > 
> > * This unifies the usecase of adding hooks to an irqsoff and irqson
> > event, and a preemptoff and preempton event.
> >   3 users of the events exist:
> >   - Lockdep
> >   - irqsoff and preemptoff tracers
> >   - irqs and preempt trace events
> > 
> > The unification cleans up several ifdefs and makes the code in preempt
> > tracer and irqsoff tracers simpler. It gets rid of all the horrific
> > ifdeferry around PROVE_LOCKING and makes configuration of the different
> > users of the tracepoints more easy and understandable. It also gets rid
> > of the time_* function calls from the lockdep hooks used to call into
> > the preemptirq tracer which is not needed anymore. The negative delta in
> > lines of code in this patch is quite large too.
> > 
> > In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
> > as a single point for registering probes onto the tracepoints. With
> > this,
> > the web of config options for preempt/irq toggle tracepoints and its
> > users becomes:
> > 
> >  PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
> >        |                 |     \         |           |
> >        \    (selects)    /      \        \ (selects) /
> >       TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
> >                       \                  /
> >                        \ (depends on)   /
> >                      PREEMPTIRQ_TRACEPOINTS
> > 
> > One note, I have to check for lockdep recursion in the code that calls
> > the trace events API and bail out if we're in lockdep recursion
> > protection to prevent something like the following case: a spin_lock is
> > taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
> > and then sets lockdep_recursion, and then calls __lockdep_acquired. In
> > this function, a call to get_lock_stats happens which calls
> > preempt_disable, which calls trace IRQS off somewhere which enters my
> > tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
> > This flag is then never cleared causing lockdep paths to never be
> > entered and thus causing splats and other bad things.
> > 
> > Other than the performance tests mentioned in the previous patch, I also
> > ran the locking API test suite. I verified that all tests cases are
> > passing.
> > 
> > I also injected issues by not registering lockdep probes onto the
> > tracepoints and I see failures to confirm that the probes are indeed
> > working.
> > 
> > Without probes:
> > 
> > [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
> > [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > 
> > With probes:
> > 
> > [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
> > [    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
> > [    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> > [    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
> > [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
> 
> Reviewed-by: Namhyung Kim <namhyung at kernel.org>

Thanks a lot Namhyung for the review, Also I swapped the 'With probes' and
'Without probes' in the commit message above. Below is the updated patch with
commit message fixed up and your Reviewed-by.

Steve, Ingo, does this patch series look Ok to you?

---8<-----------------------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Mon, 16 Apr 2018 20:18:00 -0700
Subject: [PATCH v8.1 6/8] tracing: Centralize preemptirq tracepoints and unify their
 usage

This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.

Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.

* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
  3 users of the events exist:
  - Lockdep
  - irqsoff and preemptoff tracers
  - irqs and preempt trace events

The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.

In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:

 PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
       |                 |     \         |           |
       \    (selects)    /      \        \ (selects) /
      TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
                      \                  /
                       \ (depends on)   /
                     PREEMPTIRQ_TRACEPOINTS

One note, I have to check for lockdep recursion in the code that calls
the trace events API and bail out if we're in lockdep recursion
protection to prevent something like the following case: a spin_lock is
taken. Then lockdep_acquired is called.  That does a raw_local_irq_save
and then sets lockdep_recursion, and then calls __lockdep_acquired. In
this function, a call to get_lock_stats happens which calls
preempt_disable, which calls trace IRQS off somewhere which enters my
tracepoint code and sets the tracing_irq_cpu flag to prevent recursion.
This flag is then never cleared causing lockdep paths to never be
entered and thus causing splats and other bad things.

Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.

I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.

This series + lockdep probes not registered (just to inject errors):
[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

With this series + lockdep probes registered, all locking tests pass:

[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

Reviewed-by: Namhyung Kim <namhyung at kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/ftrace.h            |  11 +-
 include/linux/irqflags.h          |  11 +-
 include/linux/lockdep.h           |   8 +-
 include/linux/preempt.h           |   2 +-
 include/trace/events/preemptirq.h |  23 +--
 init/main.c                       |   5 +-
 kernel/locking/lockdep.c          |  35 ++---
 kernel/sched/core.c               |   2 +-
 kernel/trace/Kconfig              |  22 ++-
 kernel/trace/Makefile             |   2 +-
 kernel/trace/trace_irqsoff.c      | 231 ++++++++----------------------
 kernel/trace/trace_preemptirq.c   |  71 +++++++++
 12 files changed, 194 insertions(+), 229 deletions(-)
 create mode 100644 kernel/trace/trace_preemptirq.c

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9c3c9a319e48..5191030af0c0 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void)
 	return CALLER_ADDR2;
 }
 
-#ifdef CONFIG_IRQSOFF_TRACER
-  extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
-  extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
-#else
-  static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
-  static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
-#endif
-
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
   extern void trace_preempt_on(unsigned long a0, unsigned long a1);
   extern void trace_preempt_off(unsigned long a0, unsigned long a1);
 #else
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 9700f00bbc04..50edb9cbbd26 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -15,9 +15,16 @@
 #include <linux/typecheck.h>
 #include <asm/irqflags.h>
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+/* Currently trace_softirqs_on/off is used only by lockdep */
+#ifdef CONFIG_PROVE_LOCKING
   extern void trace_softirqs_on(unsigned long ip);
   extern void trace_softirqs_off(unsigned long ip);
+#else
+# define trace_softirqs_on(ip)	do { } while (0)
+# define trace_softirqs_off(ip)	do { } while (0)
+#endif
+
+#ifdef CONFIG_TRACE_IRQFLAGS
   extern void trace_hardirqs_on(void);
   extern void trace_hardirqs_off(void);
 # define trace_hardirq_context(p)	((p)->hardirq_context)
@@ -43,8 +50,6 @@ do {						\
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
-# define trace_softirqs_on(ip)		do { } while (0)
-# define trace_softirqs_off(ip)		do { } while (0)
 # define trace_hardirq_context(p)	0
 # define trace_softirq_context(p)	0
 # define trace_hardirqs_enabled(p)	0
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..a8113357ceeb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -266,7 +266,8 @@ struct held_lock {
 /*
  * Initialization, self-test and debugging-output methods:
  */
-extern void lockdep_info(void);
+extern void lockdep_init(void);
+extern void lockdep_init_early(void);
 extern void lockdep_reset(void);
 extern void lockdep_reset_lock(struct lockdep_map *lock);
 extern void lockdep_free_key_range(void *start, unsigned long size);
@@ -406,7 +407,8 @@ static inline void lockdep_on(void)
 # define lock_downgrade(l, i)			do { } while (0)
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
-# define lockdep_info()				do { } while (0)
+# define lockdep_init()				do { } while (0)
+# define lockdep_init_early()			do { } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
@@ -532,7 +534,7 @@ do {								\
 
 #endif /* CONFIG_LOCKDEP */
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_PROVE_LOCKING
 extern void print_irqtrace_events(struct task_struct *curr);
 #else
 static inline void print_irqtrace_events(struct task_struct *curr)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 5bd3f151da78..c01813c3fbe9 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -150,7 +150,7 @@
  */
 #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
 
-#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
 extern void preempt_count_add(int val);
 extern void preempt_count_sub(int val);
 #define preempt_count_dec_and_test() \
diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
index 9c4eb33c5a1d..9a0d4ceeb166 100644
--- a/include/trace/events/preemptirq.h
+++ b/include/trace/events/preemptirq.h
@@ -1,4 +1,4 @@
-#ifdef CONFIG_PREEMPTIRQ_EVENTS
+#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM preemptirq
@@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template,
 		  (void *)((unsigned long)(_stext) + __entry->parent_offs))
 );
 
-#ifndef CONFIG_PROVE_LOCKING
+#ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_EVENT(preemptirq_template, irq_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable,
 DEFINE_EVENT(preemptirq_template, irq_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_irq_enable(...)
+#define trace_irq_disable(...)
+#define trace_irq_enable_rcuidle(...)
+#define trace_irq_disable_rcuidle(...)
 #endif
 
-#ifdef CONFIG_DEBUG_PREEMPT
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
 DEFINE_EVENT(preemptirq_template, preempt_disable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
@@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable,
 DEFINE_EVENT(preemptirq_template, preempt_enable,
 	     TP_PROTO(unsigned long ip, unsigned long parent_ip),
 	     TP_ARGS(ip, parent_ip));
+#else
+#define trace_preempt_enable(...)
+#define trace_preempt_disable(...)
+#define trace_preempt_enable_rcuidle(...)
+#define trace_preempt_disable_rcuidle(...)
 #endif
 
 #endif /* _TRACE_PREEMPTIRQ_H */
 
 #include <trace/define_trace.h>
 
-#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
+#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */
 #define trace_irq_enable(...)
 #define trace_irq_disable(...)
 #define trace_irq_enable_rcuidle(...)
 #define trace_irq_disable_rcuidle(...)
-#endif
-
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
 #define trace_preempt_enable(...)
 #define trace_preempt_disable(...)
 #define trace_preempt_enable_rcuidle(...)
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..44fe43be84c1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void)
 	profile_init();
 	call_function_init();
 	WARN(!irqs_disabled(), "Interrupts were enabled early\n");
+
+	lockdep_init_early();
+
 	early_boot_irqs_disabled = false;
 	local_irq_enable();
 
@@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void)
 		panic("Too many boot %s vars at `%s'", panic_later,
 		      panic_param);
 
-	lockdep_info();
+	lockdep_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 023386338269..871a42232858 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -55,6 +55,7 @@
 
 #include "lockdep_internals.h"
 
+#include <trace/events/preemptirq.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/lock.h>
 
@@ -2841,10 +2842,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip)
 	debug_atomic_inc(hardirqs_on_events);
 }
 
-__visible void trace_hardirqs_on_caller(unsigned long ip)
+static void lockdep_hardirqs_on(void *none, unsigned long ignore,
+				unsigned long ip)
 {
-	time_hardirqs_on(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2883,23 +2883,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip)
 	__trace_hardirqs_on_caller(ip);
 	current->lockdep_recursion = 0;
 }
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-void trace_hardirqs_on(void)
-{
-	trace_hardirqs_on_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
 
 /*
  * Hardirqs were disabled:
  */
-__visible void trace_hardirqs_off_caller(unsigned long ip)
+static void lockdep_hardirqs_off(void *none, unsigned long ignore,
+				 unsigned long ip)
 {
 	struct task_struct *curr = current;
 
-	time_hardirqs_off(CALLER_ADDR0, ip);
-
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
@@ -2921,13 +2913,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip)
 	} else
 		debug_atomic_inc(redundant_hardirqs_off);
 }
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-void trace_hardirqs_off(void)
-{
-	trace_hardirqs_off_caller(CALLER_ADDR0);
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
 
 /*
  * Softirqs will be enabled:
@@ -4334,7 +4319,15 @@ void lockdep_reset_lock(struct lockdep_map *lock)
 	raw_local_irq_restore(flags);
 }
 
-void __init lockdep_info(void)
+void __init lockdep_init_early(void)
+{
+#ifdef CONFIG_PROVE_LOCKING
+	register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
+	register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
+#endif
+}
+
+void __init lockdep_init(void)
 {
 	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 092f7c4de903..4e9c2d254fba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3189,7 +3189,7 @@ static inline void sched_tick_stop(int cpu) { }
 #endif
 
 #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
-				defined(CONFIG_PREEMPT_TRACER))
+				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
 /*
  * If the value passed in is equal to the current preempt count
  * then we just disabled preemption. Start timing the latency.
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index c4f0f2e4126e..0bcba2a76ad9 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP
 	 Allow the use of ring_buffer_swap_cpu.
 	 Adds a very slight overhead to tracing when enabled.
 
+config PREEMPTIRQ_TRACEPOINTS
+	bool
+	depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
+	select TRACING
+	default y
+	help
+	  Create preempt/irq toggle tracepoints if needed, so that other parts
+	  of the kernel can use them to generate or add hooks to them.
+
 # All tracer options should select GENERIC_TRACER. For those options that are
 # enabled by all tracers (context switch and event tracer) they select TRACING.
 # This allows those options to appear when no other tracer is selected. But the
@@ -159,18 +168,20 @@ config FUNCTION_GRAPH_TRACER
 	  the return value. This is done by setting the current return
 	  address on the current task structure into a stack of calls.
 
+config TRACE_PREEMPT_TOGGLE
+	bool
+	help
+	  Enables hooks which will be called when preemption is first disabled,
+	  and last enabled.
 
 config PREEMPTIRQ_EVENTS
 	bool "Enable trace events for preempt and irq disable/enable"
 	select TRACE_IRQFLAGS
-	depends on DEBUG_PREEMPT || !PROVE_LOCKING
-	depends on TRACING
+	select TRACE_PREEMPT_TOGGLE if PREEMPT
+	select GENERIC_TRACER
 	default n
 	help
 	  Enable tracing of disable and enable events for preemption and irqs.
-	  For tracing preempt disable/enable events, DEBUG_PREEMPT must be
-	  enabled. For tracing irq disable/enable events, PROVE_LOCKING must
-	  be disabled.
 
 config IRQSOFF_TRACER
 	bool "Interrupts-off Latency Tracer"
@@ -207,6 +218,7 @@ config PREEMPT_TRACER
 	select RING_BUFFER_ALLOW_SWAP
 	select TRACER_SNAPSHOT
 	select TRACER_SNAPSHOT_PER_CPU_SWAP
+	select TRACE_PREEMPT_TOGGLE
 	help
 	  This option measures the time spent in preemption-off critical
 	  sections, with microsecond accuracy.
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e2538c7638d4..84a0cb222f20 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o
 obj-$(CONFIG_TRACING_MAP) += tracing_map.o
 obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
-obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o
+obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
 obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
 obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index f8daa754cce2..770cd30cda40 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -16,7 +16,6 @@
 
 #include "trace.h"
 
-#define CREATE_TRACE_POINTS
 #include <trace/events/preemptirq.h>
 
 #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER)
@@ -450,66 +449,6 @@ void stop_critical_timings(void)
 }
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 
-#ifdef CONFIG_IRQSOFF_TRACER
-#ifdef CONFIG_PROVE_LOCKING
-void time_hardirqs_on(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-void time_hardirqs_off(unsigned long a0, unsigned long a1)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(a0, a1);
-}
-
-#else /* !CONFIG_PROVE_LOCKING */
-
-/*
- * We are only interested in hardirq on/off events:
- */
-static inline void tracer_hardirqs_on(void)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_off(void)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (!preempt_trace() && irq_trace())
-		start_critical_timing(CALLER_ADDR0, caller_addr);
-}
-
-#endif /* CONFIG_PROVE_LOCKING */
-#endif /*  CONFIG_IRQSOFF_TRACER */
-
-#ifdef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		stop_critical_timing(a0, a1);
-}
-
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1)
-{
-	if (preempt_trace() && !irq_trace())
-		start_critical_timing(a0, a1);
-}
-#endif /* CONFIG_PREEMPT_TRACER */
-
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
@@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr)
 }
 
 #ifdef CONFIG_IRQSOFF_TRACER
+/*
+ * We are only interested in hardirq on/off events:
+ */
+static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (!preempt_trace() && irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int irqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void irqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_irqsoff(trace) register_tracer(&trace)
-#else
-# define register_irqsoff(trace) do { } while (0)
-#endif
+#endif /*  CONFIG_IRQSOFF_TRACER */
 
 #ifdef CONFIG_PREEMPT_TRACER
+static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		stop_critical_timing(a0, a1);
+}
+
+static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1)
+{
+	if (preempt_trace() && !irq_trace())
+		start_critical_timing(a0, a1);
+}
+
 static int preemptoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_PREEMPT_OFF;
 
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-# define register_preemptoff(trace) register_tracer(&trace)
-#else
-# define register_preemptoff(trace) do { } while (0)
-#endif
+#endif /* CONFIG_PREEMPT_TRACER */
 
-#if defined(CONFIG_IRQSOFF_TRACER) && \
-	defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
 
 static int preemptirqsoff_tracer_init(struct trace_array *tr)
 {
 	trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
 
+	register_trace_irq_disable(tracer_hardirqs_off, NULL);
+	register_trace_irq_enable(tracer_hardirqs_on, NULL);
+	register_trace_preempt_disable(tracer_preempt_off, NULL);
+	register_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	return __irqsoff_tracer_init(tr);
 }
 
 static void preemptirqsoff_tracer_reset(struct trace_array *tr)
 {
+	unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
+	unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
+	unregister_trace_preempt_disable(tracer_preempt_off, NULL);
+	unregister_trace_preempt_enable(tracer_preempt_on, NULL);
+
 	__irqsoff_tracer_reset(tr);
 }
 
@@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
 	.allow_instances = true,
 	.use_max_tr	= true,
 };
-
-# define register_preemptirqsoff(trace) register_tracer(&trace)
-#else
-# define register_preemptirqsoff(trace) do { } while (0)
 #endif
 
 __init static int init_irqsoff_tracer(void)
 {
-	register_irqsoff(irqsoff_tracer);
-	register_preemptoff(preemptoff_tracer);
-	register_preemptirqsoff(preemptirqsoff_tracer);
-
-	return 0;
-}
-core_initcall(init_irqsoff_tracer);
-#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
-
-#ifndef CONFIG_IRQSOFF_TRACER
-static inline void tracer_hardirqs_on(void) { }
-static inline void tracer_hardirqs_off(void) { }
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { }
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { }
+#ifdef CONFIG_IRQSOFF_TRACER
+	register_tracer(&irqsoff_tracer);
 #endif
-
-#ifndef CONFIG_PREEMPT_TRACER
-static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { }
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { }
+#ifdef CONFIG_PREEMPT_TRACER
+	register_tracer(&preemptoff_tracer);
 #endif
-
-#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING)
-/* Per-cpu variable to prevent redundant calls when IRQs already off */
-static DEFINE_PER_CPU(int, tracing_irq_cpu);
-
-void trace_hardirqs_on(void)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_on();
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on);
-
-void trace_hardirqs_off(void)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
-	tracer_hardirqs_off();
-}
-EXPORT_SYMBOL(trace_hardirqs_off);
-
-__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
-{
-	if (!this_cpu_read(tracing_irq_cpu))
-		return;
-
-	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_on_caller(caller_addr);
-
-	this_cpu_write(tracing_irq_cpu, 0);
-}
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-
-__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
-{
-	if (this_cpu_read(tracing_irq_cpu))
-		return;
-
-	this_cpu_write(tracing_irq_cpu, 1);
-
-	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
-	tracer_hardirqs_off_caller(caller_addr);
-}
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-
-/*
- * Stubs:
- */
-
-void trace_softirqs_on(unsigned long ip)
-{
-}
-
-void trace_softirqs_off(unsigned long ip)
-{
-}
-
-inline void print_irqtrace_events(struct task_struct *curr)
-{
-}
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
+	register_tracer(&preemptirqsoff_tracer);
 #endif
 
-#if defined(CONFIG_PREEMPT_TRACER) || \
-	(defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
-void trace_preempt_on(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_enable_rcuidle(a0, a1);
-	tracer_preempt_on(a0, a1);
-}
-
-void trace_preempt_off(unsigned long a0, unsigned long a1)
-{
-	trace_preempt_disable_rcuidle(a0, a1);
-	tracer_preempt_off(a0, a1);
+	return 0;
 }
-#endif
+core_initcall(init_irqsoff_tracer);
+#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
new file mode 100644
index 000000000000..dc01c7f4d326
--- /dev/null
+++ b/kernel/trace/trace_preemptirq.c
@@ -0,0 +1,71 @@
+/*
+ * preemptoff and irqoff tracepoints
+ *
+ * Copyright (C) Joel Fernandes (Google) <joel at joelfernandes.org>
+ */
+
+#include <linux/kallsyms.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/ftrace.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/preemptirq.h>
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+/* Per-cpu variable to prevent redundant calls when IRQs already off */
+static DEFINE_PER_CPU(int, tracing_irq_cpu);
+
+void trace_hardirqs_on(void)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on);
+
+void trace_hardirqs_off(void)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+}
+EXPORT_SYMBOL(trace_hardirqs_off);
+
+__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
+		return;
+
+	trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
+	this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on_caller);
+
+__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
+{
+	if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
+		return;
+
+	this_cpu_write(tracing_irq_cpu, 1);
+	trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+}
+EXPORT_SYMBOL(trace_hardirqs_off_caller);
+#endif /* CONFIG_TRACE_IRQFLAGS */
+
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
+
+void trace_preempt_on(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_enable_rcuidle(a0, a1);
+}
+
+void trace_preempt_off(unsigned long a0, unsigned long a1)
+{
+	trace_preempt_disable_rcuidle(a0, a1);
+}
+#endif
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 8/8] kselftests: Add tests for the preemptoff and irqsoff tracers
  2018-05-30  0:05   ` joelaf
  (?)
@ 2018-05-31  6:45     ` lkp
  -1 siblings, 0 replies; 60+ messages in thread
From: kbuild test robot @ 2018-05-31  6:45 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: kbuild-all, linux-kernel, kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

Hi Joel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.17-rc7]
[cannot apply to next-20180530]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Joel-Fernandes/Centralize-and-unify-usage-of-preempt-irq-tracepoints/20180531-093357


coccinelle warnings: (new ones prefixed by >>)

>> lib/test_atomic_sections.c:66:1-3: WARNING: PTR_ERR_OR_ZERO can be used

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH] kselftests: fix ptr_ret.cocci warnings
  2018-05-30  0:05   ` joelaf
  (?)
@ 2018-05-31  6:45     ` fengguang.wu
  -1 siblings, 0 replies; 60+ messages in thread
From: kbuild test robot @ 2018-05-31  6:45 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: kbuild-all, linux-kernel, kernel-team, Joel Fernandes (Google),
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

From: kbuild test robot <fengguang.wu@intel.com>

lib/test_atomic_sections.c:66:1-3: WARNING: PTR_ERR_OR_ZERO can be used


 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: 3b0e47f0ade1 ("kselftests: Add tests for the preemptoff and irqsoff tracers")
CC: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
---

 test_atomic_sections.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/lib/test_atomic_sections.c
+++ b/lib/test_atomic_sections.c
@@ -63,10 +63,7 @@ static int __init atomic_sect_init(void)
 	snprintf(task_name, 50, "%s dis test", atomic_mode);
 
 	test_task = kthread_run((void*)atomic_sect_run, NULL, task_name);
-	if (IS_ERR(test_task))
-		return PTR_ERR(test_task);
-
-	return 0;
+	return PTR_ERR_OR_ZERO(test_task);
 }
 
 static void __exit atomic_sect_exit(void)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 8/8] kselftests: Add tests for the preemptoff and irqsoff tracers
@ 2018-05-31  6:45     ` lkp
  0 siblings, 0 replies; 60+ messages in thread
From: lkp @ 2018-05-31  6:45 UTC (permalink / raw)


Hi Joel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.17-rc7]
[cannot apply to next-20180530]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Joel-Fernandes/Centralize-and-unify-usage-of-preempt-irq-tracepoints/20180531-093357


coccinelle warnings: (new ones prefixed by >>)

>> lib/test_atomic_sections.c:66:1-3: WARNING: PTR_ERR_OR_ZERO can be used

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH] kselftests: fix ptr_ret.cocci warnings
@ 2018-05-31  6:45     ` fengguang.wu
  0 siblings, 0 replies; 60+ messages in thread
From: fengguang.wu @ 2018-05-31  6:45 UTC (permalink / raw)


From: kbuild test robot <fengguang.wu at intel.com>

lib/test_atomic_sections.c:66:1-3: WARNING: PTR_ERR_OR_ZERO can be used


 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: 3b0e47f0ade1 ("kselftests: Add tests for the preemptoff and irqsoff tracers")
CC: Joel Fernandes (Google) <joel at joelfernandes.org>
Signed-off-by: kbuild test robot <fengguang.wu at intel.com>
---

 test_atomic_sections.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/lib/test_atomic_sections.c
+++ b/lib/test_atomic_sections.c
@@ -63,10 +63,7 @@ static int __init atomic_sect_init(void)
 	snprintf(task_name, 50, "%s dis test", atomic_mode);
 
 	test_task = kthread_run((void*)atomic_sect_run, NULL, task_name);
-	if (IS_ERR(test_task))
-		return PTR_ERR(test_task);
-
-	return 0;
+	return PTR_ERR_OR_ZERO(test_task);
 }
 
 static void __exit atomic_sect_exit(void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 8/8] kselftests: Add tests for the preemptoff and irqsoff tracers
@ 2018-05-31  6:45     ` lkp
  0 siblings, 0 replies; 60+ messages in thread
From: kbuild test robot @ 2018-05-31  6:45 UTC (permalink / raw)


Hi Joel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.17-rc7]
[cannot apply to next-20180530]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Joel-Fernandes/Centralize-and-unify-usage-of-preempt-irq-tracepoints/20180531-093357


coccinelle warnings: (new ones prefixed by >>)

>> lib/test_atomic_sections.c:66:1-3: WARNING: PTR_ERR_OR_ZERO can be used

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH] kselftests: fix ptr_ret.cocci warnings
@ 2018-05-31  6:45     ` fengguang.wu
  0 siblings, 0 replies; 60+ messages in thread
From: kbuild test robot @ 2018-05-31  6:45 UTC (permalink / raw)


From: kbuild test robot <fengguang.wu@intel.com>

lib/test_atomic_sections.c:66:1-3: WARNING: PTR_ERR_OR_ZERO can be used


 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: 3b0e47f0ade1 ("kselftests: Add tests for the preemptoff and irqsoff tracers")
CC: Joel Fernandes (Google) <joel at joelfernandes.org>
Signed-off-by: kbuild test robot <fengguang.wu at intel.com>
---

 test_atomic_sections.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/lib/test_atomic_sections.c
+++ b/lib/test_atomic_sections.c
@@ -63,10 +63,7 @@ static int __init atomic_sect_init(void)
 	snprintf(task_name, 50, "%s dis test", atomic_mode);
 
 	test_task = kthread_run((void*)atomic_sect_run, NULL, task_name);
-	if (IS_ERR(test_task))
-		return PTR_ERR(test_task);
-
-	return 0;
+	return PTR_ERR_OR_ZERO(test_task);
 }
 
 static void __exit atomic_sect_exit(void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-05-30  0:04   ` joelaf
  (?)
@ 2018-05-31  6:50     ` mathieu.desnoyers
  -1 siblings, 0 replies; 60+ messages in thread
From: Mathieu Desnoyers @ 2018-05-31  6:50 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, kernel-team, Joel Fernandes, Google, Boqun Feng,
	Byungchul Park, Erick Reyes, Ingo Molnar, Julia Cartwright,
	linux-kselftest, Masami Hiramatsu, Namhyung Kim,
	Paul E. McKenney, Peter Zijlstra, shuah, rostedt,
	Thomas Gleixner, Todd Kjos, Tom Zanussi

----- On May 30, 2018, at 2:04 AM, Joel Fernandes joelaf@google.com wrote:

> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
> 
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.
> 
> Test: Tested idle and preempt/irq tracepoints.
> 
> Here are some performance numbers:
> 
> With a run of the following 30 times on a single core x86 Qemu instance
> with 1GB memory:
> hackbench -g 4 -f 2 -l 3000
> 
> Completion times in seconds. CONFIG_PROVE_LOCKING=y.
> 
> No patches (without this series)
> Mean: 3.048
> Median: 3.025
> Std Dev: 0.064
> 
> With Lockdep using irq tracepoints with RCU implementation:
> Mean: 3.451   (-11.66 %)
> Median: 3.447 (-12.22%)
> Std Dev: 0.049
> 
> With Lockdep using irq tracepoints with SRCU implementation (this series):
> Mean: 3.020   (I would consider the improvement against the "without
>	       this series" case as just noise).
> Median: 3.013
> Std Dev: 0.033
> 
> [1] https://patchwork.kernel.org/patch/10344297/
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
> include/linux/tracepoint.h | 48 +++++++++++++++++++++++++++++++-------
> kernel/tracepoint.c        | 15 +++++++++++-
> 2 files changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..880794207921 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
>  */
> 
> #include <linux/smp.h>
> +#include <linux/srcu.h>
> #include <linux/errno.h>
> #include <linux/types.h>
> #include <linux/cpumask.h>
> @@ -33,6 +34,8 @@ struct trace_eval_map {
> 
> #define TRACEPOINT_DEFAULT_PRIO	10
> 
> +extern struct srcu_struct tracepoint_srcu;
> +
> extern int
> tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> extern int
> @@ -75,10 +78,15 @@ int unregister_tracepoint_module_notifier(struct
> notifier_block *nb)
>  * probe unregistration and the end of module exit to make sure there is no
>  * caller executing a probe when it is freed.
>  */
> +#ifdef CONFIG_TRACEPOINTS
> static inline void tracepoint_synchronize_unregister(void)
> {
> +	synchronize_srcu(&tracepoint_srcu);
> 	synchronize_sched();
> }
> +#else
> +static inline void tracepoint_synchronize_unregister(void) { }

In order to add some consistency to the style in this header file,
this empty function should look like:

static inline void tracepoint_synchronize_unregister(void)
{ }

(with newline before "{")

> +#endif
> 
> #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
> extern int syscall_regfunc(void);
> @@ -129,18 +137,38 @@ extern void syscall_unregfunc(void);
>  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
>  */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
> +#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
> 	do {								\
> 		struct tracepoint_func *it_func_ptr;			\
> 		void *it_func;						\
> 		void *__data;						\
> +		int __maybe_unused idx = 0;				\
> 									\
> 		if (!(cond))						\
> 			return;						\
> -		if (rcucheck)						\
> -			rcu_irq_enter_irqson();				\
> -		rcu_read_lock_sched_notrace();				\
> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> +									\
> +		/*							\
> +		 * For rcuidle callers, use srcu since sched-rcu	\
> +		 * doesn't work from the idle path.			\
> +		 */							\
> +		if (rcuidle) {						\
> +			if (in_nmi()) {					\
> +				WARN_ON_ONCE(1);			\
> +				return; /* no srcu from nmi */		\

I find it odd to have a "return" in a macro that consists of a
do { } while (0). I'm tempted to replace "return" by "break" here,
to break the macro do/while (0) loop.

> +			}						\
> +									\
> +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> +			it_func_ptr =					\
> +				srcu_dereference_notrace((tp)->funcs,	\
> +						&tracepoint_srcu);	\
> +			/* To keep it consistent with !rcuidle path */	\
> +			preempt_disable_notrace();			\
> +		} else {						\
> +			rcu_read_lock_sched_notrace();			\
> +			it_func_ptr =					\
> +				rcu_dereference_sched((tp)->funcs);	\
> +		}							\
> +									\
> 		if (it_func_ptr) {					\
> 			do {						\
> 				it_func = (it_func_ptr)->func;		\
> @@ -148,9 +176,13 @@ extern void syscall_unregfunc(void);
> 				((void(*)(proto))(it_func))(args);	\
> 			} while ((++it_func_ptr)->func);		\
> 		}							\
> -		rcu_read_unlock_sched_notrace();			\
> -		if (rcucheck)						\
> -			rcu_irq_exit_irqson();				\
> +									\
> +		if (rcuidle) {						\
> +			preempt_enable_notrace();			\
> +			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> +		} else {						\
> +			rcu_read_unlock_sched_notrace();		\
> +		}							\
> 	} while (0)
> 
> #ifndef MODULE
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 1e37da2e0c25..54157792f5ab 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -31,6 +31,9 @@
> extern struct tracepoint * const __start___tracepoints_ptrs[];
> extern struct tracepoint * const __stop___tracepoints_ptrs[];
> 
> +DEFINE_SRCU(tracepoint_srcu);
> +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> +
> /* Set to 1 to enable tracepoint debug output */
> static const int tracepoint_debug;
> 
> @@ -67,16 +70,26 @@ static inline void *allocate_probes(int count)
> 	return p == NULL ? NULL : p->probes;
> }
> 
> -static void rcu_free_old_probes(struct rcu_head *head)
> +static void srcu_free_old_probes(struct rcu_head *head)
> {
> 	kfree(container_of(head, struct tp_probes, rcu));
> }
> 
> +static void rcu_free_old_probes(struct rcu_head *head)
> +{
> +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> +}
> +
> static inline void release_probes(struct tracepoint_func *old)
> {
> 	if (old) {
> 		struct tp_probes *tp_probes = container_of(old,
> 			struct tp_probes, probes[0]);
> +		/*
> +		 * Tracepoint probes are protected by both sched RCU and SRCU,
> +		 * by calling the SRCU callback in the sched RCU callback we
> +		 * cover both cases. So lets chain the SRCU and RCU callbacks.

lets -> let's

But I would use a different wording for that sentence that does not include
"let's" which is short for "let us", e.g.

"Chain the SRCU and sched RCU callbacks to wait for both grace periods."

With the comments above taken care of, please add my:

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Thanks,

Mathieu

> +		 */
> 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
> 	}
> }
> --
> 2.17.0.921.gf22659ad46-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
@ 2018-05-31  6:50     ` mathieu.desnoyers
  0 siblings, 0 replies; 60+ messages in thread
From: mathieu.desnoyers @ 2018-05-31  6:50 UTC (permalink / raw)


----- On May 30, 2018, at 2:04 AM, Joel Fernandes joelaf at google.com wrote:

> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> 
> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
> 
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.
> 
> Test: Tested idle and preempt/irq tracepoints.
> 
> Here are some performance numbers:
> 
> With a run of the following 30 times on a single core x86 Qemu instance
> with 1GB memory:
> hackbench -g 4 -f 2 -l 3000
> 
> Completion times in seconds. CONFIG_PROVE_LOCKING=y.
> 
> No patches (without this series)
> Mean: 3.048
> Median: 3.025
> Std Dev: 0.064
> 
> With Lockdep using irq tracepoints with RCU implementation:
> Mean: 3.451   (-11.66 %)
> Median: 3.447 (-12.22%)
> Std Dev: 0.049
> 
> With Lockdep using irq tracepoints with SRCU implementation (this series):
> Mean: 3.020   (I would consider the improvement against the "without
>	       this series" case as just noise).
> Median: 3.013
> Std Dev: 0.033
> 
> [1] https://patchwork.kernel.org/patch/10344297/
> 
> Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
> ---
> include/linux/tracepoint.h | 48 +++++++++++++++++++++++++++++++-------
> kernel/tracepoint.c        | 15 +++++++++++-
> 2 files changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..880794207921 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
>  */
> 
> #include <linux/smp.h>
> +#include <linux/srcu.h>
> #include <linux/errno.h>
> #include <linux/types.h>
> #include <linux/cpumask.h>
> @@ -33,6 +34,8 @@ struct trace_eval_map {
> 
> #define TRACEPOINT_DEFAULT_PRIO	10
> 
> +extern struct srcu_struct tracepoint_srcu;
> +
> extern int
> tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> extern int
> @@ -75,10 +78,15 @@ int unregister_tracepoint_module_notifier(struct
> notifier_block *nb)
>  * probe unregistration and the end of module exit to make sure there is no
>  * caller executing a probe when it is freed.
>  */
> +#ifdef CONFIG_TRACEPOINTS
> static inline void tracepoint_synchronize_unregister(void)
> {
> +	synchronize_srcu(&tracepoint_srcu);
> 	synchronize_sched();
> }
> +#else
> +static inline void tracepoint_synchronize_unregister(void) { }

In order to add some consistency to the style in this header file,
this empty function should look like:

static inline void tracepoint_synchronize_unregister(void)
{ }

(with newline before "{")

> +#endif
> 
> #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
> extern int syscall_regfunc(void);
> @@ -129,18 +137,38 @@ extern void syscall_unregfunc(void);
>  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
>  */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
> +#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
> 	do {								\
> 		struct tracepoint_func *it_func_ptr;			\
> 		void *it_func;						\
> 		void *__data;						\
> +		int __maybe_unused idx = 0;				\
> 									\
> 		if (!(cond))						\
> 			return;						\
> -		if (rcucheck)						\
> -			rcu_irq_enter_irqson();				\
> -		rcu_read_lock_sched_notrace();				\
> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> +									\
> +		/*							\
> +		 * For rcuidle callers, use srcu since sched-rcu	\
> +		 * doesn't work from the idle path.			\
> +		 */							\
> +		if (rcuidle) {						\
> +			if (in_nmi()) {					\
> +				WARN_ON_ONCE(1);			\
> +				return; /* no srcu from nmi */		\

I find it odd to have a "return" in a macro that consists of a
do { } while (0). I'm tempted to replace "return" by "break" here,
to break the macro do/while (0) loop.

> +			}						\
> +									\
> +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> +			it_func_ptr =					\
> +				srcu_dereference_notrace((tp)->funcs,	\
> +						&tracepoint_srcu);	\
> +			/* To keep it consistent with !rcuidle path */	\
> +			preempt_disable_notrace();			\
> +		} else {						\
> +			rcu_read_lock_sched_notrace();			\
> +			it_func_ptr =					\
> +				rcu_dereference_sched((tp)->funcs);	\
> +		}							\
> +									\
> 		if (it_func_ptr) {					\
> 			do {						\
> 				it_func = (it_func_ptr)->func;		\
> @@ -148,9 +176,13 @@ extern void syscall_unregfunc(void);
> 				((void(*)(proto))(it_func))(args);	\
> 			} while ((++it_func_ptr)->func);		\
> 		}							\
> -		rcu_read_unlock_sched_notrace();			\
> -		if (rcucheck)						\
> -			rcu_irq_exit_irqson();				\
> +									\
> +		if (rcuidle) {						\
> +			preempt_enable_notrace();			\
> +			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> +		} else {						\
> +			rcu_read_unlock_sched_notrace();		\
> +		}							\
> 	} while (0)
> 
> #ifndef MODULE
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 1e37da2e0c25..54157792f5ab 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -31,6 +31,9 @@
> extern struct tracepoint * const __start___tracepoints_ptrs[];
> extern struct tracepoint * const __stop___tracepoints_ptrs[];
> 
> +DEFINE_SRCU(tracepoint_srcu);
> +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> +
> /* Set to 1 to enable tracepoint debug output */
> static const int tracepoint_debug;
> 
> @@ -67,16 +70,26 @@ static inline void *allocate_probes(int count)
> 	return p == NULL ? NULL : p->probes;
> }
> 
> -static void rcu_free_old_probes(struct rcu_head *head)
> +static void srcu_free_old_probes(struct rcu_head *head)
> {
> 	kfree(container_of(head, struct tp_probes, rcu));
> }
> 
> +static void rcu_free_old_probes(struct rcu_head *head)
> +{
> +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> +}
> +
> static inline void release_probes(struct tracepoint_func *old)
> {
> 	if (old) {
> 		struct tp_probes *tp_probes = container_of(old,
> 			struct tp_probes, probes[0]);
> +		/*
> +		 * Tracepoint probes are protected by both sched RCU and SRCU,
> +		 * by calling the SRCU callback in the sched RCU callback we
> +		 * cover both cases. So lets chain the SRCU and RCU callbacks.

lets -> let's

But I would use a different wording for that sentence that does not include
"let's" which is short for "let us", e.g.

"Chain the SRCU and sched RCU callbacks to wait for both grace periods."

With the comments above taken care of, please add my:

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>

Thanks,

Mathieu

> +		 */
> 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
> 	}
> }
> --
> 2.17.0.921.gf22659ad46-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
@ 2018-05-31  6:50     ` mathieu.desnoyers
  0 siblings, 0 replies; 60+ messages in thread
From: Mathieu Desnoyers @ 2018-05-31  6:50 UTC (permalink / raw)


----- On May 30, 2018,@2:04 AM, Joel Fernandes joelaf@google.com wrote:

> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> 
> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
> 
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.
> 
> Test: Tested idle and preempt/irq tracepoints.
> 
> Here are some performance numbers:
> 
> With a run of the following 30 times on a single core x86 Qemu instance
> with 1GB memory:
> hackbench -g 4 -f 2 -l 3000
> 
> Completion times in seconds. CONFIG_PROVE_LOCKING=y.
> 
> No patches (without this series)
> Mean: 3.048
> Median: 3.025
> Std Dev: 0.064
> 
> With Lockdep using irq tracepoints with RCU implementation:
> Mean: 3.451   (-11.66 %)
> Median: 3.447 (-12.22%)
> Std Dev: 0.049
> 
> With Lockdep using irq tracepoints with SRCU implementation (this series):
> Mean: 3.020   (I would consider the improvement against the "without
>	       this series" case as just noise).
> Median: 3.013
> Std Dev: 0.033
> 
> [1] https://patchwork.kernel.org/patch/10344297/
> 
> Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
> ---
> include/linux/tracepoint.h | 48 +++++++++++++++++++++++++++++++-------
> kernel/tracepoint.c        | 15 +++++++++++-
> 2 files changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..880794207921 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
>  */
> 
> #include <linux/smp.h>
> +#include <linux/srcu.h>
> #include <linux/errno.h>
> #include <linux/types.h>
> #include <linux/cpumask.h>
> @@ -33,6 +34,8 @@ struct trace_eval_map {
> 
> #define TRACEPOINT_DEFAULT_PRIO	10
> 
> +extern struct srcu_struct tracepoint_srcu;
> +
> extern int
> tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> extern int
> @@ -75,10 +78,15 @@ int unregister_tracepoint_module_notifier(struct
> notifier_block *nb)
>  * probe unregistration and the end of module exit to make sure there is no
>  * caller executing a probe when it is freed.
>  */
> +#ifdef CONFIG_TRACEPOINTS
> static inline void tracepoint_synchronize_unregister(void)
> {
> +	synchronize_srcu(&tracepoint_srcu);
> 	synchronize_sched();
> }
> +#else
> +static inline void tracepoint_synchronize_unregister(void) { }

In order to add some consistency to the style in this header file,
this empty function should look like:

static inline void tracepoint_synchronize_unregister(void)
{ }

(with newline before "{")

> +#endif
> 
> #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
> extern int syscall_regfunc(void);
> @@ -129,18 +137,38 @@ extern void syscall_unregfunc(void);
>  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
>  */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
> +#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
> 	do {								\
> 		struct tracepoint_func *it_func_ptr;			\
> 		void *it_func;						\
> 		void *__data;						\
> +		int __maybe_unused idx = 0;				\
> 									\
> 		if (!(cond))						\
> 			return;						\
> -		if (rcucheck)						\
> -			rcu_irq_enter_irqson();				\
> -		rcu_read_lock_sched_notrace();				\
> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> +									\
> +		/*							\
> +		 * For rcuidle callers, use srcu since sched-rcu	\
> +		 * doesn't work from the idle path.			\
> +		 */							\
> +		if (rcuidle) {						\
> +			if (in_nmi()) {					\
> +				WARN_ON_ONCE(1);			\
> +				return; /* no srcu from nmi */		\

I find it odd to have a "return" in a macro that consists of a
do { } while (0). I'm tempted to replace "return" by "break" here,
to break the macro do/while (0) loop.

> +			}						\
> +									\
> +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> +			it_func_ptr =					\
> +				srcu_dereference_notrace((tp)->funcs,	\
> +						&tracepoint_srcu);	\
> +			/* To keep it consistent with !rcuidle path */	\
> +			preempt_disable_notrace();			\
> +		} else {						\
> +			rcu_read_lock_sched_notrace();			\
> +			it_func_ptr =					\
> +				rcu_dereference_sched((tp)->funcs);	\
> +		}							\
> +									\
> 		if (it_func_ptr) {					\
> 			do {						\
> 				it_func = (it_func_ptr)->func;		\
> @@ -148,9 +176,13 @@ extern void syscall_unregfunc(void);
> 				((void(*)(proto))(it_func))(args);	\
> 			} while ((++it_func_ptr)->func);		\
> 		}							\
> -		rcu_read_unlock_sched_notrace();			\
> -		if (rcucheck)						\
> -			rcu_irq_exit_irqson();				\
> +									\
> +		if (rcuidle) {						\
> +			preempt_enable_notrace();			\
> +			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> +		} else {						\
> +			rcu_read_unlock_sched_notrace();		\
> +		}							\
> 	} while (0)
> 
> #ifndef MODULE
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 1e37da2e0c25..54157792f5ab 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -31,6 +31,9 @@
> extern struct tracepoint * const __start___tracepoints_ptrs[];
> extern struct tracepoint * const __stop___tracepoints_ptrs[];
> 
> +DEFINE_SRCU(tracepoint_srcu);
> +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> +
> /* Set to 1 to enable tracepoint debug output */
> static const int tracepoint_debug;
> 
> @@ -67,16 +70,26 @@ static inline void *allocate_probes(int count)
> 	return p == NULL ? NULL : p->probes;
> }
> 
> -static void rcu_free_old_probes(struct rcu_head *head)
> +static void srcu_free_old_probes(struct rcu_head *head)
> {
> 	kfree(container_of(head, struct tp_probes, rcu));
> }
> 
> +static void rcu_free_old_probes(struct rcu_head *head)
> +{
> +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> +}
> +
> static inline void release_probes(struct tracepoint_func *old)
> {
> 	if (old) {
> 		struct tp_probes *tp_probes = container_of(old,
> 			struct tp_probes, probes[0]);
> +		/*
> +		 * Tracepoint probes are protected by both sched RCU and SRCU,
> +		 * by calling the SRCU callback in the sched RCU callback we
> +		 * cover both cases. So lets chain the SRCU and RCU callbacks.

lets -> let's

But I would use a different wording for that sentence that does not include
"let's" which is short for "let us", e.g.

"Chain the SRCU and sched RCU callbacks to wait for both grace periods."

With the comments above taken care of, please add my:

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>

Thanks,

Mathieu

> +		 */
> 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
> 	}
> }
> --
> 2.17.0.921.gf22659ad46-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] kselftests: fix ptr_ret.cocci warnings
  2018-05-31  6:45     ` fengguang.wu
  (?)
@ 2018-05-31  7:14       ` joel
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-31  7:14 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Joel Fernandes, kbuild-all, linux-kernel, kernel-team,
	Boqun Feng, Byungchul Park, Erick Reyes, Ingo Molnar,
	Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

On Thu, May 31, 2018 at 02:45:35PM +0800, kbuild test robot wrote:
> From: kbuild test robot <fengguang.wu@intel.com>
> 
> lib/test_atomic_sections.c:66:1-3: WARNING: PTR_ERR_OR_ZERO can be used
> 
> 
>  Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
> 
> Generated by: scripts/coccinelle/api/ptr_ret.cocci
> 
> Fixes: 3b0e47f0ade1 ("kselftests: Add tests for the preemptoff and irqsoff tracers")

Seems a simple cosmetic/style fix. I can fold it into this patch. thanks,

 - Joel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH] kselftests: fix ptr_ret.cocci warnings
@ 2018-05-31  7:14       ` joel
  0 siblings, 0 replies; 60+ messages in thread
From: joel @ 2018-05-31  7:14 UTC (permalink / raw)


On Thu, May 31, 2018 at 02:45:35PM +0800, kbuild test robot wrote:
> From: kbuild test robot <fengguang.wu at intel.com>
> 
> lib/test_atomic_sections.c:66:1-3: WARNING: PTR_ERR_OR_ZERO can be used
> 
> 
>  Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
> 
> Generated by: scripts/coccinelle/api/ptr_ret.cocci
> 
> Fixes: 3b0e47f0ade1 ("kselftests: Add tests for the preemptoff and irqsoff tracers")

Seems a simple cosmetic/style fix. I can fold it into this patch. thanks,

 - Joel

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH] kselftests: fix ptr_ret.cocci warnings
@ 2018-05-31  7:14       ` joel
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-31  7:14 UTC (permalink / raw)


On Thu, May 31, 2018@02:45:35PM +0800, kbuild test robot wrote:
> From: kbuild test robot <fengguang.wu at intel.com>
> 
> lib/test_atomic_sections.c:66:1-3: WARNING: PTR_ERR_OR_ZERO can be used
> 
> 
>  Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
> 
> Generated by: scripts/coccinelle/api/ptr_ret.cocci
> 
> Fixes: 3b0e47f0ade1 ("kselftests: Add tests for the preemptoff and irqsoff tracers")

Seems a simple cosmetic/style fix. I can fold it into this patch. thanks,

 - Joel

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-05-31  6:50     ` mathieu.desnoyers
  (?)
@ 2018-05-31 17:51       ` joel
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-31 17:51 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Joel Fernandes, linux-kernel, kernel-team, Boqun Feng,
	Byungchul Park, Erick Reyes, Ingo Molnar, Julia Cartwright,
	linux-kselftest, Masami Hiramatsu, Namhyung Kim,
	Paul E. McKenney, Peter Zijlstra, shuah, rostedt,
	Thomas Gleixner, Todd Kjos, Tom Zanussi

On Thu, May 31, 2018 at 02:50:41AM -0400, Mathieu Desnoyers wrote:
> ----- On May 30, 2018, at 2:04 AM, Joel Fernandes joelaf@google.com wrote:
> 
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > 
> > In recent tests with IRQ on/off tracepoints, a large performance
> > overhead ~10% is noticed when running hackbench. This is root caused to
> > calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> > tracepoint code. Following a long discussion on the list [1] about this,
> > we concluded that srcu is a better alternative for use during rcu idle.
> > Although it does involve extra barriers, its lighter than the sched-rcu
> > version which has to do additional RCU calls to notify RCU idle about
> > entry into RCU sections.
> > 
> > In this patch, we change the underlying implementation of the
> > trace_*_rcuidle API to use SRCU. This has shown to improve performance
> > alot for the high frequency irq enable/disable tracepoints.
> > 
> > Test: Tested idle and preempt/irq tracepoints.
> > 
> > Here are some performance numbers:
> > 
> > With a run of the following 30 times on a single core x86 Qemu instance
> > with 1GB memory:
> > hackbench -g 4 -f 2 -l 3000
> > 
> > Completion times in seconds. CONFIG_PROVE_LOCKING=y.
> > 
> > No patches (without this series)
> > Mean: 3.048
> > Median: 3.025
> > Std Dev: 0.064
> > 
> > With Lockdep using irq tracepoints with RCU implementation:
> > Mean: 3.451   (-11.66 %)
> > Median: 3.447 (-12.22%)
> > Std Dev: 0.049
> > 
> > With Lockdep using irq tracepoints with SRCU implementation (this series):
> > Mean: 3.020   (I would consider the improvement against the "without
> >	       this series" case as just noise).
> > Median: 3.013
> > Std Dev: 0.033
> > 
> > [1] https://patchwork.kernel.org/patch/10344297/
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> > include/linux/tracepoint.h | 48 +++++++++++++++++++++++++++++++-------
> > kernel/tracepoint.c        | 15 +++++++++++-
> > 2 files changed, 54 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> > index c94f466d57ef..880794207921 100644
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -15,6 +15,7 @@
> >  */
> > 
> > #include <linux/smp.h>
> > +#include <linux/srcu.h>
> > #include <linux/errno.h>
> > #include <linux/types.h>
> > #include <linux/cpumask.h>
> > @@ -33,6 +34,8 @@ struct trace_eval_map {
> > 
> > #define TRACEPOINT_DEFAULT_PRIO	10
> > 
> > +extern struct srcu_struct tracepoint_srcu;
> > +
> > extern int
> > tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> > extern int
> > @@ -75,10 +78,15 @@ int unregister_tracepoint_module_notifier(struct
> > notifier_block *nb)
> >  * probe unregistration and the end of module exit to make sure there is no
> >  * caller executing a probe when it is freed.
> >  */
> > +#ifdef CONFIG_TRACEPOINTS
> > static inline void tracepoint_synchronize_unregister(void)
> > {
> > +	synchronize_srcu(&tracepoint_srcu);
> > 	synchronize_sched();
> > }
> > +#else
> > +static inline void tracepoint_synchronize_unregister(void) { }
> 
> In order to add some consistency to the style in this header file,
> this empty function should look like:
> 
> static inline void tracepoint_synchronize_unregister(void)
> { }
> 
> (with newline before "{")

Ok, I changed it to the style you're proposing.

> > +#endif
> > 
> > #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
> > extern int syscall_regfunc(void);
> > @@ -129,18 +137,38 @@ extern void syscall_unregfunc(void);
> >  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
> >  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
> >  */
> > -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
> > +#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
> > 	do {								\
> > 		struct tracepoint_func *it_func_ptr;			\
> > 		void *it_func;						\
> > 		void *__data;						\
> > +		int __maybe_unused idx = 0;				\
> > 									\
> > 		if (!(cond))						\
> > 			return;						\
> > -		if (rcucheck)						\
> > -			rcu_irq_enter_irqson();				\
> > -		rcu_read_lock_sched_notrace();				\
> > -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> > +									\
> > +		/*							\
> > +		 * For rcuidle callers, use srcu since sched-rcu	\
> > +		 * doesn't work from the idle path.			\
> > +		 */							\
> > +		if (rcuidle) {						\
> > +			if (in_nmi()) {					\
> > +				WARN_ON_ONCE(1);			\
> > +				return; /* no srcu from nmi */		\
> 
> I find it odd to have a "return" in a macro that consists of a
> do { } while (0). I'm tempted to replace "return" by "break" here,
> to break the macro do/while (0) loop.

"return;" is also used from "if (!(cond))" above so I prefer be consistent
and just use return than break as done above, but please let me know if you
still object.

> > +			}						\
> > +									\
> > +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> > +			it_func_ptr =					\
> > +				srcu_dereference_notrace((tp)->funcs,	\
> > +						&tracepoint_srcu);	\
> > +			/* To keep it consistent with !rcuidle path */	\
> > +			preempt_disable_notrace();			\
> > +		} else {						\
> > +			rcu_read_lock_sched_notrace();			\
> > +			it_func_ptr =					\
> > +				rcu_dereference_sched((tp)->funcs);	\
> > +		}							\
> > +									\
> > 		if (it_func_ptr) {					\
> > 			do {						\
> > 				it_func = (it_func_ptr)->func;		\
> > @@ -148,9 +176,13 @@ extern void syscall_unregfunc(void);
> > 				((void(*)(proto))(it_func))(args);	\
> > 			} while ((++it_func_ptr)->func);		\
> > 		}							\
> > -		rcu_read_unlock_sched_notrace();			\
> > -		if (rcucheck)						\
> > -			rcu_irq_exit_irqson();				\
> > +									\
> > +		if (rcuidle) {						\
> > +			preempt_enable_notrace();			\
> > +			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> > +		} else {						\
> > +			rcu_read_unlock_sched_notrace();		\
> > +		}							\
> > 	} while (0)
> > 
> > #ifndef MODULE
> > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> > index 1e37da2e0c25..54157792f5ab 100644
> > --- a/kernel/tracepoint.c
> > +++ b/kernel/tracepoint.c
> > @@ -31,6 +31,9 @@
> > extern struct tracepoint * const __start___tracepoints_ptrs[];
> > extern struct tracepoint * const __stop___tracepoints_ptrs[];
> > 
> > +DEFINE_SRCU(tracepoint_srcu);
> > +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> > +
> > /* Set to 1 to enable tracepoint debug output */
> > static const int tracepoint_debug;
> > 
> > @@ -67,16 +70,26 @@ static inline void *allocate_probes(int count)
> > 	return p == NULL ? NULL : p->probes;
> > }
> > 
> > -static void rcu_free_old_probes(struct rcu_head *head)
> > +static void srcu_free_old_probes(struct rcu_head *head)
> > {
> > 	kfree(container_of(head, struct tp_probes, rcu));
> > }
> > 
> > +static void rcu_free_old_probes(struct rcu_head *head)
> > +{
> > +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> > +}
> > +
> > static inline void release_probes(struct tracepoint_func *old)
> > {
> > 	if (old) {
> > 		struct tp_probes *tp_probes = container_of(old,
> > 			struct tp_probes, probes[0]);
> > +		/*
> > +		 * Tracepoint probes are protected by both sched RCU and SRCU,
> > +		 * by calling the SRCU callback in the sched RCU callback we
> > +		 * cover both cases. So lets chain the SRCU and RCU callbacks.
> 
> lets -> let's
> 
> But I would use a different wording for that sentence that does not include
> "let's" which is short for "let us", e.g.
> 
> "Chain the SRCU and sched RCU callbacks to wait for both grace periods."

Fixed, thanks.

> With the comments above taken care of, please add my:
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Will do, and thanks for the review!

Below is the updated patch.

 - Joel

---8<-----------------------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Thu, 26 Apr 2018 20:44:07 -0700
Subject: [PATCH v8.1] tracepoint: Make rcuidle tracepoint callers use SRCU

In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

Test: Tested idle and preempt/irq tracepoints.

Here are some performance numbers:

With a run of the following 30 times on a single core x86 Qemu instance
with 1GB memory:
hackbench -g 4 -f 2 -l 3000

Completion times in seconds. CONFIG_PROVE_LOCKING=y.

No patches (without this series)
Mean: 3.048
Median: 3.025
Std Dev: 0.064

With Lockdep using irq tracepoints with RCU implementation:
Mean: 3.451   (-11.66 %)
Median: 3.447 (-12.22%)
Std Dev: 0.049

With Lockdep using irq tracepoints with SRCU implementation (this series):
Mean: 3.020   (I would consider the improvement against the "without
	       this series" case as just noise).
Median: 3.013
Std Dev: 0.033

[1] https://patchwork.kernel.org/patch/10344297/

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/tracepoint.h | 49 +++++++++++++++++++++++++++++++-------
 kernel/tracepoint.c        | 16 ++++++++++++-
 2 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..9cf2882d2ef4 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/srcu.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -33,6 +34,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -75,10 +78,16 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
  * probe unregistration and the end of module exit to make sure there is no
  * caller executing a probe when it is freed.
  */
+#ifdef CONFIG_TRACEPOINTS
 static inline void tracepoint_synchronize_unregister(void)
 {
+	synchronize_srcu(&tracepoint_srcu);
 	synchronize_sched();
 }
+#else
+static inline void tracepoint_synchronize_unregister(void)
+{ }
+#endif
 
 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
 extern int syscall_regfunc(void);
@@ -129,18 +138,38 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
+#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
+		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
-		if (rcucheck)						\
-			rcu_irq_enter_irqson();				\
-		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+									\
+		/*							\
+		 * For rcuidle callers, use srcu since sched-rcu	\
+		 * doesn't work from the idle path.			\
+		 */							\
+		if (rcuidle) {						\
+			if (in_nmi()) {					\
+				WARN_ON_ONCE(1);			\
+				return; /* no srcu from nmi */		\
+			}						\
+									\
+			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
+			it_func_ptr =					\
+				srcu_dereference_notrace((tp)->funcs,	\
+						&tracepoint_srcu);	\
+			/* To keep it consistent with !rcuidle path */	\
+			preempt_disable_notrace();			\
+		} else {						\
+			rcu_read_lock_sched_notrace();			\
+			it_func_ptr =					\
+				rcu_dereference_sched((tp)->funcs);	\
+		}							\
+									\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -148,9 +177,13 @@ extern void syscall_unregfunc(void);
 				((void(*)(proto))(it_func))(args);	\
 			} while ((++it_func_ptr)->func);		\
 		}							\
-		rcu_read_unlock_sched_notrace();			\
-		if (rcucheck)						\
-			rcu_irq_exit_irqson();				\
+									\
+		if (rcuidle) {						\
+			preempt_enable_notrace();			\
+			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
+		} else {						\
+			rcu_read_unlock_sched_notrace();		\
+		}							\
 	} while (0)
 
 #ifndef MODULE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 1e37da2e0c25..b2e3be589aff 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -31,6 +31,9 @@
 extern struct tracepoint * const __start___tracepoints_ptrs[];
 extern struct tracepoint * const __stop___tracepoints_ptrs[];
 
+DEFINE_SRCU(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
 /* Set to 1 to enable tracepoint debug output */
 static const int tracepoint_debug;
 
@@ -67,16 +70,27 @@ static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
 {
 	kfree(container_of(head, struct tp_probes, rcu));
 }
 
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
 static inline void release_probes(struct tracepoint_func *old)
 {
 	if (old) {
 		struct tp_probes *tp_probes = container_of(old,
 			struct tp_probes, probes[0]);
+		/*
+		 * Tracepoint probes are protected by both sched RCU and SRCU,
+		 * by calling the SRCU callback in the sched RCU callback we
+		 * cover both cases. So let us chain the SRCU and sched RCU
+		 * callbacks to wait for both grace periods.
+		 */
 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
 	}
 }
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
@ 2018-05-31 17:51       ` joel
  0 siblings, 0 replies; 60+ messages in thread
From: joel @ 2018-05-31 17:51 UTC (permalink / raw)


On Thu, May 31, 2018 at 02:50:41AM -0400, Mathieu Desnoyers wrote:
> ----- On May 30, 2018, at 2:04 AM, Joel Fernandes joelaf at google.com wrote:
> 
> > From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> > 
> > In recent tests with IRQ on/off tracepoints, a large performance
> > overhead ~10% is noticed when running hackbench. This is root caused to
> > calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> > tracepoint code. Following a long discussion on the list [1] about this,
> > we concluded that srcu is a better alternative for use during rcu idle.
> > Although it does involve extra barriers, its lighter than the sched-rcu
> > version which has to do additional RCU calls to notify RCU idle about
> > entry into RCU sections.
> > 
> > In this patch, we change the underlying implementation of the
> > trace_*_rcuidle API to use SRCU. This has shown to improve performance
> > alot for the high frequency irq enable/disable tracepoints.
> > 
> > Test: Tested idle and preempt/irq tracepoints.
> > 
> > Here are some performance numbers:
> > 
> > With a run of the following 30 times on a single core x86 Qemu instance
> > with 1GB memory:
> > hackbench -g 4 -f 2 -l 3000
> > 
> > Completion times in seconds. CONFIG_PROVE_LOCKING=y.
> > 
> > No patches (without this series)
> > Mean: 3.048
> > Median: 3.025
> > Std Dev: 0.064
> > 
> > With Lockdep using irq tracepoints with RCU implementation:
> > Mean: 3.451   (-11.66 %)
> > Median: 3.447 (-12.22%)
> > Std Dev: 0.049
> > 
> > With Lockdep using irq tracepoints with SRCU implementation (this series):
> > Mean: 3.020   (I would consider the improvement against the "without
> >	       this series" case as just noise).
> > Median: 3.013
> > Std Dev: 0.033
> > 
> > [1] https://patchwork.kernel.org/patch/10344297/
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
> > ---
> > include/linux/tracepoint.h | 48 +++++++++++++++++++++++++++++++-------
> > kernel/tracepoint.c        | 15 +++++++++++-
> > 2 files changed, 54 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> > index c94f466d57ef..880794207921 100644
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -15,6 +15,7 @@
> >  */
> > 
> > #include <linux/smp.h>
> > +#include <linux/srcu.h>
> > #include <linux/errno.h>
> > #include <linux/types.h>
> > #include <linux/cpumask.h>
> > @@ -33,6 +34,8 @@ struct trace_eval_map {
> > 
> > #define TRACEPOINT_DEFAULT_PRIO	10
> > 
> > +extern struct srcu_struct tracepoint_srcu;
> > +
> > extern int
> > tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> > extern int
> > @@ -75,10 +78,15 @@ int unregister_tracepoint_module_notifier(struct
> > notifier_block *nb)
> >  * probe unregistration and the end of module exit to make sure there is no
> >  * caller executing a probe when it is freed.
> >  */
> > +#ifdef CONFIG_TRACEPOINTS
> > static inline void tracepoint_synchronize_unregister(void)
> > {
> > +	synchronize_srcu(&tracepoint_srcu);
> > 	synchronize_sched();
> > }
> > +#else
> > +static inline void tracepoint_synchronize_unregister(void) { }
> 
> In order to add some consistency to the style in this header file,
> this empty function should look like:
> 
> static inline void tracepoint_synchronize_unregister(void)
> { }
> 
> (with newline before "{")

Ok, I changed it to the style you're proposing.

> > +#endif
> > 
> > #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
> > extern int syscall_regfunc(void);
> > @@ -129,18 +137,38 @@ extern void syscall_unregfunc(void);
> >  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
> >  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
> >  */
> > -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
> > +#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
> > 	do {								\
> > 		struct tracepoint_func *it_func_ptr;			\
> > 		void *it_func;						\
> > 		void *__data;						\
> > +		int __maybe_unused idx = 0;				\
> > 									\
> > 		if (!(cond))						\
> > 			return;						\
> > -		if (rcucheck)						\
> > -			rcu_irq_enter_irqson();				\
> > -		rcu_read_lock_sched_notrace();				\
> > -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> > +									\
> > +		/*							\
> > +		 * For rcuidle callers, use srcu since sched-rcu	\
> > +		 * doesn't work from the idle path.			\
> > +		 */							\
> > +		if (rcuidle) {						\
> > +			if (in_nmi()) {					\
> > +				WARN_ON_ONCE(1);			\
> > +				return; /* no srcu from nmi */		\
> 
> I find it odd to have a "return" in a macro that consists of a
> do { } while (0). I'm tempted to replace "return" by "break" here,
> to break the macro do/while (0) loop.

"return;" is also used from "if (!(cond))" above so I prefer be consistent
and just use return than break as done above, but please let me know if you
still object.

> > +			}						\
> > +									\
> > +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> > +			it_func_ptr =					\
> > +				srcu_dereference_notrace((tp)->funcs,	\
> > +						&tracepoint_srcu);	\
> > +			/* To keep it consistent with !rcuidle path */	\
> > +			preempt_disable_notrace();			\
> > +		} else {						\
> > +			rcu_read_lock_sched_notrace();			\
> > +			it_func_ptr =					\
> > +				rcu_dereference_sched((tp)->funcs);	\
> > +		}							\
> > +									\
> > 		if (it_func_ptr) {					\
> > 			do {						\
> > 				it_func = (it_func_ptr)->func;		\
> > @@ -148,9 +176,13 @@ extern void syscall_unregfunc(void);
> > 				((void(*)(proto))(it_func))(args);	\
> > 			} while ((++it_func_ptr)->func);		\
> > 		}							\
> > -		rcu_read_unlock_sched_notrace();			\
> > -		if (rcucheck)						\
> > -			rcu_irq_exit_irqson();				\
> > +									\
> > +		if (rcuidle) {						\
> > +			preempt_enable_notrace();			\
> > +			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> > +		} else {						\
> > +			rcu_read_unlock_sched_notrace();		\
> > +		}							\
> > 	} while (0)
> > 
> > #ifndef MODULE
> > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> > index 1e37da2e0c25..54157792f5ab 100644
> > --- a/kernel/tracepoint.c
> > +++ b/kernel/tracepoint.c
> > @@ -31,6 +31,9 @@
> > extern struct tracepoint * const __start___tracepoints_ptrs[];
> > extern struct tracepoint * const __stop___tracepoints_ptrs[];
> > 
> > +DEFINE_SRCU(tracepoint_srcu);
> > +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> > +
> > /* Set to 1 to enable tracepoint debug output */
> > static const int tracepoint_debug;
> > 
> > @@ -67,16 +70,26 @@ static inline void *allocate_probes(int count)
> > 	return p == NULL ? NULL : p->probes;
> > }
> > 
> > -static void rcu_free_old_probes(struct rcu_head *head)
> > +static void srcu_free_old_probes(struct rcu_head *head)
> > {
> > 	kfree(container_of(head, struct tp_probes, rcu));
> > }
> > 
> > +static void rcu_free_old_probes(struct rcu_head *head)
> > +{
> > +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> > +}
> > +
> > static inline void release_probes(struct tracepoint_func *old)
> > {
> > 	if (old) {
> > 		struct tp_probes *tp_probes = container_of(old,
> > 			struct tp_probes, probes[0]);
> > +		/*
> > +		 * Tracepoint probes are protected by both sched RCU and SRCU,
> > +		 * by calling the SRCU callback in the sched RCU callback we
> > +		 * cover both cases. So lets chain the SRCU and RCU callbacks.
> 
> lets -> let's
> 
> But I would use a different wording for that sentence that does not include
> "let's" which is short for "let us", e.g.
> 
> "Chain the SRCU and sched RCU callbacks to wait for both grace periods."

Fixed, thanks.

> With the comments above taken care of, please add my:
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>

Will do, and thanks for the review!

Below is the updated patch.

 - Joel

---8<-----------------------

From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
Date: Thu, 26 Apr 2018 20:44:07 -0700
Subject: [PATCH v8.1] tracepoint: Make rcuidle tracepoint callers use SRCU

In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

Test: Tested idle and preempt/irq tracepoints.

Here are some performance numbers:

With a run of the following 30 times on a single core x86 Qemu instance
with 1GB memory:
hackbench -g 4 -f 2 -l 3000

Completion times in seconds. CONFIG_PROVE_LOCKING=y.

No patches (without this series)
Mean: 3.048
Median: 3.025
Std Dev: 0.064

With Lockdep using irq tracepoints with RCU implementation:
Mean: 3.451   (-11.66 %)
Median: 3.447 (-12.22%)
Std Dev: 0.049

With Lockdep using irq tracepoints with SRCU implementation (this series):
Mean: 3.020   (I would consider the improvement against the "without
	       this series" case as just noise).
Median: 3.013
Std Dev: 0.033

[1] https://patchwork.kernel.org/patch/10344297/

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/tracepoint.h | 49 +++++++++++++++++++++++++++++++-------
 kernel/tracepoint.c        | 16 ++++++++++++-
 2 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..9cf2882d2ef4 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/srcu.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -33,6 +34,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -75,10 +78,16 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
  * probe unregistration and the end of module exit to make sure there is no
  * caller executing a probe when it is freed.
  */
+#ifdef CONFIG_TRACEPOINTS
 static inline void tracepoint_synchronize_unregister(void)
 {
+	synchronize_srcu(&tracepoint_srcu);
 	synchronize_sched();
 }
+#else
+static inline void tracepoint_synchronize_unregister(void)
+{ }
+#endif
 
 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
 extern int syscall_regfunc(void);
@@ -129,18 +138,38 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
+#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
+		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
-		if (rcucheck)						\
-			rcu_irq_enter_irqson();				\
-		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+									\
+		/*							\
+		 * For rcuidle callers, use srcu since sched-rcu	\
+		 * doesn't work from the idle path.			\
+		 */							\
+		if (rcuidle) {						\
+			if (in_nmi()) {					\
+				WARN_ON_ONCE(1);			\
+				return; /* no srcu from nmi */		\
+			}						\
+									\
+			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
+			it_func_ptr =					\
+				srcu_dereference_notrace((tp)->funcs,	\
+						&tracepoint_srcu);	\
+			/* To keep it consistent with !rcuidle path */	\
+			preempt_disable_notrace();			\
+		} else {						\
+			rcu_read_lock_sched_notrace();			\
+			it_func_ptr =					\
+				rcu_dereference_sched((tp)->funcs);	\
+		}							\
+									\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -148,9 +177,13 @@ extern void syscall_unregfunc(void);
 				((void(*)(proto))(it_func))(args);	\
 			} while ((++it_func_ptr)->func);		\
 		}							\
-		rcu_read_unlock_sched_notrace();			\
-		if (rcucheck)						\
-			rcu_irq_exit_irqson();				\
+									\
+		if (rcuidle) {						\
+			preempt_enable_notrace();			\
+			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
+		} else {						\
+			rcu_read_unlock_sched_notrace();		\
+		}							\
 	} while (0)
 
 #ifndef MODULE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 1e37da2e0c25..b2e3be589aff 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -31,6 +31,9 @@
 extern struct tracepoint * const __start___tracepoints_ptrs[];
 extern struct tracepoint * const __stop___tracepoints_ptrs[];
 
+DEFINE_SRCU(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
 /* Set to 1 to enable tracepoint debug output */
 static const int tracepoint_debug;
 
@@ -67,16 +70,27 @@ static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
 {
 	kfree(container_of(head, struct tp_probes, rcu));
 }
 
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
 static inline void release_probes(struct tracepoint_func *old)
 {
 	if (old) {
 		struct tp_probes *tp_probes = container_of(old,
 			struct tp_probes, probes[0]);
+		/*
+		 * Tracepoint probes are protected by both sched RCU and SRCU,
+		 * by calling the SRCU callback in the sched RCU callback we
+		 * cover both cases. So let us chain the SRCU and sched RCU
+		 * callbacks to wait for both grace periods.
+		 */
 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
 	}
 }
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
@ 2018-05-31 17:51       ` joel
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-05-31 17:51 UTC (permalink / raw)


On Thu, May 31, 2018@02:50:41AM -0400, Mathieu Desnoyers wrote:
> ----- On May 30, 2018,@2:04 AM, Joel Fernandes joelaf@google.com wrote:
> 
> > From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> > 
> > In recent tests with IRQ on/off tracepoints, a large performance
> > overhead ~10% is noticed when running hackbench. This is root caused to
> > calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> > tracepoint code. Following a long discussion on the list [1] about this,
> > we concluded that srcu is a better alternative for use during rcu idle.
> > Although it does involve extra barriers, its lighter than the sched-rcu
> > version which has to do additional RCU calls to notify RCU idle about
> > entry into RCU sections.
> > 
> > In this patch, we change the underlying implementation of the
> > trace_*_rcuidle API to use SRCU. This has shown to improve performance
> > alot for the high frequency irq enable/disable tracepoints.
> > 
> > Test: Tested idle and preempt/irq tracepoints.
> > 
> > Here are some performance numbers:
> > 
> > With a run of the following 30 times on a single core x86 Qemu instance
> > with 1GB memory:
> > hackbench -g 4 -f 2 -l 3000
> > 
> > Completion times in seconds. CONFIG_PROVE_LOCKING=y.
> > 
> > No patches (without this series)
> > Mean: 3.048
> > Median: 3.025
> > Std Dev: 0.064
> > 
> > With Lockdep using irq tracepoints with RCU implementation:
> > Mean: 3.451   (-11.66 %)
> > Median: 3.447 (-12.22%)
> > Std Dev: 0.049
> > 
> > With Lockdep using irq tracepoints with SRCU implementation (this series):
> > Mean: 3.020   (I would consider the improvement against the "without
> >	       this series" case as just noise).
> > Median: 3.013
> > Std Dev: 0.033
> > 
> > [1] https://patchwork.kernel.org/patch/10344297/
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
> > ---
> > include/linux/tracepoint.h | 48 +++++++++++++++++++++++++++++++-------
> > kernel/tracepoint.c        | 15 +++++++++++-
> > 2 files changed, 54 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> > index c94f466d57ef..880794207921 100644
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -15,6 +15,7 @@
> >  */
> > 
> > #include <linux/smp.h>
> > +#include <linux/srcu.h>
> > #include <linux/errno.h>
> > #include <linux/types.h>
> > #include <linux/cpumask.h>
> > @@ -33,6 +34,8 @@ struct trace_eval_map {
> > 
> > #define TRACEPOINT_DEFAULT_PRIO	10
> > 
> > +extern struct srcu_struct tracepoint_srcu;
> > +
> > extern int
> > tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> > extern int
> > @@ -75,10 +78,15 @@ int unregister_tracepoint_module_notifier(struct
> > notifier_block *nb)
> >  * probe unregistration and the end of module exit to make sure there is no
> >  * caller executing a probe when it is freed.
> >  */
> > +#ifdef CONFIG_TRACEPOINTS
> > static inline void tracepoint_synchronize_unregister(void)
> > {
> > +	synchronize_srcu(&tracepoint_srcu);
> > 	synchronize_sched();
> > }
> > +#else
> > +static inline void tracepoint_synchronize_unregister(void) { }
> 
> In order to add some consistency to the style in this header file,
> this empty function should look like:
> 
> static inline void tracepoint_synchronize_unregister(void)
> { }
> 
> (with newline before "{")

Ok, I changed it to the style you're proposing.

> > +#endif
> > 
> > #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
> > extern int syscall_regfunc(void);
> > @@ -129,18 +137,38 @@ extern void syscall_unregfunc(void);
> >  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
> >  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
> >  */
> > -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
> > +#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
> > 	do {								\
> > 		struct tracepoint_func *it_func_ptr;			\
> > 		void *it_func;						\
> > 		void *__data;						\
> > +		int __maybe_unused idx = 0;				\
> > 									\
> > 		if (!(cond))						\
> > 			return;						\
> > -		if (rcucheck)						\
> > -			rcu_irq_enter_irqson();				\
> > -		rcu_read_lock_sched_notrace();				\
> > -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> > +									\
> > +		/*							\
> > +		 * For rcuidle callers, use srcu since sched-rcu	\
> > +		 * doesn't work from the idle path.			\
> > +		 */							\
> > +		if (rcuidle) {						\
> > +			if (in_nmi()) {					\
> > +				WARN_ON_ONCE(1);			\
> > +				return; /* no srcu from nmi */		\
> 
> I find it odd to have a "return" in a macro that consists of a
> do { } while (0). I'm tempted to replace "return" by "break" here,
> to break the macro do/while (0) loop.

"return;" is also used from "if (!(cond))" above so I prefer be consistent
and just use return than break as done above, but please let me know if you
still object.

> > +			}						\
> > +									\
> > +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> > +			it_func_ptr =					\
> > +				srcu_dereference_notrace((tp)->funcs,	\
> > +						&tracepoint_srcu);	\
> > +			/* To keep it consistent with !rcuidle path */	\
> > +			preempt_disable_notrace();			\
> > +		} else {						\
> > +			rcu_read_lock_sched_notrace();			\
> > +			it_func_ptr =					\
> > +				rcu_dereference_sched((tp)->funcs);	\
> > +		}							\
> > +									\
> > 		if (it_func_ptr) {					\
> > 			do {						\
> > 				it_func = (it_func_ptr)->func;		\
> > @@ -148,9 +176,13 @@ extern void syscall_unregfunc(void);
> > 				((void(*)(proto))(it_func))(args);	\
> > 			} while ((++it_func_ptr)->func);		\
> > 		}							\
> > -		rcu_read_unlock_sched_notrace();			\
> > -		if (rcucheck)						\
> > -			rcu_irq_exit_irqson();				\
> > +									\
> > +		if (rcuidle) {						\
> > +			preempt_enable_notrace();			\
> > +			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> > +		} else {						\
> > +			rcu_read_unlock_sched_notrace();		\
> > +		}							\
> > 	} while (0)
> > 
> > #ifndef MODULE
> > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> > index 1e37da2e0c25..54157792f5ab 100644
> > --- a/kernel/tracepoint.c
> > +++ b/kernel/tracepoint.c
> > @@ -31,6 +31,9 @@
> > extern struct tracepoint * const __start___tracepoints_ptrs[];
> > extern struct tracepoint * const __stop___tracepoints_ptrs[];
> > 
> > +DEFINE_SRCU(tracepoint_srcu);
> > +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> > +
> > /* Set to 1 to enable tracepoint debug output */
> > static const int tracepoint_debug;
> > 
> > @@ -67,16 +70,26 @@ static inline void *allocate_probes(int count)
> > 	return p == NULL ? NULL : p->probes;
> > }
> > 
> > -static void rcu_free_old_probes(struct rcu_head *head)
> > +static void srcu_free_old_probes(struct rcu_head *head)
> > {
> > 	kfree(container_of(head, struct tp_probes, rcu));
> > }
> > 
> > +static void rcu_free_old_probes(struct rcu_head *head)
> > +{
> > +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> > +}
> > +
> > static inline void release_probes(struct tracepoint_func *old)
> > {
> > 	if (old) {
> > 		struct tp_probes *tp_probes = container_of(old,
> > 			struct tp_probes, probes[0]);
> > +		/*
> > +		 * Tracepoint probes are protected by both sched RCU and SRCU,
> > +		 * by calling the SRCU callback in the sched RCU callback we
> > +		 * cover both cases. So lets chain the SRCU and RCU callbacks.
> 
> lets -> let's
> 
> But I would use a different wording for that sentence that does not include
> "let's" which is short for "let us", e.g.
> 
> "Chain the SRCU and sched RCU callbacks to wait for both grace periods."

Fixed, thanks.

> With the comments above taken care of, please add my:
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>

Will do, and thanks for the review!

Below is the updated patch.

 - Joel

---8<-----------------------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Thu, 26 Apr 2018 20:44:07 -0700
Subject: [PATCH v8.1] tracepoint: Make rcuidle tracepoint callers use SRCU

In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

Test: Tested idle and preempt/irq tracepoints.

Here are some performance numbers:

With a run of the following 30 times on a single core x86 Qemu instance
with 1GB memory:
hackbench -g 4 -f 2 -l 3000

Completion times in seconds. CONFIG_PROVE_LOCKING=y.

No patches (without this series)
Mean: 3.048
Median: 3.025
Std Dev: 0.064

With Lockdep using irq tracepoints with RCU implementation:
Mean: 3.451   (-11.66 %)
Median: 3.447 (-12.22%)
Std Dev: 0.049

With Lockdep using irq tracepoints with SRCU implementation (this series):
Mean: 3.020   (I would consider the improvement against the "without
	       this series" case as just noise).
Median: 3.013
Std Dev: 0.033

[1] https://patchwork.kernel.org/patch/10344297/

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 include/linux/tracepoint.h | 49 +++++++++++++++++++++++++++++++-------
 kernel/tracepoint.c        | 16 ++++++++++++-
 2 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..9cf2882d2ef4 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/srcu.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -33,6 +34,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -75,10 +78,16 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
  * probe unregistration and the end of module exit to make sure there is no
  * caller executing a probe when it is freed.
  */
+#ifdef CONFIG_TRACEPOINTS
 static inline void tracepoint_synchronize_unregister(void)
 {
+	synchronize_srcu(&tracepoint_srcu);
 	synchronize_sched();
 }
+#else
+static inline void tracepoint_synchronize_unregister(void)
+{ }
+#endif
 
 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
 extern int syscall_regfunc(void);
@@ -129,18 +138,38 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
+#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
+		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
-		if (rcucheck)						\
-			rcu_irq_enter_irqson();				\
-		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+									\
+		/*							\
+		 * For rcuidle callers, use srcu since sched-rcu	\
+		 * doesn't work from the idle path.			\
+		 */							\
+		if (rcuidle) {						\
+			if (in_nmi()) {					\
+				WARN_ON_ONCE(1);			\
+				return; /* no srcu from nmi */		\
+			}						\
+									\
+			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
+			it_func_ptr =					\
+				srcu_dereference_notrace((tp)->funcs,	\
+						&tracepoint_srcu);	\
+			/* To keep it consistent with !rcuidle path */	\
+			preempt_disable_notrace();			\
+		} else {						\
+			rcu_read_lock_sched_notrace();			\
+			it_func_ptr =					\
+				rcu_dereference_sched((tp)->funcs);	\
+		}							\
+									\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -148,9 +177,13 @@ extern void syscall_unregfunc(void);
 				((void(*)(proto))(it_func))(args);	\
 			} while ((++it_func_ptr)->func);		\
 		}							\
-		rcu_read_unlock_sched_notrace();			\
-		if (rcucheck)						\
-			rcu_irq_exit_irqson();				\
+									\
+		if (rcuidle) {						\
+			preempt_enable_notrace();			\
+			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
+		} else {						\
+			rcu_read_unlock_sched_notrace();		\
+		}							\
 	} while (0)
 
 #ifndef MODULE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 1e37da2e0c25..b2e3be589aff 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -31,6 +31,9 @@
 extern struct tracepoint * const __start___tracepoints_ptrs[];
 extern struct tracepoint * const __stop___tracepoints_ptrs[];
 
+DEFINE_SRCU(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
 /* Set to 1 to enable tracepoint debug output */
 static const int tracepoint_debug;
 
@@ -67,16 +70,27 @@ static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
 {
 	kfree(container_of(head, struct tp_probes, rcu));
 }
 
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
 static inline void release_probes(struct tracepoint_func *old)
 {
 	if (old) {
 		struct tp_probes *tp_probes = container_of(old,
 			struct tp_probes, probes[0]);
+		/*
+		 * Tracepoint probes are protected by both sched RCU and SRCU,
+		 * by calling the SRCU callback in the sched RCU callback we
+		 * cover both cases. So let us chain the SRCU and sched RCU
+		 * callbacks to wait for both grace periods.
+		 */
 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
 	}
 }
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
  2018-05-31 17:51       ` joel
  (?)
@ 2018-06-02  1:17         ` mathieu.desnoyers
  -1 siblings, 0 replies; 60+ messages in thread
From: Mathieu Desnoyers @ 2018-06-02  1:17 UTC (permalink / raw)
  To: Joel Fernandes, Google
  Cc: Joel Fernandes, linux-kernel, kernel-team, Boqun Feng,
	Byungchul Park, Erick Reyes, Ingo Molnar, Julia Cartwright,
	linux-kselftest, Masami Hiramatsu, Namhyung Kim,
	Paul E. McKenney, Peter Zijlstra, shuah, rostedt,
	Thomas Gleixner, Todd Kjos, Tom Zanussi

----- On May 31, 2018, at 1:51 PM, Joel Fernandes, Google joel@joelfernandes.org wrote:

>> I find it odd to have a "return" in a macro that consists of a
>> do { } while (0). I'm tempted to replace "return" by "break" here,
>> to break the macro do/while (0) loop.
> 
> "return;" is also used from "if (!(cond))" above so I prefer be consistent
> and just use return than break as done above, but please let me know if you
> still object.

It's fine by me,

Thanks!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
@ 2018-06-02  1:17         ` mathieu.desnoyers
  0 siblings, 0 replies; 60+ messages in thread
From: mathieu.desnoyers @ 2018-06-02  1:17 UTC (permalink / raw)


----- On May 31, 2018, at 1:51 PM, Joel Fernandes, Google joel at joelfernandes.org wrote:

>> I find it odd to have a "return" in a macro that consists of a
>> do { } while (0). I'm tempted to replace "return" by "break" here,
>> to break the macro do/while (0) loop.
> 
> "return;" is also used from "if (!(cond))" above so I prefer be consistent
> and just use return than break as done above, but please let me know if you
> still object.

It's fine by me,

Thanks!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU
@ 2018-06-02  1:17         ` mathieu.desnoyers
  0 siblings, 0 replies; 60+ messages in thread
From: Mathieu Desnoyers @ 2018-06-02  1:17 UTC (permalink / raw)


----- On May 31, 2018,@1:51 PM, Joel Fernandes, Google joel@joelfernandes.org wrote:

>> I find it odd to have a "return" in a macro that consists of a
>> do { } while (0). I'm tempted to replace "return" by "break" here,
>> to break the macro do/while (0) loop.
> 
> "return;" is also used from "if (!(cond))" above so I prefer be consistent
> and just use return than break as done above, but please let me know if you
> still object.

It's fine by me,

Thanks!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
  2018-05-30  0:04   ` joelaf
  (?)
@ 2018-06-05 23:01     ` joel
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-06-05 23:01 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, kernel-team, Boqun Feng, Byungchul Park,
	Erick Reyes, Ingo Molnar, Julia Cartwright, linux-kselftest,
	Masami Hiramatsu, Mathieu Desnoyers, Namhyung Kim, Paul McKenney,
	Peter Zijlstra, Shuah Khan, Steven Rostedt, Thomas Glexiner,
	Todd Kjos, Tom Zanussi

Hi,

On Tue, May 29, 2018 at 05:04:59PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> In this patch we introduce a test module for simulating a long atomic
> section in the kernel which the preemptoff or irqsoff tracers can
> detect. This module is to be used only for test purposes and is default
> disabled.
> 
> Following is the expected output (only briefly shown) that can be parsed
> to verify that the tracers are working correctly. We will use this from
> the kselftests in future patches.
> 
> For the preemptoff tracer:
> 
> echo preemptoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
> preempt -1066    2...2 500012us : <stack trace>
>  => kthread
>  => ret_from_fork
> 
> For the irqsoff tracer:
> 
> echo irqsoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> irq dis -1069    1d..1    0us@: atomic_sect_run
> irq dis -1069    1d..1 500001us : atomic_sect_run
> irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
> irq dis -1069    1d..1 500005us : <stack trace>
>  => ret_from_fork

Andy, previously made some suggestions to this patch. The updated version is
below and I am planning to send it along with this series as v9. I have
included it in advance below for your Review.

Andy, would you be Ok with adding your Reviewed-by to it?

---8<-----------------------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Wed, 16 May 2018 23:46:06 -0700
Subject: [PATCH v9 7/8] lib: Add module to simulate atomic sections for testing
 preemptoff tracers

In this patch we introduce a test module for simulating a long atomic
section in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
preempt -1066    2...2 500012us : <stack trace>
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -1069    1d..1    0us@: atomic_sect_run
irq dis -1069    1d..1 500001us : atomic_sect_run
irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
irq dis -1069    1d..1 500005us : <stack trace>
 => ret_from_fork

Co-developed-by: Erick Reyes <erickreyes@google.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 lib/Kconfig.debug          |  8 ++++
 lib/Makefile               |  1 +
 lib/test_atomic_sections.c | 77 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)
 create mode 100644 lib/test_atomic_sections.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..faebf0fe3bcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1933,6 +1933,14 @@ config TEST_KMOD
 
 	  If unsure, say N.
 
+config TEST_ATOMIC_SECTIONS
+	tristate "Simulate atomic sections for tracers to detect"
+	depends on m
+	help
+	  Select this option to build a test module that can help test atomic
+	  sections by simulating them with a duration supplied as a module
+	  parameter. Preempt disable and irq disable modes can be requested.
+
 config TEST_DEBUG_VIRTUAL
 	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
 	depends on DEBUG_VIRTUAL
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..e82cf5445b7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,6 +46,7 @@ obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
+obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
 obj-y += kstrtox.o
 obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
new file mode 100644
index 000000000000..1eef518f0974
--- /dev/null
+++ b/lib/test_atomic_sections.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Atomic section emulation test module
+ *
+ * Emulates atomic sections by disabling IRQs or preemption
+ * and doing a busy wait for a specified amount of time.
+ * This can be used for testing of different atomic section
+ * tracers such as irqsoff tracers.
+ *
+ * (c) 2018. Google LLC
+ */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+static ulong atomic_time = 100;
+static char atomic_mode[10] = "irq";
+
+module_param_named(atomic_time, atomic_time, ulong, S_IRUGO);
+module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
+MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
+MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)");
+
+static void busy_wait(ulong time)
+{
+	ktime_t start, end;
+	start = ktime_get();
+	do {
+		end = ktime_get();
+		if (kthread_should_stop())
+			break;
+	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
+}
+
+int atomic_sect_run(void *data)
+{
+	unsigned long flags;
+
+	if (!strcmp(atomic_mode, "irq")) {
+		local_irq_save(flags);
+		busy_wait(atomic_time);
+		local_irq_restore(flags);
+	} else if (!strcmp(atomic_mode, "preempt")) {
+		preempt_disable();
+		busy_wait(atomic_time);
+		preempt_enable();
+	}
+
+	return 0;
+}
+
+static int __init atomic_sect_init(void)
+{
+	char task_name[50];
+	struct task_struct *test_task;
+
+	snprintf(task_name, sizeof(task_name), "%s_test", atomic_mode);
+
+	test_task = kthread_run(atomic_sect_run, NULL, task_name);
+	return PTR_ERR_OR_ZERO(test_task);
+}
+
+static void __exit atomic_sect_exit(void)
+{
+	return;
+}
+
+module_init(atomic_sect_init)
+module_exit(atomic_sect_exit)
+MODULE_LICENSE("GPL v2");
-- 
2.17.1.1185.g55be947832-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
@ 2018-06-05 23:01     ` joel
  0 siblings, 0 replies; 60+ messages in thread
From: joel @ 2018-06-05 23:01 UTC (permalink / raw)


Hi,

On Tue, May 29, 2018 at 05:04:59PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> 
> In this patch we introduce a test module for simulating a long atomic
> section in the kernel which the preemptoff or irqsoff tracers can
> detect. This module is to be used only for test purposes and is default
> disabled.
> 
> Following is the expected output (only briefly shown) that can be parsed
> to verify that the tracers are working correctly. We will use this from
> the kselftests in future patches.
> 
> For the preemptoff tracer:
> 
> echo preemptoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
> preempt -1066    2...2 500012us : <stack trace>
>  => kthread
>  => ret_from_fork
> 
> For the irqsoff tracer:
> 
> echo irqsoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> irq dis -1069    1d..1    0us@: atomic_sect_run
> irq dis -1069    1d..1 500001us : atomic_sect_run
> irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
> irq dis -1069    1d..1 500005us : <stack trace>
>  => ret_from_fork

Andy, previously made some suggestions to this patch. The updated version is
below and I am planning to send it along with this series as v9. I have
included it in advance below for your Review.

Andy, would you be Ok with adding your Reviewed-by to it?

---8<-----------------------

From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
Date: Wed, 16 May 2018 23:46:06 -0700
Subject: [PATCH v9 7/8] lib: Add module to simulate atomic sections for testing
 preemptoff tracers

In this patch we introduce a test module for simulating a long atomic
section in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
preempt -1066    2...2 500012us : <stack trace>
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -1069    1d..1    0us@: atomic_sect_run
irq dis -1069    1d..1 500001us : atomic_sect_run
irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
irq dis -1069    1d..1 500005us : <stack trace>
 => ret_from_fork

Co-developed-by: Erick Reyes <erickreyes at google.com>
Cc: Andy Shevchenko <andriy.shevchenko at linux.intel.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 lib/Kconfig.debug          |  8 ++++
 lib/Makefile               |  1 +
 lib/test_atomic_sections.c | 77 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)
 create mode 100644 lib/test_atomic_sections.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..faebf0fe3bcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1933,6 +1933,14 @@ config TEST_KMOD
 
 	  If unsure, say N.
 
+config TEST_ATOMIC_SECTIONS
+	tristate "Simulate atomic sections for tracers to detect"
+	depends on m
+	help
+	  Select this option to build a test module that can help test atomic
+	  sections by simulating them with a duration supplied as a module
+	  parameter. Preempt disable and irq disable modes can be requested.
+
 config TEST_DEBUG_VIRTUAL
 	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
 	depends on DEBUG_VIRTUAL
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..e82cf5445b7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,6 +46,7 @@ obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
+obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
 obj-y += kstrtox.o
 obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
new file mode 100644
index 000000000000..1eef518f0974
--- /dev/null
+++ b/lib/test_atomic_sections.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Atomic section emulation test module
+ *
+ * Emulates atomic sections by disabling IRQs or preemption
+ * and doing a busy wait for a specified amount of time.
+ * This can be used for testing of different atomic section
+ * tracers such as irqsoff tracers.
+ *
+ * (c) 2018. Google LLC
+ */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+static ulong atomic_time = 100;
+static char atomic_mode[10] = "irq";
+
+module_param_named(atomic_time, atomic_time, ulong, S_IRUGO);
+module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
+MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
+MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)");
+
+static void busy_wait(ulong time)
+{
+	ktime_t start, end;
+	start = ktime_get();
+	do {
+		end = ktime_get();
+		if (kthread_should_stop())
+			break;
+	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
+}
+
+int atomic_sect_run(void *data)
+{
+	unsigned long flags;
+
+	if (!strcmp(atomic_mode, "irq")) {
+		local_irq_save(flags);
+		busy_wait(atomic_time);
+		local_irq_restore(flags);
+	} else if (!strcmp(atomic_mode, "preempt")) {
+		preempt_disable();
+		busy_wait(atomic_time);
+		preempt_enable();
+	}
+
+	return 0;
+}
+
+static int __init atomic_sect_init(void)
+{
+	char task_name[50];
+	struct task_struct *test_task;
+
+	snprintf(task_name, sizeof(task_name), "%s_test", atomic_mode);
+
+	test_task = kthread_run(atomic_sect_run, NULL, task_name);
+	return PTR_ERR_OR_ZERO(test_task);
+}
+
+static void __exit atomic_sect_exit(void)
+{
+	return;
+}
+
+module_init(atomic_sect_init)
+module_exit(atomic_sect_exit)
+MODULE_LICENSE("GPL v2");
-- 
2.17.1.1185.g55be947832-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
@ 2018-06-05 23:01     ` joel
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-06-05 23:01 UTC (permalink / raw)


Hi,

On Tue, May 29, 2018@05:04:59PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> 
> In this patch we introduce a test module for simulating a long atomic
> section in the kernel which the preemptoff or irqsoff tracers can
> detect. This module is to be used only for test purposes and is default
> disabled.
> 
> Following is the expected output (only briefly shown) that can be parsed
> to verify that the tracers are working correctly. We will use this from
> the kselftests in future patches.
> 
> For the preemptoff tracer:
> 
> echo preemptoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
> preempt -1066    2...2 500012us : <stack trace>
>  => kthread
>  => ret_from_fork
> 
> For the irqsoff tracer:
> 
> echo irqsoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> irq dis -1069    1d..1    0us@: atomic_sect_run
> irq dis -1069    1d..1 500001us : atomic_sect_run
> irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
> irq dis -1069    1d..1 500005us : <stack trace>
>  => ret_from_fork

Andy, previously made some suggestions to this patch. The updated version is
below and I am planning to send it along with this series as v9. I have
included it in advance below for your Review.

Andy, would you be Ok with adding your Reviewed-by to it?

---8<-----------------------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Wed, 16 May 2018 23:46:06 -0700
Subject: [PATCH v9 7/8] lib: Add module to simulate atomic sections for testing
 preemptoff tracers

In this patch we introduce a test module for simulating a long atomic
section in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
preempt -1066    2...2 500012us : <stack trace>
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -1069    1d..1    0us@: atomic_sect_run
irq dis -1069    1d..1 500001us : atomic_sect_run
irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
irq dis -1069    1d..1 500005us : <stack trace>
 => ret_from_fork

Co-developed-by: Erick Reyes <erickreyes at google.com>
Cc: Andy Shevchenko <andriy.shevchenko at linux.intel.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 lib/Kconfig.debug          |  8 ++++
 lib/Makefile               |  1 +
 lib/test_atomic_sections.c | 77 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)
 create mode 100644 lib/test_atomic_sections.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..faebf0fe3bcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1933,6 +1933,14 @@ config TEST_KMOD
 
 	  If unsure, say N.
 
+config TEST_ATOMIC_SECTIONS
+	tristate "Simulate atomic sections for tracers to detect"
+	depends on m
+	help
+	  Select this option to build a test module that can help test atomic
+	  sections by simulating them with a duration supplied as a module
+	  parameter. Preempt disable and irq disable modes can be requested.
+
 config TEST_DEBUG_VIRTUAL
 	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
 	depends on DEBUG_VIRTUAL
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..e82cf5445b7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,6 +46,7 @@ obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
+obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
 obj-y += kstrtox.o
 obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
new file mode 100644
index 000000000000..1eef518f0974
--- /dev/null
+++ b/lib/test_atomic_sections.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Atomic section emulation test module
+ *
+ * Emulates atomic sections by disabling IRQs or preemption
+ * and doing a busy wait for a specified amount of time.
+ * This can be used for testing of different atomic section
+ * tracers such as irqsoff tracers.
+ *
+ * (c) 2018. Google LLC
+ */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+static ulong atomic_time = 100;
+static char atomic_mode[10] = "irq";
+
+module_param_named(atomic_time, atomic_time, ulong, S_IRUGO);
+module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
+MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
+MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)");
+
+static void busy_wait(ulong time)
+{
+	ktime_t start, end;
+	start = ktime_get();
+	do {
+		end = ktime_get();
+		if (kthread_should_stop())
+			break;
+	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
+}
+
+int atomic_sect_run(void *data)
+{
+	unsigned long flags;
+
+	if (!strcmp(atomic_mode, "irq")) {
+		local_irq_save(flags);
+		busy_wait(atomic_time);
+		local_irq_restore(flags);
+	} else if (!strcmp(atomic_mode, "preempt")) {
+		preempt_disable();
+		busy_wait(atomic_time);
+		preempt_enable();
+	}
+
+	return 0;
+}
+
+static int __init atomic_sect_init(void)
+{
+	char task_name[50];
+	struct task_struct *test_task;
+
+	snprintf(task_name, sizeof(task_name), "%s_test", atomic_mode);
+
+	test_task = kthread_run(atomic_sect_run, NULL, task_name);
+	return PTR_ERR_OR_ZERO(test_task);
+}
+
+static void __exit atomic_sect_exit(void)
+{
+	return;
+}
+
+module_init(atomic_sect_init)
+module_exit(atomic_sect_exit)
+MODULE_LICENSE("GPL v2");
-- 
2.17.1.1185.g55be947832-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
  2018-05-30  0:04   ` joelaf
  (?)
@ 2018-06-05 23:53     ` joel
  -1 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-06-05 23:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Boqun Feng, Andy Shevchenko, Byungchul Park,
	Erick Reyes, Ingo Molnar, Julia Cartwright, linux-kselftest,
	Masami Hiramatsu, Mathieu Desnoyers, Namhyung Kim, Paul McKenney,
	Peter Zijlstra, Shuah Khan, Steven Rostedt, Thomas Glexiner,
	Todd Kjos, Tom Zanussi

(Resending since Andy wasn't on CC - sorry)

Hi,

On Tue, May 29, 2018 at 05:04:59PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> In this patch we introduce a test module for simulating a long atomic
> section in the kernel which the preemptoff or irqsoff tracers can
> detect. This module is to be used only for test purposes and is default
> disabled.
> 
> Following is the expected output (only briefly shown) that can be parsed
> to verify that the tracers are working correctly. We will use this from
> the kselftests in future patches.
> 
> For the preemptoff tracer:
> 
> echo preemptoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
> preempt -1066    2...2 500012us : <stack trace>
>  => kthread
>  => ret_from_fork
> 
> For the irqsoff tracer:
> 
> echo irqsoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> irq dis -1069    1d..1    0us@: atomic_sect_run
> irq dis -1069    1d..1 500001us : atomic_sect_run
> irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
> irq dis -1069    1d..1 500005us : <stack trace>
>  => ret_from_fork

Andy, previously made some suggestions to this patch. The updated version is
below and I am planning to send it along with this series as v9. I have
included it in advance below for your Review.

Andy, would you be Ok with adding your Reviewed-by to it?

---8<-----------------------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Wed, 16 May 2018 23:46:06 -0700
Subject: [PATCH v9 7/8] lib: Add module to simulate atomic sections for testing
 preemptoff tracers

In this patch we introduce a test module for simulating a long atomic
section in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
preempt -1066    2...2 500012us : <stack trace>
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -1069    1d..1    0us@: atomic_sect_run
irq dis -1069    1d..1 500001us : atomic_sect_run
irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
irq dis -1069    1d..1 500005us : <stack trace>
 => ret_from_fork

Co-developed-by: Erick Reyes <erickreyes@google.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 lib/Kconfig.debug          |  8 ++++
 lib/Makefile               |  1 +
 lib/test_atomic_sections.c | 77 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)
 create mode 100644 lib/test_atomic_sections.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..faebf0fe3bcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1933,6 +1933,14 @@ config TEST_KMOD
 
 	  If unsure, say N.
 
+config TEST_ATOMIC_SECTIONS
+	tristate "Simulate atomic sections for tracers to detect"
+	depends on m
+	help
+	  Select this option to build a test module that can help test atomic
+	  sections by simulating them with a duration supplied as a module
+	  parameter. Preempt disable and irq disable modes can be requested.
+
 config TEST_DEBUG_VIRTUAL
 	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
 	depends on DEBUG_VIRTUAL
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..e82cf5445b7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,6 +46,7 @@ obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
+obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
 obj-y += kstrtox.o
 obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
new file mode 100644
index 000000000000..1eef518f0974
--- /dev/null
+++ b/lib/test_atomic_sections.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Atomic section emulation test module
+ *
+ * Emulates atomic sections by disabling IRQs or preemption
+ * and doing a busy wait for a specified amount of time.
+ * This can be used for testing of different atomic section
+ * tracers such as irqsoff tracers.
+ *
+ * (c) 2018. Google LLC
+ */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+static ulong atomic_time = 100;
+static char atomic_mode[10] = "irq";
+
+module_param_named(atomic_time, atomic_time, ulong, S_IRUGO);
+module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
+MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
+MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)");
+
+static void busy_wait(ulong time)
+{
+	ktime_t start, end;
+	start = ktime_get();
+	do {
+		end = ktime_get();
+		if (kthread_should_stop())
+			break;
+	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
+}
+
+int atomic_sect_run(void *data)
+{
+	unsigned long flags;
+
+	if (!strcmp(atomic_mode, "irq")) {
+		local_irq_save(flags);
+		busy_wait(atomic_time);
+		local_irq_restore(flags);
+	} else if (!strcmp(atomic_mode, "preempt")) {
+		preempt_disable();
+		busy_wait(atomic_time);
+		preempt_enable();
+	}
+
+	return 0;
+}
+
+static int __init atomic_sect_init(void)
+{
+	char task_name[50];
+	struct task_struct *test_task;
+
+	snprintf(task_name, sizeof(task_name), "%s_test", atomic_mode);
+
+	test_task = kthread_run(atomic_sect_run, NULL, task_name);
+	return PTR_ERR_OR_ZERO(test_task);
+}
+
+static void __exit atomic_sect_exit(void)
+{
+	return;
+}
+
+module_init(atomic_sect_init)
+module_exit(atomic_sect_exit)
+MODULE_LICENSE("GPL v2");
-- 
2.17.1.1185.g55be947832-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
@ 2018-06-05 23:53     ` joel
  0 siblings, 0 replies; 60+ messages in thread
From: joel @ 2018-06-05 23:53 UTC (permalink / raw)


(Resending since Andy wasn't on CC - sorry)

Hi,

On Tue, May 29, 2018 at 05:04:59PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> 
> In this patch we introduce a test module for simulating a long atomic
> section in the kernel which the preemptoff or irqsoff tracers can
> detect. This module is to be used only for test purposes and is default
> disabled.
> 
> Following is the expected output (only briefly shown) that can be parsed
> to verify that the tracers are working correctly. We will use this from
> the kselftests in future patches.
> 
> For the preemptoff tracer:
> 
> echo preemptoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
> preempt -1066    2...2 500012us : <stack trace>
>  => kthread
>  => ret_from_fork
> 
> For the irqsoff tracer:
> 
> echo irqsoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> irq dis -1069    1d..1    0us@: atomic_sect_run
> irq dis -1069    1d..1 500001us : atomic_sect_run
> irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
> irq dis -1069    1d..1 500005us : <stack trace>
>  => ret_from_fork

Andy, previously made some suggestions to this patch. The updated version is
below and I am planning to send it along with this series as v9. I have
included it in advance below for your Review.

Andy, would you be Ok with adding your Reviewed-by to it?

---8<-----------------------

From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
Date: Wed, 16 May 2018 23:46:06 -0700
Subject: [PATCH v9 7/8] lib: Add module to simulate atomic sections for testing
 preemptoff tracers

In this patch we introduce a test module for simulating a long atomic
section in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
preempt -1066    2...2 500012us : <stack trace>
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -1069    1d..1    0us@: atomic_sect_run
irq dis -1069    1d..1 500001us : atomic_sect_run
irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
irq dis -1069    1d..1 500005us : <stack trace>
 => ret_from_fork

Co-developed-by: Erick Reyes <erickreyes at google.com>
Cc: Andy Shevchenko <andriy.shevchenko at linux.intel.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 lib/Kconfig.debug          |  8 ++++
 lib/Makefile               |  1 +
 lib/test_atomic_sections.c | 77 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)
 create mode 100644 lib/test_atomic_sections.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..faebf0fe3bcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1933,6 +1933,14 @@ config TEST_KMOD
 
 	  If unsure, say N.
 
+config TEST_ATOMIC_SECTIONS
+	tristate "Simulate atomic sections for tracers to detect"
+	depends on m
+	help
+	  Select this option to build a test module that can help test atomic
+	  sections by simulating them with a duration supplied as a module
+	  parameter. Preempt disable and irq disable modes can be requested.
+
 config TEST_DEBUG_VIRTUAL
 	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
 	depends on DEBUG_VIRTUAL
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..e82cf5445b7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,6 +46,7 @@ obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
+obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
 obj-y += kstrtox.o
 obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
new file mode 100644
index 000000000000..1eef518f0974
--- /dev/null
+++ b/lib/test_atomic_sections.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Atomic section emulation test module
+ *
+ * Emulates atomic sections by disabling IRQs or preemption
+ * and doing a busy wait for a specified amount of time.
+ * This can be used for testing of different atomic section
+ * tracers such as irqsoff tracers.
+ *
+ * (c) 2018. Google LLC
+ */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+static ulong atomic_time = 100;
+static char atomic_mode[10] = "irq";
+
+module_param_named(atomic_time, atomic_time, ulong, S_IRUGO);
+module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
+MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
+MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)");
+
+static void busy_wait(ulong time)
+{
+	ktime_t start, end;
+	start = ktime_get();
+	do {
+		end = ktime_get();
+		if (kthread_should_stop())
+			break;
+	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
+}
+
+int atomic_sect_run(void *data)
+{
+	unsigned long flags;
+
+	if (!strcmp(atomic_mode, "irq")) {
+		local_irq_save(flags);
+		busy_wait(atomic_time);
+		local_irq_restore(flags);
+	} else if (!strcmp(atomic_mode, "preempt")) {
+		preempt_disable();
+		busy_wait(atomic_time);
+		preempt_enable();
+	}
+
+	return 0;
+}
+
+static int __init atomic_sect_init(void)
+{
+	char task_name[50];
+	struct task_struct *test_task;
+
+	snprintf(task_name, sizeof(task_name), "%s_test", atomic_mode);
+
+	test_task = kthread_run(atomic_sect_run, NULL, task_name);
+	return PTR_ERR_OR_ZERO(test_task);
+}
+
+static void __exit atomic_sect_exit(void)
+{
+	return;
+}
+
+module_init(atomic_sect_init)
+module_exit(atomic_sect_exit)
+MODULE_LICENSE("GPL v2");
-- 
2.17.1.1185.g55be947832-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
@ 2018-06-05 23:53     ` joel
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2018-06-05 23:53 UTC (permalink / raw)


(Resending since Andy wasn't on CC - sorry)

Hi,

On Tue, May 29, 2018@05:04:59PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> 
> In this patch we introduce a test module for simulating a long atomic
> section in the kernel which the preemptoff or irqsoff tracers can
> detect. This module is to be used only for test purposes and is default
> disabled.
> 
> Following is the expected output (only briefly shown) that can be parsed
> to verify that the tracers are working correctly. We will use this from
> the kselftests in future patches.
> 
> For the preemptoff tracer:
> 
> echo preemptoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
> preempt -1066    2...2 500012us : <stack trace>
>  => kthread
>  => ret_from_fork
> 
> For the irqsoff tracer:
> 
> echo irqsoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> irq dis -1069    1d..1    0us@: atomic_sect_run
> irq dis -1069    1d..1 500001us : atomic_sect_run
> irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
> irq dis -1069    1d..1 500005us : <stack trace>
>  => ret_from_fork

Andy, previously made some suggestions to this patch. The updated version is
below and I am planning to send it along with this series as v9. I have
included it in advance below for your Review.

Andy, would you be Ok with adding your Reviewed-by to it?

---8<-----------------------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Wed, 16 May 2018 23:46:06 -0700
Subject: [PATCH v9 7/8] lib: Add module to simulate atomic sections for testing
 preemptoff tracers

In this patch we introduce a test module for simulating a long atomic
section in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
preempt -1066    2...2 500012us : <stack trace>
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -1069    1d..1    0us@: atomic_sect_run
irq dis -1069    1d..1 500001us : atomic_sect_run
irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
irq dis -1069    1d..1 500005us : <stack trace>
 => ret_from_fork

Co-developed-by: Erick Reyes <erickreyes at google.com>
Cc: Andy Shevchenko <andriy.shevchenko at linux.intel.com>
Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
---
 lib/Kconfig.debug          |  8 ++++
 lib/Makefile               |  1 +
 lib/test_atomic_sections.c | 77 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)
 create mode 100644 lib/test_atomic_sections.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..faebf0fe3bcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1933,6 +1933,14 @@ config TEST_KMOD
 
 	  If unsure, say N.
 
+config TEST_ATOMIC_SECTIONS
+	tristate "Simulate atomic sections for tracers to detect"
+	depends on m
+	help
+	  Select this option to build a test module that can help test atomic
+	  sections by simulating them with a duration supplied as a module
+	  parameter. Preempt disable and irq disable modes can be requested.
+
 config TEST_DEBUG_VIRTUAL
 	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
 	depends on DEBUG_VIRTUAL
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..e82cf5445b7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,6 +46,7 @@ obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
+obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
 obj-y += kstrtox.o
 obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
new file mode 100644
index 000000000000..1eef518f0974
--- /dev/null
+++ b/lib/test_atomic_sections.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Atomic section emulation test module
+ *
+ * Emulates atomic sections by disabling IRQs or preemption
+ * and doing a busy wait for a specified amount of time.
+ * This can be used for testing of different atomic section
+ * tracers such as irqsoff tracers.
+ *
+ * (c) 2018. Google LLC
+ */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+static ulong atomic_time = 100;
+static char atomic_mode[10] = "irq";
+
+module_param_named(atomic_time, atomic_time, ulong, S_IRUGO);
+module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
+MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
+MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)");
+
+static void busy_wait(ulong time)
+{
+	ktime_t start, end;
+	start = ktime_get();
+	do {
+		end = ktime_get();
+		if (kthread_should_stop())
+			break;
+	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
+}
+
+int atomic_sect_run(void *data)
+{
+	unsigned long flags;
+
+	if (!strcmp(atomic_mode, "irq")) {
+		local_irq_save(flags);
+		busy_wait(atomic_time);
+		local_irq_restore(flags);
+	} else if (!strcmp(atomic_mode, "preempt")) {
+		preempt_disable();
+		busy_wait(atomic_time);
+		preempt_enable();
+	}
+
+	return 0;
+}
+
+static int __init atomic_sect_init(void)
+{
+	char task_name[50];
+	struct task_struct *test_task;
+
+	snprintf(task_name, sizeof(task_name), "%s_test", atomic_mode);
+
+	test_task = kthread_run(atomic_sect_run, NULL, task_name);
+	return PTR_ERR_OR_ZERO(test_task);
+}
+
+static void __exit atomic_sect_exit(void)
+{
+	return;
+}
+
+module_init(atomic_sect_init)
+module_exit(atomic_sect_exit)
+MODULE_LICENSE("GPL v2");
-- 
2.17.1.1185.g55be947832-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
  2018-06-05 23:53     ` joel
  (?)
@ 2018-06-06  7:48       ` andriy.shevchenko
  -1 siblings, 0 replies; 60+ messages in thread
From: Andy Shevchenko @ 2018-06-06  7:48 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel
  Cc: kernel-team, Boqun Feng, Byungchul Park, Erick Reyes,
	Ingo Molnar, Julia Cartwright, linux-kselftest, Masami Hiramatsu,
	Mathieu Desnoyers, Namhyung Kim, Paul McKenney, Peter Zijlstra,
	Shuah Khan, Steven Rostedt, Thomas Glexiner, Todd Kjos,
	Tom Zanussi

On Tue, 2018-06-05 at 16:53 -0700, Joel Fernandes wrote:
> (Resending since Andy wasn't on CC - sorry)

> Andy, previously made some suggestions to this patch. The updated
> version is
> below and I am planning to send it along with this series as v9. I
> have
> included it in advance below for your Review.
> 
> Andy, would you be Ok with adding your Reviewed-by to it?
> 

FWIW,
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

P.S. I'm not familiar with the topic so much to have any insights about
implementation, though.

> ---8<-----------------------
> 
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> Date: Wed, 16 May 2018 23:46:06 -0700
> Subject: [PATCH v9 7/8] lib: Add module to simulate atomic sections
> for testing
>  preemptoff tracers
> 
> In this patch we introduce a test module for simulating a long atomic
> section in the kernel which the preemptoff or irqsoff tracers can
> detect. This module is to be used only for test purposes and is
> default
> disabled.
> 
> Following is the expected output (only briefly shown) that can be
> parsed
> to verify that the tracers are working correctly. We will use this
> from
> the kselftests in future patches.
> 
> For the preemptoff tracer:
> 
> echo preemptoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=preempt
> atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
> preempt -1066    2...2 500012us : <stack trace>
>  => kthread
>  => ret_from_fork
> 
> For the irqsoff tracer:
> 
> echo irqsoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> irq dis -1069    1d..1    0us@: atomic_sect_run
> irq dis -1069    1d..1 500001us : atomic_sect_run
> irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
> irq dis -1069    1d..1 500005us : <stack trace>
>  => ret_from_fork
> 
> Co-developed-by: Erick Reyes <erickreyes@google.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  lib/Kconfig.debug          |  8 ++++
>  lib/Makefile               |  1 +
>  lib/test_atomic_sections.c | 77
> ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 86 insertions(+)
>  create mode 100644 lib/test_atomic_sections.c
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c40c7b734cd1..faebf0fe3bcf 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1933,6 +1933,14 @@ config TEST_KMOD
>  
>  	  If unsure, say N.
>  
> +config TEST_ATOMIC_SECTIONS
> +	tristate "Simulate atomic sections for tracers to detect"
> +	depends on m
> +	help
> +	  Select this option to build a test module that can help
> test atomic
> +	  sections by simulating them with a duration supplied as a
> module
> +	  parameter. Preempt disable and irq disable modes can be
> requested.
> +
>  config TEST_DEBUG_VIRTUAL
>  	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
>  	depends on DEBUG_VIRTUAL
> diff --git a/lib/Makefile b/lib/Makefile
> index ce20696d5a92..e82cf5445b7b 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -46,6 +46,7 @@ obj-y += string_helpers.o
>  obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
>  obj-y += hexdump.o
>  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> +obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
>  obj-y += kstrtox.o
>  obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
>  obj-$(CONFIG_TEST_BPF) += test_bpf.o
> diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
> new file mode 100644
> index 000000000000..1eef518f0974
> --- /dev/null
> +++ b/lib/test_atomic_sections.c
> @@ -0,0 +1,77 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Atomic section emulation test module
> + *
> + * Emulates atomic sections by disabling IRQs or preemption
> + * and doing a busy wait for a specified amount of time.
> + * This can be used for testing of different atomic section
> + * tracers such as irqsoff tracers.
> + *
> + * (c) 2018. Google LLC
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/kernel.h>
> +#include <linux/kthread.h>
> +#include <linux/ktime.h>
> +#include <linux/module.h>
> +#include <linux/printk.h>
> +#include <linux/string.h>
> +
> +static ulong atomic_time = 100;
> +static char atomic_mode[10] = "irq";
> +
> +module_param_named(atomic_time, atomic_time, ulong, S_IRUGO);
> +module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
> +MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS
> default)");
> +MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or
> irq (default irq)");
> +
> +static void busy_wait(ulong time)
> +{
> +	ktime_t start, end;
> +	start = ktime_get();
> +	do {
> +		end = ktime_get();
> +		if (kthread_should_stop())
> +			break;
> +	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
> +}
> +
> +int atomic_sect_run(void *data)
> +{
> +	unsigned long flags;
> +
> +	if (!strcmp(atomic_mode, "irq")) {
> +		local_irq_save(flags);
> +		busy_wait(atomic_time);
> +		local_irq_restore(flags);
> +	} else if (!strcmp(atomic_mode, "preempt")) {
> +		preempt_disable();
> +		busy_wait(atomic_time);
> +		preempt_enable();
> +	}
> +
> +	return 0;
> +}
> +
> +static int __init atomic_sect_init(void)
> +{
> +	char task_name[50];
> +	struct task_struct *test_task;
> +
> +	snprintf(task_name, sizeof(task_name), "%s_test",
> atomic_mode);
> +
> +	test_task = kthread_run(atomic_sect_run, NULL, task_name);
> +	return PTR_ERR_OR_ZERO(test_task);
> +}
> +
> +static void __exit atomic_sect_exit(void)
> +{
> +	return;
> +}
> +
> +module_init(atomic_sect_init)
> +module_exit(atomic_sect_exit)
> +MODULE_LICENSE("GPL v2");

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
@ 2018-06-06  7:48       ` andriy.shevchenko
  0 siblings, 0 replies; 60+ messages in thread
From: andriy.shevchenko @ 2018-06-06  7:48 UTC (permalink / raw)


On Tue, 2018-06-05 at 16:53 -0700, Joel Fernandes wrote:
> (Resending since Andy wasn't on CC - sorry)

> Andy, previously made some suggestions to this patch. The updated
> version is
> below and I am planning to send it along with this series as v9. I
> have
> included it in advance below for your Review.
> 
> Andy, would you be Ok with adding your Reviewed-by to it?
> 

FWIW,
Reviewed-by: Andy Shevchenko <andriy.shevchenko at linux.intel.com>

P.S. I'm not familiar with the topic so much to have any insights about
implementation, though.

> ---8<-----------------------
> 
> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> Date: Wed, 16 May 2018 23:46:06 -0700
> Subject: [PATCH v9 7/8] lib: Add module to simulate atomic sections
> for testing
>  preemptoff tracers
> 
> In this patch we introduce a test module for simulating a long atomic
> section in the kernel which the preemptoff or irqsoff tracers can
> detect. This module is to be used only for test purposes and is
> default
> disabled.
> 
> Following is the expected output (only briefly shown) that can be
> parsed
> to verify that the tracers are working correctly. We will use this
> from
> the kselftests in future patches.
> 
> For the preemptoff tracer:
> 
> echo preemptoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=preempt
> atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
> preempt -1066    2...2 500012us : <stack trace>
>  => kthread
>  => ret_from_fork
> 
> For the irqsoff tracer:
> 
> echo irqsoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> irq dis -1069    1d..1    0us@: atomic_sect_run
> irq dis -1069    1d..1 500001us : atomic_sect_run
> irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
> irq dis -1069    1d..1 500005us : <stack trace>
>  => ret_from_fork
> 
> Co-developed-by: Erick Reyes <erickreyes at google.com>
> Cc: Andy Shevchenko <andriy.shevchenko at linux.intel.com>
> Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
> ---
>  lib/Kconfig.debug          |  8 ++++
>  lib/Makefile               |  1 +
>  lib/test_atomic_sections.c | 77
> ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 86 insertions(+)
>  create mode 100644 lib/test_atomic_sections.c
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c40c7b734cd1..faebf0fe3bcf 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1933,6 +1933,14 @@ config TEST_KMOD
>  
>  	  If unsure, say N.
>  
> +config TEST_ATOMIC_SECTIONS
> +	tristate "Simulate atomic sections for tracers to detect"
> +	depends on m
> +	help
> +	  Select this option to build a test module that can help
> test atomic
> +	  sections by simulating them with a duration supplied as a
> module
> +	  parameter. Preempt disable and irq disable modes can be
> requested.
> +
>  config TEST_DEBUG_VIRTUAL
>  	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
>  	depends on DEBUG_VIRTUAL
> diff --git a/lib/Makefile b/lib/Makefile
> index ce20696d5a92..e82cf5445b7b 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -46,6 +46,7 @@ obj-y += string_helpers.o
>  obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
>  obj-y += hexdump.o
>  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> +obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
>  obj-y += kstrtox.o
>  obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
>  obj-$(CONFIG_TEST_BPF) += test_bpf.o
> diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
> new file mode 100644
> index 000000000000..1eef518f0974
> --- /dev/null
> +++ b/lib/test_atomic_sections.c
> @@ -0,0 +1,77 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Atomic section emulation test module
> + *
> + * Emulates atomic sections by disabling IRQs or preemption
> + * and doing a busy wait for a specified amount of time.
> + * This can be used for testing of different atomic section
> + * tracers such as irqsoff tracers.
> + *
> + * (c) 2018. Google LLC
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/kernel.h>
> +#include <linux/kthread.h>
> +#include <linux/ktime.h>
> +#include <linux/module.h>
> +#include <linux/printk.h>
> +#include <linux/string.h>
> +
> +static ulong atomic_time = 100;
> +static char atomic_mode[10] = "irq";
> +
> +module_param_named(atomic_time, atomic_time, ulong, S_IRUGO);
> +module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
> +MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS
> default)");
> +MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or
> irq (default irq)");
> +
> +static void busy_wait(ulong time)
> +{
> +	ktime_t start, end;
> +	start = ktime_get();
> +	do {
> +		end = ktime_get();
> +		if (kthread_should_stop())
> +			break;
> +	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
> +}
> +
> +int atomic_sect_run(void *data)
> +{
> +	unsigned long flags;
> +
> +	if (!strcmp(atomic_mode, "irq")) {
> +		local_irq_save(flags);
> +		busy_wait(atomic_time);
> +		local_irq_restore(flags);
> +	} else if (!strcmp(atomic_mode, "preempt")) {
> +		preempt_disable();
> +		busy_wait(atomic_time);
> +		preempt_enable();
> +	}
> +
> +	return 0;
> +}
> +
> +static int __init atomic_sect_init(void)
> +{
> +	char task_name[50];
> +	struct task_struct *test_task;
> +
> +	snprintf(task_name, sizeof(task_name), "%s_test",
> atomic_mode);
> +
> +	test_task = kthread_run(atomic_sect_run, NULL, task_name);
> +	return PTR_ERR_OR_ZERO(test_task);
> +}
> +
> +static void __exit atomic_sect_exit(void)
> +{
> +	return;
> +}
> +
> +module_init(atomic_sect_init)
> +module_exit(atomic_sect_exit)
> +MODULE_LICENSE("GPL v2");

-- 
Andy Shevchenko <andriy.shevchenko at linux.intel.com>
Intel Finland Oy
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers
@ 2018-06-06  7:48       ` andriy.shevchenko
  0 siblings, 0 replies; 60+ messages in thread
From: Andy Shevchenko @ 2018-06-06  7:48 UTC (permalink / raw)


On Tue, 2018-06-05@16:53 -0700, Joel Fernandes wrote:
> (Resending since Andy wasn't on CC - sorry)

> Andy, previously made some suggestions to this patch. The updated
> version is
> below and I am planning to send it along with this series as v9. I
> have
> included it in advance below for your Review.
> 
> Andy, would you be Ok with adding your Reviewed-by to it?
> 

FWIW,
Reviewed-by: Andy Shevchenko <andriy.shevchenko at linux.intel.com>

P.S. I'm not familiar with the topic so much to have any insights about
implementation, though.

> ---8<-----------------------
> 
> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> Date: Wed, 16 May 2018 23:46:06 -0700
> Subject: [PATCH v9 7/8] lib: Add module to simulate atomic sections
> for testing
>  preemptoff tracers
> 
> In this patch we introduce a test module for simulating a long atomic
> section in the kernel which the preemptoff or irqsoff tracers can
> detect. This module is to be used only for test purposes and is
> default
> disabled.
> 
> Following is the expected output (only briefly shown) that can be
> parsed
> to verify that the tracers are working correctly. We will use this
> from
> the kselftests in future patches.
> 
> For the preemptoff tracer:
> 
> echo preemptoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=preempt
> atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> preempt -1066    2...2    0us@: atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500002us : atomic_sect_run <-atomic_sect_run
> preempt -1066    2...2 500004us : tracer_preempt_on <-atomic_sect_run
> preempt -1066    2...2 500012us : <stack trace>
>  => kthread
>  => ret_from_fork
> 
> For the irqsoff tracer:
> 
> echo irqsoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000
> sleep 1
> bash-4.3# cat /d/tracing/trace
> irq dis -1069    1d..1    0us@: atomic_sect_run
> irq dis -1069    1d..1 500001us : atomic_sect_run
> irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run
> irq dis -1069    1d..1 500005us : <stack trace>
>  => ret_from_fork
> 
> Co-developed-by: Erick Reyes <erickreyes at google.com>
> Cc: Andy Shevchenko <andriy.shevchenko at linux.intel.com>
> Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
> ---
>  lib/Kconfig.debug          |  8 ++++
>  lib/Makefile               |  1 +
>  lib/test_atomic_sections.c | 77
> ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 86 insertions(+)
>  create mode 100644 lib/test_atomic_sections.c
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c40c7b734cd1..faebf0fe3bcf 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1933,6 +1933,14 @@ config TEST_KMOD
>  
>  	  If unsure, say N.
>  
> +config TEST_ATOMIC_SECTIONS
> +	tristate "Simulate atomic sections for tracers to detect"
> +	depends on m
> +	help
> +	  Select this option to build a test module that can help
> test atomic
> +	  sections by simulating them with a duration supplied as a
> module
> +	  parameter. Preempt disable and irq disable modes can be
> requested.
> +
>  config TEST_DEBUG_VIRTUAL
>  	tristate "Test CONFIG_DEBUG_VIRTUAL feature"
>  	depends on DEBUG_VIRTUAL
> diff --git a/lib/Makefile b/lib/Makefile
> index ce20696d5a92..e82cf5445b7b 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -46,6 +46,7 @@ obj-y += string_helpers.o
>  obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
>  obj-y += hexdump.o
>  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> +obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
>  obj-y += kstrtox.o
>  obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
>  obj-$(CONFIG_TEST_BPF) += test_bpf.o
> diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
> new file mode 100644
> index 000000000000..1eef518f0974
> --- /dev/null
> +++ b/lib/test_atomic_sections.c
> @@ -0,0 +1,77 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Atomic section emulation test module
> + *
> + * Emulates atomic sections by disabling IRQs or preemption
> + * and doing a busy wait for a specified amount of time.
> + * This can be used for testing of different atomic section
> + * tracers such as irqsoff tracers.
> + *
> + * (c) 2018. Google LLC
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/kernel.h>
> +#include <linux/kthread.h>
> +#include <linux/ktime.h>
> +#include <linux/module.h>
> +#include <linux/printk.h>
> +#include <linux/string.h>
> +
> +static ulong atomic_time = 100;
> +static char atomic_mode[10] = "irq";
> +
> +module_param_named(atomic_time, atomic_time, ulong, S_IRUGO);
> +module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
> +MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS
> default)");
> +MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or
> irq (default irq)");
> +
> +static void busy_wait(ulong time)
> +{
> +	ktime_t start, end;
> +	start = ktime_get();
> +	do {
> +		end = ktime_get();
> +		if (kthread_should_stop())
> +			break;
> +	} while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
> +}
> +
> +int atomic_sect_run(void *data)
> +{
> +	unsigned long flags;
> +
> +	if (!strcmp(atomic_mode, "irq")) {
> +		local_irq_save(flags);
> +		busy_wait(atomic_time);
> +		local_irq_restore(flags);
> +	} else if (!strcmp(atomic_mode, "preempt")) {
> +		preempt_disable();
> +		busy_wait(atomic_time);
> +		preempt_enable();
> +	}
> +
> +	return 0;
> +}
> +
> +static int __init atomic_sect_init(void)
> +{
> +	char task_name[50];
> +	struct task_struct *test_task;
> +
> +	snprintf(task_name, sizeof(task_name), "%s_test",
> atomic_mode);
> +
> +	test_task = kthread_run(atomic_sect_run, NULL, task_name);
> +	return PTR_ERR_OR_ZERO(test_task);
> +}
> +
> +static void __exit atomic_sect_exit(void)
> +{
> +	return;
> +}
> +
> +module_init(atomic_sect_init)
> +module_exit(atomic_sect_exit)
> +MODULE_LICENSE("GPL v2");

-- 
Andy Shevchenko <andriy.shevchenko at linux.intel.com>
Intel Finland Oy
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2018-06-06  7:48 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-30  0:04 [PATCH v8 0/8] Centralize and unify usage of preempt/irq tracepoints Joel Fernandes
2018-05-30  0:04 ` Joel Fernandes
2018-05-30  0:04 ` joelaf
2018-05-30  0:04 ` [PATCH v8 1/8] softirq: reorder trace_softirqs_on to prevent lockdep splat Joel Fernandes
2018-05-30  0:04   ` Joel Fernandes
2018-05-30  0:04   ` joelaf
2018-05-30  0:04 ` [PATCH v8 2/8] srcu: Add notrace variants of srcu_read_{lock,unlock} Joel Fernandes
2018-05-30  0:04   ` Joel Fernandes
2018-05-30  0:04   ` joelaf
2018-05-30  0:04 ` [PATCH v8 3/8] srcu: Add notrace variant of srcu_dereference Joel Fernandes
2018-05-30  0:04   ` Joel Fernandes
2018-05-30  0:04   ` joelaf
2018-05-30  0:04 ` [PATCH v8 4/8] trace/irqsoff: Split reset into separate functions Joel Fernandes
2018-05-30  0:04   ` Joel Fernandes
2018-05-30  0:04   ` joelaf
2018-05-30  0:04 ` [PATCH v8 5/8] tracepoint: Make rcuidle tracepoint callers use SRCU Joel Fernandes
2018-05-30  0:04   ` Joel Fernandes
2018-05-30  0:04   ` joelaf
2018-05-31  6:50   ` Mathieu Desnoyers
2018-05-31  6:50     ` Mathieu Desnoyers
2018-05-31  6:50     ` mathieu.desnoyers
2018-05-31 17:51     ` Joel Fernandes
2018-05-31 17:51       ` Joel Fernandes
2018-05-31 17:51       ` joel
2018-06-02  1:17       ` Mathieu Desnoyers
2018-06-02  1:17         ` Mathieu Desnoyers
2018-06-02  1:17         ` mathieu.desnoyers
2018-05-30  0:04 ` [PATCH v8 6/8] tracing: Centralize preemptirq tracepoints and unify their usage Joel Fernandes
2018-05-30  0:04   ` Joel Fernandes
2018-05-30  0:04   ` joelaf
2018-05-31  1:56   ` Namhyung Kim
2018-05-31  1:56     ` Namhyung Kim
2018-05-31  1:56     ` namhyung
2018-05-31  6:26     ` Joel Fernandes
2018-05-31  6:26       ` Joel Fernandes
2018-05-31  6:26       ` joel
2018-05-30  0:04 ` [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers Joel Fernandes
2018-05-30  0:04   ` Joel Fernandes
2018-05-30  0:04   ` joelaf
2018-06-05 23:01   ` Joel Fernandes
2018-06-05 23:01     ` Joel Fernandes
2018-06-05 23:01     ` joel
2018-06-05 23:53   ` Joel Fernandes
2018-06-05 23:53     ` Joel Fernandes
2018-06-05 23:53     ` joel
2018-06-06  7:48     ` Andy Shevchenko
2018-06-06  7:48       ` Andy Shevchenko
2018-06-06  7:48       ` andriy.shevchenko
2018-05-30  0:05 ` [PATCH v8 8/8] kselftests: Add tests for the preemptoff and irqsoff tracers Joel Fernandes
2018-05-30  0:05   ` Joel Fernandes
2018-05-30  0:05   ` joelaf
2018-05-31  6:45   ` kbuild test robot
2018-05-31  6:45     ` kbuild test robot
2018-05-31  6:45     ` lkp
2018-05-31  6:45   ` [PATCH] kselftests: fix ptr_ret.cocci warnings kbuild test robot
2018-05-31  6:45     ` kbuild test robot
2018-05-31  6:45     ` fengguang.wu
2018-05-31  7:14     ` Joel Fernandes
2018-05-31  7:14       ` Joel Fernandes
2018-05-31  7:14       ` joel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.