All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] [GIT PULL] tracing,lockdep,x86: more updates for tip
@ 2009-04-17 21:11 Steven Rostedt
  2009-04-17 21:11 ` [PATCH 1/5] tracing/events: enable code with EVENT_TRACING not EVENT_TRACER Steven Rostedt
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Steven Rostedt @ 2009-04-17 21:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Frederic Weisbecker

Ingo,

Please pull the latest tip/tracing/ftrace tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/ftrace


Steven Rostedt (5):
      tracing/events: enable code with EVENT_TRACING not EVENT_TRACER
      tracing: add EXPORT_SYMBOL_GPL for trace commits
      tracing: add same level recursion detection
      tracing: protect trace_printk from recursion
      lockdep,x86: account for irqs enabled in paranoid_exit

----
 arch/x86/kernel/entry_64.S   |    3 +++
 include/linux/ftrace.h       |    7 +++++++
 include/linux/init_task.h    |    1 +
 include/linux/sched.h        |    4 +++-
 include/trace/define_trace.h |    2 +-
 kernel/trace/ring_buffer.c   |   42 ++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace.c         |   14 +++++++++++---
 7 files changed, 68 insertions(+), 5 deletions(-)
-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] tracing/events: enable code with EVENT_TRACING not EVENT_TRACER
  2009-04-17 21:11 [PATCH 0/5] [GIT PULL] tracing,lockdep,x86: more updates for tip Steven Rostedt
@ 2009-04-17 21:11 ` Steven Rostedt
  2009-04-18  7:13   ` Ingo Molnar
  2009-04-17 21:11 ` [PATCH 2/5] tracing: add EXPORT_SYMBOL_GPL for trace commits Steven Rostedt
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2009-04-17 21:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Frederic Weisbecker

[-- Attachment #1: 0001-tracing-events-enable-code-with-EVENT_TRACING-not-E.patch --]
[-- Type: text/plain, Size: 923 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The CONFIG_EVENT_TRACER is the way to turn on event tracing when no
other tracing has been configured. All code to get enabled should
depend on CONFIG_EVENT_TRACING. That is what is enabled when TRACING
(or CONFIG_EVENT_TRACER) is selected.

This patch enables the include/trace/ftrace.h file when
CONFIG_EVENT_TRACING is enabled.

[ Impact: fix warning in event tracer selftest ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/define_trace.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 1886941..7f1f23d 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -56,7 +56,7 @@
 
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
 
-#ifdef CONFIG_EVENT_TRACER
+#ifdef CONFIG_EVENT_TRACING
 #include <trace/ftrace.h>
 #endif
 
-- 
1.6.2.1

-- 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] tracing: add EXPORT_SYMBOL_GPL for trace commits
  2009-04-17 21:11 [PATCH 0/5] [GIT PULL] tracing,lockdep,x86: more updates for tip Steven Rostedt
  2009-04-17 21:11 ` [PATCH 1/5] tracing/events: enable code with EVENT_TRACING not EVENT_TRACER Steven Rostedt
@ 2009-04-17 21:11 ` Steven Rostedt
  2009-04-17 21:11 ` [PATCH 3/5] tracing: add same level recursion detection Steven Rostedt
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2009-04-17 21:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Frederic Weisbecker

[-- Attachment #1: 0002-tracing-add-EXPORT_SYMBOL_GPL-for-trace-commits.patch --]
[-- Type: text/plain, Size: 1264 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Not all the necessary symbols were exported to allow for tracing
by modules. This patch adds them in.

[ Impact: allow modules to commit data to the ring buffer ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f681f64..183d788 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -894,18 +894,20 @@ void trace_current_buffer_unlock_commit(struct ring_buffer_event *event,
 {
 	__trace_buffer_unlock_commit(&global_trace, event, flags, pc, 1);
 }
+EXPORT_SYMBOL(trace_current_buffer_unlock_commit);
 
 void trace_nowake_buffer_unlock_commit(struct ring_buffer_event *event,
 					unsigned long flags, int pc)
 {
 	__trace_buffer_unlock_commit(&global_trace, event, flags, pc, 0);
 }
+EXPORT_SYMBOL(trace_nowake_buffer_unlock_commit);
 
 void trace_current_buffer_discard_commit(struct ring_buffer_event *event)
 {
 	ring_buffer_discard_commit(global_trace.buffer, event);
 }
-EXPORT_SYMBOL_GPL(trace_nowake_buffer_unlock_commit);
+EXPORT_SYMBOL_GPL(trace_current_buffer_discard_commit);
 
 void
 trace_function(struct trace_array *tr,
-- 
1.6.2.1

-- 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] tracing: add same level recursion detection
  2009-04-17 21:11 [PATCH 0/5] [GIT PULL] tracing,lockdep,x86: more updates for tip Steven Rostedt
  2009-04-17 21:11 ` [PATCH 1/5] tracing/events: enable code with EVENT_TRACING not EVENT_TRACER Steven Rostedt
  2009-04-17 21:11 ` [PATCH 2/5] tracing: add EXPORT_SYMBOL_GPL for trace commits Steven Rostedt
@ 2009-04-17 21:11 ` Steven Rostedt
  2009-04-18  7:11   ` Ingo Molnar
  2009-04-17 21:11 ` [PATCH 4/5] tracing: protect trace_printk from recursion Steven Rostedt
  2009-04-17 21:11 ` [PATCH 5/5] lockdep,x86: account for irqs enabled in paranoid_exit Steven Rostedt
  4 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2009-04-17 21:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Frederic Weisbecker

[-- Attachment #1: 0003-tracing-add-same-level-recursion-detection.patch --]
[-- Type: text/plain, Size: 4557 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The tracing infrastructure allows for recursion. That is, an interrupt
may interrupt the act of tracing an event, and that interrupt may very well
perform its own trace. This is a recursive trace, and is fine to do.

The problem arises when there is a bug, and the utility doing the trace
calls something that recurses back into the tracer. This recursion is not
caused by an external event like an interrupt, but by code that is not
expected to recurse. The result could be a lockup.

This patch adds a bitmask to the task structure that keeps track
of the trace recursion. To find the interrupt depth, the following
algorithm is used:

  level = hardirq_count() + softirq_count() + in_nmi;

Here, level will be the depth of interrutps and softirqs, and even handles
the nmi. Then the corresponding bit is set in the recursion bitmask.
If the bit was already set, we know we had a recursion at the same level
and we warn about it and fail the writing to the buffer.

After the data has been committed to the buffer, we clear the bit.
No atomics are needed. The only races are with interrupts and they reset
the bitmask before returning anywy.

[ Impact: detect same irq level trace recursion ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace.h     |    7 +++++++
 include/linux/init_task.h  |    1 +
 include/linux/sched.h      |    4 +++-
 kernel/trace/ring_buffer.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 53 insertions(+), 1 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 97c83e1..39b95c5 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -488,8 +488,15 @@ static inline int test_tsk_trace_graph(struct task_struct *tsk)
 
 extern int ftrace_dump_on_oops;
 
+#ifdef CONFIG_PREEMPT
+#define INIT_TRACE_RECURSION		.trace_recursion = 0,
+#endif
+
 #endif /* CONFIG_TRACING */
 
+#ifndef INIT_TRACE_RECURSION
+#define INIT_TRACE_RECURSION
+#endif
 
 #ifdef CONFIG_HW_BRANCH_TRACER
 
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index dcfb933..6fc2185 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -187,6 +187,7 @@ extern struct cred init_cred;
 	INIT_TRACE_IRQFLAGS						\
 	INIT_LOCKDEP							\
 	INIT_FTRACE_GRAPH						\
+	INIT_TRACE_RECURSION						\
 }
 
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b4c38bc..7ede5e4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1428,7 +1428,9 @@ struct task_struct {
 #ifdef CONFIG_TRACING
 	/* state flags for use by tracers */
 	unsigned long trace;
-#endif
+	/* bitmask of trace recursion */
+	unsigned long trace_recursion;
+#endif /* CONFIG_TRACING */
 };
 
 /* Future-safe accessor for struct task_struct's cpus_allowed. */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 84a6055..b421b0e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1481,6 +1481,40 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
 	return event;
 }
 
+static int trace_irq_level(void)
+{
+	return hardirq_count() + softirq_count() + in_nmi();
+}
+
+static int trace_recursive_lock(void)
+{
+	int level;
+
+	level = trace_irq_level();
+
+	if (unlikely(current->trace_recursion & (1 << level))) {
+		/* Disable all tracing before we do anything else */
+		tracing_off_permanent();
+		WARN_ON_ONCE(1);
+		return -1;
+	}
+
+	current->trace_recursion |= 1 << level;
+
+	return 0;
+}
+
+static void trace_recursive_unlock(void)
+{
+	int level;
+
+	level = trace_irq_level();
+
+	WARN_ON_ONCE(!current->trace_recursion & (1 << level));
+
+	current->trace_recursion &= ~(1 << level);
+}
+
 static DEFINE_PER_CPU(int, rb_need_resched);
 
 /**
@@ -1514,6 +1548,9 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
 	/* If we are tracing schedule, we don't want to recurse */
 	resched = ftrace_preempt_disable();
 
+	if (trace_recursive_lock())
+		goto out_nocheck;
+
 	cpu = raw_smp_processor_id();
 
 	if (!cpumask_test_cpu(cpu, buffer->cpumask))
@@ -1543,6 +1580,9 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
 	return event;
 
  out:
+	trace_recursive_unlock();
+
+ out_nocheck:
 	ftrace_preempt_enable(resched);
 	return NULL;
 }
@@ -1581,6 +1621,8 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
 
 	rb_commit(cpu_buffer, event);
 
+	trace_recursive_unlock();
+
 	/*
 	 * Only the last preempt count needs to restore preemption.
 	 */
-- 
1.6.2.1

-- 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] tracing: protect trace_printk from recursion
  2009-04-17 21:11 [PATCH 0/5] [GIT PULL] tracing,lockdep,x86: more updates for tip Steven Rostedt
                   ` (2 preceding siblings ...)
  2009-04-17 21:11 ` [PATCH 3/5] tracing: add same level recursion detection Steven Rostedt
@ 2009-04-17 21:11 ` Steven Rostedt
  2009-04-17 21:11 ` [PATCH 5/5] lockdep,x86: account for irqs enabled in paranoid_exit Steven Rostedt
  4 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2009-04-17 21:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Frederic Weisbecker

[-- Attachment #1: 0004-tracing-protect-trace_printk-from-recursion.patch --]
[-- Type: text/plain, Size: 2259 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

trace_printk can be called from any context, including NMIs.
If this happens, then we must test for for recursion before
grabbing any spinlocks.

This patch prevents trace_printk from being called recursively.

[ Impact: prevent hard lockup in lockdep event tracer ]

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 183d788..b9a3adc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1259,6 +1259,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
 	struct trace_array_cpu *data;
 	struct bprint_entry *entry;
 	unsigned long flags;
+	int disable;
 	int resched;
 	int cpu, len = 0, size, pc;
 
@@ -1273,7 +1274,8 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
 	cpu = raw_smp_processor_id();
 	data = tr->data[cpu];
 
-	if (unlikely(atomic_read(&data->disabled)))
+	disable = atomic_inc_return(&data->disabled);
+	if (unlikely(disable != 1))
 		goto out;
 
 	/* Lockdep uses trace_printk for lock tracing */
@@ -1301,6 +1303,7 @@ out_unlock:
 	local_irq_restore(flags);
 
 out:
+	atomic_dec_return(&data->disabled);
 	ftrace_preempt_enable(resched);
 	unpause_graph_tracing();
 
@@ -1320,6 +1323,7 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
 	int cpu, len = 0, size, pc;
 	struct print_entry *entry;
 	unsigned long irq_flags;
+	int disable;
 
 	if (tracing_disabled || tracing_selftest_running)
 		return 0;
@@ -1329,7 +1333,8 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
 	cpu = raw_smp_processor_id();
 	data = tr->data[cpu];
 
-	if (unlikely(atomic_read(&data->disabled)))
+	disable = atomic_inc_return(&data->disabled);
+	if (unlikely(disable != 1))
 		goto out;
 
 	pause_graph_tracing();
@@ -1357,6 +1362,7 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
 	raw_local_irq_restore(irq_flags);
 	unpause_graph_tracing();
  out:
+	atomic_dec_return(&data->disabled);
 	preempt_enable_notrace();
 
 	return len;
-- 
1.6.2.1

-- 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] lockdep,x86: account for irqs enabled in paranoid_exit
  2009-04-17 21:11 [PATCH 0/5] [GIT PULL] tracing,lockdep,x86: more updates for tip Steven Rostedt
                   ` (3 preceding siblings ...)
  2009-04-17 21:11 ` [PATCH 4/5] tracing: protect trace_printk from recursion Steven Rostedt
@ 2009-04-17 21:11 ` Steven Rostedt
  2009-04-18  7:00   ` Ingo Molnar
  4 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2009-04-17 21:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Frederic Weisbecker, Peter Zijlsta

[-- Attachment #1: 0005-lockdep-x86-account-for-irqs-enabled-in-paranoid_ex.patch --]
[-- Type: text/plain, Size: 1924 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

I hit the check_flags error of lockdep:

WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0()
[...]
hardirqs last  enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110
hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40
softirqs last  enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110
softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0

The check_flags warning of lockdep tells me that lockdep thought interrupts
were disabled, but they were really enabled.

The numbers in the above parenthesis show the order of events:

12566: softirqs last enabled:  lock_sock_nested
12567: hardirqs last enabled:  local_bh_enable
12568: softirqs last disabled: tcp_prequeue_process
12566: hardirqs last disabled: int3

int3 is a breakpoint!

Examining this further, I have CONFIG_NET_TCPPROBE enabled which adds
break points into the kernel.

The paranoid_exit of the return of int3 does not account for enabling
interrupts on return to kernel. This code is a bit tricky since it
is also used by the nmi handler (when lockdep is off), and we must be
careful about the swapgs. We can not call kernel code after the swapgs
has been performed.

[ Impact: fix lockdep check_flags warning ]

Acked-by: Peter Zijlsta <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/kernel/entry_64.S |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1ac9986..987f91f 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1397,7 +1397,10 @@ ENTRY(paranoid_exit)
 paranoid_swapgs:
 	TRACE_IRQS_IRETQ 0
 	SWAPGS_UNSAFE_STACK
+	RESTORE_ALL 8
+	jmp irq_return
 paranoid_restore:
+	TRACE_IRQS_IRETQ 0
 	RESTORE_ALL 8
 	jmp irq_return
 paranoid_userspace:
-- 
1.6.2.1

-- 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] lockdep,x86: account for irqs enabled in paranoid_exit
  2009-04-17 21:11 ` [PATCH 5/5] lockdep,x86: account for irqs enabled in paranoid_exit Steven Rostedt
@ 2009-04-18  7:00   ` Ingo Molnar
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2009-04-18  7:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Frederic Weisbecker, Peter Zijlsta


* Steven Rostedt <rostedt@goodmis.org> wrote:

> From: Steven Rostedt <srostedt@redhat.com>
> 
> I hit the check_flags error of lockdep:
> 
> WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0()
> [...]
> hardirqs last  enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110
> hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40
> softirqs last  enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110
> softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0
> 
> The check_flags warning of lockdep tells me that lockdep thought interrupts
> were disabled, but they were really enabled.
> 
> The numbers in the above parenthesis show the order of events:
> 
> 12566: softirqs last enabled:  lock_sock_nested
> 12567: hardirqs last enabled:  local_bh_enable
> 12568: softirqs last disabled: tcp_prequeue_process
> 12566: hardirqs last disabled: int3
> 
> int3 is a breakpoint!
> 
> Examining this further, I have CONFIG_NET_TCPPROBE enabled which adds
> break points into the kernel.
> 
> The paranoid_exit of the return of int3 does not account for enabling
> interrupts on return to kernel. This code is a bit tricky since it
> is also used by the nmi handler (when lockdep is off), and we must be
> careful about the swapgs. We can not call kernel code after the swapgs
> has been performed.
> 
> [ Impact: fix lockdep check_flags warning ]
> 
> Acked-by: Peter Zijlsta <a.p.zijlstra@chello.nl>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  arch/x86/kernel/entry_64.S |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 1ac9986..987f91f 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -1397,7 +1397,10 @@ ENTRY(paranoid_exit)
>  paranoid_swapgs:
>  	TRACE_IRQS_IRETQ 0
>  	SWAPGS_UNSAFE_STACK
> +	RESTORE_ALL 8
> +	jmp irq_return
>  paranoid_restore:
> +	TRACE_IRQS_IRETQ 0
>  	RESTORE_ALL 8
>  	jmp irq_return
>  paranoid_userspace:

Nice catch! This is also for upstream really - i'll cherry-pick it 
into the lockdep urgent queue as well.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/5] tracing: add same level recursion detection
  2009-04-17 21:11 ` [PATCH 3/5] tracing: add same level recursion detection Steven Rostedt
@ 2009-04-18  7:11   ` Ingo Molnar
  2009-04-18 14:12     ` Steven Rostedt
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2009-04-18  7:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Frederic Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> From: Steven Rostedt <srostedt@redhat.com>
> 
> The tracing infrastructure allows for recursion. That is, an interrupt
> may interrupt the act of tracing an event, and that interrupt may very well
> perform its own trace. This is a recursive trace, and is fine to do.
> 
> The problem arises when there is a bug, and the utility doing the trace
> calls something that recurses back into the tracer. This recursion is not
> caused by an external event like an interrupt, but by code that is not
> expected to recurse. The result could be a lockup.
> 
> This patch adds a bitmask to the task structure that keeps track
> of the trace recursion. To find the interrupt depth, the following
> algorithm is used:
> 
>   level = hardirq_count() + softirq_count() + in_nmi;
> 
> Here, level will be the depth of interrutps and softirqs, and even handles
> the nmi. Then the corresponding bit is set in the recursion bitmask.
> If the bit was already set, we know we had a recursion at the same level
> and we warn about it and fail the writing to the buffer.
> 
> After the data has been committed to the buffer, we clear the bit.
> No atomics are needed. The only races are with interrupts and they reset
> the bitmask before returning anywy.
> 
> [ Impact: detect same irq level trace recursion ]
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  include/linux/ftrace.h     |    7 +++++++
>  include/linux/init_task.h  |    1 +
>  include/linux/sched.h      |    4 +++-
>  kernel/trace/ring_buffer.c |   42 ++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 53 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 97c83e1..39b95c5 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -488,8 +488,15 @@ static inline int test_tsk_trace_graph(struct task_struct *tsk)
>  
>  extern int ftrace_dump_on_oops;
>  
> +#ifdef CONFIG_PREEMPT
> +#define INIT_TRACE_RECURSION		.trace_recursion = 0,
> +#endif
> +
>  #endif /* CONFIG_TRACING */
>  
> +#ifndef INIT_TRACE_RECURSION
> +#define INIT_TRACE_RECURSION
> +#endif
>  
>  #ifdef CONFIG_HW_BRANCH_TRACER
>  
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index dcfb933..6fc2185 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -187,6 +187,7 @@ extern struct cred init_cred;
>  	INIT_TRACE_IRQFLAGS						\
>  	INIT_LOCKDEP							\
>  	INIT_FTRACE_GRAPH						\
> +	INIT_TRACE_RECURSION						\
>  }
>  
>  
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b4c38bc..7ede5e4 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1428,7 +1428,9 @@ struct task_struct {
>  #ifdef CONFIG_TRACING
>  	/* state flags for use by tracers */
>  	unsigned long trace;
> -#endif
> +	/* bitmask of trace recursion */
> +	unsigned long trace_recursion;
> +#endif /* CONFIG_TRACING */
>  };
>  
>  /* Future-safe accessor for struct task_struct's cpus_allowed. */
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 84a6055..b421b0e 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1481,6 +1481,40 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
>  	return event;
>  }
>  
> +static int trace_irq_level(void)
> +{
> +	return hardirq_count() + softirq_count() + in_nmi();
> +}
> +
> +static int trace_recursive_lock(void)
> +{
> +	int level;
> +
> +	level = trace_irq_level();
> +
> +	if (unlikely(current->trace_recursion & (1 << level))) {
> +		/* Disable all tracing before we do anything else */
> +		tracing_off_permanent();
> +		WARN_ON_ONCE(1);
> +		return -1;
> +	}
> +
> +	current->trace_recursion |= 1 << level;
> +
> +	return 0;
> +}
> +
> +static void trace_recursive_unlock(void)
> +{
> +	int level;
> +
> +	level = trace_irq_level();
> +
> +	WARN_ON_ONCE(!current->trace_recursion & (1 << level));
> +
> +	current->trace_recursion &= ~(1 << level);
> +}
> +
>  static DEFINE_PER_CPU(int, rb_need_resched);
>  
>  /**
> @@ -1514,6 +1548,9 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
>  	/* If we are tracing schedule, we don't want to recurse */
>  	resched = ftrace_preempt_disable();
>  
> +	if (trace_recursive_lock())
> +		goto out_nocheck;

This is obviously a good idea, but i'm nervous about two things 
here: overhead and placement.

Firstly, the lock/unlock sequence adds to the critical path of all 
the tracing infrastructure and makes it slower. Secondly, we do this 
in ring_buffer_lock_reserve() only - instead of doing it at the 
outmost layer (where recursion protection is the most effective).

So while i've pulled it, it would be nice to rework the concepts 
here comprehensively. We have a number of atomics, layers, checks, 
and all that adds overhead and complexity.

The best would be to just face it and disable IRQs via 
raw_local_irq_save() plus set the trace recursion flag - and do this 
around all trace point facilities and as early as possible (they are 
not allowed to sleep anyway).

That would remove many of the "what if an IRQ recurses" uglinesses. 
CLI/POPF pairs are also very fast on most architectures (a lot of 
performance-sensitive code in Linux depends on that). This would 
also improve the end result: IRQs would not mix up the order of 
trace records occasionally.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/5] tracing/events: enable code with EVENT_TRACING not EVENT_TRACER
  2009-04-17 21:11 ` [PATCH 1/5] tracing/events: enable code with EVENT_TRACING not EVENT_TRACER Steven Rostedt
@ 2009-04-18  7:13   ` Ingo Molnar
  2009-04-18 14:18     ` Steven Rostedt
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2009-04-18  7:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Frederic Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> -#ifdef CONFIG_EVENT_TRACER
> +#ifdef CONFIG_EVENT_TRACING

Hm, my crystal ball predicts many such incidents in future years to 
come.

This:

  s/CONFIG_EVENT_TRACER/CONFIG_EVENT_TRACER_PLUGIN

might help. Or not.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/5] tracing: add same level recursion detection
  2009-04-18  7:11   ` Ingo Molnar
@ 2009-04-18 14:12     ` Steven Rostedt
  0 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2009-04-18 14:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Frederic Weisbecker


On Sat, 18 Apr 2009, Ingo Molnar wrote:
> > +
> > +static void trace_recursive_unlock(void)
> > +{
> > +	int level;
> > +
> > +	level = trace_irq_level();
> > +
> > +	WARN_ON_ONCE(!current->trace_recursion & (1 << level));
> > +
> > +	current->trace_recursion &= ~(1 << level);
> > +}
> > +
> >  static DEFINE_PER_CPU(int, rb_need_resched);
> >  
> >  /**
> > @@ -1514,6 +1548,9 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
> >  	/* If we are tracing schedule, we don't want to recurse */
> >  	resched = ftrace_preempt_disable();
> >  
> > +	if (trace_recursive_lock())
> > +		goto out_nocheck;
> 
> This is obviously a good idea, but i'm nervous about two things 
> here: overhead and placement.

I'm not sure how much overhead is added. It modifies a variable in
current without the need for any atomic operations.

> 
> Firstly, the lock/unlock sequence adds to the critical path of all 
> the tracing infrastructure and makes it slower. Secondly, we do this 
> in ring_buffer_lock_reserve() only - instead of doing it at the 
> outmost layer (where recursion protection is the most effective).

The problem with doing it in the outer most layer is that I may need
to disable preemption. Once I do that, I need to be careful about
calling preempt_enable or preempt_enable_noresched. If you call
preempt enable from within the scheduler, you may end up with an
inifinite recursion. If you use noresched, you may miss a preemption 
point.

the ftrace_preempt_disable handles this, but the enable needs to be done 
in another function, and that means the state needs to be passed to that 
function. It can be done (the ring buffer does this), but instead of 
rewriting the code the ring buffer does, I simply moved the tests into the 
ring buffer.

> 
> So while i've pulled it, it would be nice to rework the concepts 
> here comprehensively. We have a number of atomics, layers, checks, 
> and all that adds overhead and complexity.

Agreed. I added this because I thought the lockdep bug was due to
recurion. It was, but it was due to a spinlock in trace_printk.
I have yet to hit a bug that this code would have detected, so I'm not 
sure yet that it is needed. Especially when the ring buffer becomes 
lockless, then the change of recursion being a problem becomes even more 
minimal.

> 
> The best would be to just face it and disable IRQs via 
> raw_local_irq_save() plus set the trace recursion flag - and do this 
> around all trace point facilities and as early as possible (they are 
> not allowed to sleep anyway).

The ring buffer already disables preemption, but disabling interrupts 
would be even a greater hit on performance. And that gives no protection 
to NMIs, which can also call tracing.

> 
> That would remove many of the "what if an IRQ recurses" uglinesses. 
> CLI/POPF pairs are also very fast on most architectures (a lot of 
> performance-sensitive code in Linux depends on that). This would 
> also improve the end result: IRQs would not mix up the order of 
> trace records occasionally.

Yes, a lot of code in the kernel depends on it, but tracing becomes even a 
bigger hot path. The best tracers give the lowest overhead, and besides, 
it gives no protection to NMIs.

The order is presevered even if an IRQ happens. The first one to hit the 
real "reserve buffer" code will have that order.

I put a lot of work into making sure the buffer can be re-entrant. I would 
hate to just through it away. It is also needed for tracing NMIs. Letting 
interrupts reenter the code gives better testing to know that NMIs would 
work too.

This recursive logic is something that I've even thought about being a 
compiled in option only (DEBUG). Something to try out if we see a failure. 
Again, I wrote this to see if the lockdep hang was caused by the ring 
buffer and even the event tracer. I can move the checks into the events, 
and use a similar approach as the ring buffer does for preempt disable.
But this did not help at all in the debugging but let me know that the 
ring buffer and event tracer were not the reason for the hang. It was the 
spinlock in trace_print that caused the lockup. And I could have sworn I 
wrote a patch for that code before. I think it was lost in the rewrites.

-- Steve


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/5] tracing/events: enable code with EVENT_TRACING not EVENT_TRACER
  2009-04-18  7:13   ` Ingo Molnar
@ 2009-04-18 14:18     ` Steven Rostedt
  0 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2009-04-18 14:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Frederic Weisbecker


On Sat, 18 Apr 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > -#ifdef CONFIG_EVENT_TRACER
> > +#ifdef CONFIG_EVENT_TRACING
> 
> Hm, my crystal ball predicts many such incidents in future years to 
> come.
> 
> This:
> 
>   s/CONFIG_EVENT_TRACER/CONFIG_EVENT_TRACER_PLUGIN
> 
> might help. Or not.

The problem is that a plugin is defined (by me anyway) as something in the 
"available_tracers" file. The event tracer does not qualify as one. But I 
see the confusion with the current naming.

What about:

	s/CONFIG_EVENT_TRACER/CONFIG_ENABLE_EVENT_TRACING/

It pretty much says what it does.

??

-- Steve


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-04-18 14:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-17 21:11 [PATCH 0/5] [GIT PULL] tracing,lockdep,x86: more updates for tip Steven Rostedt
2009-04-17 21:11 ` [PATCH 1/5] tracing/events: enable code with EVENT_TRACING not EVENT_TRACER Steven Rostedt
2009-04-18  7:13   ` Ingo Molnar
2009-04-18 14:18     ` Steven Rostedt
2009-04-17 21:11 ` [PATCH 2/5] tracing: add EXPORT_SYMBOL_GPL for trace commits Steven Rostedt
2009-04-17 21:11 ` [PATCH 3/5] tracing: add same level recursion detection Steven Rostedt
2009-04-18  7:11   ` Ingo Molnar
2009-04-18 14:12     ` Steven Rostedt
2009-04-17 21:11 ` [PATCH 4/5] tracing: protect trace_printk from recursion Steven Rostedt
2009-04-17 21:11 ` [PATCH 5/5] lockdep,x86: account for irqs enabled in paranoid_exit Steven Rostedt
2009-04-18  7:00   ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.