linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8][RFC] ftrace: Optimizing the function tracer
@ 2012-11-15 15:30 Steven Rostedt
  2012-11-15 15:30 ` [PATCH 1/8][RFC] ftrace: Fix global function tracers that are not recursion safe Steven Rostedt
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Steven Rostedt @ 2012-11-15 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

In the past I've worked on tricks to make the function tracer a bit
faster. I've lowered the overhead by 18% which is quite an improvement.
Thus, instead of only taking 11x longer in the kernel, it takes 9x ;-)

Anyway, these are the latest patches to improve function tracing and
are pretty much good to go mainline. I just wanted to do one last
RFC post before pushing them forward.

-- Steve


These patches can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
rfc/function-preempt-only-v2

Head SHA1: bf9d25e084984439cb32c1fc7cb0f1936c95c51a


Steven Rostedt (8):
      ftrace: Fix global function tracers that are not recursion safe
      ftrace: Fix function tracing recursion self test
      ftrace: Optimize the function tracer list loop
      ftrace: Add context level recursion bit checking
      tracing: Make the trace recursion bits into enums
      tracing: Avoid unnecessary multiple recursion checks
      ftrace: Use only the preempt version of function tracing
      ring-buffer: User context bit recursion checking

----
 kernel/trace/ftrace.c          |   82 ++++++++++++++++---------
 kernel/trace/ring_buffer.c     |   85 +++++++++++++++++--------
 kernel/trace/trace.h           |  133 ++++++++++++++++++++++++++++++++++++----
 kernel/trace/trace_functions.c |   61 +++++-------------
 kernel/trace/trace_selftest.c  |    3 +-
 5 files changed, 248 insertions(+), 116 deletions(-)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/8][RFC] ftrace: Fix global function tracers that are not recursion safe
  2012-11-15 15:30 [PATCH 0/8][RFC] ftrace: Optimizing the function tracer Steven Rostedt
@ 2012-11-15 15:30 ` Steven Rostedt
  2012-11-15 15:30 ` [PATCH 2/8][RFC] ftrace: Fix function tracing recursion self test Steven Rostedt
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2012-11-15 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: 0001-ftrace-Fix-global-function-tracers-that-are-not-recu.patch --]
[-- Type: text/plain, Size: 1434 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

If one of the function tracers set by the global ops is not recursion
safe, it can still be called directly without the added recursion
supplied by the ftrace infrastructure.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ftrace.c |   18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4451aa3..67e886e 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -221,10 +221,24 @@ static void update_global_ops(void)
 	 * registered callers.
 	 */
 	if (ftrace_global_list == &ftrace_list_end ||
-	    ftrace_global_list->next == &ftrace_list_end)
+	    ftrace_global_list->next == &ftrace_list_end) {
 		func = ftrace_global_list->func;
-	else
+		/*
+		 * As we are calling the function directly.
+		 * If it does not have recursion protection,
+		 * the function_trace_op needs to be updated
+		 * accordingly.
+		 */
+		if (ftrace_global_list->flags & FTRACE_OPS_FL_RECURSION_SAFE)
+			global_ops.flags |= FTRACE_OPS_FL_RECURSION_SAFE;
+		else
+			global_ops.flags &= ~FTRACE_OPS_FL_RECURSION_SAFE;
+	} else {
 		func = ftrace_global_list_func;
+		/* The list has its own recursion protection. */
+		global_ops.flags |= FTRACE_OPS_FL_RECURSION_SAFE;
+	}
+
 
 	/* If we filter on pids, update to use the pid function */
 	if (!list_empty(&ftrace_pids)) {
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/8][RFC] ftrace: Fix function tracing recursion self test
  2012-11-15 15:30 [PATCH 0/8][RFC] ftrace: Optimizing the function tracer Steven Rostedt
  2012-11-15 15:30 ` [PATCH 1/8][RFC] ftrace: Fix global function tracers that are not recursion safe Steven Rostedt
@ 2012-11-15 15:30 ` Steven Rostedt
  2012-11-15 15:30 ` [PATCH 3/8][RFC] ftrace: Optimize the function tracer list loop Steven Rostedt
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2012-11-15 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: 0002-ftrace-Fix-function-tracing-recursion-self-test.patch --]
[-- Type: text/plain, Size: 935 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The function tracing recursion self test should not crash
the machine if the resursion test fails. If it detects that
the function tracing is recursing when it should not be, then
bail, don't go into an infinite recursive loop.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_selftest.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 4762316..d1f98ab 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -415,7 +415,8 @@ static void trace_selftest_test_recursion_func(unsigned long ip,
 	 * The ftrace infrastructure should provide the recursion
 	 * protection. If not, this will crash the kernel!
 	 */
-	trace_selftest_recursion_cnt++;
+	if (trace_selftest_recursion_cnt++ > 10)
+		return;
 	DYN_FTRACE_TEST_NAME();
 }
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/8][RFC] ftrace: Optimize the function tracer list loop
  2012-11-15 15:30 [PATCH 0/8][RFC] ftrace: Optimizing the function tracer Steven Rostedt
  2012-11-15 15:30 ` [PATCH 1/8][RFC] ftrace: Fix global function tracers that are not recursion safe Steven Rostedt
  2012-11-15 15:30 ` [PATCH 2/8][RFC] ftrace: Fix function tracing recursion self test Steven Rostedt
@ 2012-11-15 15:30 ` Steven Rostedt
  2012-11-15 15:30 ` [PATCH 4/8][RFC] ftrace: Add context level recursion bit checking Steven Rostedt
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2012-11-15 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: 0003-ftrace-Optimize-the-function-tracer-list-loop.patch --]
[-- Type: text/plain, Size: 4886 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

There is lots of places that perform:

       op = rcu_dereference_raw(ftrace_control_list);
       while (op != &ftrace_list_end) {

Add a helper macro to do this, and also optimize for a single
entity. That is, gcc will optimize a loop for either no iterations
or more than one iteration. But usually only a single callback
is registered to the function tracer, thus the optimized case
should be a single pass. to do this we now do:

	op = rcu_dereference_raw(list);
	do {
		[...]
	} while (likely(op = rcu_dereference_raw((op)->next)) &&
	       unlikely((op) != &ftrace_list_end));

An op is always registered (ftrace_list_end when no callbacks is
registered), thus when a single callback is registered, the link
list looks like:

 top => callback => ftrace_list_end => NULL.

The likely(op = op->next) still must be performed due to the race
of removing the callback, where the first op assignment could
equal ftrace_list_end. In that case, the op->next would be NULL.
But this is unlikely (only happens in a race condition when
removing the callback).

But it is very likely that the next op would be ftrace_list_end,
unless more than one callback has been registered. This tells
gcc what the most common case is and makes the fast path with
the least amount of branches.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ftrace.c |   48 ++++++++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 67e886e..cd6cd3c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -111,6 +111,26 @@ static void ftrace_ops_no_ops(unsigned long ip, unsigned long parent_ip);
 #define ftrace_ops_list_func ((ftrace_func_t)ftrace_ops_no_ops)
 #endif
 
+/*
+ * Traverse the ftrace_global_list, invoking all entries.  The reason that we
+ * can use rcu_dereference_raw() is that elements removed from this list
+ * are simply leaked, so there is no need to interact with a grace-period
+ * mechanism.  The rcu_dereference_raw() calls are needed to handle
+ * concurrent insertions into the ftrace_global_list.
+ *
+ * Silly Alpha and silly pointer-speculation compiler optimizations!
+ */
+#define do_for_each_ftrace_op(op, list)			\
+	op = rcu_dereference_raw(list);			\
+	do
+
+/*
+ * Optimized for just a single item in the list (as that is the normal case).
+ */
+#define while_for_each_ftrace_op(op)				\
+	while (likely(op = rcu_dereference_raw((op)->next)) &&	\
+	       unlikely((op) != &ftrace_list_end))
+
 /**
  * ftrace_nr_registered_ops - return number of ops registered
  *
@@ -132,15 +152,6 @@ int ftrace_nr_registered_ops(void)
 	return cnt;
 }
 
-/*
- * Traverse the ftrace_global_list, invoking all entries.  The reason that we
- * can use rcu_dereference_raw() is that elements removed from this list
- * are simply leaked, so there is no need to interact with a grace-period
- * mechanism.  The rcu_dereference_raw() calls are needed to handle
- * concurrent insertions into the ftrace_global_list.
- *
- * Silly Alpha and silly pointer-speculation compiler optimizations!
- */
 static void
 ftrace_global_list_func(unsigned long ip, unsigned long parent_ip,
 			struct ftrace_ops *op, struct pt_regs *regs)
@@ -149,11 +160,9 @@ ftrace_global_list_func(unsigned long ip, unsigned long parent_ip,
 		return;
 
 	trace_recursion_set(TRACE_GLOBAL_BIT);
-	op = rcu_dereference_raw(ftrace_global_list); /*see above*/
-	while (op != &ftrace_list_end) {
+	do_for_each_ftrace_op(op, ftrace_global_list) {
 		op->func(ip, parent_ip, op, regs);
-		op = rcu_dereference_raw(op->next); /*see above*/
-	};
+	} while_for_each_ftrace_op(op);
 	trace_recursion_clear(TRACE_GLOBAL_BIT);
 }
 
@@ -4104,14 +4113,11 @@ ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip,
 	 */
 	preempt_disable_notrace();
 	trace_recursion_set(TRACE_CONTROL_BIT);
-	op = rcu_dereference_raw(ftrace_control_list);
-	while (op != &ftrace_list_end) {
+	do_for_each_ftrace_op(op, ftrace_control_list) {
 		if (!ftrace_function_local_disabled(op) &&
 		    ftrace_ops_test(op, ip))
 			op->func(ip, parent_ip, op, regs);
-
-		op = rcu_dereference_raw(op->next);
-	};
+	} while_for_each_ftrace_op(op);
 	trace_recursion_clear(TRACE_CONTROL_BIT);
 	preempt_enable_notrace();
 }
@@ -4139,12 +4145,10 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
 	 * they must be freed after a synchronize_sched().
 	 */
 	preempt_disable_notrace();
-	op = rcu_dereference_raw(ftrace_ops_list);
-	while (op != &ftrace_list_end) {
+	do_for_each_ftrace_op(op, ftrace_ops_list) {
 		if (ftrace_ops_test(op, ip))
 			op->func(ip, parent_ip, op, regs);
-		op = rcu_dereference_raw(op->next);
-	};
+	} while_for_each_ftrace_op(op);
 	preempt_enable_notrace();
 	trace_recursion_clear(TRACE_INTERNAL_BIT);
 }
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/8][RFC] ftrace: Add context level recursion bit checking
  2012-11-15 15:30 [PATCH 0/8][RFC] ftrace: Optimizing the function tracer Steven Rostedt
                   ` (2 preceding siblings ...)
  2012-11-15 15:30 ` [PATCH 3/8][RFC] ftrace: Optimize the function tracer list loop Steven Rostedt
@ 2012-11-15 15:30 ` Steven Rostedt
  2012-11-15 15:30 ` [PATCH 5/8][RFC] tracing: Make the trace recursion bits into enums Steven Rostedt
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2012-11-15 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: 0004-ftrace-Add-context-level-recursion-bit-checking.patch --]
[-- Type: text/plain, Size: 4084 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Currently for recursion checking in the function tracer, ftrace
tests a task_struct bit to determine if the function tracer had
recursed or not. If it has, then it will will return without going
further.

But this leads to races. If an interrupt came in after the bit
was set, the functions being traced would see that bit set and
think that the function tracer recursed on itself, and would return.

Instead add a bit for each context (normal, softirq, irq and nmi).

A check of which context the task is in is made before testing the
associated bit. Now if an interrupt preempts the function tracer
after the previous context has been set, the interrupt functions
can still be traced.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ftrace.c |   40 +++++++++++++++++++++++++++++++++-------
 kernel/trace/trace.h  |   12 +++++++++---
 2 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index cd6cd3c..fbf25e7 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -156,14 +156,27 @@ static void
 ftrace_global_list_func(unsigned long ip, unsigned long parent_ip,
 			struct ftrace_ops *op, struct pt_regs *regs)
 {
-	if (unlikely(trace_recursion_test(TRACE_GLOBAL_BIT)))
+	int bit;
+
+	if (in_interrupt()) {
+		if (in_nmi())
+			bit = TRACE_GLOBAL_NMI_BIT;
+
+		else if (in_irq())
+			bit = TRACE_GLOBAL_IRQ_BIT;
+		else
+			bit = TRACE_GLOBAL_SIRQ_BIT;
+	} else
+		bit = TRACE_GLOBAL_BIT;
+
+	if (unlikely(trace_recursion_test(bit)))
 		return;
 
-	trace_recursion_set(TRACE_GLOBAL_BIT);
+	trace_recursion_set(bit);
 	do_for_each_ftrace_op(op, ftrace_global_list) {
 		op->func(ip, parent_ip, op, regs);
 	} while_for_each_ftrace_op(op);
-	trace_recursion_clear(TRACE_GLOBAL_BIT);
+	trace_recursion_clear(bit);
 }
 
 static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip,
@@ -4132,14 +4145,27 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
 		       struct ftrace_ops *ignored, struct pt_regs *regs)
 {
 	struct ftrace_ops *op;
+	unsigned int bit;
 
 	if (function_trace_stop)
 		return;
 
-	if (unlikely(trace_recursion_test(TRACE_INTERNAL_BIT)))
-		return;
+	if (in_interrupt()) {
+		if (in_nmi())
+			bit = TRACE_INTERNAL_NMI_BIT;
+
+		else if (in_irq())
+			bit = TRACE_INTERNAL_IRQ_BIT;
+		else
+			bit = TRACE_INTERNAL_SIRQ_BIT;
+	} else
+		bit = TRACE_INTERNAL_BIT;
+
+	if (unlikely(trace_recursion_test(bit)))
+			return;
+
+	trace_recursion_set(bit);
 
-	trace_recursion_set(TRACE_INTERNAL_BIT);
 	/*
 	 * Some of the ops may be dynamically allocated,
 	 * they must be freed after a synchronize_sched().
@@ -4150,7 +4176,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
 			op->func(ip, parent_ip, op, regs);
 	} while_for_each_ftrace_op(op);
 	preempt_enable_notrace();
-	trace_recursion_clear(TRACE_INTERNAL_BIT);
+	trace_recursion_clear(bit);
 }
 
 /*
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c75d798..fe6ccff 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -299,8 +299,14 @@ struct tracer {
 
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
-#define TRACE_GLOBAL_BIT		(1<<12)
-#define TRACE_CONTROL_BIT		(1<<13)
+#define TRACE_INTERNAL_NMI_BIT		(1<<12)
+#define TRACE_INTERNAL_IRQ_BIT		(1<<13)
+#define TRACE_INTERNAL_SIRQ_BIT		(1<<14)
+#define TRACE_GLOBAL_BIT		(1<<15)
+#define TRACE_GLOBAL_NMI_BIT		(1<<16)
+#define TRACE_GLOBAL_IRQ_BIT		(1<<17)
+#define TRACE_GLOBAL_SIRQ_BIT		(1<<18)
+#define TRACE_CONTROL_BIT		(1<<19)
 
 /*
  * Abuse of the trace_recursion.
@@ -309,7 +315,7 @@ struct tracer {
  * was called in irq context but we have irq tracing off. Since this
  * can only be modified by current, we can reuse trace_recursion.
  */
-#define TRACE_IRQ_BIT			(1<<13)
+#define TRACE_IRQ_BIT			(1<<20)
 
 #define trace_recursion_set(bit)	do { (current)->trace_recursion |= (bit); } while (0)
 #define trace_recursion_clear(bit)	do { (current)->trace_recursion &= ~(bit); } while (0)
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/8][RFC] tracing: Make the trace recursion bits into enums
  2012-11-15 15:30 [PATCH 0/8][RFC] ftrace: Optimizing the function tracer Steven Rostedt
                   ` (3 preceding siblings ...)
  2012-11-15 15:30 ` [PATCH 4/8][RFC] ftrace: Add context level recursion bit checking Steven Rostedt
@ 2012-11-15 15:30 ` Steven Rostedt
  2012-11-15 15:30 ` [PATCH 6/8][RFC] tracing: Avoid unnecessary multiple recursion checks Steven Rostedt
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2012-11-15 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: 0005-tracing-Make-the-trace-recursion-bits-into-enums.patch --]
[-- Type: text/plain, Size: 2006 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Convert the bits into enums which makes the code a little easier
to maintain.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.h |   30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index fe6ccff..5a095d6 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -298,15 +298,18 @@ struct tracer {
 #define trace_recursion_buffer() ((current)->trace_recursion & 0x3ff)
 
 /* for function tracing recursion */
-#define TRACE_INTERNAL_BIT		(1<<11)
-#define TRACE_INTERNAL_NMI_BIT		(1<<12)
-#define TRACE_INTERNAL_IRQ_BIT		(1<<13)
-#define TRACE_INTERNAL_SIRQ_BIT		(1<<14)
-#define TRACE_GLOBAL_BIT		(1<<15)
-#define TRACE_GLOBAL_NMI_BIT		(1<<16)
-#define TRACE_GLOBAL_IRQ_BIT		(1<<17)
-#define TRACE_GLOBAL_SIRQ_BIT		(1<<18)
-#define TRACE_CONTROL_BIT		(1<<19)
+enum {
+	TRACE_INTERNAL_BIT = 11,
+	TRACE_INTERNAL_NMI_BIT,
+	TRACE_INTERNAL_IRQ_BIT,
+	TRACE_INTERNAL_SIRQ_BIT,
+
+	TRACE_GLOBAL_BIT,
+	TRACE_GLOBAL_NMI_BIT,
+	TRACE_GLOBAL_IRQ_BIT,
+	TRACE_GLOBAL_SIRQ_BIT,
+
+	TRACE_CONTROL_BIT,
 
 /*
  * Abuse of the trace_recursion.
@@ -315,11 +318,12 @@ struct tracer {
  * was called in irq context but we have irq tracing off. Since this
  * can only be modified by current, we can reuse trace_recursion.
  */
-#define TRACE_IRQ_BIT			(1<<20)
+	TRACE_IRQ_BIT,
+};
 
-#define trace_recursion_set(bit)	do { (current)->trace_recursion |= (bit); } while (0)
-#define trace_recursion_clear(bit)	do { (current)->trace_recursion &= ~(bit); } while (0)
-#define trace_recursion_test(bit)	((current)->trace_recursion & (bit))
+#define trace_recursion_set(bit)	do { (current)->trace_recursion |= (1<<(bit)); } while (0)
+#define trace_recursion_clear(bit)	do { (current)->trace_recursion &= ~(1<<(bit)); } while (0)
+#define trace_recursion_test(bit)	((current)->trace_recursion & (1<<(bit)))
 
 #define TRACE_PIPE_ALL_CPU	-1
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/8][RFC] tracing: Avoid unnecessary multiple recursion checks
  2012-11-15 15:30 [PATCH 0/8][RFC] ftrace: Optimizing the function tracer Steven Rostedt
                   ` (4 preceding siblings ...)
  2012-11-15 15:30 ` [PATCH 5/8][RFC] tracing: Make the trace recursion bits into enums Steven Rostedt
@ 2012-11-15 15:30 ` Steven Rostedt
  2012-11-15 15:30 ` [PATCH 7/8][RFC] ftrace: Use only the preempt version of function tracing Steven Rostedt
  2012-11-15 15:30 ` [PATCH 8/8][RFC] ring-buffer: User context bit recursion checking Steven Rostedt
  7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2012-11-15 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: 0006-tracing-Avoid-unnecessary-multiple-recursion-checks.patch --]
[-- Type: text/plain, Size: 6695 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

When function tracing occurs, the following steps are made:
  If arch does not support a ftrace feature:
   call internal function (uses INTERNAL bits) which calls...
  If callback is registered to the "global" list, the list
   function is called and recursion checks the GLOBAL bits.
   then this function calls...
  The function callback, which can use the FTRACE bits to
   check for recursion.

Now if the arch does not suppport a feature, and it calls
the global list function which calls the ftrace callback
all three of these steps will do a recursion protection.
There's no reason to do one if the previous caller already
did. The recursion that we are protecting against will
go through the same steps again.

To prevent the multiple recursion checks, if a recursion
bit is set that is higher than the MAX bit of the current
check, then we know that the check was made by the previous
caller, and we can skip the current check.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ftrace.c |   40 +++++--------------
 kernel/trace/trace.h  |  106 ++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 110 insertions(+), 36 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fbf25e7..d1ceaf4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -158,25 +158,15 @@ ftrace_global_list_func(unsigned long ip, unsigned long parent_ip,
 {
 	int bit;
 
-	if (in_interrupt()) {
-		if (in_nmi())
-			bit = TRACE_GLOBAL_NMI_BIT;
-
-		else if (in_irq())
-			bit = TRACE_GLOBAL_IRQ_BIT;
-		else
-			bit = TRACE_GLOBAL_SIRQ_BIT;
-	} else
-		bit = TRACE_GLOBAL_BIT;
-
-	if (unlikely(trace_recursion_test(bit)))
+	bit = trace_test_and_set_recursion(TRACE_GLOBAL_START, TRACE_GLOBAL_MAX);
+	if (bit < 0)
 		return;
 
-	trace_recursion_set(bit);
 	do_for_each_ftrace_op(op, ftrace_global_list) {
 		op->func(ip, parent_ip, op, regs);
 	} while_for_each_ftrace_op(op);
-	trace_recursion_clear(bit);
+
+	trace_clear_recursion(bit);
 }
 
 static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip,
@@ -4145,26 +4135,14 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
 		       struct ftrace_ops *ignored, struct pt_regs *regs)
 {
 	struct ftrace_ops *op;
-	unsigned int bit;
+	int bit;
 
 	if (function_trace_stop)
 		return;
 
-	if (in_interrupt()) {
-		if (in_nmi())
-			bit = TRACE_INTERNAL_NMI_BIT;
-
-		else if (in_irq())
-			bit = TRACE_INTERNAL_IRQ_BIT;
-		else
-			bit = TRACE_INTERNAL_SIRQ_BIT;
-	} else
-		bit = TRACE_INTERNAL_BIT;
-
-	if (unlikely(trace_recursion_test(bit)))
-			return;
-
-	trace_recursion_set(bit);
+	bit = trace_test_and_set_recursion(TRACE_LIST_START, TRACE_LIST_MAX);
+	if (bit < 0)
+		return;
 
 	/*
 	 * Some of the ops may be dynamically allocated,
@@ -4176,7 +4154,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
 			op->func(ip, parent_ip, op, regs);
 	} while_for_each_ftrace_op(op);
 	preempt_enable_notrace();
-	trace_recursion_clear(bit);
+	trace_clear_recursion(bit);
 }
 
 /*
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 5a095d6..c203a51 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -297,18 +297,49 @@ struct tracer {
 /* Ring buffer has the 10 LSB bits to count */
 #define trace_recursion_buffer() ((current)->trace_recursion & 0x3ff)
 
-/* for function tracing recursion */
+/*
+ * For function tracing recursion:
+ *  The order of these bits are important.
+ *
+ *  When function tracing occurs, the following steps are made:
+ *   If arch does not support a ftrace feature:
+ *    call internal function (uses INTERNAL bits) which calls...
+ *   If callback is registered to the "global" list, the list
+ *    function is called and recursion checks the GLOBAL bits.
+ *    then this function calls...
+ *   The function callback, which can use the FTRACE bits to
+ *    check for recursion.
+ *
+ * Now if the arch does not suppport a feature, and it calls
+ * the global list function which calls the ftrace callback
+ * all three of these steps will do a recursion protection.
+ * There's no reason to do one if the previous caller already
+ * did. The recursion that we are protecting against will
+ * go through the same steps again.
+ *
+ * To prevent the multiple recursion checks, if a recursion
+ * bit is set that is higher than the MAX bit of the current
+ * check, then we know that the check was made by the previous
+ * caller, and we can skip the current check.
+ */
 enum {
-	TRACE_INTERNAL_BIT = 11,
-	TRACE_INTERNAL_NMI_BIT,
-	TRACE_INTERNAL_IRQ_BIT,
-	TRACE_INTERNAL_SIRQ_BIT,
+	TRACE_FTRACE_BIT = 11,
+	TRACE_FTRACE_NMI_BIT,
+	TRACE_FTRACE_IRQ_BIT,
+	TRACE_FTRACE_SIRQ_BIT,
 
+	/* GLOBAL_BITs must be greater than FTRACE_BITs */
 	TRACE_GLOBAL_BIT,
 	TRACE_GLOBAL_NMI_BIT,
 	TRACE_GLOBAL_IRQ_BIT,
 	TRACE_GLOBAL_SIRQ_BIT,
 
+	/* INTERNAL_BITs must be greater than GLOBAL_BITs */
+	TRACE_INTERNAL_BIT,
+	TRACE_INTERNAL_NMI_BIT,
+	TRACE_INTERNAL_IRQ_BIT,
+	TRACE_INTERNAL_SIRQ_BIT,
+
 	TRACE_CONTROL_BIT,
 
 /*
@@ -325,6 +356,71 @@ enum {
 #define trace_recursion_clear(bit)	do { (current)->trace_recursion &= ~(1<<(bit)); } while (0)
 #define trace_recursion_test(bit)	((current)->trace_recursion & (1<<(bit)))
 
+#define TRACE_CONTEXT_BITS	4
+
+#define TRACE_FTRACE_START	TRACE_FTRACE_BIT
+#define TRACE_FTRACE_MAX	((1 << (TRACE_FTRACE_START + TRACE_CONTEXT_BITS)) - 1)
+
+#define TRACE_GLOBAL_START	TRACE_GLOBAL_BIT
+#define TRACE_GLOBAL_MAX	((1 << (TRACE_GLOBAL_START + TRACE_CONTEXT_BITS)) - 1)
+
+#define TRACE_LIST_START	TRACE_INTERNAL_BIT
+#define TRACE_LIST_MAX		((1 << (TRACE_LIST_START + TRACE_CONTEXT_BITS)) - 1)
+
+#define TRACE_CONTEXT_MASK	TRACE_LIST_MAX
+
+static __always_inline int trace_get_context_bit(void)
+{
+	int bit;
+
+	if (in_interrupt()) {
+		if (in_nmi())
+			bit = 0;
+
+		else if (in_irq())
+			bit = 1;
+		else
+			bit = 2;
+	} else
+		bit = 3;
+
+	return bit;
+}
+
+static __always_inline int trace_test_and_set_recursion(int start, int max)
+{
+	unsigned int val = current->trace_recursion;
+	int bit;
+
+	/* A previous recursion check was made */
+	if ((val & TRACE_CONTEXT_MASK) > max)
+		return 0;
+
+	bit = trace_get_context_bit() + start;
+	if (unlikely(val & (1 << bit)))
+		return -1;
+
+	val |= 1 << bit;
+	current->trace_recursion = val;
+	barrier();
+
+	return bit;
+}
+
+static __always_inline void trace_clear_recursion(int bit)
+{
+	unsigned int val = current->trace_recursion;
+
+	if (!bit)
+		return;
+
+	bit = 1 << bit;
+	val &= ~bit;
+
+	barrier();
+	current->trace_recursion = val;
+}
+
 #define TRACE_PIPE_ALL_CPU	-1
 
 static inline struct ring_buffer_iter *
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 7/8][RFC] ftrace: Use only the preempt version of function tracing
  2012-11-15 15:30 [PATCH 0/8][RFC] ftrace: Optimizing the function tracer Steven Rostedt
                   ` (5 preceding siblings ...)
  2012-11-15 15:30 ` [PATCH 6/8][RFC] tracing: Avoid unnecessary multiple recursion checks Steven Rostedt
@ 2012-11-15 15:30 ` Steven Rostedt
  2012-11-15 15:30 ` [PATCH 8/8][RFC] ring-buffer: User context bit recursion checking Steven Rostedt
  7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2012-11-15 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: 0007-ftrace-Use-only-the-preempt-version-of-function-trac.patch --]
[-- Type: text/plain, Size: 4695 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The function tracer had two different versions of function tracing.

The disabling of irqs version and the preempt disable version.

As function tracing in very intrusive and can cause nasty recursion
issues, it has its own recursion protection. But the old method to
do this was a flat layer. If it detected that a recursion was happening
then it would just return without recording.

This made the preempt version (much faster than the irq disabling one)
not very useful, because if an interrupt were to occur after the
recursion flag was set, the interrupt would not be traced at all,
because every function that was traced would think it recursed on
itself (due to the context it preempted setting the recursive flag).

Now that we have a recursion flag for every context level, we
no longer need to worry about that. We can disable preemption,
set the current context recursion check bit, and go on. If an
interrupt were to come along, it would check its own context bit
and happily continue to trace.

As the preempt version is faster than the irq disable version,
there's no more reason to keep the preempt version around.
And the irq disable version still had an issue with missing
out on tracing NMI code.

Remove the irq disable function tracer version and have the
preempt disable version be the default (and only version).

Before this patch we had from running:

 # echo function > /debug/tracing/current_tracer
 # for i in `seq 10`; do ./hackbench 50; done
Time: 12.028
Time: 11.945
Time: 11.925
Time: 11.964
Time: 12.002
Time: 11.910
Time: 11.944
Time: 11.929
Time: 11.941
Time: 11.924

(average: 11.9512)

Now we have:

 # echo function > /debug/tracing/current_tracer
 # for i in `seq 10`; do ./hackbench 50; done
Time: 10.285
Time: 10.407
Time: 10.243
Time: 10.372
Time: 10.380
Time: 10.198
Time: 10.272
Time: 10.354
Time: 10.248
Time: 10.253

(average: 10.3012)

 a 13.8% savings!

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_functions.c |   61 +++++++++-------------------------------
 1 file changed, 14 insertions(+), 47 deletions(-)

diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index bb227e3..1621f45 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -47,34 +47,6 @@ static void function_trace_start(struct trace_array *tr)
 	tracing_reset_online_cpus(tr);
 }
 
-static void
-function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip,
-				 struct ftrace_ops *op, struct pt_regs *pt_regs)
-{
-	struct trace_array *tr = func_trace;
-	struct trace_array_cpu *data;
-	unsigned long flags;
-	long disabled;
-	int cpu;
-	int pc;
-
-	if (unlikely(!ftrace_function_enabled))
-		return;
-
-	pc = preempt_count();
-	preempt_disable_notrace();
-	local_save_flags(flags);
-	cpu = raw_smp_processor_id();
-	data = tr->data[cpu];
-	disabled = atomic_inc_return(&data->disabled);
-
-	if (likely(disabled == 1))
-		trace_function(tr, ip, parent_ip, flags, pc);
-
-	atomic_dec(&data->disabled);
-	preempt_enable_notrace();
-}
-
 /* Our option */
 enum {
 	TRACE_FUNC_OPT_STACK	= 0x1,
@@ -85,34 +57,34 @@ static struct tracer_flags func_flags;
 static void
 function_trace_call(unsigned long ip, unsigned long parent_ip,
 		    struct ftrace_ops *op, struct pt_regs *pt_regs)
-
 {
 	struct trace_array *tr = func_trace;
 	struct trace_array_cpu *data;
 	unsigned long flags;
-	long disabled;
+	unsigned int bit;
 	int cpu;
 	int pc;
 
 	if (unlikely(!ftrace_function_enabled))
 		return;
 
-	/*
-	 * Need to use raw, since this must be called before the
-	 * recursive protection is performed.
-	 */
-	local_irq_save(flags);
-	cpu = raw_smp_processor_id();
-	data = tr->data[cpu];
-	disabled = atomic_inc_return(&data->disabled);
+	pc = preempt_count();
+	preempt_disable_notrace();
 
-	if (likely(disabled == 1)) {
-		pc = preempt_count();
+	bit = trace_test_and_set_recursion(TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+	if (bit < 0)
+		goto out;
+
+	cpu = smp_processor_id();
+	data = tr->data[cpu];
+	if (!atomic_read(&data->disabled)) {
+		local_save_flags(flags);
 		trace_function(tr, ip, parent_ip, flags, pc);
 	}
+	trace_clear_recursion(bit);
 
-	atomic_dec(&data->disabled);
-	local_irq_restore(flags);
+ out:
+	preempt_enable_notrace();
 }
 
 static void
@@ -185,11 +157,6 @@ static void tracing_start_function_trace(void)
 {
 	ftrace_function_enabled = 0;
 
-	if (trace_flags & TRACE_ITER_PREEMPTONLY)
-		trace_ops.func = function_trace_call_preempt_only;
-	else
-		trace_ops.func = function_trace_call;
-
 	if (func_flags.val & TRACE_FUNC_OPT_STACK)
 		register_ftrace_function(&trace_stack_ops);
 	else
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 8/8][RFC] ring-buffer: User context bit recursion checking
  2012-11-15 15:30 [PATCH 0/8][RFC] ftrace: Optimizing the function tracer Steven Rostedt
                   ` (6 preceding siblings ...)
  2012-11-15 15:30 ` [PATCH 7/8][RFC] ftrace: Use only the preempt version of function tracing Steven Rostedt
@ 2012-11-15 15:30 ` Steven Rostedt
  7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2012-11-15 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: 0008-ring-buffer-User-context-bit-recursion-checking.patch --]
[-- Type: text/plain, Size: 4823 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Using context bit recursion checking, we can help increase the
performance of the ring buffer.

Before this patch:

 # echo function > /debug/tracing/current_tracer
 # for i in `seq 10`; do ./hackbench 50; done
Time: 10.285
Time: 10.407
Time: 10.243
Time: 10.372
Time: 10.380
Time: 10.198
Time: 10.272
Time: 10.354
Time: 10.248
Time: 10.253

(average: 10.3012)

Now we have:

 # echo function > /debug/tracing/current_tracer
 # for i in `seq 10`; do ./hackbench 50; done
Time: 9.712
Time: 9.824
Time: 9.861
Time: 9.827
Time: 9.962
Time: 9.905
Time: 9.886
Time: 10.088
Time: 9.861
Time: 9.834

(average: 9.876)

 a 4% savings!

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   85 +++++++++++++++++++++++++++++++-------------
 kernel/trace/trace.h       |   13 +++----
 2 files changed, 67 insertions(+), 31 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 3c7834c..e91877c 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2430,41 +2430,76 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 
 #ifdef CONFIG_TRACING
 
-#define TRACE_RECURSIVE_DEPTH 16
+/*
+ * The lock and unlock are done within a preempt disable section.
+ * The current_context per_cpu variable can only be modified
+ * by the current task between lock and unlock. But it can
+ * be modified more than once via an interrupt. To pass this
+ * information from the lock to the unlock without having to
+ * access the 'in_interrupt()' functions again (which do show
+ * a bit of overhead in something as critical as function tracing,
+ * we use a bitmask trick.
+ *
+ *  bit 0 =  NMI context
+ *  bit 1 =  IRQ context
+ *  bit 2 =  SoftIRQ context
+ *  bit 3 =  normal context.
+ *
+ * This works because this is the order of contexts that can
+ * preempt other contexts. A SoftIRQ never preempts an IRQ
+ * context.
+ *
+ * When the context is determined, the corresponding bit is
+ * checked and set (if it was set, then a recursion of that context
+ * happened).
+ *
+ * On unlock, we need to clear this bit. To do so, just subtract
+ * 1 from the current_context and AND it to itself.
+ *
+ * (binary)
+ *  101 - 1 = 100
+ *  101 & 100 = 100 (clearing bit zero)
+ *
+ *  1010 - 1 = 1001
+ *  1010 & 1001 = 1000 (clearing bit 1)
+ *
+ * The least significant bit can be cleared this way, and it
+ * just so happens that it is the same bit corresponding to
+ * the current context.
+ */
+static DEFINE_PER_CPU(unsigned int, current_context);
 
-/* Keep this code out of the fast path cache */
-static noinline void trace_recursive_fail(void)
+static __always_inline int trace_recursive_lock(void)
 {
-	/* Disable all tracing before we do anything else */
-	tracing_off_permanent();
-
-	printk_once(KERN_WARNING "Tracing recursion: depth[%ld]:"
-		    "HC[%lu]:SC[%lu]:NMI[%lu]\n",
-		    trace_recursion_buffer(),
-		    hardirq_count() >> HARDIRQ_SHIFT,
-		    softirq_count() >> SOFTIRQ_SHIFT,
-		    in_nmi());
+	unsigned int val = this_cpu_read(current_context);
+	int bit;
 
-	WARN_ON_ONCE(1);
-}
-
-static inline int trace_recursive_lock(void)
-{
-	trace_recursion_inc();
+	if (in_interrupt()) {
+		if (in_nmi())
+			bit = 0;
+		else if (in_irq())
+			bit = 1;
+		else
+			bit = 2;
+	} else
+		bit = 3;
 
-	if (likely(trace_recursion_buffer() < TRACE_RECURSIVE_DEPTH))
-		return 0;
+	if (unlikely(val & (1 << bit)))
+		return 1;
 
-	trace_recursive_fail();
+	val |= (1 << bit);
+	this_cpu_write(current_context, val);
 
-	return -1;
+	return 0;
 }
 
-static inline void trace_recursive_unlock(void)
+static __always_inline void trace_recursive_unlock(void)
 {
-	WARN_ON_ONCE(!trace_recursion_buffer());
+	unsigned int val = this_cpu_read(current_context);
 
-	trace_recursion_dec();
+	val--;
+	val &= this_cpu_read(current_context);
+	this_cpu_write(current_context, val);
 }
 
 #else
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c203a51..04a2c7a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -291,11 +291,6 @@ struct tracer {
 
 
 /* Only current can touch trace_recursion */
-#define trace_recursion_inc() do { (current)->trace_recursion++; } while (0)
-#define trace_recursion_dec() do { (current)->trace_recursion--; } while (0)
-
-/* Ring buffer has the 10 LSB bits to count */
-#define trace_recursion_buffer() ((current)->trace_recursion & 0x3ff)
 
 /*
  * For function tracing recursion:
@@ -323,7 +318,13 @@ struct tracer {
  * caller, and we can skip the current check.
  */
 enum {
-	TRACE_FTRACE_BIT = 11,
+	TRACE_BUFFER_BIT,
+	TRACE_BUFFER_NMI_BIT,
+	TRACE_BUFFER_IRQ_BIT,
+	TRACE_BUFFER_SIRQ_BIT,
+
+	/* Start of function recursion bits */
+	TRACE_FTRACE_BIT,
 	TRACE_FTRACE_NMI_BIT,
 	TRACE_FTRACE_IRQ_BIT,
 	TRACE_FTRACE_SIRQ_BIT,
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-11-15 15:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-15 15:30 [PATCH 0/8][RFC] ftrace: Optimizing the function tracer Steven Rostedt
2012-11-15 15:30 ` [PATCH 1/8][RFC] ftrace: Fix global function tracers that are not recursion safe Steven Rostedt
2012-11-15 15:30 ` [PATCH 2/8][RFC] ftrace: Fix function tracing recursion self test Steven Rostedt
2012-11-15 15:30 ` [PATCH 3/8][RFC] ftrace: Optimize the function tracer list loop Steven Rostedt
2012-11-15 15:30 ` [PATCH 4/8][RFC] ftrace: Add context level recursion bit checking Steven Rostedt
2012-11-15 15:30 ` [PATCH 5/8][RFC] tracing: Make the trace recursion bits into enums Steven Rostedt
2012-11-15 15:30 ` [PATCH 6/8][RFC] tracing: Avoid unnecessary multiple recursion checks Steven Rostedt
2012-11-15 15:30 ` [PATCH 7/8][RFC] ftrace: Use only the preempt version of function tracing Steven Rostedt
2012-11-15 15:30 ` [PATCH 8/8][RFC] ring-buffer: User context bit recursion checking Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).