linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
@ 2018-11-22  1:27 Steven Rostedt
  2018-11-22  1:27 ` [RFC][PATCH 01/14] fgraph: Create a fgraph.c file to store function graph infrastructure Steven Rostedt
                   ` (15 more replies)
  0 siblings, 16 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland


I talked with many of you at Plumbers about rewriting the function graph
tracer. Well, this is it. I was originally going to produce just a
proof of concept, but when I found that I had to fix a design flaw
and that covered all the arch code anyway, I decided to do more of a
RFC patch set.

I probably should add more comments to the code, and update the 
function graph design documentation, but I wanted to get this out
before the US Turkey day for your enjoyment while you try to let your
pants buckle again.

Why the rewrite?

Well the fuction graph tracer is arguably the strongest of the tracers.
It shows both the entrance and exit of a function, can give the timings
of a function, and shows the execution of the code quite nicely.

But it has one major flaw.

It can't let more than one user access it at a time. The function
tracer has had that feature for years now, but due to the design of
the function graph tracer it was difficult to implement. Why?

Because you must maintain the state of a three-tuple.

 Task, Function, Callback

The state is determined at by the entryfunc and must be passed to the
retfunc when the function being traced returns. But this is not an
easy task, as that state can be different for each task, each function
and each callback.

What's the solution? I use the shadow stack that is already being
used to store the function return addresses.

  A big thanks to Masami Hiramatsu for suggesting this idea!

For now, I only allow an 16 users of the function graph tracer at a time.
That should be more than enough. I create an array of 16 fgraph_ops
pointers. When a user registers their fgraph_ops to the function graph
tracer, it is assigned an index into that array, which will hold a pointer
to the fgraph_ops being registered.

On entry of the function, the array is iterated and each entryfunc of
the fgraph_ops in the array is called. If the entryfunc returns non-zero,
then the index of that fgraph_ops is pushed on the shadow stack (along
with the index to the "ret_stack entry" structure, for fast access
to it). If the entryfunc returns zero, then it is ignored. If at least
one function returned non-zero then the return of the traced function
will also be traced.

On the return of the function, the shadow stack is examined and all
the indexes that were pushed on the stack is read, and each fgraph_ops
retfunc is called in the reverse order.

When a fgraph_ops is unregistered, its index in the array is set to point
to a "stub" fgraph_ops that holds stub functions that just return
"0" for the entryfunc and does nothing for the retfunc. This is because
the retfunc may be called literally days after the entryfunc is called
and we want to be able to free the fgraph_ops that is unregistered.

Note, if another fgraph_ops is registered in the same location, its
retfunc may be called that was set by a previous fgraph_ops. This
is not a regression because that's what can happen today if you unregister
a callback from the current function_graph tracer and register another
one. If this is an issue, there are ways to solve it.

This patch series is based on top of the one I just sent out to fix
the design flaw:

  http://lkml.kernel.org/r/20181122002801.501220343@goodmis.org

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
ftrace/fgraph-multi

Head SHA1: b177713619ad49f3dacacfcc11ed110da73ac857


Steven Rostedt (VMware) (14):
      fgraph: Create a fgraph.c file to store function graph infrastructure
      fgraph: Have set_graph_notrace only affect function_graph tracer
      arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH
      function_graph: Remove the use of FTRACE_NOTRACE_DEPTH
      ftrace: Create new ftrace-internal.h header
      fgraph: Move function graph specific code into fgraph.c
      fgraph: Add new fgraph_ops structure to enable function graph hooks
      function_graph: Remove unused task_curr_ret_stack()
      function_graph: Move ftrace_graph_get_addr() to fgraph.c
      function_graph: Have profiler use new helper ftrace_graph_get_ret_stack()
      function_graph: Convert ret_stack to a series of longs
      function_graph: Add an array structure that will allow multiple callbacks
      function_graph: Allow multiple users to attach to function graph
      function_graph: Allow for more than one callback to be registered

----
 arch/arm64/kernel/stacktrace.c       |   3 -
 include/linux/ftrace.h               |  42 +-
 include/linux/sched.h                |   2 +-
 kernel/trace/Makefile                |   1 +
 kernel/trace/fgraph.c                | 816 +++++++++++++++++++++++++++++++++++
 kernel/trace/ftrace.c                | 469 ++------------------
 kernel/trace/ftrace_internal.h       |  75 ++++
 kernel/trace/trace.h                 |   6 +
 kernel/trace/trace_functions_graph.c | 329 ++------------
 kernel/trace/trace_irqsoff.c         |  10 +-
 kernel/trace/trace_sched_wakeup.c    |  10 +-
 kernel/trace/trace_selftest.c        |   8 +-
 12 files changed, 1016 insertions(+), 755 deletions(-)
 create mode 100644 kernel/trace/fgraph.c
 create mode 100644 kernel/trace/ftrace_internal.h

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC][PATCH 01/14] fgraph: Create a fgraph.c file to store function graph infrastructure
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-22  1:27 ` [RFC][PATCH 02/14] fgraph: Have set_graph_notrace only affect function_graph tracer Steven Rostedt
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

As the function graph infrastructure can be used by thing other than
tracing, moving the code to its own file out of the trace_functions_graph.c
code makes more sense.

The fgraph.c file will only contain the infrastructure required to hook into
functions and their return code.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/Makefile                |   1 +
 kernel/trace/fgraph.c                | 233 +++++++++++++++++++++++++++
 kernel/trace/trace_functions_graph.c | 221 -------------------------
 3 files changed, 234 insertions(+), 221 deletions(-)
 create mode 100644 kernel/trace/fgraph.c

diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index f81dadbc7c4a..c7ade7965464 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
 obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
 obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
+obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
 ifeq ($(CONFIG_BLOCK),y)
 obj-$(CONFIG_EVENT_TRACING) += blktrace.o
 endif
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
new file mode 100644
index 000000000000..e8fcf1b2b38c
--- /dev/null
+++ b/kernel/trace/fgraph.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Infrastructure to took into function calls and returns.
+ * Copyright (c) 2008-2009 Frederic Weisbecker <fweisbec@gmail.com>
+ * Mostly borrowed from function tracer which
+ * is Copyright (c) Steven Rostedt <srostedt@redhat.com>
+ *
+ * Highly modified by Steven Rostedt (VMware).
+ */
+#include <linux/ftrace.h>
+
+#include "trace.h"
+
+static bool kill_ftrace_graph;
+
+/**
+ * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
+ *
+ * ftrace_graph_stop() is called when a severe error is detected in
+ * the function graph tracing. This function is called by the critical
+ * paths of function graph to keep those paths from doing any more harm.
+ */
+bool ftrace_graph_is_dead(void)
+{
+	return kill_ftrace_graph;
+}
+
+/**
+ * ftrace_graph_stop - set to permanently disable function graph tracincg
+ *
+ * In case of an error int function graph tracing, this is called
+ * to try to keep function graph tracing from causing any more harm.
+ * Usually this is pretty severe and this is called to try to at least
+ * get a warning out to the user.
+ */
+void ftrace_graph_stop(void)
+{
+	kill_ftrace_graph = true;
+}
+
+/* Add a function return address to the trace stack on thread info.*/
+static int
+ftrace_push_return_trace(unsigned long ret, unsigned long func,
+			 unsigned long frame_pointer, unsigned long *retp)
+{
+	unsigned long long calltime;
+	int index;
+
+	if (unlikely(ftrace_graph_is_dead()))
+		return -EBUSY;
+
+	if (!current->ret_stack)
+		return -EBUSY;
+
+	/*
+	 * We must make sure the ret_stack is tested before we read
+	 * anything else.
+	 */
+	smp_rmb();
+
+	/* The return trace stack is full */
+	if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+		atomic_inc(&current->trace_overrun);
+		return -EBUSY;
+	}
+
+	/*
+	 * The curr_ret_stack is an index to ftrace return stack of
+	 * current task.  Its value should be in [0, FTRACE_RETFUNC_
+	 * DEPTH) when the function graph tracer is used.  To support
+	 * filtering out specific functions, it makes the index
+	 * negative by subtracting huge value (FTRACE_NOTRACE_DEPTH)
+	 * so when it sees a negative index the ftrace will ignore
+	 * the record.  And the index gets recovered when returning
+	 * from the filtered function by adding the FTRACE_NOTRACE_
+	 * DEPTH and then it'll continue to record functions normally.
+	 *
+	 * The curr_ret_stack is initialized to -1 and get increased
+	 * in this function.  So it can be less than -1 only if it was
+	 * filtered out via ftrace_graph_notrace_addr() which can be
+	 * set from set_graph_notrace file in tracefs by user.
+	 */
+	if (current->curr_ret_stack < -1)
+		return -EBUSY;
+
+	calltime = trace_clock_local();
+
+	index = ++current->curr_ret_stack;
+	if (ftrace_graph_notrace_addr(func))
+		current->curr_ret_stack -= FTRACE_NOTRACE_DEPTH;
+	barrier();
+	current->ret_stack[index].ret = ret;
+	current->ret_stack[index].func = func;
+	current->ret_stack[index].calltime = calltime;
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
+	current->ret_stack[index].fp = frame_pointer;
+#endif
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+	current->ret_stack[index].retp = retp;
+#endif
+	return 0;
+}
+
+int function_graph_enter(unsigned long ret, unsigned long func,
+			 unsigned long frame_pointer, unsigned long *retp)
+{
+	struct ftrace_graph_ent trace;
+
+	trace.func = func;
+	trace.depth = ++current->curr_ret_depth;
+
+	if (ftrace_push_return_trace(ret, func,
+				     frame_pointer, retp))
+		goto out;
+
+	/* Only trace if the calling function expects to */
+	if (!ftrace_graph_entry(&trace))
+		goto out_ret;
+
+	return 0;
+ out_ret:
+	current->curr_ret_stack--;
+ out:
+	current->curr_ret_depth--;
+	return -EBUSY;
+}
+
+/* Retrieve a function return address to the trace stack on thread info.*/
+static void
+ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
+			unsigned long frame_pointer)
+{
+	int index;
+
+	index = current->curr_ret_stack;
+
+	/*
+	 * A negative index here means that it's just returned from a
+	 * notrace'd function.  Recover index to get an original
+	 * return address.  See ftrace_push_return_trace().
+	 *
+	 * TODO: Need to check whether the stack gets corrupted.
+	 */
+	if (index < 0)
+		index += FTRACE_NOTRACE_DEPTH;
+
+	if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
+		ftrace_graph_stop();
+		WARN_ON(1);
+		/* Might as well panic, otherwise we have no where to go */
+		*ret = (unsigned long)panic;
+		return;
+	}
+
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
+	/*
+	 * The arch may choose to record the frame pointer used
+	 * and check it here to make sure that it is what we expect it
+	 * to be. If gcc does not set the place holder of the return
+	 * address in the frame pointer, and does a copy instead, then
+	 * the function graph trace will fail. This test detects this
+	 * case.
+	 *
+	 * Currently, x86_32 with optimize for size (-Os) makes the latest
+	 * gcc do the above.
+	 *
+	 * Note, -mfentry does not use frame pointers, and this test
+	 *  is not needed if CC_USING_FENTRY is set.
+	 */
+	if (unlikely(current->ret_stack[index].fp != frame_pointer)) {
+		ftrace_graph_stop();
+		WARN(1, "Bad frame pointer: expected %lx, received %lx\n"
+		     "  from func %ps return to %lx\n",
+		     current->ret_stack[index].fp,
+		     frame_pointer,
+		     (void *)current->ret_stack[index].func,
+		     current->ret_stack[index].ret);
+		*ret = (unsigned long)panic;
+		return;
+	}
+#endif
+
+	*ret = current->ret_stack[index].ret;
+	trace->func = current->ret_stack[index].func;
+	trace->calltime = current->ret_stack[index].calltime;
+	trace->overrun = atomic_read(&current->trace_overrun);
+	trace->depth = current->curr_ret_depth--;
+	/*
+	 * We still want to trace interrupts coming in if
+	 * max_depth is set to 1. Make sure the decrement is
+	 * seen before ftrace_graph_return.
+	 */
+	barrier();
+}
+
+/*
+ * Send the trace to the ring-buffer.
+ * @return the original return address.
+ */
+unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
+{
+	struct ftrace_graph_ret trace;
+	unsigned long ret;
+
+	ftrace_pop_return_trace(&trace, &ret, frame_pointer);
+	trace.rettime = trace_clock_local();
+	ftrace_graph_return(&trace);
+	/*
+	 * The ftrace_graph_return() may still access the current
+	 * ret_stack structure, we need to make sure the update of
+	 * curr_ret_stack is after that.
+	 */
+	barrier();
+	current->curr_ret_stack--;
+	/*
+	 * The curr_ret_stack can be less than -1 only if it was
+	 * filtered out and it's about to return from the function.
+	 * Recover the index and continue to trace normal functions.
+	 */
+	if (current->curr_ret_stack < -1) {
+		current->curr_ret_stack += FTRACE_NOTRACE_DEPTH;
+		return ret;
+	}
+
+	if (unlikely(!ret)) {
+		ftrace_graph_stop();
+		WARN_ON(1);
+		/* Might as well panic. What else to do? */
+		ret = (unsigned long)panic;
+	}
+
+	return ret;
+}
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 2561460d7baf..af1759cd6eab 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -16,33 +16,6 @@
 #include "trace.h"
 #include "trace_output.h"
 
-static bool kill_ftrace_graph;
-
-/**
- * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
- *
- * ftrace_graph_stop() is called when a severe error is detected in
- * the function graph tracing. This function is called by the critical
- * paths of function graph to keep those paths from doing any more harm.
- */
-bool ftrace_graph_is_dead(void)
-{
-	return kill_ftrace_graph;
-}
-
-/**
- * ftrace_graph_stop - set to permanently disable function graph tracincg
- *
- * In case of an error int function graph tracing, this is called
- * to try to keep function graph tracing from causing any more harm.
- * Usually this is pretty severe and this is called to try to at least
- * get a warning out to the user.
- */
-void ftrace_graph_stop(void)
-{
-	kill_ftrace_graph = true;
-}
-
 /* When set, irq functions will be ignored */
 static int ftrace_graph_skip_irqs;
 
@@ -117,200 +90,6 @@ static void
 print_graph_duration(struct trace_array *tr, unsigned long long duration,
 		     struct trace_seq *s, u32 flags);
 
-/* Add a function return address to the trace stack on thread info.*/
-static int
-ftrace_push_return_trace(unsigned long ret, unsigned long func,
-			 unsigned long frame_pointer, unsigned long *retp)
-{
-	unsigned long long calltime;
-	int index;
-
-	if (unlikely(ftrace_graph_is_dead()))
-		return -EBUSY;
-
-	if (!current->ret_stack)
-		return -EBUSY;
-
-	/*
-	 * We must make sure the ret_stack is tested before we read
-	 * anything else.
-	 */
-	smp_rmb();
-
-	/* The return trace stack is full */
-	if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
-		atomic_inc(&current->trace_overrun);
-		return -EBUSY;
-	}
-
-	/*
-	 * The curr_ret_stack is an index to ftrace return stack of
-	 * current task.  Its value should be in [0, FTRACE_RETFUNC_
-	 * DEPTH) when the function graph tracer is used.  To support
-	 * filtering out specific functions, it makes the index
-	 * negative by subtracting huge value (FTRACE_NOTRACE_DEPTH)
-	 * so when it sees a negative index the ftrace will ignore
-	 * the record.  And the index gets recovered when returning
-	 * from the filtered function by adding the FTRACE_NOTRACE_
-	 * DEPTH and then it'll continue to record functions normally.
-	 *
-	 * The curr_ret_stack is initialized to -1 and get increased
-	 * in this function.  So it can be less than -1 only if it was
-	 * filtered out via ftrace_graph_notrace_addr() which can be
-	 * set from set_graph_notrace file in tracefs by user.
-	 */
-	if (current->curr_ret_stack < -1)
-		return -EBUSY;
-
-	calltime = trace_clock_local();
-
-	index = ++current->curr_ret_stack;
-	if (ftrace_graph_notrace_addr(func))
-		current->curr_ret_stack -= FTRACE_NOTRACE_DEPTH;
-	barrier();
-	current->ret_stack[index].ret = ret;
-	current->ret_stack[index].func = func;
-	current->ret_stack[index].calltime = calltime;
-#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
-	current->ret_stack[index].fp = frame_pointer;
-#endif
-#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-	current->ret_stack[index].retp = retp;
-#endif
-	return 0;
-}
-
-int function_graph_enter(unsigned long ret, unsigned long func,
-			 unsigned long frame_pointer, unsigned long *retp)
-{
-	struct ftrace_graph_ent trace;
-
-	trace.func = func;
-	trace.depth = ++current->curr_ret_depth;
-
-	if (ftrace_push_return_trace(ret, func,
-				     frame_pointer, retp))
-		goto out;
-
-	/* Only trace if the calling function expects to */
-	if (!ftrace_graph_entry(&trace))
-		goto out_ret;
-
-	return 0;
- out_ret:
-	current->curr_ret_stack--;
- out:
-	current->curr_ret_depth--;
-	return -EBUSY;
-}
-
-/* Retrieve a function return address to the trace stack on thread info.*/
-static void
-ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
-			unsigned long frame_pointer)
-{
-	int index;
-
-	index = current->curr_ret_stack;
-
-	/*
-	 * A negative index here means that it's just returned from a
-	 * notrace'd function.  Recover index to get an original
-	 * return address.  See ftrace_push_return_trace().
-	 *
-	 * TODO: Need to check whether the stack gets corrupted.
-	 */
-	if (index < 0)
-		index += FTRACE_NOTRACE_DEPTH;
-
-	if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
-		ftrace_graph_stop();
-		WARN_ON(1);
-		/* Might as well panic, otherwise we have no where to go */
-		*ret = (unsigned long)panic;
-		return;
-	}
-
-#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
-	/*
-	 * The arch may choose to record the frame pointer used
-	 * and check it here to make sure that it is what we expect it
-	 * to be. If gcc does not set the place holder of the return
-	 * address in the frame pointer, and does a copy instead, then
-	 * the function graph trace will fail. This test detects this
-	 * case.
-	 *
-	 * Currently, x86_32 with optimize for size (-Os) makes the latest
-	 * gcc do the above.
-	 *
-	 * Note, -mfentry does not use frame pointers, and this test
-	 *  is not needed if CC_USING_FENTRY is set.
-	 */
-	if (unlikely(current->ret_stack[index].fp != frame_pointer)) {
-		ftrace_graph_stop();
-		WARN(1, "Bad frame pointer: expected %lx, received %lx\n"
-		     "  from func %ps return to %lx\n",
-		     current->ret_stack[index].fp,
-		     frame_pointer,
-		     (void *)current->ret_stack[index].func,
-		     current->ret_stack[index].ret);
-		*ret = (unsigned long)panic;
-		return;
-	}
-#endif
-
-	*ret = current->ret_stack[index].ret;
-	trace->func = current->ret_stack[index].func;
-	trace->calltime = current->ret_stack[index].calltime;
-	trace->overrun = atomic_read(&current->trace_overrun);
-	trace->depth = current->curr_ret_depth--;
-	/*
-	 * We still want to trace interrupts coming in if
-	 * max_depth is set to 1. Make sure the decrement is
-	 * seen before ftrace_graph_return.
-	 */
-	barrier();
-}
-
-/*
- * Send the trace to the ring-buffer.
- * @return the original return address.
- */
-unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
-{
-	struct ftrace_graph_ret trace;
-	unsigned long ret;
-
-	ftrace_pop_return_trace(&trace, &ret, frame_pointer);
-	trace.rettime = trace_clock_local();
-	ftrace_graph_return(&trace);
-	/*
-	 * The ftrace_graph_return() may still access the current
-	 * ret_stack structure, we need to make sure the update of
-	 * curr_ret_stack is after that.
-	 */
-	barrier();
-	current->curr_ret_stack--;
-	/*
-	 * The curr_ret_stack can be less than -1 only if it was
-	 * filtered out and it's about to return from the function.
-	 * Recover the index and continue to trace normal functions.
-	 */
-	if (current->curr_ret_stack < -1) {
-		current->curr_ret_stack += FTRACE_NOTRACE_DEPTH;
-		return ret;
-	}
-
-	if (unlikely(!ret)) {
-		ftrace_graph_stop();
-		WARN_ON(1);
-		/* Might as well panic. What else to do? */
-		ret = (unsigned long)panic;
-	}
-
-	return ret;
-}
-
 /**
  * ftrace_graph_ret_addr - convert a potentially modified stack return address
  *			   to its original value
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 02/14] fgraph: Have set_graph_notrace only affect function_graph tracer
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
  2018-11-22  1:27 ` [RFC][PATCH 01/14] fgraph: Create a fgraph.c file to store function graph infrastructure Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-23  0:01   ` Namhyung Kim
  2018-11-22  1:27 ` [RFC][PATCH 03/14] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH Steven Rostedt
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

In order to make the function graph infrastructure more generic, there can
not be code specific for the function_graph tracer in the generic code. This
includes the set_graph_notrace logic, that stops all graph calls when a
function in the set_graph_notrace is hit.

By using the trace_recursion mask, we can use a bit in the current
task_struct to implement the notrace code, and move the logic out of
fgraph.c and into trace_functions_graph.c and keeps it affecting only the
tracer and not all call graph callbacks.

Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/fgraph.c                | 21 ---------------------
 kernel/trace/trace.h                 |  6 ++++++
 kernel/trace/trace_functions_graph.c | 21 +++++++++++++++++++++
 3 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index e8fcf1b2b38c..c684968b87e7 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -64,30 +64,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
 		return -EBUSY;
 	}
 
-	/*
-	 * The curr_ret_stack is an index to ftrace return stack of
-	 * current task.  Its value should be in [0, FTRACE_RETFUNC_
-	 * DEPTH) when the function graph tracer is used.  To support
-	 * filtering out specific functions, it makes the index
-	 * negative by subtracting huge value (FTRACE_NOTRACE_DEPTH)
-	 * so when it sees a negative index the ftrace will ignore
-	 * the record.  And the index gets recovered when returning
-	 * from the filtered function by adding the FTRACE_NOTRACE_
-	 * DEPTH and then it'll continue to record functions normally.
-	 *
-	 * The curr_ret_stack is initialized to -1 and get increased
-	 * in this function.  So it can be less than -1 only if it was
-	 * filtered out via ftrace_graph_notrace_addr() which can be
-	 * set from set_graph_notrace file in tracefs by user.
-	 */
-	if (current->curr_ret_stack < -1)
-		return -EBUSY;
-
 	calltime = trace_clock_local();
 
 	index = ++current->curr_ret_stack;
-	if (ftrace_graph_notrace_addr(func))
-		current->curr_ret_stack -= FTRACE_NOTRACE_DEPTH;
 	barrier();
 	current->ret_stack[index].ret = ret;
 	current->ret_stack[index].func = func;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3b8c0e24ab30..f3ad85830961 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -512,6 +512,12 @@ enum {
  * can only be modified by current, we can reuse trace_recursion.
  */
 	TRACE_IRQ_BIT,
+/*
+ * To implement set_graph_notrace, if this bit is set, we ignore
+ * function graph tracing of called functions, until the return
+ * function is called to clear it.
+ */
+	TRACE_GRAPH_NOTRACE_BIT,
 };
 
 #define trace_recursion_set(bit)	do { (current)->trace_recursion |= (1<<(bit)); } while (0)
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index af1759cd6eab..4748dc1bf5e1 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -188,6 +188,18 @@ int trace_graph_entry(struct ftrace_graph_ent *trace)
 	int cpu;
 	int pc;
 
+	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
+		return 0;
+
+	if (ftrace_graph_notrace_addr(trace->func)) {
+		trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
+		/*
+		 * Need to return 1 to have the return called
+		 * that will clear the NOTRACE bit.
+		 */
+		return 1;
+	}
+
 	if (!ftrace_trace_task(tr))
 		return 0;
 
@@ -288,6 +300,11 @@ void trace_graph_return(struct ftrace_graph_ret *trace)
 	int cpu;
 	int pc;
 
+	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
+		trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+		return;
+	}
+
 	local_irq_save(flags);
 	cpu = raw_smp_processor_id();
 	data = per_cpu_ptr(tr->trace_buffer.data, cpu);
@@ -311,6 +328,10 @@ void set_graph_array(struct trace_array *tr)
 
 static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
 {
+	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
+		trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+		return;
+	}
 	if (tracing_thresh &&
 	    (trace->rettime - trace->calltime < tracing_thresh))
 		return;
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 03/14] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
  2018-11-22  1:27 ` [RFC][PATCH 01/14] fgraph: Create a fgraph.c file to store function graph infrastructure Steven Rostedt
  2018-11-22  1:27 ` [RFC][PATCH 02/14] fgraph: Have set_graph_notrace only affect function_graph tracer Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-27 19:31   ` Will Deacon
  2018-11-22  1:27 ` [RFC][PATCH 04/14] function_graph: Remove the " Steven Rostedt
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Catalin Marinas,
	Will Deacon, linux-arm-kernel

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The curr_ret_stack is no longer set to -1 when not tracing a function. It is
now done differently, and the FTRACE_NOTRACE_DEPTH value is no longer used.
Remove the reference to it.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 arch/arm64/kernel/stacktrace.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 4989f7ea1e59..7723dadf25be 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -61,9 +61,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 			(frame->pc == (unsigned long)return_to_handler)) {
 		if (WARN_ON_ONCE(frame->graph == -1))
 			return -EINVAL;
-		if (frame->graph < -1)
-			frame->graph += FTRACE_NOTRACE_DEPTH;
-
 		/*
 		 * This is a case where function graph tracer has
 		 * modified a return address (LR) in a stack frame
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 04/14] function_graph: Remove the use of FTRACE_NOTRACE_DEPTH
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (2 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 03/14] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-22  1:27 ` [RFC][PATCH 05/14] ftrace: Create new ftrace-internal.h header Steven Rostedt
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The curr_ret_stack is no longer set to a negative value when a function is
not to be traced by the function graph tracer. Remove the usage of
FTRACE_NOTRACE_DEPTH, as it is no longer needed.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h               |  1 -
 kernel/trace/fgraph.c                | 19 -------------------
 kernel/trace/trace_functions_graph.c | 11 -----------
 3 files changed, 31 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index dd16e8218db3..f98063e273e5 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -790,7 +790,6 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
  */
 #define __notrace_funcgraph		notrace
 
-#define FTRACE_NOTRACE_DEPTH 65536
 #define FTRACE_RETFUNC_DEPTH 50
 #define FTRACE_RETSTACK_ALLOC_SIZE 32
 extern int register_ftrace_graph(trace_func_graph_ret_t retfunc,
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index c684968b87e7..5b0d554dbf65 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -113,16 +113,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 
 	index = current->curr_ret_stack;
 
-	/*
-	 * A negative index here means that it's just returned from a
-	 * notrace'd function.  Recover index to get an original
-	 * return address.  See ftrace_push_return_trace().
-	 *
-	 * TODO: Need to check whether the stack gets corrupted.
-	 */
-	if (index < 0)
-		index += FTRACE_NOTRACE_DEPTH;
-
 	if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
 		ftrace_graph_stop();
 		WARN_ON(1);
@@ -191,15 +181,6 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 	 */
 	barrier();
 	current->curr_ret_stack--;
-	/*
-	 * The curr_ret_stack can be less than -1 only if it was
-	 * filtered out and it's about to return from the function.
-	 * Recover the index and continue to trace normal functions.
-	 */
-	if (current->curr_ret_stack < -1) {
-		current->curr_ret_stack += FTRACE_NOTRACE_DEPTH;
-		return ret;
-	}
 
 	if (unlikely(!ret)) {
 		ftrace_graph_stop();
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 4748dc1bf5e1..0e0ff08357cf 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -115,9 +115,6 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 	if (ret != (unsigned long)return_to_handler)
 		return ret;
 
-	if (index < -1)
-		index += FTRACE_NOTRACE_DEPTH;
-
 	if (index < 0)
 		return ret;
 
@@ -670,10 +667,6 @@ print_graph_entry_leaf(struct trace_iterator *iter,
 
 		cpu_data = per_cpu_ptr(data->cpu_data, cpu);
 
-		/* If a graph tracer ignored set_graph_notrace */
-		if (call->depth < -1)
-			call->depth += FTRACE_NOTRACE_DEPTH;
-
 		/*
 		 * Comments display at + 1 to depth. Since
 		 * this is a leaf function, keep the comments
@@ -716,10 +709,6 @@ print_graph_entry_nested(struct trace_iterator *iter,
 		struct fgraph_cpu_data *cpu_data;
 		int cpu = iter->cpu;
 
-		/* If a graph tracer ignored set_graph_notrace */
-		if (call->depth < -1)
-			call->depth += FTRACE_NOTRACE_DEPTH;
-
 		cpu_data = per_cpu_ptr(data->cpu_data, cpu);
 		cpu_data->depth = call->depth;
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 05/14] ftrace: Create new ftrace-internal.h header
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (3 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 04/14] function_graph: Remove the " Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-22  1:27 ` [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c Steven Rostedt
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

In order to move function graph infrastructure into its own file (fgraph.h)
it needs to access various functions and variables in ftrace.c that are
currently static. Create a new file called ftrace-internal.h that holds the
function prototypes and the extern declarations of the variables needed by
fgraph.c as well, and make them global in ftrace.c such that they can be
used outside that file.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/ftrace.c          | 74 ++++++---------------------------
 kernel/trace/ftrace_internal.h | 75 ++++++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+), 61 deletions(-)
 create mode 100644 kernel/trace/ftrace_internal.h

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 77734451cb05..6e407e8278f1 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -40,6 +40,7 @@
 #include <asm/sections.h>
 #include <asm/setup.h>
 
+#include "ftrace-internal.h"
 #include "trace_output.h"
 #include "trace_stat.h"
 
@@ -77,7 +78,7 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
-static struct ftrace_ops ftrace_list_end __read_mostly = {
+struct ftrace_ops ftrace_list_end __read_mostly = {
 	.func		= ftrace_stub,
 	.flags		= FTRACE_OPS_FL_RECURSION_SAFE | FTRACE_OPS_FL_STUB,
 	INIT_OPS_HASH(ftrace_list_end)
@@ -112,11 +113,11 @@ static void ftrace_update_trampoline(struct ftrace_ops *ops);
  */
 static int ftrace_disabled __read_mostly;
 
-static DEFINE_MUTEX(ftrace_lock);
+DEFINE_MUTEX(ftrace_lock);
 
-static struct ftrace_ops __rcu *ftrace_ops_list __read_mostly = &ftrace_list_end;
+struct ftrace_ops __rcu *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
-static struct ftrace_ops global_ops;
+struct ftrace_ops global_ops;
 
 #if ARCH_SUPPORTS_FTRACE_OPS
 static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
@@ -127,26 +128,6 @@ static void ftrace_ops_no_ops(unsigned long ip, unsigned long parent_ip);
 #define ftrace_ops_list_func ((ftrace_func_t)ftrace_ops_no_ops)
 #endif
 
-/*
- * Traverse the ftrace_global_list, invoking all entries.  The reason that we
- * can use rcu_dereference_raw_notrace() is that elements removed from this list
- * are simply leaked, so there is no need to interact with a grace-period
- * mechanism.  The rcu_dereference_raw_notrace() calls are needed to handle
- * concurrent insertions into the ftrace_global_list.
- *
- * Silly Alpha and silly pointer-speculation compiler optimizations!
- */
-#define do_for_each_ftrace_op(op, list)			\
-	op = rcu_dereference_raw_notrace(list);			\
-	do
-
-/*
- * Optimized for just a single item in the list (as that is the normal case).
- */
-#define while_for_each_ftrace_op(op)				\
-	while (likely(op = rcu_dereference_raw_notrace((op)->next)) &&	\
-	       unlikely((op) != &ftrace_list_end))
-
 static inline void ftrace_ops_init(struct ftrace_ops *ops)
 {
 #ifdef CONFIG_DYNAMIC_FTRACE
@@ -187,17 +168,11 @@ static void ftrace_sync_ipi(void *data)
 }
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static void update_function_graph_func(void);
-
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 static bool fgraph_graph_time = true;
-
-#else
-static inline void update_function_graph_func(void) { }
 #endif
 
-
 static ftrace_func_t ftrace_ops_get_list_func(struct ftrace_ops *ops)
 {
 	/*
@@ -334,7 +309,7 @@ static int remove_ftrace_ops(struct ftrace_ops __rcu **list,
 
 static void ftrace_update_trampoline(struct ftrace_ops *ops);
 
-static int __register_ftrace_function(struct ftrace_ops *ops)
+int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ops->flags & FTRACE_OPS_FL_DELETED)
 		return -EINVAL;
@@ -375,7 +350,7 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	return 0;
 }
 
-static int __unregister_ftrace_function(struct ftrace_ops *ops)
+int __unregister_ftrace_function(struct ftrace_ops *ops)
 {
 	int ret;
 
@@ -1022,9 +997,7 @@ static __init void ftrace_profile_tracefs(struct dentry *d_tracer)
 #endif /* CONFIG_FUNCTION_PROFILER */
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static int ftrace_graph_active;
-#else
-# define ftrace_graph_active 0
+int ftrace_graph_active;
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
@@ -1067,7 +1040,7 @@ static const struct ftrace_hash empty_hash = {
 };
 #define EMPTY_HASH	((struct ftrace_hash *)&empty_hash)
 
-static struct ftrace_ops global_ops = {
+struct ftrace_ops global_ops = {
 	.func				= ftrace_stub,
 	.local_hash.notrace_hash	= EMPTY_HASH,
 	.local_hash.filter_hash		= EMPTY_HASH,
@@ -1503,7 +1476,7 @@ static bool hash_contains_ip(unsigned long ip,
  * This needs to be called with preemption disabled as
  * the hashes are freed with call_rcu_sched().
  */
-static int
+int
 ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
 {
 	struct ftrace_ops_hash hash;
@@ -2682,7 +2655,7 @@ static void ftrace_startup_all(int command)
 	update_all_ops = false;
 }
 
-static int ftrace_startup(struct ftrace_ops *ops, int command)
+int ftrace_startup(struct ftrace_ops *ops, int command)
 {
 	int ret;
 
@@ -2724,7 +2697,7 @@ static int ftrace_startup(struct ftrace_ops *ops, int command)
 	return 0;
 }
 
-static int ftrace_shutdown(struct ftrace_ops *ops, int command)
+int ftrace_shutdown(struct ftrace_ops *ops, int command)
 {
 	int ret;
 
@@ -6194,31 +6167,10 @@ core_initcall(ftrace_nodyn_init);
 static inline int ftrace_init_dyn_tracefs(struct dentry *d_tracer) { return 0; }
 static inline void ftrace_startup_enable(int command) { }
 static inline void ftrace_startup_all(int command) { }
-/* Keep as macros so we do not need to define the commands */
-# define ftrace_startup(ops, command)					\
-	({								\
-		int ___ret = __register_ftrace_function(ops);		\
-		if (!___ret)						\
-			(ops)->flags |= FTRACE_OPS_FL_ENABLED;		\
-		___ret;							\
-	})
-# define ftrace_shutdown(ops, command)					\
-	({								\
-		int ___ret = __unregister_ftrace_function(ops);		\
-		if (!___ret)						\
-			(ops)->flags &= ~FTRACE_OPS_FL_ENABLED;		\
-		___ret;							\
-	})
 
 # define ftrace_startup_sysctl()	do { } while (0)
 # define ftrace_shutdown_sysctl()	do { } while (0)
 
-static inline int
-ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
-{
-	return 1;
-}
-
 static void ftrace_update_trampoline(struct ftrace_ops *ops)
 {
 }
@@ -6930,7 +6882,7 @@ static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
  * function against the global ops, and not just trace any function
  * that any ftrace_ops registered.
  */
-static void update_function_graph_func(void)
+void update_function_graph_func(void)
 {
 	struct ftrace_ops *op;
 	bool do_test = false;
diff --git a/kernel/trace/ftrace_internal.h b/kernel/trace/ftrace_internal.h
new file mode 100644
index 000000000000..0515a2096f90
--- /dev/null
+++ b/kernel/trace/ftrace_internal.h
@@ -0,0 +1,75 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_KERNEL_FTRACE_INTERNAL_H
+#define  _LINUX_KERNEL_FTRACE_INTERNAL_H
+
+#ifdef CONFIG_FUNCTION_TRACER
+
+/*
+ * Traverse the ftrace_global_list, invoking all entries.  The reason that we
+ * can use rcu_dereference_raw_notrace() is that elements removed from this list
+ * are simply leaked, so there is no need to interact with a grace-period
+ * mechanism.  The rcu_dereference_raw_notrace() calls are needed to handle
+ * concurrent insertions into the ftrace_global_list.
+ *
+ * Silly Alpha and silly pointer-speculation compiler optimizations!
+ */
+#define do_for_each_ftrace_op(op, list)			\
+	op = rcu_dereference_raw_notrace(list);			\
+	do
+
+/*
+ * Optimized for just a single item in the list (as that is the normal case).
+ */
+#define while_for_each_ftrace_op(op)				\
+	while (likely(op = rcu_dereference_raw_notrace((op)->next)) &&	\
+	       unlikely((op) != &ftrace_list_end))
+
+extern struct ftrace_ops __rcu *ftrace_ops_list;
+extern struct ftrace_ops ftrace_list_end;
+extern struct mutex ftrace_lock;
+extern struct ftrace_ops global_ops;
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+
+int ftrace_startup(struct ftrace_ops *ops, int command);
+int ftrace_shutdown(struct ftrace_ops *ops, int command);
+int ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs);
+
+#else /* !CONFIG_DYNAMIC_FTRACE */
+
+int __register_ftrace_function(struct ftrace_ops *ops);
+int __unregister_ftrace_function(struct ftrace_ops *ops);
+/* Keep as macros so we do not need to define the commands */
+# define ftrace_startup(ops, command)					\
+	({								\
+		int ___ret = __register_ftrace_function(ops);		\
+		if (!___ret)						\
+			(ops)->flags |= FTRACE_OPS_FL_ENABLED;		\
+		___ret;							\
+	})
+# define ftrace_shutdown(ops, command)					\
+	({								\
+		int ___ret = __unregister_ftrace_function(ops);		\
+		if (!___ret)						\
+			(ops)->flags &= ~FTRACE_OPS_FL_ENABLED;		\
+		___ret;							\
+	})
+static inline int
+ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
+{
+	return 1;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE */
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+extern int ftrace_graph_active;
+void update_function_graph_func(void);
+#else /* !CONFIG_FUNCTION_GRAPH_TRACER */
+# define ftrace_graph_active 0
+static inline void update_function_graph_func(void) { }
+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+#else /* !CONFIG_FUNCTION_TRACER */
+#endif /* CONFIG_FUNCTION_TRACER */
+
+#endif
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (4 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 05/14] ftrace: Create new ftrace-internal.h header Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-23  6:11   ` Joel Fernandes
  2018-11-22  1:27 ` [RFC][PATCH 07/14] fgraph: Add new fgraph_ops structure to enable function graph hooks Steven Rostedt
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

To make the function graph infrastructure more managable, the code needs to
be in its own file (fgraph.c). Move the code that is specific for managing
the function graph infrastructure out of ftrace.c and into fgraph.c

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/fgraph.c | 360 ++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/ftrace.c | 368 +-----------------------------------------
 2 files changed, 366 insertions(+), 362 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 5b0d554dbf65..b9c7dbbbdd96 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -7,11 +7,27 @@
  *
  * Highly modified by Steven Rostedt (VMware).
  */
+#include <linux/suspend.h>
 #include <linux/ftrace.h>
+#include <linux/slab.h>
 
-#include "trace.h"
+#include <trace/events/sched.h>
+
+#include "ftrace-internal.h"
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+#define ASSIGN_OPS_HASH(opsname, val) \
+	.func_hash		= val, \
+	.local_hash.regex_lock	= __MUTEX_INITIALIZER(opsname.local_hash.regex_lock),
+#else
+#define ASSIGN_OPS_HASH(opsname, val)
+#endif
 
 static bool kill_ftrace_graph;
+int ftrace_graph_active;
+
+/* Both enabled by default (can be cleared by function_graph tracer flags */
+static bool fgraph_sleep_time = true;
 
 /**
  * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
@@ -162,6 +178,31 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 	barrier();
 }
 
+/*
+ * Hibernation protection.
+ * The state of the current task is too much unstable during
+ * suspend/restore to disk. We want to protect against that.
+ */
+static int
+ftrace_suspend_notifier_call(struct notifier_block *bl, unsigned long state,
+							void *unused)
+{
+	switch (state) {
+	case PM_HIBERNATION_PREPARE:
+		pause_graph_tracing();
+		break;
+
+	case PM_POST_HIBERNATION:
+		unpause_graph_tracing();
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block ftrace_suspend_notifier = {
+	.notifier_call = ftrace_suspend_notifier_call,
+};
+
 /*
  * Send the trace to the ring-buffer.
  * @return the original return address.
@@ -191,3 +232,320 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 
 	return ret;
 }
+
+static struct ftrace_ops graph_ops = {
+	.func			= ftrace_stub,
+	.flags			= FTRACE_OPS_FL_RECURSION_SAFE |
+				   FTRACE_OPS_FL_INITIALIZED |
+				   FTRACE_OPS_FL_PID |
+				   FTRACE_OPS_FL_STUB,
+#ifdef FTRACE_GRAPH_TRAMP_ADDR
+	.trampoline		= FTRACE_GRAPH_TRAMP_ADDR,
+	/* trampoline_size is only needed for dynamically allocated tramps */
+#endif
+	ASSIGN_OPS_HASH(graph_ops, &global_ops.local_hash)
+};
+
+void ftrace_graph_sleep_time_control(bool enable)
+{
+	fgraph_sleep_time = enable;
+}
+
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+{
+	return 0;
+}
+
+/* The callbacks that hook a function */
+trace_func_graph_ret_t ftrace_graph_return =
+			(trace_func_graph_ret_t)ftrace_stub;
+trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
+static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
+
+/* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
+static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
+{
+	int i;
+	int ret = 0;
+	int start = 0, end = FTRACE_RETSTACK_ALLOC_SIZE;
+	struct task_struct *g, *t;
+
+	for (i = 0; i < FTRACE_RETSTACK_ALLOC_SIZE; i++) {
+		ret_stack_list[i] =
+			kmalloc_array(FTRACE_RETFUNC_DEPTH,
+				      sizeof(struct ftrace_ret_stack),
+				      GFP_KERNEL);
+		if (!ret_stack_list[i]) {
+			start = 0;
+			end = i;
+			ret = -ENOMEM;
+			goto free;
+		}
+	}
+
+	read_lock(&tasklist_lock);
+	do_each_thread(g, t) {
+		if (start == end) {
+			ret = -EAGAIN;
+			goto unlock;
+		}
+
+		if (t->ret_stack == NULL) {
+			atomic_set(&t->tracing_graph_pause, 0);
+			atomic_set(&t->trace_overrun, 0);
+			t->curr_ret_stack = -1;
+			t->curr_ret_depth = -1;
+			/* Make sure the tasks see the -1 first: */
+			smp_wmb();
+			t->ret_stack = ret_stack_list[start++];
+		}
+	} while_each_thread(g, t);
+
+unlock:
+	read_unlock(&tasklist_lock);
+free:
+	for (i = start; i < end; i++)
+		kfree(ret_stack_list[i]);
+	return ret;
+}
+
+static void
+ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
+			struct task_struct *prev, struct task_struct *next)
+{
+	unsigned long long timestamp;
+	int index;
+
+	/*
+	 * Does the user want to count the time a function was asleep.
+	 * If so, do not update the time stamps.
+	 */
+	if (fgraph_sleep_time)
+		return;
+
+	timestamp = trace_clock_local();
+
+	prev->ftrace_timestamp = timestamp;
+
+	/* only process tasks that we timestamped */
+	if (!next->ftrace_timestamp)
+		return;
+
+	/*
+	 * Update all the counters in next to make up for the
+	 * time next was sleeping.
+	 */
+	timestamp -= next->ftrace_timestamp;
+
+	for (index = next->curr_ret_stack; index >= 0; index--)
+		next->ret_stack[index].calltime += timestamp;
+}
+
+static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
+{
+	if (!ftrace_ops_test(&global_ops, trace->func, NULL))
+		return 0;
+	return __ftrace_graph_entry(trace);
+}
+
+/*
+ * The function graph tracer should only trace the functions defined
+ * by set_ftrace_filter and set_ftrace_notrace. If another function
+ * tracer ops is registered, the graph tracer requires testing the
+ * function against the global ops, and not just trace any function
+ * that any ftrace_ops registered.
+ */
+void update_function_graph_func(void)
+{
+	struct ftrace_ops *op;
+	bool do_test = false;
+
+	/*
+	 * The graph and global ops share the same set of functions
+	 * to test. If any other ops is on the list, then
+	 * the graph tracing needs to test if its the function
+	 * it should call.
+	 */
+	do_for_each_ftrace_op(op, ftrace_ops_list) {
+		if (op != &global_ops && op != &graph_ops &&
+		    op != &ftrace_list_end) {
+			do_test = true;
+			/* in double loop, break out with goto */
+			goto out;
+		}
+	} while_for_each_ftrace_op(op);
+ out:
+	if (do_test)
+		ftrace_graph_entry = ftrace_graph_entry_test;
+	else
+		ftrace_graph_entry = __ftrace_graph_entry;
+}
+
+static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack);
+
+static void
+graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
+{
+	atomic_set(&t->tracing_graph_pause, 0);
+	atomic_set(&t->trace_overrun, 0);
+	t->ftrace_timestamp = 0;
+	/* make curr_ret_stack visible before we add the ret_stack */
+	smp_wmb();
+	t->ret_stack = ret_stack;
+}
+
+/*
+ * Allocate a return stack for the idle task. May be the first
+ * time through, or it may be done by CPU hotplug online.
+ */
+void ftrace_graph_init_idle_task(struct task_struct *t, int cpu)
+{
+	t->curr_ret_stack = -1;
+	t->curr_ret_depth = -1;
+	/*
+	 * The idle task has no parent, it either has its own
+	 * stack or no stack at all.
+	 */
+	if (t->ret_stack)
+		WARN_ON(t->ret_stack != per_cpu(idle_ret_stack, cpu));
+
+	if (ftrace_graph_active) {
+		struct ftrace_ret_stack *ret_stack;
+
+		ret_stack = per_cpu(idle_ret_stack, cpu);
+		if (!ret_stack) {
+			ret_stack =
+				kmalloc_array(FTRACE_RETFUNC_DEPTH,
+					      sizeof(struct ftrace_ret_stack),
+					      GFP_KERNEL);
+			if (!ret_stack)
+				return;
+			per_cpu(idle_ret_stack, cpu) = ret_stack;
+		}
+		graph_init_task(t, ret_stack);
+	}
+}
+
+/* Allocate a return stack for newly created task */
+void ftrace_graph_init_task(struct task_struct *t)
+{
+	/* Make sure we do not use the parent ret_stack */
+	t->ret_stack = NULL;
+	t->curr_ret_stack = -1;
+	t->curr_ret_depth = -1;
+
+	if (ftrace_graph_active) {
+		struct ftrace_ret_stack *ret_stack;
+
+		ret_stack = kmalloc_array(FTRACE_RETFUNC_DEPTH,
+					  sizeof(struct ftrace_ret_stack),
+					  GFP_KERNEL);
+		if (!ret_stack)
+			return;
+		graph_init_task(t, ret_stack);
+	}
+}
+
+void ftrace_graph_exit_task(struct task_struct *t)
+{
+	struct ftrace_ret_stack	*ret_stack = t->ret_stack;
+
+	t->ret_stack = NULL;
+	/* NULL must become visible to IRQs before we free it: */
+	barrier();
+
+	kfree(ret_stack);
+}
+
+/* Allocate a return stack for each task */
+static int start_graph_tracing(void)
+{
+	struct ftrace_ret_stack **ret_stack_list;
+	int ret, cpu;
+
+	ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE,
+				       sizeof(struct ftrace_ret_stack *),
+				       GFP_KERNEL);
+
+	if (!ret_stack_list)
+		return -ENOMEM;
+
+	/* The cpu_boot init_task->ret_stack will never be freed */
+	for_each_online_cpu(cpu) {
+		if (!idle_task(cpu)->ret_stack)
+			ftrace_graph_init_idle_task(idle_task(cpu), cpu);
+	}
+
+	do {
+		ret = alloc_retstack_tasklist(ret_stack_list);
+	} while (ret == -EAGAIN);
+
+	if (!ret) {
+		ret = register_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
+		if (ret)
+			pr_info("ftrace_graph: Couldn't activate tracepoint"
+				" probe to kernel_sched_switch\n");
+	}
+
+	kfree(ret_stack_list);
+	return ret;
+}
+
+int register_ftrace_graph(trace_func_graph_ret_t retfunc,
+			trace_func_graph_ent_t entryfunc)
+{
+	int ret = 0;
+
+	mutex_lock(&ftrace_lock);
+
+	/* we currently allow only one tracer registered at a time */
+	if (ftrace_graph_active) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	register_pm_notifier(&ftrace_suspend_notifier);
+
+	ftrace_graph_active++;
+	ret = start_graph_tracing();
+	if (ret) {
+		ftrace_graph_active--;
+		goto out;
+	}
+
+	ftrace_graph_return = retfunc;
+
+	/*
+	 * Update the indirect function to the entryfunc, and the
+	 * function that gets called to the entry_test first. Then
+	 * call the update fgraph entry function to determine if
+	 * the entryfunc should be called directly or not.
+	 */
+	__ftrace_graph_entry = entryfunc;
+	ftrace_graph_entry = ftrace_graph_entry_test;
+	update_function_graph_func();
+
+	ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+out:
+	mutex_unlock(&ftrace_lock);
+	return ret;
+}
+
+void unregister_ftrace_graph(void)
+{
+	mutex_lock(&ftrace_lock);
+
+	if (unlikely(!ftrace_graph_active))
+		goto out;
+
+	ftrace_graph_active--;
+	ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
+	ftrace_graph_entry = ftrace_graph_entry_stub;
+	__ftrace_graph_entry = ftrace_graph_entry_stub;
+	ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
+	unregister_pm_notifier(&ftrace_suspend_notifier);
+	unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
+
+ out:
+	mutex_unlock(&ftrace_lock);
+}
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 6e407e8278f1..64e635994648 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -19,7 +19,6 @@
 #include <linux/sched/task.h>
 #include <linux/kallsyms.h>
 #include <linux/seq_file.h>
-#include <linux/suspend.h>
 #include <linux/tracefs.h>
 #include <linux/hardirq.h>
 #include <linux/kthread.h>
@@ -167,12 +166,6 @@ static void ftrace_sync_ipi(void *data)
 	smp_rmb();
 }
 
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-/* Both enabled by default (can be cleared by function_graph tracer flags */
-static bool fgraph_sleep_time = true;
-static bool fgraph_graph_time = true;
-#endif
-
 static ftrace_func_t ftrace_ops_get_list_func(struct ftrace_ops *ops)
 {
 	/*
@@ -790,6 +783,13 @@ function_profile_call(unsigned long ip, unsigned long parent_ip,
 }
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
+static bool fgraph_graph_time = true;
+
+void ftrace_graph_graph_time_control(bool enable)
+{
+	fgraph_graph_time = enable;
+}
+
 static int profile_graph_entry(struct ftrace_graph_ent *trace)
 {
 	int index = current->curr_ret_stack;
@@ -996,10 +996,6 @@ static __init void ftrace_profile_tracefs(struct dentry *d_tracer)
 }
 #endif /* CONFIG_FUNCTION_PROFILER */
 
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-int ftrace_graph_active;
-#endif
-
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 static struct ftrace_ops *removed_ops;
@@ -6697,353 +6693,3 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
 	mutex_unlock(&ftrace_lock);
 	return ret;
 }
-
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-
-static struct ftrace_ops graph_ops = {
-	.func			= ftrace_stub,
-	.flags			= FTRACE_OPS_FL_RECURSION_SAFE |
-				   FTRACE_OPS_FL_INITIALIZED |
-				   FTRACE_OPS_FL_PID |
-				   FTRACE_OPS_FL_STUB,
-#ifdef FTRACE_GRAPH_TRAMP_ADDR
-	.trampoline		= FTRACE_GRAPH_TRAMP_ADDR,
-	/* trampoline_size is only needed for dynamically allocated tramps */
-#endif
-	ASSIGN_OPS_HASH(graph_ops, &global_ops.local_hash)
-};
-
-void ftrace_graph_sleep_time_control(bool enable)
-{
-	fgraph_sleep_time = enable;
-}
-
-void ftrace_graph_graph_time_control(bool enable)
-{
-	fgraph_graph_time = enable;
-}
-
-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
-{
-	return 0;
-}
-
-/* The callbacks that hook a function */
-trace_func_graph_ret_t ftrace_graph_return =
-			(trace_func_graph_ret_t)ftrace_stub;
-trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
-static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
-
-/* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
-static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
-{
-	int i;
-	int ret = 0;
-	int start = 0, end = FTRACE_RETSTACK_ALLOC_SIZE;
-	struct task_struct *g, *t;
-
-	for (i = 0; i < FTRACE_RETSTACK_ALLOC_SIZE; i++) {
-		ret_stack_list[i] =
-			kmalloc_array(FTRACE_RETFUNC_DEPTH,
-				      sizeof(struct ftrace_ret_stack),
-				      GFP_KERNEL);
-		if (!ret_stack_list[i]) {
-			start = 0;
-			end = i;
-			ret = -ENOMEM;
-			goto free;
-		}
-	}
-
-	read_lock(&tasklist_lock);
-	do_each_thread(g, t) {
-		if (start == end) {
-			ret = -EAGAIN;
-			goto unlock;
-		}
-
-		if (t->ret_stack == NULL) {
-			atomic_set(&t->tracing_graph_pause, 0);
-			atomic_set(&t->trace_overrun, 0);
-			t->curr_ret_stack = -1;
-			t->curr_ret_depth = -1;
-			/* Make sure the tasks see the -1 first: */
-			smp_wmb();
-			t->ret_stack = ret_stack_list[start++];
-		}
-	} while_each_thread(g, t);
-
-unlock:
-	read_unlock(&tasklist_lock);
-free:
-	for (i = start; i < end; i++)
-		kfree(ret_stack_list[i]);
-	return ret;
-}
-
-static void
-ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
-			struct task_struct *prev, struct task_struct *next)
-{
-	unsigned long long timestamp;
-	int index;
-
-	/*
-	 * Does the user want to count the time a function was asleep.
-	 * If so, do not update the time stamps.
-	 */
-	if (fgraph_sleep_time)
-		return;
-
-	timestamp = trace_clock_local();
-
-	prev->ftrace_timestamp = timestamp;
-
-	/* only process tasks that we timestamped */
-	if (!next->ftrace_timestamp)
-		return;
-
-	/*
-	 * Update all the counters in next to make up for the
-	 * time next was sleeping.
-	 */
-	timestamp -= next->ftrace_timestamp;
-
-	for (index = next->curr_ret_stack; index >= 0; index--)
-		next->ret_stack[index].calltime += timestamp;
-}
-
-/* Allocate a return stack for each task */
-static int start_graph_tracing(void)
-{
-	struct ftrace_ret_stack **ret_stack_list;
-	int ret, cpu;
-
-	ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE,
-				       sizeof(struct ftrace_ret_stack *),
-				       GFP_KERNEL);
-
-	if (!ret_stack_list)
-		return -ENOMEM;
-
-	/* The cpu_boot init_task->ret_stack will never be freed */
-	for_each_online_cpu(cpu) {
-		if (!idle_task(cpu)->ret_stack)
-			ftrace_graph_init_idle_task(idle_task(cpu), cpu);
-	}
-
-	do {
-		ret = alloc_retstack_tasklist(ret_stack_list);
-	} while (ret == -EAGAIN);
-
-	if (!ret) {
-		ret = register_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
-		if (ret)
-			pr_info("ftrace_graph: Couldn't activate tracepoint"
-				" probe to kernel_sched_switch\n");
-	}
-
-	kfree(ret_stack_list);
-	return ret;
-}
-
-/*
- * Hibernation protection.
- * The state of the current task is too much unstable during
- * suspend/restore to disk. We want to protect against that.
- */
-static int
-ftrace_suspend_notifier_call(struct notifier_block *bl, unsigned long state,
-							void *unused)
-{
-	switch (state) {
-	case PM_HIBERNATION_PREPARE:
-		pause_graph_tracing();
-		break;
-
-	case PM_POST_HIBERNATION:
-		unpause_graph_tracing();
-		break;
-	}
-	return NOTIFY_DONE;
-}
-
-static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
-{
-	if (!ftrace_ops_test(&global_ops, trace->func, NULL))
-		return 0;
-	return __ftrace_graph_entry(trace);
-}
-
-/*
- * The function graph tracer should only trace the functions defined
- * by set_ftrace_filter and set_ftrace_notrace. If another function
- * tracer ops is registered, the graph tracer requires testing the
- * function against the global ops, and not just trace any function
- * that any ftrace_ops registered.
- */
-void update_function_graph_func(void)
-{
-	struct ftrace_ops *op;
-	bool do_test = false;
-
-	/*
-	 * The graph and global ops share the same set of functions
-	 * to test. If any other ops is on the list, then
-	 * the graph tracing needs to test if its the function
-	 * it should call.
-	 */
-	do_for_each_ftrace_op(op, ftrace_ops_list) {
-		if (op != &global_ops && op != &graph_ops &&
-		    op != &ftrace_list_end) {
-			do_test = true;
-			/* in double loop, break out with goto */
-			goto out;
-		}
-	} while_for_each_ftrace_op(op);
- out:
-	if (do_test)
-		ftrace_graph_entry = ftrace_graph_entry_test;
-	else
-		ftrace_graph_entry = __ftrace_graph_entry;
-}
-
-static struct notifier_block ftrace_suspend_notifier = {
-	.notifier_call = ftrace_suspend_notifier_call,
-};
-
-int register_ftrace_graph(trace_func_graph_ret_t retfunc,
-			trace_func_graph_ent_t entryfunc)
-{
-	int ret = 0;
-
-	mutex_lock(&ftrace_lock);
-
-	/* we currently allow only one tracer registered at a time */
-	if (ftrace_graph_active) {
-		ret = -EBUSY;
-		goto out;
-	}
-
-	register_pm_notifier(&ftrace_suspend_notifier);
-
-	ftrace_graph_active++;
-	ret = start_graph_tracing();
-	if (ret) {
-		ftrace_graph_active--;
-		goto out;
-	}
-
-	ftrace_graph_return = retfunc;
-
-	/*
-	 * Update the indirect function to the entryfunc, and the
-	 * function that gets called to the entry_test first. Then
-	 * call the update fgraph entry function to determine if
-	 * the entryfunc should be called directly or not.
-	 */
-	__ftrace_graph_entry = entryfunc;
-	ftrace_graph_entry = ftrace_graph_entry_test;
-	update_function_graph_func();
-
-	ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
-out:
-	mutex_unlock(&ftrace_lock);
-	return ret;
-}
-
-void unregister_ftrace_graph(void)
-{
-	mutex_lock(&ftrace_lock);
-
-	if (unlikely(!ftrace_graph_active))
-		goto out;
-
-	ftrace_graph_active--;
-	ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
-	ftrace_graph_entry = ftrace_graph_entry_stub;
-	__ftrace_graph_entry = ftrace_graph_entry_stub;
-	ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
-	unregister_pm_notifier(&ftrace_suspend_notifier);
-	unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
-
- out:
-	mutex_unlock(&ftrace_lock);
-}
-
-static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack);
-
-static void
-graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
-{
-	atomic_set(&t->tracing_graph_pause, 0);
-	atomic_set(&t->trace_overrun, 0);
-	t->ftrace_timestamp = 0;
-	/* make curr_ret_stack visible before we add the ret_stack */
-	smp_wmb();
-	t->ret_stack = ret_stack;
-}
-
-/*
- * Allocate a return stack for the idle task. May be the first
- * time through, or it may be done by CPU hotplug online.
- */
-void ftrace_graph_init_idle_task(struct task_struct *t, int cpu)
-{
-	t->curr_ret_stack = -1;
-	t->curr_ret_depth = -1;
-	/*
-	 * The idle task has no parent, it either has its own
-	 * stack or no stack at all.
-	 */
-	if (t->ret_stack)
-		WARN_ON(t->ret_stack != per_cpu(idle_ret_stack, cpu));
-
-	if (ftrace_graph_active) {
-		struct ftrace_ret_stack *ret_stack;
-
-		ret_stack = per_cpu(idle_ret_stack, cpu);
-		if (!ret_stack) {
-			ret_stack =
-				kmalloc_array(FTRACE_RETFUNC_DEPTH,
-					      sizeof(struct ftrace_ret_stack),
-					      GFP_KERNEL);
-			if (!ret_stack)
-				return;
-			per_cpu(idle_ret_stack, cpu) = ret_stack;
-		}
-		graph_init_task(t, ret_stack);
-	}
-}
-
-/* Allocate a return stack for newly created task */
-void ftrace_graph_init_task(struct task_struct *t)
-{
-	/* Make sure we do not use the parent ret_stack */
-	t->ret_stack = NULL;
-	t->curr_ret_stack = -1;
-	t->curr_ret_depth = -1;
-
-	if (ftrace_graph_active) {
-		struct ftrace_ret_stack *ret_stack;
-
-		ret_stack = kmalloc_array(FTRACE_RETFUNC_DEPTH,
-					  sizeof(struct ftrace_ret_stack),
-					  GFP_KERNEL);
-		if (!ret_stack)
-			return;
-		graph_init_task(t, ret_stack);
-	}
-}
-
-void ftrace_graph_exit_task(struct task_struct *t)
-{
-	struct ftrace_ret_stack	*ret_stack = t->ret_stack;
-
-	t->ret_stack = NULL;
-	/* NULL must become visible to IRQs before we free it: */
-	barrier();
-
-	kfree(ret_stack);
-}
-#endif
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 07/14] fgraph: Add new fgraph_ops structure to enable function graph hooks
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (5 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-23  2:59   ` Joel Fernandes
  2018-11-26 11:30   ` Masami Hiramatsu
  2018-11-22  1:27 ` [RFC][PATCH 08/14] function_graph: Remove unused task_curr_ret_stack() Steven Rostedt
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Currently the registering of function graph is to pass in a entry and return
function. We need to have a way to associate those functions together where
the entry can determine to run the return hook. Having a structure that
contains both functions will facilitate the process of converting the code
to be able to do such.

This is similar to the way function hooks are enabled (it passes in
ftrace_ops). Instead of passing in the functions to use, a single structure
is passed in to the registering function.

The unregister function is now passed in the fgraph_ops handle. When we
allow more than one callback to the function graph hooks, this will let the
system know which one to remove.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h               | 24 +++++++++++++++++-------
 kernel/trace/fgraph.c                |  9 ++++-----
 kernel/trace/ftrace.c                | 10 +++++++---
 kernel/trace/trace_functions_graph.c | 21 ++++++++++++++++-----
 kernel/trace/trace_irqsoff.c         | 10 +++++++---
 kernel/trace/trace_sched_wakeup.c    | 10 +++++++---
 kernel/trace/trace_selftest.c        |  8 ++++++--
 7 files changed, 64 insertions(+), 28 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index f98063e273e5..477ff9412d26 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -749,6 +749,18 @@ typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *); /* entry */
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
+struct fgraph_ops {
+	trace_func_graph_ent_t		entryfunc;
+	trace_func_graph_ret_t		retfunc;
+	struct fgraph_ops __rcu		*next;
+	unsigned long			flags;
+	void				*private;
+#ifdef CONFIG_DYNAMIC_FTRACE
+	struct ftrace_ops_hash		local_hash;
+	struct ftrace_ops_hash		*func_hash;
+#endif
+};
+
 /*
  * Stack of return addresses for functions
  * of a thread.
@@ -792,8 +804,9 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 
 #define FTRACE_RETFUNC_DEPTH 50
 #define FTRACE_RETSTACK_ALLOC_SIZE 32
-extern int register_ftrace_graph(trace_func_graph_ret_t retfunc,
-				trace_func_graph_ent_t entryfunc);
+
+extern int register_ftrace_graph(struct fgraph_ops *ops);
+extern void unregister_ftrace_graph(struct fgraph_ops *ops);
 
 extern bool ftrace_graph_is_dead(void);
 extern void ftrace_graph_stop(void);
@@ -802,8 +815,6 @@ extern void ftrace_graph_stop(void);
 extern trace_func_graph_ret_t ftrace_graph_return;
 extern trace_func_graph_ent_t ftrace_graph_entry;
 
-extern void unregister_ftrace_graph(void);
-
 extern void ftrace_graph_init_task(struct task_struct *t);
 extern void ftrace_graph_exit_task(struct task_struct *t);
 extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
@@ -830,12 +841,11 @@ static inline void ftrace_graph_init_task(struct task_struct *t) { }
 static inline void ftrace_graph_exit_task(struct task_struct *t) { }
 static inline void ftrace_graph_init_idle_task(struct task_struct *t, int cpu) { }
 
-static inline int register_ftrace_graph(trace_func_graph_ret_t retfunc,
-			  trace_func_graph_ent_t entryfunc)
+static inline int register_ftrace_graph(struct fgraph_ops *ops);
 {
 	return -1;
 }
-static inline void unregister_ftrace_graph(void) { }
+static inline void unregister_ftrace_graph(struct fgraph_ops *ops) { }
 
 static inline int task_curr_ret_stack(struct task_struct *tsk)
 {
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index b9c7dbbbdd96..f3a89ecac671 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -491,8 +491,7 @@ static int start_graph_tracing(void)
 	return ret;
 }
 
-int register_ftrace_graph(trace_func_graph_ret_t retfunc,
-			trace_func_graph_ent_t entryfunc)
+int register_ftrace_graph(struct fgraph_ops *gops)
 {
 	int ret = 0;
 
@@ -513,7 +512,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
 		goto out;
 	}
 
-	ftrace_graph_return = retfunc;
+	ftrace_graph_return = gops->retfunc;
 
 	/*
 	 * Update the indirect function to the entryfunc, and the
@@ -521,7 +520,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
 	 * call the update fgraph entry function to determine if
 	 * the entryfunc should be called directly or not.
 	 */
-	__ftrace_graph_entry = entryfunc;
+	__ftrace_graph_entry = gops->entryfunc;
 	ftrace_graph_entry = ftrace_graph_entry_test;
 	update_function_graph_func();
 
@@ -531,7 +530,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
 	return ret;
 }
 
-void unregister_ftrace_graph(void)
+void unregister_ftrace_graph(struct fgraph_ops *gops)
 {
 	mutex_lock(&ftrace_lock);
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 64e635994648..d057dde081e7 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -849,15 +849,19 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
 	local_irq_restore(flags);
 }
 
+static struct fgraph_ops fprofiler_ops = {
+	.entryfunc = &profile_graph_entry,
+	.retfunc = &profile_graph_return,
+};
+
 static int register_ftrace_profiler(void)
 {
-	return register_ftrace_graph(&profile_graph_return,
-				     &profile_graph_entry);
+	return register_ftrace_graph(&fprofiler_ops);
 }
 
 static void unregister_ftrace_profiler(void)
 {
-	unregister_ftrace_graph();
+	unregister_ftrace_graph(&fprofiler_ops);
 }
 #else
 static struct ftrace_ops ftrace_profile_ops __read_mostly = {
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 0e0ff08357cf..7c7fd13d2373 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -336,17 +336,25 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
 		trace_graph_return(trace);
 }
 
+static struct fgraph_ops funcgraph_threash_ops = {
+	.entryfunc = &trace_graph_entry,
+	.retfunc = &trace_graph_thresh_return,
+};
+
+static struct fgraph_ops funcgraph_ops = {
+	.entryfunc = &trace_graph_entry,
+	.retfunc = &trace_graph_return,
+};
+
 static int graph_trace_init(struct trace_array *tr)
 {
 	int ret;
 
 	set_graph_array(tr);
 	if (tracing_thresh)
-		ret = register_ftrace_graph(&trace_graph_thresh_return,
-					    &trace_graph_entry);
+		ret = register_ftrace_graph(&funcgraph_threash_ops);
 	else
-		ret = register_ftrace_graph(&trace_graph_return,
-					    &trace_graph_entry);
+		ret = register_ftrace_graph(&funcgraph_ops);
 	if (ret)
 		return ret;
 	tracing_start_cmdline_record();
@@ -357,7 +365,10 @@ static int graph_trace_init(struct trace_array *tr)
 static void graph_trace_reset(struct trace_array *tr)
 {
 	tracing_stop_cmdline_record();
-	unregister_ftrace_graph();
+	if (tracing_thresh)
+		unregister_ftrace_graph(&funcgraph_threash_ops);
+	else
+		unregister_ftrace_graph(&funcgraph_ops);
 }
 
 static int graph_trace_update_thresh(struct trace_array *tr)
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index b7357f9f82a3..1edab493849b 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -457,6 +457,11 @@ EXPORT_SYMBOL_GPL(stop_critical_timings);
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
+static struct fgraph_ops fgraph_ops = {
+	.entryfunc		= &irqsoff_graph_entry,
+	.retfunc		= &irqsoff_graph_return,
+};
+
 static int register_irqsoff_function(struct trace_array *tr, int graph, int set)
 {
 	int ret;
@@ -466,8 +471,7 @@ static int register_irqsoff_function(struct trace_array *tr, int graph, int set)
 		return 0;
 
 	if (graph)
-		ret = register_ftrace_graph(&irqsoff_graph_return,
-					    &irqsoff_graph_entry);
+		ret = register_ftrace_graph(&fgraph_ops);
 	else
 		ret = register_ftrace_function(tr->ops);
 
@@ -483,7 +487,7 @@ static void unregister_irqsoff_function(struct trace_array *tr, int graph)
 		return;
 
 	if (graph)
-		unregister_ftrace_graph();
+		unregister_ftrace_graph(&fgraph_ops);
 	else
 		unregister_ftrace_function(tr->ops);
 
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index a86b303e6c67..1cd61528137f 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -127,6 +127,11 @@ wakeup_tracer_call(unsigned long ip, unsigned long parent_ip,
 	preempt_enable_notrace();
 }
 
+static struct fgraph_ops fgraph_wakeup_ops = {
+	.entryfunc = &wakeup_graph_entry,
+	.retfunc = &wakeup_graph_return,
+};
+
 static int register_wakeup_function(struct trace_array *tr, int graph, int set)
 {
 	int ret;
@@ -136,8 +141,7 @@ static int register_wakeup_function(struct trace_array *tr, int graph, int set)
 		return 0;
 
 	if (graph)
-		ret = register_ftrace_graph(&wakeup_graph_return,
-					    &wakeup_graph_entry);
+		ret = register_ftrace_graph(&fgraph_wakeup_ops);
 	else
 		ret = register_ftrace_function(tr->ops);
 
@@ -153,7 +157,7 @@ static void unregister_wakeup_function(struct trace_array *tr, int graph)
 		return;
 
 	if (graph)
-		unregister_ftrace_graph();
+		unregister_ftrace_graph(&fgraph_wakeup_ops);
 	else
 		unregister_ftrace_function(tr->ops);
 
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 11e9daa4a568..9d402e7fc949 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -741,6 +741,11 @@ static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace)
 	return trace_graph_entry(trace);
 }
 
+static struct fgraph_ops fgraph_ops __initdata  = {
+	.entryfunc		= &trace_graph_entry_watchdog,
+	.retfunc		= &trace_graph_return,
+};
+
 /*
  * Pretty much the same than for the function tracer from which the selftest
  * has been borrowed.
@@ -765,8 +770,7 @@ trace_selftest_startup_function_graph(struct tracer *trace,
 	 */
 	tracing_reset_online_cpus(&tr->trace_buffer);
 	set_graph_array(tr);
-	ret = register_ftrace_graph(&trace_graph_return,
-				    &trace_graph_entry_watchdog);
+	ret = register_ftrace_graph(&fgraph_ops);
 	if (ret) {
 		warn_failed_init_tracer(trace, ret);
 		goto out;
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 08/14] function_graph: Remove unused task_curr_ret_stack()
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (6 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 07/14] fgraph: Add new fgraph_ops structure to enable function graph hooks Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-26  7:40   ` Masami Hiramatsu
  2018-11-26 10:02   ` Joey Pabalinas
  2018-11-22  1:27 ` [RFC][PATCH 09/14] function_graph: Move ftrace_graph_get_addr() to fgraph.c Steven Rostedt
                   ` (7 subsequent siblings)
  15 siblings, 2 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The static inline function task_curr_ret_stack() is unused, remove it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 477ff9412d26..5544df21a886 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -819,11 +819,6 @@ extern void ftrace_graph_init_task(struct task_struct *t);
 extern void ftrace_graph_exit_task(struct task_struct *t);
 extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
 
-static inline int task_curr_ret_stack(struct task_struct *t)
-{
-	return t->curr_ret_stack;
-}
-
 static inline void pause_graph_tracing(void)
 {
 	atomic_inc(&current->tracing_graph_pause);
@@ -847,11 +842,6 @@ static inline int register_ftrace_graph(struct fgraph_ops *ops);
 }
 static inline void unregister_ftrace_graph(struct fgraph_ops *ops) { }
 
-static inline int task_curr_ret_stack(struct task_struct *tsk)
-{
-	return -1;
-}
-
 static inline unsigned long
 ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
 		      unsigned long *retp)
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 09/14] function_graph: Move ftrace_graph_get_addr() to fgraph.c
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (7 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 08/14] function_graph: Remove unused task_curr_ret_stack() Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-23  3:13   ` Joel Fernandes
  2018-11-22  1:27 ` [RFC][PATCH 10/14] function_graph: Have profiler use new helper ftrace_graph_get_ret_stack() Steven Rostedt
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Move the function function_graph_get_addr() to fgraph.c, as the management
of the curr_ret_stack is going to change, and all the accesses to ret_stack
needs to be done in fgraph.c.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/fgraph.c                | 55 ++++++++++++++++++++++++++++
 kernel/trace/trace_functions_graph.c | 55 ----------------------------
 2 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index f3a89ecac671..c7d612897e33 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -233,6 +233,61 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 	return ret;
 }
 
+/**
+ * ftrace_graph_ret_addr - convert a potentially modified stack return address
+ *			   to its original value
+ *
+ * This function can be called by stack unwinding code to convert a found stack
+ * return address ('ret') to its original value, in case the function graph
+ * tracer has modified it to be 'return_to_handler'.  If the address hasn't
+ * been modified, the unchanged value of 'ret' is returned.
+ *
+ * 'idx' is a state variable which should be initialized by the caller to zero
+ * before the first call.
+ *
+ * 'retp' is a pointer to the return address on the stack.  It's ignored if
+ * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
+ */
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+				    unsigned long ret, unsigned long *retp)
+{
+	int index = task->curr_ret_stack;
+	int i;
+
+	if (ret != (unsigned long)return_to_handler)
+		return ret;
+
+	if (index < 0)
+		return ret;
+
+	for (i = 0; i <= index; i++)
+		if (task->ret_stack[i].retp == retp)
+			return task->ret_stack[i].ret;
+
+	return ret;
+}
+#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+				    unsigned long ret, unsigned long *retp)
+{
+	int task_idx;
+
+	if (ret != (unsigned long)return_to_handler)
+		return ret;
+
+	task_idx = task->curr_ret_stack;
+
+	if (!task->ret_stack || task_idx < *idx)
+		return ret;
+
+	task_idx -= *idx;
+	(*idx)++;
+
+	return task->ret_stack[task_idx].ret;
+}
+#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+
 static struct ftrace_ops graph_ops = {
 	.func			= ftrace_stub,
 	.flags			= FTRACE_OPS_FL_RECURSION_SAFE |
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 7c7fd13d2373..0f9cbc30645d 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -90,61 +90,6 @@ static void
 print_graph_duration(struct trace_array *tr, unsigned long long duration,
 		     struct trace_seq *s, u32 flags);
 
-/**
- * ftrace_graph_ret_addr - convert a potentially modified stack return address
- *			   to its original value
- *
- * This function can be called by stack unwinding code to convert a found stack
- * return address ('ret') to its original value, in case the function graph
- * tracer has modified it to be 'return_to_handler'.  If the address hasn't
- * been modified, the unchanged value of 'ret' is returned.
- *
- * 'idx' is a state variable which should be initialized by the caller to zero
- * before the first call.
- *
- * 'retp' is a pointer to the return address on the stack.  It's ignored if
- * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
- */
-#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
-				    unsigned long ret, unsigned long *retp)
-{
-	int index = task->curr_ret_stack;
-	int i;
-
-	if (ret != (unsigned long)return_to_handler)
-		return ret;
-
-	if (index < 0)
-		return ret;
-
-	for (i = 0; i <= index; i++)
-		if (task->ret_stack[i].retp == retp)
-			return task->ret_stack[i].ret;
-
-	return ret;
-}
-#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
-unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
-				    unsigned long ret, unsigned long *retp)
-{
-	int task_idx;
-
-	if (ret != (unsigned long)return_to_handler)
-		return ret;
-
-	task_idx = task->curr_ret_stack;
-
-	if (!task->ret_stack || task_idx < *idx)
-		return ret;
-
-	task_idx -= *idx;
-	(*idx)++;
-
-	return task->ret_stack[task_idx].ret;
-}
-#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
-
 int __trace_graph_entry(struct trace_array *tr,
 				struct ftrace_graph_ent *trace,
 				unsigned long flags,
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 10/14] function_graph: Have profiler use new helper ftrace_graph_get_ret_stack()
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (8 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 09/14] function_graph: Move ftrace_graph_get_addr() to fgraph.c Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-22  1:27 ` [RFC][PATCH 11/14] function_graph: Convert ret_stack to a series of longs Steven Rostedt
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The ret_stack processing is going to change, and that is going
to break anything that is accessing the ret_stack directly. One user is the
function graph profiler. By using the ftrace_graph_get_ret_stack() helper
function, the profiler can access the ret_stack entry without relying on the
implementation details of the stack itself.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h |  3 +++
 kernel/trace/fgraph.c  | 11 +++++++++++
 kernel/trace/ftrace.c  | 21 +++++++++++----------
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 5544df21a886..36a0fd1316dd 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -792,6 +792,9 @@ extern int
 function_graph_enter(unsigned long ret, unsigned long func,
 		     unsigned long frame_pointer, unsigned long *retp);
 
+struct ftrace_ret_stack *
+ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
+
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp);
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index c7d612897e33..9b85638ecded 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -233,6 +233,17 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 	return ret;
 }
 
+struct ftrace_ret_stack *
+ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
+{
+	idx = current->curr_ret_stack - idx;
+
+	if (idx >= 0 && idx <= task->curr_ret_stack)
+		return &current->ret_stack[idx];
+
+	return NULL;
+}
+
 /**
  * ftrace_graph_ret_addr - convert a potentially modified stack return address
  *			   to its original value
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index d057dde081e7..05568ad4e590 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -792,7 +792,7 @@ void ftrace_graph_graph_time_control(bool enable)
 
 static int profile_graph_entry(struct ftrace_graph_ent *trace)
 {
-	int index = current->curr_ret_stack;
+	struct ftrace_ret_stack *ret_stack;
 
 	function_profile_call(trace->func, 0, NULL, NULL);
 
@@ -800,14 +800,16 @@ static int profile_graph_entry(struct ftrace_graph_ent *trace)
 	if (!current->ret_stack)
 		return 0;
 
-	if (index >= 0 && index < FTRACE_RETFUNC_DEPTH)
-		current->ret_stack[index].subtime = 0;
+	ret_stack = ftrace_graph_get_ret_stack(current, 0);
+	if (ret_stack)
+		ret_stack->subtime = 0;
 
 	return 1;
 }
 
 static void profile_graph_return(struct ftrace_graph_ret *trace)
 {
+	struct ftrace_ret_stack *ret_stack;
 	struct ftrace_profile_stat *stat;
 	unsigned long long calltime;
 	struct ftrace_profile *rec;
@@ -825,16 +827,15 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
 	calltime = trace->rettime - trace->calltime;
 
 	if (!fgraph_graph_time) {
-		int index;
-
-		index = current->curr_ret_stack;
 
 		/* Append this call time to the parent time to subtract */
-		if (index)
-			current->ret_stack[index - 1].subtime += calltime;
+		ret_stack = ftrace_graph_get_ret_stack(current, 1);
+		if (ret_stack)
+			ret_stack->subtime += calltime;
 
-		if (current->ret_stack[index].subtime < calltime)
-			calltime -= current->ret_stack[index].subtime;
+		ret_stack = ftrace_graph_get_ret_stack(current, 0);
+		if (ret_stack && ret_stack->subtime < calltime)
+			calltime -= ret_stack->subtime;
 		else
 			calltime = 0;
 	}
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 11/14] function_graph: Convert ret_stack to a series of longs
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (9 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 10/14] function_graph: Have profiler use new helper ftrace_graph_get_ret_stack() Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-24  5:31   ` Joel Fernandes
  2018-11-22  1:27 ` [RFC][PATCH 12/14] function_graph: Add an array structure that will allow multiple callbacks Steven Rostedt
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

In order to make it possible to have multiple callbacks registered with the
function_graph tracer, the retstack needs to be converted from an array of
ftrace_ret_stack structures to an array of longs. This will allow to store
the list of callbacks on the stack for the return side of the functions.

[ Note, this currently breaks architectures that access the ret_stack of a
  task to handle unwinding when 'return_to_handler' is on the stack ]

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/sched.h |   2 +-
 kernel/trace/fgraph.c | 123 +++++++++++++++++++++++-------------------
 2 files changed, 70 insertions(+), 55 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d6183a55e8eb..71a084a300da 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1119,7 +1119,7 @@ struct task_struct {
 	int				curr_ret_depth;
 
 	/* Stack of return addresses for return function tracing: */
-	struct ftrace_ret_stack		*ret_stack;
+	unsigned long			*ret_stack;
 
 	/* Timestamp for last schedule: */
 	unsigned long long		ftrace_timestamp;
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 9b85638ecded..1389fe39f64c 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -23,6 +23,17 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
+#define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
+#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_SIZE (FTRACE_RETFUNC_DEPTH * FGRAPH_RET_SIZE)
+#define SHADOW_STACK_INDEX			\
+	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+
+#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
+#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
+#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
+
 static bool kill_ftrace_graph;
 int ftrace_graph_active;
 
@@ -59,6 +70,7 @@ static int
 ftrace_push_return_trace(unsigned long ret, unsigned long func,
 			 unsigned long frame_pointer, unsigned long *retp)
 {
+	struct ftrace_ret_stack *ret_stack;
 	unsigned long long calltime;
 	int index;
 
@@ -75,23 +87,25 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
 	smp_rmb();
 
 	/* The return trace stack is full */
-	if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+	if (current->curr_ret_stack > SHADOW_STACK_MAX_INDEX) {
 		atomic_inc(&current->trace_overrun);
 		return -EBUSY;
 	}
 
 	calltime = trace_clock_local();
 
-	index = ++current->curr_ret_stack;
+	index = current->curr_ret_stack;
+	RET_STACK_INC(current->curr_ret_stack);
+	ret_stack = RET_STACK(current, index);
 	barrier();
-	current->ret_stack[index].ret = ret;
-	current->ret_stack[index].func = func;
-	current->ret_stack[index].calltime = calltime;
+	ret_stack->ret = ret;
+	ret_stack->func = func;
+	ret_stack->calltime = calltime;
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
-	current->ret_stack[index].fp = frame_pointer;
+	ret_stack->fp = frame_pointer;
 #endif
 #ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-	current->ret_stack[index].retp = retp;
+	ret_stack->retp = retp;
 #endif
 	return 0;
 }
@@ -114,7 +128,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 
 	return 0;
  out_ret:
-	current->curr_ret_stack--;
+	RET_STACK_DEC(current->curr_ret_stack);
  out:
 	current->curr_ret_depth--;
 	return -EBUSY;
@@ -125,11 +139,13 @@ static void
 ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 			unsigned long frame_pointer)
 {
+	struct ftrace_ret_stack *ret_stack;
 	int index;
 
 	index = current->curr_ret_stack;
+	RET_STACK_DEC(index);
 
-	if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
+	if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
 		ftrace_graph_stop();
 		WARN_ON(1);
 		/* Might as well panic, otherwise we have no where to go */
@@ -137,6 +153,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 		return;
 	}
 
+	ret_stack = RET_STACK(current, index);
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	/*
 	 * The arch may choose to record the frame pointer used
@@ -152,22 +169,22 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 	 * Note, -mfentry does not use frame pointers, and this test
 	 *  is not needed if CC_USING_FENTRY is set.
 	 */
-	if (unlikely(current->ret_stack[index].fp != frame_pointer)) {
+	if (unlikely(ret_stack->fp != frame_pointer)) {
 		ftrace_graph_stop();
 		WARN(1, "Bad frame pointer: expected %lx, received %lx\n"
 		     "  from func %ps return to %lx\n",
 		     current->ret_stack[index].fp,
 		     frame_pointer,
-		     (void *)current->ret_stack[index].func,
-		     current->ret_stack[index].ret);
+		     (void *)ret_stack->func,
+		     ret_stack->ret);
 		*ret = (unsigned long)panic;
 		return;
 	}
 #endif
 
-	*ret = current->ret_stack[index].ret;
-	trace->func = current->ret_stack[index].func;
-	trace->calltime = current->ret_stack[index].calltime;
+	*ret = ret_stack->ret;
+	trace->func = ret_stack->func;
+	trace->calltime = ret_stack->calltime;
 	trace->overrun = atomic_read(&current->trace_overrun);
 	trace->depth = current->curr_ret_depth--;
 	/*
@@ -221,7 +238,7 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 	 * curr_ret_stack is after that.
 	 */
 	barrier();
-	current->curr_ret_stack--;
+	RET_STACK_DEC(current->curr_ret_stack);
 
 	if (unlikely(!ret)) {
 		ftrace_graph_stop();
@@ -236,12 +253,13 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 struct ftrace_ret_stack *
 ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
 {
-	idx = current->curr_ret_stack - idx;
+	int index = task->curr_ret_stack;
 
-	if (idx >= 0 && idx <= task->curr_ret_stack)
-		return &current->ret_stack[idx];
+	index -= FGRAPH_RET_INDEX * (idx + 1);
+	if (index < 0)
+		return NULL;
 
-	return NULL;
+	return RET_STACK(task, index);
 }
 
 /**
@@ -263,18 +281,20 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp)
 {
+	struct ftrace_ret_stack *ret_stack;
 	int index = task->curr_ret_stack;
 	int i;
 
 	if (ret != (unsigned long)return_to_handler)
 		return ret;
 
-	if (index < 0)
-		return ret;
+	RET_STACK_DEC(index);
 
-	for (i = 0; i <= index; i++)
-		if (task->ret_stack[i].retp == retp)
-			return task->ret_stack[i].ret;
+	for (i = index; i >= 0; RET_STACK_DEC(i)) {
+		ret_stack = RET_STACK(task, i);
+		if (ret_stack->retp == retp)
+			return ret_stack->ret;
+	}
 
 	return ret;
 }
@@ -288,14 +308,15 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 		return ret;
 
 	task_idx = task->curr_ret_stack;
+	RET_STACK_DEC(task_idx);
 
 	if (!task->ret_stack || task_idx < *idx)
 		return ret;
 
 	task_idx -= *idx;
-	(*idx)++;
+	RET_STACK_INC(*idx);
 
-	return task->ret_stack[task_idx].ret;
+	return RET_STACK(task, task_idx);
 }
 #endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
 
@@ -329,7 +350,7 @@ trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
 static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
 
 /* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
-static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
+static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
 {
 	int i;
 	int ret = 0;
@@ -337,10 +358,7 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
 	struct task_struct *g, *t;
 
 	for (i = 0; i < FTRACE_RETSTACK_ALLOC_SIZE; i++) {
-		ret_stack_list[i] =
-			kmalloc_array(FTRACE_RETFUNC_DEPTH,
-				      sizeof(struct ftrace_ret_stack),
-				      GFP_KERNEL);
+		ret_stack_list[i] = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 		if (!ret_stack_list[i]) {
 			start = 0;
 			end = i;
@@ -359,9 +377,9 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
 		if (t->ret_stack == NULL) {
 			atomic_set(&t->tracing_graph_pause, 0);
 			atomic_set(&t->trace_overrun, 0);
-			t->curr_ret_stack = -1;
+			t->curr_ret_stack = 0;
 			t->curr_ret_depth = -1;
-			/* Make sure the tasks see the -1 first: */
+			/* Make sure the tasks see the 0 first: */
 			smp_wmb();
 			t->ret_stack = ret_stack_list[start++];
 		}
@@ -379,6 +397,7 @@ static void
 ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 			struct task_struct *prev, struct task_struct *next)
 {
+	struct ftrace_ret_stack *ret_stack;
 	unsigned long long timestamp;
 	int index;
 
@@ -403,8 +422,11 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 	 */
 	timestamp -= next->ftrace_timestamp;
 
-	for (index = next->curr_ret_stack; index >= 0; index--)
-		next->ret_stack[index].calltime += timestamp;
+	for (index = next->curr_ret_stack - FGRAPH_RET_INDEX; index >= 0; ) {
+		ret_stack = RET_STACK(next, index);
+		ret_stack->calltime += timestamp;
+		index -= FGRAPH_RET_INDEX;
+	}
 }
 
 static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
@@ -447,10 +469,10 @@ void update_function_graph_func(void)
 		ftrace_graph_entry = __ftrace_graph_entry;
 }
 
-static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack);
+static DEFINE_PER_CPU(unsigned long *, idle_ret_stack);
 
 static void
-graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
+graph_init_task(struct task_struct *t, unsigned long *ret_stack)
 {
 	atomic_set(&t->tracing_graph_pause, 0);
 	atomic_set(&t->trace_overrun, 0);
@@ -466,7 +488,7 @@ graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
  */
 void ftrace_graph_init_idle_task(struct task_struct *t, int cpu)
 {
-	t->curr_ret_stack = -1;
+	t->curr_ret_stack = 0;
 	t->curr_ret_depth = -1;
 	/*
 	 * The idle task has no parent, it either has its own
@@ -476,14 +498,11 @@ void ftrace_graph_init_idle_task(struct task_struct *t, int cpu)
 		WARN_ON(t->ret_stack != per_cpu(idle_ret_stack, cpu));
 
 	if (ftrace_graph_active) {
-		struct ftrace_ret_stack *ret_stack;
+		unsigned long *ret_stack;
 
 		ret_stack = per_cpu(idle_ret_stack, cpu);
 		if (!ret_stack) {
-			ret_stack =
-				kmalloc_array(FTRACE_RETFUNC_DEPTH,
-					      sizeof(struct ftrace_ret_stack),
-					      GFP_KERNEL);
+			ret_stack = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 			if (!ret_stack)
 				return;
 			per_cpu(idle_ret_stack, cpu) = ret_stack;
@@ -497,15 +516,13 @@ void ftrace_graph_init_task(struct task_struct *t)
 {
 	/* Make sure we do not use the parent ret_stack */
 	t->ret_stack = NULL;
-	t->curr_ret_stack = -1;
+	t->curr_ret_stack = 0;
 	t->curr_ret_depth = -1;
 
 	if (ftrace_graph_active) {
-		struct ftrace_ret_stack *ret_stack;
+		unsigned long *ret_stack;
 
-		ret_stack = kmalloc_array(FTRACE_RETFUNC_DEPTH,
-					  sizeof(struct ftrace_ret_stack),
-					  GFP_KERNEL);
+		ret_stack = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 		if (!ret_stack)
 			return;
 		graph_init_task(t, ret_stack);
@@ -514,7 +531,7 @@ void ftrace_graph_init_task(struct task_struct *t)
 
 void ftrace_graph_exit_task(struct task_struct *t)
 {
-	struct ftrace_ret_stack	*ret_stack = t->ret_stack;
+	unsigned long *ret_stack = t->ret_stack;
 
 	t->ret_stack = NULL;
 	/* NULL must become visible to IRQs before we free it: */
@@ -526,12 +543,10 @@ void ftrace_graph_exit_task(struct task_struct *t)
 /* Allocate a return stack for each task */
 static int start_graph_tracing(void)
 {
-	struct ftrace_ret_stack **ret_stack_list;
+	unsigned long **ret_stack_list;
 	int ret, cpu;
 
-	ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE,
-				       sizeof(struct ftrace_ret_stack *),
-				       GFP_KERNEL);
+	ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 
 	if (!ret_stack_list)
 		return -ENOMEM;
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 12/14] function_graph: Add an array structure that will allow multiple callbacks
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (10 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 11/14] function_graph: Convert ret_stack to a series of longs Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-22  1:27 ` [RFC][PATCH 13/14] function_graph: Allow multiple users to attach to function graph Steven Rostedt
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add an array structure that will eventually allow the function graph tracer
to have up to 16 simultaneous callbacks attached. It's an array of 16
fgraph_ops pointers, that is assigned when one is registered. On entry of a
function the entry of the first item in the array is called, and if it
returns zero, then the callback returns non zero if it wants the return
callback to be called on exit of the function.

The array will simplify the process of having more than one callback
attached to the same function, as its index into the array can be stored on
the shadow stack. We need to only save the index, because this will allow
the fgraph_ops to be freed before the function returns (which may happen if
the function call schedule for a long time).

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/fgraph.c | 48 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 1389fe39f64c..826fa158f9b7 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -37,6 +37,14 @@
 static bool kill_ftrace_graph;
 int ftrace_graph_active;
 
+#define FGRAPH_ARRAY_SIZE	16
+
+#define SHADOW_STACK_SIZE (FTRACE_RETFUNC_DEPTH *			\
+	(sizeof(struct ftrace_ret_stack) +				\
+	FGRAPH_ARRAY_SIZE * sizeof(long)))
+
+static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
+
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
@@ -123,7 +131,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 		goto out;
 
 	/* Only trace if the calling function expects to */
-	if (!ftrace_graph_entry(&trace))
+	if (!fgraph_array[0]->entryfunc(&trace))
 		goto out_ret;
 
 	return 0;
@@ -231,7 +239,7 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 
 	ftrace_pop_return_trace(&trace, &ret, frame_pointer);
 	trace.rettime = trace_clock_local();
-	ftrace_graph_return(&trace);
+	fgraph_array[0]->retfunc(&trace);
 	/*
 	 * The ftrace_graph_return() may still access the current
 	 * ret_stack structure, we need to make sure the update of
@@ -343,6 +351,11 @@ int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
 	return 0;
 }
 
+static struct fgraph_ops fgraph_stub = {
+	.entryfunc = ftrace_graph_entry_stub,
+	.retfunc = (trace_func_graph_ret_t)ftrace_stub,
+};
+
 /* The callbacks that hook a function */
 trace_func_graph_ret_t ftrace_graph_return =
 			(trace_func_graph_ret_t)ftrace_stub;
@@ -575,15 +588,29 @@ static int start_graph_tracing(void)
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
 	int ret = 0;
+	int i;
 
 	mutex_lock(&ftrace_lock);
 
-	/* we currently allow only one tracer registered at a time */
-	if (ftrace_graph_active) {
+	if (!fgraph_array[0]) {
+		/* The array must always have real data on it */
+		for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+			fgraph_array[i] = &fgraph_stub;
+		}
+	}
+
+	/* Look for an available spot */
+	for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+		if (fgraph_array[i] == &fgraph_stub)
+			break;
+	}
+	if (i >= FGRAPH_ARRAY_SIZE) {
 		ret = -EBUSY;
 		goto out;
 	}
 
+	fgraph_array[i] = gops;
+
 	register_pm_notifier(&ftrace_suspend_notifier);
 
 	ftrace_graph_active++;
@@ -594,7 +621,6 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 	}
 
 	ftrace_graph_return = gops->retfunc;
-
 	/*
 	 * Update the indirect function to the entryfunc, and the
 	 * function that gets called to the entry_test first. Then
@@ -613,11 +639,23 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 
 void unregister_ftrace_graph(struct fgraph_ops *gops)
 {
+	int i;
+
 	mutex_lock(&ftrace_lock);
 
 	if (unlikely(!ftrace_graph_active))
 		goto out;
 
+	/* Look for an available spot */
+	for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+		if (fgraph_array[i] == gops)
+			break;
+	}
+	if (i == FGRAPH_ARRAY_SIZE)
+		goto out;
+
+	fgraph_array[i] = &fgraph_stub;
+
 	ftrace_graph_active--;
 	ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
 	ftrace_graph_entry = ftrace_graph_entry_stub;
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 13/14] function_graph: Allow multiple users to attach to function graph
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (11 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 12/14] function_graph: Add an array structure that will allow multiple callbacks Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-22  1:27 ` [RFC][PATCH 14/14] function_graph: Allow for more than one callback to be registered Steven Rostedt
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Allow for multiple users to attach to function graph tracer at the same
time. Only 16 simultaneous users can attach to the tracer. This is because
there's an array that stores the pointers to the attached fgraph_ops. When a
a function being traced is entered, each of the ftrace_ops entryfunc is
called and if it returns non zero, its index into the array will be added to
the shadow stack.

On exit of the function being traced, the shadow stack will contain the
indexes of the ftrace_ops on the array that want their retfunc to be called.

Because a function may sleep for a long time (if a task sleeps itself), the
return of the function may be literally days later. If the ftrace_ops is
removed, its place on the array is replaced with a ftrace_ops that contains
the stub functions and that will be called when the function finally
returns.

If another ftrace_ops is added that happens to get the same index into the
array, its return function may be called. But that's actually the way things
current work with the old function graph tracer. If one tracer is removed
and another is added, the new one will get the return calls of the function
traced by the previous one, thus this is not a regression.

Note, being able to filter functions when both are called is not completely
handled yet, but that shouldn't be too hard to manage.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h |   4 +
 kernel/trace/fgraph.c  | 261 +++++++++++++++++++++++++++++++----------
 2 files changed, 206 insertions(+), 59 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 36a0fd1316dd..746b865c46b6 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -779,6 +779,8 @@ struct ftrace_ret_stack {
 #ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
 	unsigned long *retp;
 #endif
+	unsigned long callers;
+	unsigned long calls[];
 };
 
 /*
@@ -798,6 +800,8 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp);
 
+int function_graph_enter(unsigned long ret, unsigned long func,
+			 unsigned long frame_pointer, unsigned long *retp);
 /*
  * Sometimes we don't want to trace a function with the function
  * graph tracer but we want them to keep traced by the usual function
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 826fa158f9b7..6e5efe1663b9 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -25,26 +25,72 @@
 
 #define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
 #define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
-#define SHADOW_STACK_SIZE (FTRACE_RETFUNC_DEPTH * FGRAPH_RET_SIZE)
+
+#define FGRAPH_ARRAY_SIZE	16
+#define FGRAPH_ARRAY_MASK	0xff
+
+/* Currently the max stack index can't be more than register callers */
+#define FGRAPH_MAX_INDEX	FGRAPH_ARRAY_SIZE
+
+#define FGRAPH_INDEX_SHIFT	8
+
+#define FGRAPH_FRAME_SIZE (FGRAPH_RET_SIZE + FGRAPH_ARRAY_SIZE * (sizeof(long)))
+#define FGRAPH_FRAME_INDEX (ALIGN(FGRAPH_FRAME_SIZE,		\
+				  sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_SIZE (FTRACE_RETFUNC_DEPTH * FGRAPH_FRAME_SIZE)
 #define SHADOW_STACK_INDEX			\
 	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
-#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_FRAME_INDEX)
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
-#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
-#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
 
 static bool kill_ftrace_graph;
 int ftrace_graph_active;
 
-#define FGRAPH_ARRAY_SIZE	16
-
-#define SHADOW_STACK_SIZE (FTRACE_RETFUNC_DEPTH *			\
-	(sizeof(struct ftrace_ret_stack) +				\
-	FGRAPH_ARRAY_SIZE * sizeof(long)))
+static int fgraph_array_cnt;
 
 static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
 
+static inline int get_stack_index(struct task_struct *t, int offset)
+{
+	return current->ret_stack[offset] >> FGRAPH_INDEX_SHIFT;
+}
+
+/*
+ * @offset: The index into @t->ret_stack to find the ret_stack entry
+ * @index: Where to place the index into @t->ret_stack of that entry
+ *
+ * Calling this with:
+ *
+ *   offset = task->curr_ret_stack;
+ *   do {
+ *	ret_stack = get_ret_stack(task, offset, &offset);
+ *   } while (ret_stack);
+ *
+ * Will iterate through all the ret_stack entries from curr_ret_stack
+ * down to the first one.
+ */
+static inline struct ftrace_ret_stack *
+get_ret_stack(struct task_struct *t, int offset, int *index)
+{
+	int idx;
+
+	if (offset <= 0)
+		return NULL;
+
+	idx = get_stack_index(t, offset - 1);
+
+	if (idx <= 0 || idx > FGRAPH_MAX_INDEX)
+		return NULL;
+
+	offset -= idx + FGRAPH_RET_INDEX;
+	if (offset < 0)
+		return NULL;
+
+	*index = offset;
+	return RET_STACK(current, offset);
+}
+
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
@@ -103,8 +149,28 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
 	calltime = trace_clock_local();
 
 	index = current->curr_ret_stack;
-	RET_STACK_INC(current->curr_ret_stack);
+	current->ret_stack[index + FGRAPH_RET_INDEX] = 1 << FGRAPH_INDEX_SHIFT;
 	ret_stack = RET_STACK(current, index);
+	ret_stack->ret = ret;
+	/*
+	 * The undwinders expect curr_ret_stack to point to either zero
+	 * or an index where to find the next ret_stack. Even though the
+	 * ret stack might be bogus, we want to write the ret and the
+	 * index to find the ret_stack before we increment the stack point.
+	 * If an interrupt comes in now before we increment the curr_ret_stack
+	 * it may blow away what we wrote. But that's fine, because the
+	 * index will still be correct (even though the ret wont be).
+	 * What we worry about is the index being correct after we increment
+	 * the curr_ret_stack and before we update that index, as if an
+	 * interrupt comes in and does an unwind stack dump, it will need
+	 * at least a correct index!
+	 */
+	barrier();
+	current->curr_ret_stack += FGRAPH_RET_INDEX + 1;
+	/*
+	 * This next barrier is to ensure that an interrupt coming in
+	 * will not corrupt what we are about to write.
+	 */
 	barrier();
 	ret_stack->ret = ret;
 	ret_stack->func = func;
@@ -122,6 +188,11 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			 unsigned long frame_pointer, unsigned long *retp)
 {
 	struct ftrace_graph_ent trace;
+	int cnt = 0;
+	int offset;
+	int idx;
+	int val;
+	int i;
 
 	trace.func = func;
 	trace.depth = ++current->curr_ret_depth;
@@ -130,38 +201,73 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 				     frame_pointer, retp))
 		goto out;
 
-	/* Only trace if the calling function expects to */
-	if (!fgraph_array[0]->entryfunc(&trace))
+	offset = current->curr_ret_stack - 1;
+
+	for (i = 0, idx = 0; i < fgraph_array_cnt; i++) {
+		if (fgraph_array[i]->entryfunc(&trace)) {
+			val = i | ((++idx) << FGRAPH_INDEX_SHIFT);
+			current->ret_stack[offset] = val;
+			/*
+			 * ftrace_push_return_trace includes one
+			 * callback already (to ensure unwinders can
+			 * find the ret_stack.
+			 * We only need to increment curr_ret_stack
+			 * if there's more than one.
+			 */
+			if (idx == 1) {
+				offset++;
+				cnt++;
+				continue;
+			}
+			/*
+			 * Write the value before we increment, so that
+			 * if an interrupt comes in after we increment
+			 * it will still see the value and skip over
+			 * this.
+			 */
+			barrier();
+			/*
+			 * Have to write again, in case an interrupt
+			 * came in before the increment and after we
+			 * wrote the value.
+			 */
+			current->curr_ret_stack++;
+			barrier();
+			current->ret_stack[offset++] = val;
+			cnt++;
+		}
+	}
+
+	if (!cnt)
 		goto out_ret;
 
 	return 0;
  out_ret:
-	RET_STACK_DEC(current->curr_ret_stack);
+	current->curr_ret_stack -= FGRAPH_RET_INDEX + 1;
  out:
 	current->curr_ret_depth--;
 	return -EBUSY;
 }
 
 /* Retrieve a function return address to the trace stack on thread info.*/
-static void
+static struct ftrace_ret_stack *
 ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 			unsigned long frame_pointer)
 {
 	struct ftrace_ret_stack *ret_stack;
 	int index;
 
-	index = current->curr_ret_stack;
-	RET_STACK_DEC(index);
+	ret_stack = get_ret_stack(current, current->curr_ret_stack, &index);
 
-	if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
+	if (unlikely(!ret_stack)) {
 		ftrace_graph_stop();
-		WARN_ON(1);
+		WARN(1, "Bad function graph ret_stack pointer: %d",
+		     index);
 		/* Might as well panic, otherwise we have no where to go */
 		*ret = (unsigned long)panic;
-		return;
+		return NULL;
 	}
 
-	ret_stack = RET_STACK(current, index);
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	/*
 	 * The arch may choose to record the frame pointer used
@@ -181,12 +287,12 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 		ftrace_graph_stop();
 		WARN(1, "Bad frame pointer: expected %lx, received %lx\n"
 		     "  from func %ps return to %lx\n",
-		     current->ret_stack[index].fp,
+		     ret_stack->fp,
 		     frame_pointer,
 		     (void *)ret_stack->func,
 		     ret_stack->ret);
 		*ret = (unsigned long)panic;
-		return;
+		return NULL;
 	}
 #endif
 
@@ -194,13 +300,15 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 	trace->func = ret_stack->func;
 	trace->calltime = ret_stack->calltime;
 	trace->overrun = atomic_read(&current->trace_overrun);
-	trace->depth = current->curr_ret_depth--;
+	trace->depth = current->curr_ret_depth;
 	/*
 	 * We still want to trace interrupts coming in if
 	 * max_depth is set to 1. Make sure the decrement is
 	 * seen before ftrace_graph_return.
 	 */
 	barrier();
+
+	return ret_stack;
 }
 
 /*
@@ -234,40 +342,63 @@ static struct notifier_block ftrace_suspend_notifier = {
  */
 unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 {
+	struct ftrace_ret_stack *ret_stack;
 	struct ftrace_graph_ret trace;
 	unsigned long ret;
+	int offset;
+	int index;
+	int idx;
+	int val;
+	int i;
 
-	ftrace_pop_return_trace(&trace, &ret, frame_pointer);
-	trace.rettime = trace_clock_local();
-	fgraph_array[0]->retfunc(&trace);
-	/*
-	 * The ftrace_graph_return() may still access the current
-	 * ret_stack structure, we need to make sure the update of
-	 * curr_ret_stack is after that.
-	 */
-	barrier();
-	RET_STACK_DEC(current->curr_ret_stack);
+	ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer);
 
 	if (unlikely(!ret)) {
 		ftrace_graph_stop();
 		WARN_ON(1);
 		/* Might as well panic. What else to do? */
-		ret = (unsigned long)panic;
+		return (unsigned long)panic;
 	}
 
+	trace.rettime = trace_clock_local();
+
+	offset = current->curr_ret_stack - 1;
+	index = get_stack_index(current, offset);
+
+	/* index has to be at least one! Optimize for it */
+	i = 0;
+	do {
+		val = current->ret_stack[offset - i];
+		idx = val & FGRAPH_ARRAY_MASK;
+		fgraph_array[idx]->retfunc(&trace);
+		i++;
+	} while (i < index);
+
+	/*
+	 * The ftrace_graph_return() may still access the current
+	 * ret_stack structure, we need to make sure the update of
+	 * curr_ret_stack is after that.
+	 */
+	barrier();
+	current->curr_ret_stack -= index + FGRAPH_RET_INDEX;
+	current->curr_ret_depth--;
 	return ret;
 }
 
 struct ftrace_ret_stack *
 ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
 {
+	struct ftrace_ret_stack *ret_stack = NULL;
 	int index = task->curr_ret_stack;
 
-	index -= FGRAPH_RET_INDEX * (idx + 1);
 	if (index < 0)
 		return NULL;
 
-	return RET_STACK(task, index);
+	do {
+		ret_stack = get_ret_stack(task, index, &index);
+	} while (ret_stack && --idx >= 0);
+
+	return ret_stack;
 }
 
 /**
@@ -290,16 +421,15 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp)
 {
 	struct ftrace_ret_stack *ret_stack;
-	int index = task->curr_ret_stack;
-	int i;
+	int i = task->curr_ret_stack;
 
 	if (ret != (unsigned long)return_to_handler)
 		return ret;
 
-	RET_STACK_DEC(index);
-
-	for (i = index; i >= 0; RET_STACK_DEC(i)) {
-		ret_stack = RET_STACK(task, i);
+	while (i > 0) {
+		ret_stack = get_ret_stack(current, i, &i);
+		if (!ret_stack)
+			break;
 		if (ret_stack->retp == retp)
 			return ret_stack->ret;
 	}
@@ -310,21 +440,26 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp)
 {
-	int task_idx;
+	struct ftrace_ret_stack *ret_stack;
+	int task_idx = task->curr_ret_stack;
+	int i;
 
 	if (ret != (unsigned long)return_to_handler)
 		return ret;
 
-	task_idx = task->curr_ret_stack;
-	RET_STACK_DEC(task_idx);
-
-	if (!task->ret_stack || task_idx < *idx)
+	if (!idx)
 		return ret;
 
-	task_idx -= *idx;
-	RET_STACK_INC(*idx);
+	i = *idx;
+	do {
+		ret_stack = get_ret_stack(task, task_idx, &task_idx);
+		i--;
+	} while (i >= 0 && ret_stack);
+
+	if (ret_stack)
+		return ret_stack->ret;
 
-	return RET_STACK(task, task_idx);
+	return ret;
 }
 #endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
 
@@ -435,10 +570,10 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 	 */
 	timestamp -= next->ftrace_timestamp;
 
-	for (index = next->curr_ret_stack - FGRAPH_RET_INDEX; index >= 0; ) {
-		ret_stack = RET_STACK(next, index);
-		ret_stack->calltime += timestamp;
-		index -= FGRAPH_RET_INDEX;
+	for (index = next->curr_ret_stack; index > 0; ) {
+		ret_stack = get_ret_stack(next, index, &index);
+		if (ret_stack)
+			ret_stack->calltime += timestamp;
 	}
 }
 
@@ -490,6 +625,8 @@ graph_init_task(struct task_struct *t, unsigned long *ret_stack)
 	atomic_set(&t->tracing_graph_pause, 0);
 	atomic_set(&t->trace_overrun, 0);
 	t->ftrace_timestamp = 0;
+	t->curr_ret_stack = 0;
+	t->curr_ret_depth = -1;
 	/* make curr_ret_stack visible before we add the ret_stack */
 	smp_wmb();
 	t->ret_stack = ret_stack;
@@ -610,6 +747,8 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 	}
 
 	fgraph_array[i] = gops;
+	if (i + 1 > fgraph_array_cnt)
+		fgraph_array_cnt = i + 1;
 
 	register_pm_notifier(&ftrace_suspend_notifier);
 
@@ -646,15 +785,19 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
 	if (unlikely(!ftrace_graph_active))
 		goto out;
 
-	/* Look for an available spot */
-	for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
-		if (fgraph_array[i] == gops)
+	for (i = 0; i < fgraph_array_cnt; i++)
+		if (gops == fgraph_array[i])
 			break;
-	}
-	if (i == FGRAPH_ARRAY_SIZE)
+	if (i >= fgraph_array_cnt)
 		goto out;
 
 	fgraph_array[i] = &fgraph_stub;
+	if (i + 1 == fgraph_array_cnt) {
+		for (; i >= 0; i--)
+			if (fgraph_array[i] != &fgraph_stub)
+				break;
+		fgraph_array_cnt = i + 1;
+	}
 
 	ftrace_graph_active--;
 	ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC][PATCH 14/14] function_graph: Allow for more than one callback to be registered
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (12 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 13/14] function_graph: Allow multiple users to attach to function graph Steven Rostedt
@ 2018-11-22  1:27 ` Steven Rostedt
  2018-11-22 10:08 ` [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Peter Zijlstra
  2018-11-26  9:21 ` Masami Hiramatsu
  15 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/fgraph.c | 50 +++++++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 6e5efe1663b9..8a99eaa46b43 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -753,24 +753,27 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 	register_pm_notifier(&ftrace_suspend_notifier);
 
 	ftrace_graph_active++;
-	ret = start_graph_tracing();
-	if (ret) {
-		ftrace_graph_active--;
-		goto out;
-	}
+	if (ftrace_graph_active == 1) {
+		ret = start_graph_tracing();
+		if (ret) {
+			ftrace_graph_active--;
+			goto out;
+		}
 
-	ftrace_graph_return = gops->retfunc;
-	/*
-	 * Update the indirect function to the entryfunc, and the
-	 * function that gets called to the entry_test first. Then
-	 * call the update fgraph entry function to determine if
-	 * the entryfunc should be called directly or not.
-	 */
-	__ftrace_graph_entry = gops->entryfunc;
-	ftrace_graph_entry = ftrace_graph_entry_test;
-	update_function_graph_func();
+		ftrace_graph_return = gops->retfunc;
 
-	ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+		/*
+		 * Update the indirect function to the entryfunc, and the
+		 * function that gets called to the entry_test first. Then
+		 * call the update fgraph entry function to determine if
+		 * the entryfunc should be called directly or not.
+		 */
+		__ftrace_graph_entry = gops->entryfunc;
+		ftrace_graph_entry = ftrace_graph_entry_test;
+		update_function_graph_func();
+
+		ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+	}
 out:
 	mutex_unlock(&ftrace_lock);
 	return ret;
@@ -800,13 +803,14 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
 	}
 
 	ftrace_graph_active--;
-	ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
-	ftrace_graph_entry = ftrace_graph_entry_stub;
-	__ftrace_graph_entry = ftrace_graph_entry_stub;
-	ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
-	unregister_pm_notifier(&ftrace_suspend_notifier);
-	unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
-
+	if (ftrace_graph_active == 0) {
+		ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
+		ftrace_graph_entry = ftrace_graph_entry_stub;
+		__ftrace_graph_entry = ftrace_graph_entry_stub;
+		ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
+		unregister_pm_notifier(&ftrace_suspend_notifier);
+		unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
+	}
  out:
 	mutex_unlock(&ftrace_lock);
 }
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (13 preceding siblings ...)
  2018-11-22  1:27 ` [RFC][PATCH 14/14] function_graph: Allow for more than one callback to be registered Steven Rostedt
@ 2018-11-22 10:08 ` Peter Zijlstra
  2018-11-22 12:46   ` Steven Rostedt
  2018-11-26  9:21 ` Masami Hiramatsu
  15 siblings, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2018-11-22 10:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

On Wed, Nov 21, 2018 at 08:27:08PM -0500, Steven Rostedt wrote:
> Well the fuction graph tracer is arguably the strongest of the tracers.
> It shows both the entrance and exit of a function, can give the timings
> of a function, and shows the execution of the code quite nicely.
> 
> But it has one major flaw.
> 
> It can't let more than one user access it at a time.

The reason I 'never' use it is performance, it _sucks_.. I've never ran
into the multi-user issue.

So while I don't think the rewrite is bad, this argument here is.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-22 10:08 ` [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Peter Zijlstra
@ 2018-11-22 12:46   ` Steven Rostedt
  2018-11-22 13:42     ` Peter Zijlstra
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-22 12:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

On Thu, 22 Nov 2018 11:08:12 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Nov 21, 2018 at 08:27:08PM -0500, Steven Rostedt wrote:
> > Well the fuction graph tracer is arguably the strongest of the tracers.
> > It shows both the entrance and exit of a function, can give the timings
> > of a function, and shows the execution of the code quite nicely.
> > 
> > But it has one major flaw.
> > 
> > It can't let more than one user access it at a time.  
> 
> The reason I 'never' use it is performance, it _sucks_.. I've never ran
> into the multi-user issue.

And performance is also something to fix (it has improved lately, you
probably haven't noticed).

> 
> So while I don't think the rewrite is bad, this argument here is.

Except that we plan on merging kretprobe with function graph tracing.
This also solves the issue that Mark Rutland has with ret protection.
He has a solution for function graph tracing, but not with kretprobes.

And yes, there's also the case of being able to trace to different
buffers where you can have a full function graph tracing enabled and
also trace a subset that you want to have.

Just because you don't need it, doesn't mean it's not needed by others.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-22 12:46   ` Steven Rostedt
@ 2018-11-22 13:42     ` Peter Zijlstra
  0 siblings, 0 replies; 54+ messages in thread
From: Peter Zijlstra @ 2018-11-22 13:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

On Thu, Nov 22, 2018 at 07:46:05AM -0500, Steven Rostedt wrote:
> On Thu, 22 Nov 2018 11:08:12 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Nov 21, 2018 at 08:27:08PM -0500, Steven Rostedt wrote:
> > > Well the fuction graph tracer is arguably the strongest of the tracers.
> > > It shows both the entrance and exit of a function, can give the timings
> > > of a function, and shows the execution of the code quite nicely.
> > > 
> > > But it has one major flaw.
> > > 
> > > It can't let more than one user access it at a time.  
> > 
> > The reason I 'never' use it is performance, it _sucks_.. I've never ran
> > into the multi-user issue.
> 
> And performance is also something to fix (it has improved lately, you
> probably haven't noticed).

Indeed, I've not recently used it.

> > So while I don't think the rewrite is bad, this argument here is.
> 
> Except that we plan on merging kretprobe with function graph tracing.
> This also solves the issue that Mark Rutland has with ret protection.
> He has a solution for function graph tracing, but not with kretprobes.

Sure. But you didn't make that argument.

> And yes, there's also the case of being able to trace to different
> buffers where you can have a full function graph tracing enabled and
> also trace a subset that you want to have.
> 
> Just because you don't need it, doesn't mean it's not needed by others.

Not saying that; just saying the argument doesn't hold. It is not the
_one_ flaw.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 02/14] fgraph: Have set_graph_notrace only affect function_graph tracer
  2018-11-22  1:27 ` [RFC][PATCH 02/14] fgraph: Have set_graph_notrace only affect function_graph tracer Steven Rostedt
@ 2018-11-23  0:01   ` Namhyung Kim
  2018-11-23 17:37     ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Namhyung Kim @ 2018-11-23  0:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland, kernel-team

Hi Steve,

On Wed, Nov 21, 2018 at 08:27:10PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> In order to make the function graph infrastructure more generic, there can
> not be code specific for the function_graph tracer in the generic code. This
> includes the set_graph_notrace logic, that stops all graph calls when a
> function in the set_graph_notrace is hit.
> 
> By using the trace_recursion mask, we can use a bit in the current
> task_struct to implement the notrace code, and move the logic out of
> fgraph.c and into trace_functions_graph.c and keeps it affecting only the
> tracer and not all call graph callbacks.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


> ---
>  kernel/trace/fgraph.c                | 21 ---------------------
>  kernel/trace/trace.h                 |  6 ++++++
>  kernel/trace/trace_functions_graph.c | 21 +++++++++++++++++++++
>  3 files changed, 27 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index e8fcf1b2b38c..c684968b87e7 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -64,30 +64,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
>  		return -EBUSY;
>  	}
>  
> -	/*
> -	 * The curr_ret_stack is an index to ftrace return stack of
> -	 * current task.  Its value should be in [0, FTRACE_RETFUNC_
> -	 * DEPTH) when the function graph tracer is used.  To support
> -	 * filtering out specific functions, it makes the index
> -	 * negative by subtracting huge value (FTRACE_NOTRACE_DEPTH)
> -	 * so when it sees a negative index the ftrace will ignore
> -	 * the record.  And the index gets recovered when returning
> -	 * from the filtered function by adding the FTRACE_NOTRACE_
> -	 * DEPTH and then it'll continue to record functions normally.
> -	 *
> -	 * The curr_ret_stack is initialized to -1 and get increased
> -	 * in this function.  So it can be less than -1 only if it was
> -	 * filtered out via ftrace_graph_notrace_addr() which can be
> -	 * set from set_graph_notrace file in tracefs by user.
> -	 */
> -	if (current->curr_ret_stack < -1)
> -		return -EBUSY;
> -
>  	calltime = trace_clock_local();
>  
>  	index = ++current->curr_ret_stack;
> -	if (ftrace_graph_notrace_addr(func))
> -		current->curr_ret_stack -= FTRACE_NOTRACE_DEPTH;
>  	barrier();
>  	current->ret_stack[index].ret = ret;
>  	current->ret_stack[index].func = func;
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 3b8c0e24ab30..f3ad85830961 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -512,6 +512,12 @@ enum {
>   * can only be modified by current, we can reuse trace_recursion.
>   */
>  	TRACE_IRQ_BIT,
> +/*
> + * To implement set_graph_notrace, if this bit is set, we ignore
> + * function graph tracing of called functions, until the return
> + * function is called to clear it.
> + */
> +	TRACE_GRAPH_NOTRACE_BIT,
>  };
>  
>  #define trace_recursion_set(bit)	do { (current)->trace_recursion |= (1<<(bit)); } while (0)
> diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
> index af1759cd6eab..4748dc1bf5e1 100644
> --- a/kernel/trace/trace_functions_graph.c
> +++ b/kernel/trace/trace_functions_graph.c
> @@ -188,6 +188,18 @@ int trace_graph_entry(struct ftrace_graph_ent *trace)
>  	int cpu;
>  	int pc;
>  
> +	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
> +		return 0;
> +
> +	if (ftrace_graph_notrace_addr(trace->func)) {
> +		trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
> +		/*
> +		 * Need to return 1 to have the return called
> +		 * that will clear the NOTRACE bit.
> +		 */
> +		return 1;
> +	}
> +
>  	if (!ftrace_trace_task(tr))
>  		return 0;
>  
> @@ -288,6 +300,11 @@ void trace_graph_return(struct ftrace_graph_ret *trace)
>  	int cpu;
>  	int pc;
>  
> +	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
> +		trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
> +		return;
> +	}
> +
>  	local_irq_save(flags);
>  	cpu = raw_smp_processor_id();
>  	data = per_cpu_ptr(tr->trace_buffer.data, cpu);
> @@ -311,6 +328,10 @@ void set_graph_array(struct trace_array *tr)
>  
>  static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
>  {
> +	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
> +		trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
> +		return;
> +	}
>  	if (tracing_thresh &&
>  	    (trace->rettime - trace->calltime < tracing_thresh))
>  		return;
> -- 
> 2.19.1
> 
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 07/14] fgraph: Add new fgraph_ops structure to enable function graph hooks
  2018-11-22  1:27 ` [RFC][PATCH 07/14] fgraph: Add new fgraph_ops structure to enable function graph hooks Steven Rostedt
@ 2018-11-23  2:59   ` Joel Fernandes
  2018-11-23 18:25     ` Steven Rostedt
  2018-11-26 11:30   ` Masami Hiramatsu
  1 sibling, 1 reply; 54+ messages in thread
From: Joel Fernandes @ 2018-11-23  2:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Wed, Nov 21, 2018 at 08:27:15PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Currently the registering of function graph is to pass in a entry and return
> function. We need to have a way to associate those functions together where
> the entry can determine to run the return hook. Having a structure that
> contains both functions will facilitate the process of converting the code
> to be able to do such.
> 
> This is similar to the way function hooks are enabled (it passes in
> ftrace_ops). Instead of passing in the functions to use, a single structure
> is passed in to the registering function.
> 
> The unregister function is now passed in the fgraph_ops handle. When we
> allow more than one callback to the function graph hooks, this will let the
> system know which one to remove.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  include/linux/ftrace.h               | 24 +++++++++++++++++-------
>  kernel/trace/fgraph.c                |  9 ++++-----
>  kernel/trace/ftrace.c                | 10 +++++++---
>  kernel/trace/trace_functions_graph.c | 21 ++++++++++++++++-----
>  kernel/trace/trace_irqsoff.c         | 10 +++++++---
>  kernel/trace/trace_sched_wakeup.c    | 10 +++++++---
>  kernel/trace/trace_selftest.c        |  8 ++++++--
>  7 files changed, 64 insertions(+), 28 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index f98063e273e5..477ff9412d26 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -749,6 +749,18 @@ typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *); /* entry */
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  
> +struct fgraph_ops {
> +	trace_func_graph_ent_t		entryfunc;
> +	trace_func_graph_ret_t		retfunc;
> +	struct fgraph_ops __rcu		*next;
> +	unsigned long			flags;
> +	void				*private;
> +#ifdef CONFIG_DYNAMIC_FTRACE
> +	struct ftrace_ops_hash		local_hash;
> +	struct ftrace_ops_hash		*func_hash;
> +#endif
> +};
> +
>  /*
>   * Stack of return addresses for functions
>   * of a thread.
> @@ -792,8 +804,9 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
>  
>  #define FTRACE_RETFUNC_DEPTH 50
>  #define FTRACE_RETSTACK_ALLOC_SIZE 32
> -extern int register_ftrace_graph(trace_func_graph_ret_t retfunc,
> -				trace_func_graph_ent_t entryfunc);
> +
> +extern int register_ftrace_graph(struct fgraph_ops *ops);
> +extern void unregister_ftrace_graph(struct fgraph_ops *ops);
>  
>  extern bool ftrace_graph_is_dead(void);
>  extern void ftrace_graph_stop(void);
> @@ -802,8 +815,6 @@ extern void ftrace_graph_stop(void);
>  extern trace_func_graph_ret_t ftrace_graph_return;
>  extern trace_func_graph_ent_t ftrace_graph_entry;
>  
> -extern void unregister_ftrace_graph(void);
> -
>  extern void ftrace_graph_init_task(struct task_struct *t);
>  extern void ftrace_graph_exit_task(struct task_struct *t);
>  extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
> @@ -830,12 +841,11 @@ static inline void ftrace_graph_init_task(struct task_struct *t) { }
>  static inline void ftrace_graph_exit_task(struct task_struct *t) { }
>  static inline void ftrace_graph_init_idle_task(struct task_struct *t, int cpu) { }
>  
> -static inline int register_ftrace_graph(trace_func_graph_ret_t retfunc,
> -			  trace_func_graph_ent_t entryfunc)
> +static inline int register_ftrace_graph(struct fgraph_ops *ops);
>  {
>  	return -1;
>  }
> -static inline void unregister_ftrace_graph(void) { }
> +static inline void unregister_ftrace_graph(struct fgraph_ops *ops) { }
>  
>  static inline int task_curr_ret_stack(struct task_struct *tsk)
>  {
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index b9c7dbbbdd96..f3a89ecac671 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -491,8 +491,7 @@ static int start_graph_tracing(void)
>  	return ret;
>  }
>  
> -int register_ftrace_graph(trace_func_graph_ret_t retfunc,
> -			trace_func_graph_ent_t entryfunc)
> +int register_ftrace_graph(struct fgraph_ops *gops)
>  {
>  	int ret = 0;
>  
> @@ -513,7 +512,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
>  		goto out;
>  	}
>  
> -	ftrace_graph_return = retfunc;
> +	ftrace_graph_return = gops->retfunc;
>  
>  	/*
>  	 * Update the indirect function to the entryfunc, and the
> @@ -521,7 +520,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
>  	 * call the update fgraph entry function to determine if
>  	 * the entryfunc should be called directly or not.
>  	 */
> -	__ftrace_graph_entry = entryfunc;
> +	__ftrace_graph_entry = gops->entryfunc;
>  	ftrace_graph_entry = ftrace_graph_entry_test;
>  	update_function_graph_func();
>  
> @@ -531,7 +530,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
>  	return ret;
>  }
>  
> -void unregister_ftrace_graph(void)
> +void unregister_ftrace_graph(struct fgraph_ops *gops)
>  {
>  	mutex_lock(&ftrace_lock);
>  
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 64e635994648..d057dde081e7 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -849,15 +849,19 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
>  	local_irq_restore(flags);
>  }
>  
> +static struct fgraph_ops fprofiler_ops = {
> +	.entryfunc = &profile_graph_entry,
> +	.retfunc = &profile_graph_return,
> +};
> +
>  static int register_ftrace_profiler(void)
>  {
> -	return register_ftrace_graph(&profile_graph_return,
> -				     &profile_graph_entry);
> +	return register_ftrace_graph(&fprofiler_ops);
>  }
>  
>  static void unregister_ftrace_profiler(void)
>  {
> -	unregister_ftrace_graph();
> +	unregister_ftrace_graph(&fprofiler_ops);
>  }
>  #else
>  static struct ftrace_ops ftrace_profile_ops __read_mostly = {
> diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
> index 0e0ff08357cf..7c7fd13d2373 100644
> --- a/kernel/trace/trace_functions_graph.c
> +++ b/kernel/trace/trace_functions_graph.c
> @@ -336,17 +336,25 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
>  		trace_graph_return(trace);
>  }
>  
> +static struct fgraph_ops funcgraph_threash_ops = {

minor nit: should be funcgraph_thresh_ops ?

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 09/14] function_graph: Move ftrace_graph_get_addr() to fgraph.c
  2018-11-22  1:27 ` [RFC][PATCH 09/14] function_graph: Move ftrace_graph_get_addr() to fgraph.c Steven Rostedt
@ 2018-11-23  3:13   ` Joel Fernandes
  2018-11-23 19:25     ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Joel Fernandes @ 2018-11-23  3:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Wed, Nov 21, 2018 at 08:27:17PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Move the function function_graph_get_addr() to fgraph.c, as the management
> of the curr_ret_stack is going to change, and all the accesses to ret_stack
> needs to be done in fgraph.c.

s/ftrace_graph_get_addr/ftrace_graph_ret_addr/

thanks,

 - Joel

> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  kernel/trace/fgraph.c                | 55 ++++++++++++++++++++++++++++
>  kernel/trace/trace_functions_graph.c | 55 ----------------------------
>  2 files changed, 55 insertions(+), 55 deletions(-)
> 
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index f3a89ecac671..c7d612897e33 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -233,6 +233,61 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
>  	return ret;
>  }
>  
> +/**
> + * ftrace_graph_ret_addr - convert a potentially modified stack return address
> + *			   to its original value
> + *
> + * This function can be called by stack unwinding code to convert a found stack
> + * return address ('ret') to its original value, in case the function graph
> + * tracer has modified it to be 'return_to_handler'.  If the address hasn't
> + * been modified, the unchanged value of 'ret' is returned.
> + *
> + * 'idx' is a state variable which should be initialized by the caller to zero
> + * before the first call.
> + *
> + * 'retp' is a pointer to the return address on the stack.  It's ignored if
> + * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
> + */
> +#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
> +unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
> +				    unsigned long ret, unsigned long *retp)
> +{
> +	int index = task->curr_ret_stack;
> +	int i;
> +
> +	if (ret != (unsigned long)return_to_handler)
> +		return ret;
> +
> +	if (index < 0)
> +		return ret;
> +
> +	for (i = 0; i <= index; i++)
> +		if (task->ret_stack[i].retp == retp)
> +			return task->ret_stack[i].ret;
> +
> +	return ret;
> +}
> +#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
> +unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
> +				    unsigned long ret, unsigned long *retp)
> +{
> +	int task_idx;
> +
> +	if (ret != (unsigned long)return_to_handler)
> +		return ret;
> +
> +	task_idx = task->curr_ret_stack;
> +
> +	if (!task->ret_stack || task_idx < *idx)
> +		return ret;
> +
> +	task_idx -= *idx;
> +	(*idx)++;
> +
> +	return task->ret_stack[task_idx].ret;
> +}
> +#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
> +
>  static struct ftrace_ops graph_ops = {
>  	.func			= ftrace_stub,
>  	.flags			= FTRACE_OPS_FL_RECURSION_SAFE |
> diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
> index 7c7fd13d2373..0f9cbc30645d 100644
> --- a/kernel/trace/trace_functions_graph.c
> +++ b/kernel/trace/trace_functions_graph.c
> @@ -90,61 +90,6 @@ static void
>  print_graph_duration(struct trace_array *tr, unsigned long long duration,
>  		     struct trace_seq *s, u32 flags);
>  
> -/**
> - * ftrace_graph_ret_addr - convert a potentially modified stack return address
> - *			   to its original value
> - *
> - * This function can be called by stack unwinding code to convert a found stack
> - * return address ('ret') to its original value, in case the function graph
> - * tracer has modified it to be 'return_to_handler'.  If the address hasn't
> - * been modified, the unchanged value of 'ret' is returned.
> - *
> - * 'idx' is a state variable which should be initialized by the caller to zero
> - * before the first call.
> - *
> - * 'retp' is a pointer to the return address on the stack.  It's ignored if
> - * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
> - */
> -#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
> -unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
> -				    unsigned long ret, unsigned long *retp)
> -{
> -	int index = task->curr_ret_stack;
> -	int i;
> -
> -	if (ret != (unsigned long)return_to_handler)
> -		return ret;
> -
> -	if (index < 0)
> -		return ret;
> -
> -	for (i = 0; i <= index; i++)
> -		if (task->ret_stack[i].retp == retp)
> -			return task->ret_stack[i].ret;
> -
> -	return ret;
> -}
> -#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
> -unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
> -				    unsigned long ret, unsigned long *retp)
> -{
> -	int task_idx;
> -
> -	if (ret != (unsigned long)return_to_handler)
> -		return ret;
> -
> -	task_idx = task->curr_ret_stack;
> -
> -	if (!task->ret_stack || task_idx < *idx)
> -		return ret;
> -
> -	task_idx -= *idx;
> -	(*idx)++;
> -
> -	return task->ret_stack[task_idx].ret;
> -}
> -#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
> -
>  int __trace_graph_entry(struct trace_array *tr,
>  				struct ftrace_graph_ent *trace,
>  				unsigned long flags,
> -- 
> 2.19.1
> 
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c
  2018-11-22  1:27 ` [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c Steven Rostedt
@ 2018-11-23  6:11   ` Joel Fernandes
  2018-11-23 17:58     ` Steven Rostedt
  2018-11-26  7:25     ` Masami Hiramatsu
  0 siblings, 2 replies; 54+ messages in thread
From: Joel Fernandes @ 2018-11-23  6:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Wed, Nov 21, 2018 at 08:27:14PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> To make the function graph infrastructure more managable, the code needs to
> be in its own file (fgraph.c). Move the code that is specific for managing
> the function graph infrastructure out of ftrace.c and into fgraph.c
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

I think this patch causes a build error if CONFIG_FUNCTION_PROFILER is
disabled but function graph is enabled. The following diff fixes it for me.

thanks,

 - Joel

 ----8<------
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 3b8c307c7ff0..ce38bb962f91 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -382,6 +382,15 @@ static void ftrace_update_pid_func(void)
 	update_ftrace_function();
 }
 
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+static bool fgraph_graph_time = true;
+
+void ftrace_graph_graph_time_control(bool enable)
+{
+	fgraph_graph_time = enable;
+}
+#endif
+
 #ifdef CONFIG_FUNCTION_PROFILER
 struct ftrace_profile {
 	struct hlist_node		node;
@@ -783,12 +792,6 @@ function_profile_call(unsigned long ip, unsigned long parent_ip,
 }
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static bool fgraph_graph_time = true;
-
-void ftrace_graph_graph_time_control(bool enable)
-{
-	fgraph_graph_time = enable;
-}
 
 static int profile_graph_entry(struct ftrace_graph_ent *trace)
 {

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 02/14] fgraph: Have set_graph_notrace only affect function_graph tracer
  2018-11-23  0:01   ` Namhyung Kim
@ 2018-11-23 17:37     ` Steven Rostedt
  2018-11-24  5:49       ` Namhyung Kim
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-23 17:37 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland, kernel-team

On Fri, 23 Nov 2018 09:01:18 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks Namhyung!

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c
  2018-11-23  6:11   ` Joel Fernandes
@ 2018-11-23 17:58     ` Steven Rostedt
  2018-11-23 18:11       ` Steven Rostedt
  2018-11-26  7:25     ` Masami Hiramatsu
  1 sibling, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-23 17:58 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Thu, 22 Nov 2018 22:11:33 -0800
Joel Fernandes <joel@joelfernandes.org> wrote:

> On Wed, Nov 21, 2018 at 08:27:14PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > To make the function graph infrastructure more managable, the code needs to
> > be in its own file (fgraph.c). Move the code that is specific for managing
> > the function graph infrastructure out of ftrace.c and into fgraph.c
> > 
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>  
> 
> I think this patch causes a build error if CONFIG_FUNCTION_PROFILER is
> disabled but function graph is enabled. The following diff fixes it for me.

Thanks for reporting this.

> 
> thanks,
> 
>  - Joel
> 
>  ----8<------
>  
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 3b8c307c7ff0..ce38bb962f91 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -382,6 +382,15 @@ static void ftrace_update_pid_func(void)
>  	update_ftrace_function();
>  }
>  
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +static bool fgraph_graph_time = true;
> +
> +void ftrace_graph_graph_time_control(bool enable)
> +{
> +	fgraph_graph_time = enable;
> +}
> +#endif
> +

I think the better answer is to move it into trace_functions_graph.c.

I'll do that and give you the "reported-by".

-- Steve


>  #ifdef CONFIG_FUNCTION_PROFILER
>  struct ftrace_profile {
>  	struct hlist_node		node;
> @@ -783,12 +792,6 @@ function_profile_call(unsigned long ip, unsigned long parent_ip,
>  }
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> -static bool fgraph_graph_time = true;
> -
> -void ftrace_graph_graph_time_control(bool enable)
> -{
> -	fgraph_graph_time = enable;
> -}
>  
>  static int profile_graph_entry(struct ftrace_graph_ent *trace)
>  {


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c
  2018-11-23 17:58     ` Steven Rostedt
@ 2018-11-23 18:11       ` Steven Rostedt
  2018-11-23 22:13         ` Joel Fernandes
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-23 18:11 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Fri, 23 Nov 2018 12:58:34 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> I think the better answer is to move it into trace_functions_graph.c.

I take that back. I think the better answer is to not call that
function if the profiler is not set, nor have that option even
available. Because it has no meaning without the profiler.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 07/14] fgraph: Add new fgraph_ops structure to enable function graph hooks
  2018-11-23  2:59   ` Joel Fernandes
@ 2018-11-23 18:25     ` Steven Rostedt
  0 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-23 18:25 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Thu, 22 Nov 2018 18:59:46 -0800
Joel Fernandes <joel@joelfernandes.org> wrote:

> >  static struct ftrace_ops ftrace_profile_ops __read_mostly = {
> > diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
> > index 0e0ff08357cf..7c7fd13d2373 100644
> > --- a/kernel/trace/trace_functions_graph.c
> > +++ b/kernel/trace/trace_functions_graph.c
> > @@ -336,17 +336,25 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
> >  		trace_graph_return(trace);
> >  }
> >  
> > +static struct fgraph_ops funcgraph_threash_ops = {  
> 
> minor nit: should be funcgraph_thresh_ops ?

At least I got it wrong consistently :-)

Fixed, thanks!

-- Steve


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 09/14] function_graph: Move ftrace_graph_get_addr() to fgraph.c
  2018-11-23  3:13   ` Joel Fernandes
@ 2018-11-23 19:25     ` Steven Rostedt
  0 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-23 19:25 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Thu, 22 Nov 2018 19:13:38 -0800
Joel Fernandes <joel@joelfernandes.org> wrote:

> On Wed, Nov 21, 2018 at 08:27:17PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > Move the function function_graph_get_addr() to fgraph.c, as the management
> > of the curr_ret_stack is going to change, and all the accesses to ret_stack
> > needs to be done in fgraph.c.  
> 
> s/ftrace_graph_get_addr/ftrace_graph_ret_addr/
> 

Good catch. I wrote the change logs a bit after writing the patch. I
had "get" on my mind.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c
  2018-11-23 18:11       ` Steven Rostedt
@ 2018-11-23 22:13         ` Joel Fernandes
  0 siblings, 0 replies; 54+ messages in thread
From: Joel Fernandes @ 2018-11-23 22:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Fri, Nov 23, 2018 at 01:11:38PM -0500, Steven Rostedt wrote:
> On Fri, 23 Nov 2018 12:58:34 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > I think the better answer is to move it into trace_functions_graph.c.
> 
> I take that back. I think the better answer is to not call that
> function if the profiler is not set, nor have that option even
> available. Because it has no meaning without the profiler.

Agreed, that's better. Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 11/14] function_graph: Convert ret_stack to a series of longs
  2018-11-22  1:27 ` [RFC][PATCH 11/14] function_graph: Convert ret_stack to a series of longs Steven Rostedt
@ 2018-11-24  5:31   ` Joel Fernandes
  2018-11-26 16:07     ` Masami Hiramatsu
  2018-11-26 21:31     ` Steven Rostedt
  0 siblings, 2 replies; 54+ messages in thread
From: Joel Fernandes @ 2018-11-24  5:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Wed, Nov 21, 2018 at 08:27:19PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> In order to make it possible to have multiple callbacks registered with the
> function_graph tracer, the retstack needs to be converted from an array of
> ftrace_ret_stack structures to an array of longs. This will allow to store
> the list of callbacks on the stack for the return side of the functions.
> 
> [ Note, this currently breaks architectures that access the ret_stack of a
>   task to handle unwinding when 'return_to_handler' is on the stack ]
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  include/linux/sched.h |   2 +-
>  kernel/trace/fgraph.c | 123 +++++++++++++++++++++++-------------------
>  2 files changed, 70 insertions(+), 55 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d6183a55e8eb..71a084a300da 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1119,7 +1119,7 @@ struct task_struct {
>  	int				curr_ret_depth;
>  
>  	/* Stack of return addresses for return function tracing: */
> -	struct ftrace_ret_stack		*ret_stack;
> +	unsigned long			*ret_stack;
>  
>  	/* Timestamp for last schedule: */
>  	unsigned long long		ftrace_timestamp;
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index 9b85638ecded..1389fe39f64c 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -23,6 +23,17 @@
>  #define ASSIGN_OPS_HASH(opsname, val)
>  #endif
>  
> +#define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
> +#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
> +#define SHADOW_STACK_SIZE (FTRACE_RETFUNC_DEPTH * FGRAPH_RET_SIZE)
> +#define SHADOW_STACK_INDEX			\
> +	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
> +#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
> +
> +#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
> +#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
> +#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
> +
[...]
> @@ -514,7 +531,7 @@ void ftrace_graph_init_task(struct task_struct *t)
>  
>  void ftrace_graph_exit_task(struct task_struct *t)
>  {
> -	struct ftrace_ret_stack	*ret_stack = t->ret_stack;
> +	unsigned long *ret_stack = t->ret_stack;
>  
>  	t->ret_stack = NULL;
>  	/* NULL must become visible to IRQs before we free it: */
> @@ -526,12 +543,10 @@ void ftrace_graph_exit_task(struct task_struct *t)
>  /* Allocate a return stack for each task */
>  static int start_graph_tracing(void)
>  {
> -	struct ftrace_ret_stack **ret_stack_list;
> +	unsigned long **ret_stack_list;
>  	int ret, cpu;
>  
> -	ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE,
> -				       sizeof(struct ftrace_ret_stack *),
> -				       GFP_KERNEL);
> +	ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
>  

I had dumped the fgraph size related macros to understand the patch better, I
got:
[    0.909528] val of FGRAPH_RET_SIZE is 40
[    0.910250] val of FGRAPH_RET_INDEX is 5
[    0.910866] val of FGRAPH_ARRAY_SIZE is 16
[    0.911488] val of FGRAPH_ARRAY_MASK is 255
[    0.912134] val of FGRAPH_MAX_INDEX is 16
[    0.912751] val of FGRAPH_INDEX_SHIFT is 8
[    0.913382] val of FGRAPH_FRAME_SIZE is 168
[    0.914033] val of FGRAPH_FRAME_INDEX is 21
                      FTRACE_RETFUNC_DEPTH is 50
[    0.914686] val of SHADOW_STACK_SIZE is 8400

I had a concern about memory overhead per-task. It seems the total memory
needed per task for the stack is 8400 bytes (with my configuration with
FUNCTION_PROFILE
turned off).

Where as before it would be 32 * 40 = 1280 bytes. That looks like ~7 times
more than before.

On my system with ~4000 threads, that becomes ~32MB which seems a bit
wasteful especially if there was only one or 2 function graph callbacks
registered and most of the callback array in the stack isn't used.

Could we make the array size configurable at compile time and start it with a
small number like 4 or 6?

Also for patches 1 through 10:
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 02/14] fgraph: Have set_graph_notrace only affect function_graph tracer
  2018-11-23 17:37     ` Steven Rostedt
@ 2018-11-24  5:49       ` Namhyung Kim
  2018-11-24 18:41         ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Namhyung Kim @ 2018-11-24  5:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf, frederic,
	Joel Fernandes, Andy Lutomirski, mark.rutland, kernel-team

On Sat, Nov 24, 2018 at 2:37 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 23 Nov 2018 09:01:18 +0900
> Namhyung Kim <namhyung@kernel.org> wrote:
>
> > Acked-by: Namhyung Kim <namhyung@kernel.org>
>
> Thanks Namhyung!

It'd be nice if you cc me for the whole patchset (and other tracing
patches) next time.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 02/14] fgraph: Have set_graph_notrace only affect function_graph tracer
  2018-11-24  5:49       ` Namhyung Kim
@ 2018-11-24 18:41         ` Steven Rostedt
  2018-11-26  4:54           ` Namhyung Kim
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-24 18:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf, frederic,
	Joel Fernandes, Andy Lutomirski, mark.rutland, kernel-team

On Sat, 24 Nov 2018 14:49:34 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> On Sat, Nov 24, 2018 at 2:37 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Fri, 23 Nov 2018 09:01:18 +0900
> > Namhyung Kim <namhyung@kernel.org> wrote:
> >  
> > > Acked-by: Namhyung Kim <namhyung@kernel.org>  
> >
> > Thanks Namhyung!  
> 
> It'd be nice if you cc me for the whole patchset (and other tracing
> patches) next time.
> 

Bah, Sorry, that was an oversight. You were one of the people I planned
on Cc'ing the entire patch set to, but in the process of adding names I
was more focused on who was at Plumbers that I talked to, and trying
to remember everyone that asked to be Cc'd. I didn't mean to leave you
off. That was a mistake.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 02/14] fgraph: Have set_graph_notrace only affect function_graph tracer
  2018-11-24 18:41         ` Steven Rostedt
@ 2018-11-26  4:54           ` Namhyung Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Namhyung Kim @ 2018-11-26  4:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf, frederic,
	Joel Fernandes, Andy Lutomirski, mark.rutland, kernel-team

On Sat, Nov 24, 2018 at 01:41:31PM -0500, Steven Rostedt wrote:
> On Sat, 24 Nov 2018 14:49:34 +0900
> Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > On Sat, Nov 24, 2018 at 2:37 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Fri, 23 Nov 2018 09:01:18 +0900
> > > Namhyung Kim <namhyung@kernel.org> wrote:
> > >  
> > > > Acked-by: Namhyung Kim <namhyung@kernel.org>  
> > >
> > > Thanks Namhyung!  
> > 
> > It'd be nice if you cc me for the whole patchset (and other tracing
> > patches) next time.
> > 
> 
> Bah, Sorry, that was an oversight. You were one of the people I planned
> on Cc'ing the entire patch set to, but in the process of adding names I
> was more focused on who was at Plumbers that I talked to, and trying
> to remember everyone that asked to be Cc'd. I didn't mean to leave you
> off. That was a mistake.

No problem!  Always thank you for the good work. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c
  2018-11-23  6:11   ` Joel Fernandes
  2018-11-23 17:58     ` Steven Rostedt
@ 2018-11-26  7:25     ` Masami Hiramatsu
  1 sibling, 0 replies; 54+ messages in thread
From: Masami Hiramatsu @ 2018-11-26  7:25 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Josh Poimboeuf, Frederic Weisbecker, Andy Lutomirski,
	Mark Rutland

On Thu, 22 Nov 2018 22:11:33 -0800
Joel Fernandes <joel@joelfernandes.org> wrote:

> On Wed, Nov 21, 2018 at 08:27:14PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > To make the function graph infrastructure more managable, the code needs to
> > be in its own file (fgraph.c). Move the code that is specific for managing
> > the function graph infrastructure out of ftrace.c and into fgraph.c
> > 
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> I think this patch causes a build error if CONFIG_FUNCTION_PROFILER is
> disabled but function graph is enabled. The following diff fixes it for me.

Good catch, I also confirmed it.

I think we'd better to define an empty stub of ftrace_graph_graph_time_control(enable)
in trace.h instead, since fgraph_graph_time is not used if CONFIG_FUNCTION_PROFILER=n.

Thank you,

> 
> thanks,
> 
>  - Joel
> 
>  ----8<------
>  
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 3b8c307c7ff0..ce38bb962f91 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -382,6 +382,15 @@ static void ftrace_update_pid_func(void)
>  	update_ftrace_function();
>  }
>  
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +static bool fgraph_graph_time = true;
> +
> +void ftrace_graph_graph_time_control(bool enable)
> +{
> +	fgraph_graph_time = enable;
> +}
> +#endif
> +
>  #ifdef CONFIG_FUNCTION_PROFILER
>  struct ftrace_profile {
>  	struct hlist_node		node;
> @@ -783,12 +792,6 @@ function_profile_call(unsigned long ip, unsigned long parent_ip,
>  }
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> -static bool fgraph_graph_time = true;
> -
> -void ftrace_graph_graph_time_control(bool enable)
> -{
> -	fgraph_graph_time = enable;
> -}
>  
>  static int profile_graph_entry(struct ftrace_graph_ent *trace)
>  {


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 08/14] function_graph: Remove unused task_curr_ret_stack()
  2018-11-22  1:27 ` [RFC][PATCH 08/14] function_graph: Remove unused task_curr_ret_stack() Steven Rostedt
@ 2018-11-26  7:40   ` Masami Hiramatsu
  2018-11-26 21:26     ` Steven Rostedt
  2018-11-26 10:02   ` Joey Pabalinas
  1 sibling, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2018-11-26  7:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland

On Wed, 21 Nov 2018 20:27:16 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> The static inline function task_curr_ret_stack() is unused, remove it.

This looks able to be applied without this series. I think we should
apply this to for-next branch?

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>

Thanks,


> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  include/linux/ftrace.h | 10 ----------
>  1 file changed, 10 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 477ff9412d26..5544df21a886 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -819,11 +819,6 @@ extern void ftrace_graph_init_task(struct task_struct *t);
>  extern void ftrace_graph_exit_task(struct task_struct *t);
>  extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
>  
> -static inline int task_curr_ret_stack(struct task_struct *t)
> -{
> -	return t->curr_ret_stack;
> -}
> -
>  static inline void pause_graph_tracing(void)
>  {
>  	atomic_inc(&current->tracing_graph_pause);
> @@ -847,11 +842,6 @@ static inline int register_ftrace_graph(struct fgraph_ops *ops);
>  }
>  static inline void unregister_ftrace_graph(struct fgraph_ops *ops) { }
>  
> -static inline int task_curr_ret_stack(struct task_struct *tsk)
> -{
> -	return -1;
> -}
> -
>  static inline unsigned long
>  ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
>  		      unsigned long *retp)
> -- 
> 2.19.1
> 
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (14 preceding siblings ...)
  2018-11-22 10:08 ` [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Peter Zijlstra
@ 2018-11-26  9:21 ` Masami Hiramatsu
  2018-11-26 16:32   ` Steven Rostedt
  15 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2018-11-26  9:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland

Hi Steve,

On Wed, 21 Nov 2018 20:27:08 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> I talked with many of you at Plumbers about rewriting the function graph
> tracer. Well, this is it. I was originally going to produce just a
> proof of concept, but when I found that I had to fix a design flaw
> and that covered all the arch code anyway, I decided to do more of a
> RFC patch set.

Thank you for starting this work! This might be a good way to simplify
treating of shadow stacks in kretprobe and function-graph-tracers.
(And I hope this can help me to remove some kind of complexity in
 kretprobes)

> 
> I probably should add more comments to the code, and update the 
> function graph design documentation, but I wanted to get this out
> before the US Turkey day for your enjoyment while you try to let your
> pants buckle again.

:)

Let me try to review and port kretprobe on it. On the way, I will find
issues if there are.

> 
> Why the rewrite? 
> 
> Well the fuction graph tracer is arguably the strongest of the tracers.
> It shows both the entrance and exit of a function, can give the timings
> of a function, and shows the execution of the code quite nicely.
> 
> But it has one major flaw.
> 
> It can't let more than one user access it at a time. The function
> tracer has had that feature for years now, but due to the design of
> the function graph tracer it was difficult to implement. Why?
> 
> Because you must maintain the state of a three-tuple.
> 
>  Task, Function, Callback
> 
> The state is determined at by the entryfunc and must be passed to the
> retfunc when the function being traced returns. But this is not an
> easy task, as that state can be different for each task, each function
> and each callback.
> 
> What's the solution? I use the shadow stack that is already being
> used to store the function return addresses.
> 
>   A big thanks to Masami Hiramatsu for suggesting this idea!
> 
> For now, I only allow an 16 users of the function graph tracer at a time.
> That should be more than enough. I create an array of 16 fgraph_ops
> pointers. When a user registers their fgraph_ops to the function graph
> tracer, it is assigned an index into that array, which will hold a pointer
> to the fgraph_ops being registered.
> 
> On entry of the function, the array is iterated and each entryfunc of
> the fgraph_ops in the array is called. If the entryfunc returns non-zero,
> then the index of that fgraph_ops is pushed on the shadow stack (along
> with the index to the "ret_stack entry" structure, for fast access
> to it). If the entryfunc returns zero, then it is ignored. If at least
> one function returned non-zero then the return of the traced function
> will also be traced.
> 
> On the return of the function, the shadow stack is examined and all
> the indexes that were pushed on the stack is read, and each fgraph_ops
> retfunc is called in the reverse order.
> 
> When a fgraph_ops is unregistered, its index in the array is set to point
> to a "stub" fgraph_ops that holds stub functions that just return
> "0" for the entryfunc and does nothing for the retfunc. This is because
> the retfunc may be called literally days after the entryfunc is called
> and we want to be able to free the fgraph_ops that is unregistered.
> 
> Note, if another fgraph_ops is registered in the same location, its
> retfunc may be called that was set by a previous fgraph_ops. This
> is not a regression because that's what can happen today if you unregister
> a callback from the current function_graph tracer and register another
> one. If this is an issue, there are ways to solve it.

Yeah, I need the solution, maybe an API to get correct return address? :)

By the way, are there any way to hold a private data on each ret_stack entry?
Since kretprobe supports "entry data" passed from entry_handler to
return handler, we have to store the data or data-instance on the ret_stack.

This feature is used by systemtap to save the function entry data, like
function parameters etc., so that return handler analyzes the parameters
with return value.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 08/14] function_graph: Remove unused task_curr_ret_stack()
  2018-11-22  1:27 ` [RFC][PATCH 08/14] function_graph: Remove unused task_curr_ret_stack() Steven Rostedt
  2018-11-26  7:40   ` Masami Hiramatsu
@ 2018-11-26 10:02   ` Joey Pabalinas
  2018-11-26 21:27     ` Steven Rostedt
  1 sibling, 1 reply; 54+ messages in thread
From: Joey Pabalinas @ 2018-11-26 10:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland, Joey Pabalinas

[-- Attachment #1: Type: text/plain, Size: 331 bytes --]

On Wed, Nov 21, 2018 at 08:27:16PM -0500, Steven Rostedt wrote:
> The static inline function task_curr_ret_stack() is unused, remove it.

Just want ot make sure I understand this correctly, instead of using this
function, the convention now is to just directly assign `t->curr_ret_stack = -1`?

-- 
Cheers,
Joey Pabalinas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 07/14] fgraph: Add new fgraph_ops structure to enable function graph hooks
  2018-11-22  1:27 ` [RFC][PATCH 07/14] fgraph: Add new fgraph_ops structure to enable function graph hooks Steven Rostedt
  2018-11-23  2:59   ` Joel Fernandes
@ 2018-11-26 11:30   ` Masami Hiramatsu
  2018-11-26 21:06     ` Steven Rostedt
  1 sibling, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2018-11-26 11:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland

On Wed, 21 Nov 2018 20:27:15 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Currently the registering of function graph is to pass in a entry and return
> function. We need to have a way to associate those functions together where
> the entry can determine to run the return hook. Having a structure that
> contains both functions will facilitate the process of converting the code
> to be able to do such.
> 
> This is similar to the way function hooks are enabled (it passes in
> ftrace_ops). Instead of passing in the functions to use, a single structure
> is passed in to the registering function.
> 
> The unregister function is now passed in the fgraph_ops handle. When we
> allow more than one callback to the function graph hooks, this will let the
> system know which one to remove.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  include/linux/ftrace.h               | 24 +++++++++++++++++-------
>  kernel/trace/fgraph.c                |  9 ++++-----
>  kernel/trace/ftrace.c                | 10 +++++++---
>  kernel/trace/trace_functions_graph.c | 21 ++++++++++++++++-----
>  kernel/trace/trace_irqsoff.c         | 10 +++++++---
>  kernel/trace/trace_sched_wakeup.c    | 10 +++++++---
>  kernel/trace/trace_selftest.c        |  8 ++++++--
>  7 files changed, 64 insertions(+), 28 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index f98063e273e5..477ff9412d26 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -749,6 +749,18 @@ typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *); /* entry */
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  
> +struct fgraph_ops {
> +	trace_func_graph_ent_t		entryfunc;
> +	trace_func_graph_ret_t		retfunc;

> +	struct fgraph_ops __rcu		*next;
> +	unsigned long			flags;
> +	void				*private;
> +#ifdef CONFIG_DYNAMIC_FTRACE
> +	struct ftrace_ops_hash		local_hash;
> +	struct ftrace_ops_hash		*func_hash;
> +#endif

Hmm, can we introduce these fields when we actually use it?

BTW, would you have any idea for using private field?

Thank you,




-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 11/14] function_graph: Convert ret_stack to a series of longs
  2018-11-24  5:31   ` Joel Fernandes
@ 2018-11-26 16:07     ` Masami Hiramatsu
  2018-11-26 16:26       ` Steven Rostedt
  2018-11-26 21:31     ` Steven Rostedt
  1 sibling, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2018-11-26 16:07 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Josh Poimboeuf, Frederic Weisbecker, Andy Lutomirski,
	Mark Rutland

On Fri, 23 Nov 2018 21:31:38 -0800
Joel Fernandes <joel@joelfernandes.org> wrote:

> On Wed, Nov 21, 2018 at 08:27:19PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > In order to make it possible to have multiple callbacks registered with the
> > function_graph tracer, the retstack needs to be converted from an array of
> > ftrace_ret_stack structures to an array of longs. This will allow to store
> > the list of callbacks on the stack for the return side of the functions.
> > 
> > [ Note, this currently breaks architectures that access the ret_stack of a
> >   task to handle unwinding when 'return_to_handler' is on the stack ]
> > 
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > ---
> >  include/linux/sched.h |   2 +-
> >  kernel/trace/fgraph.c | 123 +++++++++++++++++++++++-------------------
> >  2 files changed, 70 insertions(+), 55 deletions(-)
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index d6183a55e8eb..71a084a300da 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1119,7 +1119,7 @@ struct task_struct {
> >  	int				curr_ret_depth;
> >  
> >  	/* Stack of return addresses for return function tracing: */
> > -	struct ftrace_ret_stack		*ret_stack;
> > +	unsigned long			*ret_stack;
> >  
> >  	/* Timestamp for last schedule: */
> >  	unsigned long long		ftrace_timestamp;
> > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> > index 9b85638ecded..1389fe39f64c 100644
> > --- a/kernel/trace/fgraph.c
> > +++ b/kernel/trace/fgraph.c
> > @@ -23,6 +23,17 @@
> >  #define ASSIGN_OPS_HASH(opsname, val)
> >  #endif
> >  
> > +#define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
> > +#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
> > +#define SHADOW_STACK_SIZE (FTRACE_RETFUNC_DEPTH * FGRAPH_RET_SIZE)
> > +#define SHADOW_STACK_INDEX			\
> > +	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
> > +#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
> > +
> > +#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
> > +#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
> > +#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
> > +
> [...]
> > @@ -514,7 +531,7 @@ void ftrace_graph_init_task(struct task_struct *t)
> >  
> >  void ftrace_graph_exit_task(struct task_struct *t)
> >  {
> > -	struct ftrace_ret_stack	*ret_stack = t->ret_stack;
> > +	unsigned long *ret_stack = t->ret_stack;
> >  
> >  	t->ret_stack = NULL;
> >  	/* NULL must become visible to IRQs before we free it: */
> > @@ -526,12 +543,10 @@ void ftrace_graph_exit_task(struct task_struct *t)
> >  /* Allocate a return stack for each task */
> >  static int start_graph_tracing(void)
> >  {
> > -	struct ftrace_ret_stack **ret_stack_list;
> > +	unsigned long **ret_stack_list;
> >  	int ret, cpu;
> >  
> > -	ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE,
> > -				       sizeof(struct ftrace_ret_stack *),
> > -				       GFP_KERNEL);
> > +	ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
> >  
> 
> I had dumped the fgraph size related macros to understand the patch better, I
> got:
> [    0.909528] val of FGRAPH_RET_SIZE is 40
> [    0.910250] val of FGRAPH_RET_INDEX is 5
> [    0.910866] val of FGRAPH_ARRAY_SIZE is 16
> [    0.911488] val of FGRAPH_ARRAY_MASK is 255
> [    0.912134] val of FGRAPH_MAX_INDEX is 16
> [    0.912751] val of FGRAPH_INDEX_SHIFT is 8
> [    0.913382] val of FGRAPH_FRAME_SIZE is 168
> [    0.914033] val of FGRAPH_FRAME_INDEX is 21
>                       FTRACE_RETFUNC_DEPTH is 50
> [    0.914686] val of SHADOW_STACK_SIZE is 8400
> 
> I had a concern about memory overhead per-task. It seems the total memory
> needed per task for the stack is 8400 bytes (with my configuration with
> FUNCTION_PROFILE
> turned off).
> 
> Where as before it would be 32 * 40 = 1280 bytes. That looks like ~7 times
> more than before.

Hmm, this seems too big... I thought the shadow-stack size should be
smaller than 1 page (4kB). Steve, can we give a 4k page for shadow stack
and define FTRACE_RETFUNC_DEPTH = 4096 / FGRAPH_RET_SIZE ?

> On my system with ~4000 threads, that becomes ~32MB which seems a bit
> wasteful especially if there was only one or 2 function graph callbacks
> registered and most of the callback array in the stack isn't used.
> 
> Could we make the array size configurable at compile time and start it with a
> small number like 4 or 6?

Or, we can introduce online setting :)

Thank you,

> 
> Also for patches 1 through 10:
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> thanks,
> 
>  - Joel
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 11/14] function_graph: Convert ret_stack to a series of longs
  2018-11-26 16:07     ` Masami Hiramatsu
@ 2018-11-26 16:26       ` Steven Rostedt
  2018-11-28  1:38         ` Joel Fernandes
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-26 16:26 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Joel Fernandes, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Tue, 27 Nov 2018 01:07:55 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -1119,7 +1119,7 @@ struct task_struct {
> > >  	int				curr_ret_depth;
> > >  
> > >  	/* Stack of return addresses for return function tracing: */
> > > -	struct ftrace_ret_stack		*ret_stack;
> > > +	unsigned long			*ret_stack;
> > >  
> > >  	/* Timestamp for last schedule: */
> > >  	unsigned long long		ftrace_timestamp;
> > > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> > > index 9b85638ecded..1389fe39f64c 100644
> > > --- a/kernel/trace/fgraph.c
> > > +++ b/kernel/trace/fgraph.c
> > > @@ -23,6 +23,17 @@
> > >  #define ASSIGN_OPS_HASH(opsname, val)
> > >  #endif
> > >  
> > > +#define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
> > > +#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
> > > +#define SHADOW_STACK_SIZE (FTRACE_RETFUNC_DEPTH * FGRAPH_RET_SIZE)
> > > +#define SHADOW_STACK_INDEX			\
> > > +	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
> > > +#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
> > > +
> > > +#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
> > > +#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
> > > +#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
> > > +  
> > [...]  
> > > @@ -514,7 +531,7 @@ void ftrace_graph_init_task(struct task_struct *t)
> > >  
> > >  void ftrace_graph_exit_task(struct task_struct *t)
> > >  {
> > > -	struct ftrace_ret_stack	*ret_stack = t->ret_stack;
> > > +	unsigned long *ret_stack = t->ret_stack;
> > >  
> > >  	t->ret_stack = NULL;
> > >  	/* NULL must become visible to IRQs before we free it: */
> > > @@ -526,12 +543,10 @@ void ftrace_graph_exit_task(struct task_struct *t)
> > >  /* Allocate a return stack for each task */
> > >  static int start_graph_tracing(void)
> > >  {
> > > -	struct ftrace_ret_stack **ret_stack_list;
> > > +	unsigned long **ret_stack_list;
> > >  	int ret, cpu;
> > >  
> > > -	ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE,
> > > -				       sizeof(struct ftrace_ret_stack *),
> > > -				       GFP_KERNEL);
> > > +	ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
> > >    
> > 
> > I had dumped the fgraph size related macros to understand the patch better, I
> > got:
> > [    0.909528] val of FGRAPH_RET_SIZE is 40
> > [    0.910250] val of FGRAPH_RET_INDEX is 5
> > [    0.910866] val of FGRAPH_ARRAY_SIZE is 16
> > [    0.911488] val of FGRAPH_ARRAY_MASK is 255
> > [    0.912134] val of FGRAPH_MAX_INDEX is 16
> > [    0.912751] val of FGRAPH_INDEX_SHIFT is 8
> > [    0.913382] val of FGRAPH_FRAME_SIZE is 168
> > [    0.914033] val of FGRAPH_FRAME_INDEX is 21
> >                       FTRACE_RETFUNC_DEPTH is 50
> > [    0.914686] val of SHADOW_STACK_SIZE is 8400
> > 
> > I had a concern about memory overhead per-task. It seems the total memory
> > needed per task for the stack is 8400 bytes (with my configuration with
> > FUNCTION_PROFILE
> > turned off).
> > 
> > Where as before it would be 32 * 40 = 1280 bytes. That looks like ~7 times
> > more than before.  
> 
> Hmm, this seems too big... I thought the shadow-stack size should be
> smaller than 1 page (4kB). Steve, can we give a 4k page for shadow stack
> and define FTRACE_RETFUNC_DEPTH = 4096 / FGRAPH_RET_SIZE ?

For the first pass, I decided not to worry about the size. It made the
code less complex :-)

Yes, I plan on working on making the size of the stack smaller, but
that will probably be added on patches to do so.

> 
> > On my system with ~4000 threads, that becomes ~32MB which seems a bit
> > wasteful especially if there was only one or 2 function graph callbacks
> > registered and most of the callback array in the stack isn't used.

Note, all 4000 threads could be doing those trace backs, and if you are
doing full function graph tracing, it will use a lot.

> > 
> > Could we make the array size configurable at compile time and start it with a
> > small number like 4 or 6?  
> 
> Or, we can introduce online setting :)

Yes, that can easily be added. I didn't try to make this into the
perfect solution, I wanted a solid one first, and then massage it into
something that is more efficient, both with memory consumption and
performance.

Joel and Masami, thanks for the feedback.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-26  9:21 ` Masami Hiramatsu
@ 2018-11-26 16:32   ` Steven Rostedt
  2018-11-29 14:29     ` Masami Hiramatsu
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-26 16:32 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

On Mon, 26 Nov 2018 18:21:12 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:


> > Note, if another fgraph_ops is registered in the same location, its
> > retfunc may be called that was set by a previous fgraph_ops. This
> > is not a regression because that's what can happen today if you unregister
> > a callback from the current function_graph tracer and register another
> > one. If this is an issue, there are ways to solve it.  
> 
> Yeah, I need the solution, maybe an API to get correct return address? :)

One way to solve this is to also have a counter array that gets updated
every time the index array gets updated. And save the counter to the
shadow stack index as well. This way, we only call the return if the
counter on the stack matches what's in the counter on the counter array
for the index.

> 
> By the way, are there any way to hold a private data on each ret_stack entry?
> Since kretprobe supports "entry data" passed from entry_handler to
> return handler, we have to store the data or data-instance on the ret_stack.
> 
> This feature is used by systemtap to save the function entry data, like
> function parameters etc., so that return handler analyzes the parameters
> with return value.

Yes, I remember you telling me about this at plumbers, and while I was
writing this code I had that in mind. It wouldn't be too hard to
implement, I just left it off for now. I also left it off because I
have some questions about what exactly is needed. What size do you
require to be stored. Especially if we want to keep the shadow stack
smaller. I was going to find a way to implement some of the data that
is already stored via the ret_stack with this instead, and make the
ret_stack entry smaller. Should we allow just sizeof(long)*3? or just
let user choose any size and if they run out of stack, then too bad. We
just wont let it crash.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 07/14] fgraph: Add new fgraph_ops structure to enable function graph hooks
  2018-11-26 11:30   ` Masami Hiramatsu
@ 2018-11-26 21:06     ` Steven Rostedt
  0 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-26 21:06 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

On Mon, 26 Nov 2018 20:30:49 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > index f98063e273e5..477ff9412d26 100644
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -749,6 +749,18 @@ typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *); /* entry */
> >  
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >  
> > +struct fgraph_ops {
> > +	trace_func_graph_ent_t		entryfunc;
> > +	trace_func_graph_ret_t		retfunc;  
> 
> > +	struct fgraph_ops __rcu		*next;
> > +	unsigned long			flags;
> > +	void				*private;
> > +#ifdef CONFIG_DYNAMIC_FTRACE
> > +	struct ftrace_ops_hash		local_hash;
> > +	struct ftrace_ops_hash		*func_hash;
> > +#endif  
> 
> Hmm, can we introduce these fields when we actually use it?

Sure.

> 
> BTW, would you have any idea for using private field?

I believe I answered this in another email.

-- Steve


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 08/14] function_graph: Remove unused task_curr_ret_stack()
  2018-11-26  7:40   ` Masami Hiramatsu
@ 2018-11-26 21:26     ` Steven Rostedt
  0 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-26 21:26 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

On Mon, 26 Nov 2018 16:40:02 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Wed, 21 Nov 2018 20:27:16 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > The static inline function task_curr_ret_stack() is unused, remove it.  
> 
> This looks able to be applied without this series. I think we should
> apply this to for-next branch?

Well, my current "for-next" branch is really a "for-this" branch, as I
will be sending it up to Linus before the new merge window.

This is a clean up that can wait till the next merge window. I'll push
it to the front of the queue though, so that I can start adding some of
this for the next merge window.

> 
> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
> 

Thanks Masami,

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 08/14] function_graph: Remove unused task_curr_ret_stack()
  2018-11-26 10:02   ` Joey Pabalinas
@ 2018-11-26 21:27     ` Steven Rostedt
  2018-11-26 21:37       ` Joey Pabalinas
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-26 21:27 UTC (permalink / raw)
  To: Joey Pabalinas
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland

On Mon, 26 Nov 2018 00:02:07 -1000
Joey Pabalinas <joeypabalinas@gmail.com> wrote:

> On Wed, Nov 21, 2018 at 08:27:16PM -0500, Steven Rostedt wrote:
> > The static inline function task_curr_ret_stack() is unused, remove it.  
> 
> Just want ot make sure I understand this correctly, instead of using this
> function, the convention now is to just directly assign `t->curr_ret_stack = -1`?
> 

Not sure what you mean. This function would just return the value of
curr_ret_stack, it didn't modify it.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 11/14] function_graph: Convert ret_stack to a series of longs
  2018-11-24  5:31   ` Joel Fernandes
  2018-11-26 16:07     ` Masami Hiramatsu
@ 2018-11-26 21:31     ` Steven Rostedt
  1 sibling, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-11-26 21:31 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Fri, 23 Nov 2018 21:31:38 -0800
Joel Fernandes <joel@joelfernandes.org> wrote:

> Also for patches 1 through 10:
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Thanks!

I'll move these over to my ftrace/core branch and start testing them
for linux-next. The rest can probably wait till the next merge window
after that.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Re: [RFC][PATCH 08/14] function_graph: Remove unused task_curr_ret_stack()
  2018-11-26 21:27     ` Steven Rostedt
@ 2018-11-26 21:37       ` Joey Pabalinas
  0 siblings, 0 replies; 54+ messages in thread
From: Joey Pabalinas @ 2018-11-26 21:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joey Pabalinas, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Josh Poimboeuf, Frederic Weisbecker, Joel Fernandes,
	Andy Lutomirski, Mark Rutland

[-- Attachment #1: Type: text/plain, Size: 1017 bytes --]

On Mon, Nov 26, 2018 at 04:27:50PM -0500, Steven Rostedt wrote:
> On Mon, 26 Nov 2018 00:02:07 -1000
> Joey Pabalinas <joeypabalinas@gmail.com> wrote:
> 
> > On Wed, Nov 21, 2018 at 08:27:16PM -0500, Steven Rostedt wrote:
> > > The static inline function task_curr_ret_stack() is unused, remove it.  
> > 
> > Just want ot make sure I understand this correctly, instead of using this
> > function, the convention now is to just directly assign `t->curr_ret_stack = -1`?
> > 
> 
> Not sure what you mean. This function would just return the value of
> curr_ret_stack, it didn't modify it.
> 
> -- Steve

Sorry for wording it a bit weird, I was mostly trying to figure out what
these functions were originally used for. The uses I found in the git
log were mostly things like:

	ret = trace_vprintk(ip, task_curr_ret_stack(current), fmt, ap);

I guess the 2nd argument for these trace functions are implicit now and
these functions are no longer needed anywhere?

-- 
Cheers,
Joey Pabalinas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 03/14] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH
  2018-11-22  1:27 ` [RFC][PATCH 03/14] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH Steven Rostedt
@ 2018-11-27 19:31   ` Will Deacon
  2018-11-27 19:50     ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Will Deacon @ 2018-11-27 19:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland, Catalin Marinas, linux-arm-kernel

Hi Steve,

On Wed, Nov 21, 2018 at 08:27:11PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> The curr_ret_stack is no longer set to -1 when not tracing a function. It is
> now done differently, and the FTRACE_NOTRACE_DEPTH value is no longer used.
> Remove the reference to it.

Do you have a pointer to the commit that changed that behaviour? I just want
to make sure we're not missing something in our unwind_frame() code.

> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  arch/arm64/kernel/stacktrace.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 4989f7ea1e59..7723dadf25be 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -61,9 +61,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
>  			(frame->pc == (unsigned long)return_to_handler)) {
>  		if (WARN_ON_ONCE(frame->graph == -1))
>  			return -EINVAL;

Hmm, so is this code redundant too ^^ ?

> -		if (frame->graph < -1)
> -			frame->graph += FTRACE_NOTRACE_DEPTH;
> -

Do we still need to initialise frame->graph in __save_stack_trace()?

Will

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 03/14] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH
  2018-11-27 19:31   ` Will Deacon
@ 2018-11-27 19:50     ` Steven Rostedt
  2018-12-05 21:50       ` [PATCH 03/14 v2] " Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-27 19:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland, Catalin Marinas, linux-arm-kernel

On Tue, 27 Nov 2018 19:31:50 +0000
Will Deacon <will.deacon@arm.com> wrote:

> Hi Steve,
> 
> On Wed, Nov 21, 2018 at 08:27:11PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > The curr_ret_stack is no longer set to -1 when not tracing a function. It is
> > now done differently, and the FTRACE_NOTRACE_DEPTH value is no longer used.
> > Remove the reference to it.  
> 
> Do you have a pointer to the commit that changed that behaviour? I just want
> to make sure we're not missing something in our unwind_frame() code.

The commit log is slightly wrong. The -1 goes away in patch 11 of this
series (and this is patch 3). But the FTRACE_NOTRACE_DEPTH is going
away, which is why we need this patch.

> 
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: linux-arm-kernel@lists.infradead.org
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > ---
> >  arch/arm64/kernel/stacktrace.c | 3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index 4989f7ea1e59..7723dadf25be 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -61,9 +61,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> >  			(frame->pc == (unsigned long)return_to_handler)) {
> >  		if (WARN_ON_ONCE(frame->graph == -1))
> >  			return -EINVAL;  
> 
> Hmm, so is this code redundant too ^^ ?

Actually, it is fine before patch 11 (which I'm only going to take the
first 10 patches for the next merge window, after they are cleaned up).


> 
> > -		if (frame->graph < -1)
> > -			frame->graph += FTRACE_NOTRACE_DEPTH;
> > -  
> 
> Do we still need to initialise frame->graph in __save_stack_trace()?

You are fine till patch 11, which will probably break all this (another
reason why I'm not going to push that in the next merge window).

Ideally, we need to get rid of all users of curr_ret_stack outside of
the function graph logic. As it's going to change drastically. We need
this in order to combine function graph and kretprobes.

I'm working on a v2 that's going to be aimed at the next merge window,
that will only contain patches 1-10 of this series (and some other
updates). I'll Cc you on that entire set.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 11/14] function_graph: Convert ret_stack to a series of longs
  2018-11-26 16:26       ` Steven Rostedt
@ 2018-11-28  1:38         ` Joel Fernandes
  0 siblings, 0 replies; 54+ messages in thread
From: Joel Fernandes @ 2018-11-28  1:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Josh Poimboeuf,
	Frederic Weisbecker, Andy Lutomirski, Mark Rutland

On Mon, Nov 26, 2018 at 11:26:03AM -0500, Steven Rostedt wrote:
> On Tue, 27 Nov 2018 01:07:55 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > > > --- a/include/linux/sched.h
> > > > +++ b/include/linux/sched.h
> > > > @@ -1119,7 +1119,7 @@ struct task_struct {
> > > >  	int				curr_ret_depth;
> > > >  
> > > >  	/* Stack of return addresses for return function tracing: */
> > > > -	struct ftrace_ret_stack		*ret_stack;
> > > > +	unsigned long			*ret_stack;
> > > >  
> > > >  	/* Timestamp for last schedule: */
> > > >  	unsigned long long		ftrace_timestamp;
> > > > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> > > > index 9b85638ecded..1389fe39f64c 100644
> > > > --- a/kernel/trace/fgraph.c
> > > > +++ b/kernel/trace/fgraph.c
> > > > @@ -23,6 +23,17 @@
> > > >  #define ASSIGN_OPS_HASH(opsname, val)
> > > >  #endif
> > > >  
> > > > +#define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
> > > > +#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
> > > > +#define SHADOW_STACK_SIZE (FTRACE_RETFUNC_DEPTH * FGRAPH_RET_SIZE)
> > > > +#define SHADOW_STACK_INDEX			\
> > > > +	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
> > > > +#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
> > > > +
> > > > +#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
> > > > +#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
> > > > +#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
> > > > +  
> > > [...]  
> > > > @@ -514,7 +531,7 @@ void ftrace_graph_init_task(struct task_struct *t)
> > > >  
> > > >  void ftrace_graph_exit_task(struct task_struct *t)
> > > >  {
> > > > -	struct ftrace_ret_stack	*ret_stack = t->ret_stack;
> > > > +	unsigned long *ret_stack = t->ret_stack;
> > > >  
> > > >  	t->ret_stack = NULL;
> > > >  	/* NULL must become visible to IRQs before we free it: */
> > > > @@ -526,12 +543,10 @@ void ftrace_graph_exit_task(struct task_struct *t)
> > > >  /* Allocate a return stack for each task */
> > > >  static int start_graph_tracing(void)
> > > >  {
> > > > -	struct ftrace_ret_stack **ret_stack_list;
> > > > +	unsigned long **ret_stack_list;
> > > >  	int ret, cpu;
> > > >  
> > > > -	ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE,
> > > > -				       sizeof(struct ftrace_ret_stack *),
> > > > -				       GFP_KERNEL);
> > > > +	ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
> > > >    
> > > 
> > > I had dumped the fgraph size related macros to understand the patch better, I
> > > got:
> > > [    0.909528] val of FGRAPH_RET_SIZE is 40
> > > [    0.910250] val of FGRAPH_RET_INDEX is 5
> > > [    0.910866] val of FGRAPH_ARRAY_SIZE is 16
> > > [    0.911488] val of FGRAPH_ARRAY_MASK is 255
> > > [    0.912134] val of FGRAPH_MAX_INDEX is 16
> > > [    0.912751] val of FGRAPH_INDEX_SHIFT is 8
> > > [    0.913382] val of FGRAPH_FRAME_SIZE is 168
> > > [    0.914033] val of FGRAPH_FRAME_INDEX is 21
> > >                       FTRACE_RETFUNC_DEPTH is 50
> > > [    0.914686] val of SHADOW_STACK_SIZE is 8400
> > > 
> > > I had a concern about memory overhead per-task. It seems the total memory
> > > needed per task for the stack is 8400 bytes (with my configuration with
> > > FUNCTION_PROFILE
> > > turned off).
> > > 
> > > Where as before it would be 32 * 40 = 1280 bytes. That looks like ~7 times
> > > more than before.  
> > 
> > Hmm, this seems too big... I thought the shadow-stack size should be
> > smaller than 1 page (4kB). Steve, can we give a 4k page for shadow stack
> > and define FTRACE_RETFUNC_DEPTH = 4096 / FGRAPH_RET_SIZE ?
> 
> For the first pass, I decided not to worry about the size. It made the
> code less complex :-)
> 
> Yes, I plan on working on making the size of the stack smaller, but
> that will probably be added on patches to do so.

Cool, sounds good.

> > > On my system with ~4000 threads, that becomes ~32MB which seems a bit
> > > wasteful especially if there was only one or 2 function graph callbacks
> > > registered and most of the callback array in the stack isn't used.
> 
> Note, all 4000 threads could be doing those trace backs, and if you are
> doing full function graph tracing, it will use a lot.

But I think each of the threads will only use N entries in the callback array
where N is the number of function graph callback users who registered, right?
So the remaining total-N allocated callback array entries per thread will not
be used.

> > > Could we make the array size configurable at compile time and start it with a
> > > small number like 4 or 6?  
> > 
> > Or, we can introduce online setting :)
> 
> Yes, that can easily be added. I didn't try to make this into the
> perfect solution, I wanted a solid one first, and then massage it into
> something that is more efficient, both with memory consumption and
> performance.
> 
> Joel and Masami, thanks for the feedback.

I agree the first step is good so far. Looking forward to the patches, thanks
a lot,

 - Joel


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-26 16:32   ` Steven Rostedt
@ 2018-11-29 14:29     ` Masami Hiramatsu
  2018-11-29 16:46       ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2018-11-29 14:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

Hi Steve,

On Mon, 26 Nov 2018 11:32:15 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 26 Nov 2018 18:21:12 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> 
> > > Note, if another fgraph_ops is registered in the same location, its
> > > retfunc may be called that was set by a previous fgraph_ops. This
> > > is not a regression because that's what can happen today if you unregister
> > > a callback from the current function_graph tracer and register another
> > > one. If this is an issue, there are ways to solve it.  
> > 
> > Yeah, I need the solution, maybe an API to get correct return address? :)
> 
> One way to solve this is to also have a counter array that gets updated
> every time the index array gets updated. And save the counter to the
> shadow stack index as well. This way, we only call the return if the
> counter on the stack matches what's in the counter on the counter array
> for the index.

Hmm, but we already know the current stack "header" entry when calling
handlers, don't we? I thought we just calcurate out from curr_ret_stack.

> > By the way, are there any way to hold a private data on each ret_stack entry?
> > Since kretprobe supports "entry data" passed from entry_handler to
> > return handler, we have to store the data or data-instance on the ret_stack.
> > 
> > This feature is used by systemtap to save the function entry data, like
> > function parameters etc., so that return handler analyzes the parameters
> > with return value.
> 
> Yes, I remember you telling me about this at plumbers, and while I was
> writing this code I had that in mind. It wouldn't be too hard to
> implement, I just left it off for now. I also left it off because I
> have some questions about what exactly is needed. What size do you
> require to be stored. Especially if we want to keep the shadow stack
> smaller. I was going to find a way to implement some of the data that
> is already stored via the ret_stack with this instead, and make the
> ret_stack entry smaller. Should we allow just sizeof(long)*3? or just
> let user choose any size and if they run out of stack, then too bad. We
> just wont let it crash.

I need only sizeof(unsigned long). If the kretprobe user requires more,
it will be fall back to current method -- get an "instance" and store
its address to the entry :-)

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-29 14:29     ` Masami Hiramatsu
@ 2018-11-29 16:46       ` Steven Rostedt
  2018-11-30  2:26         ` Masami Hiramatsu
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-29 16:46 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland

On Thu, 29 Nov 2018 23:29:27 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > One way to solve this is to also have a counter array that gets updated
> > every time the index array gets updated. And save the counter to the
> > shadow stack index as well. This way, we only call the return if the
> > counter on the stack matches what's in the counter on the counter array
> > for the index.  
> 
> Hmm, but we already know the current stack "header" entry when calling
> handlers, don't we? I thought we just calcurate out from curr_ret_stack.

Basically we have this:

 array: | &fgraph_ops_1 | &fgraph_ops_2 | &fgraph_ops_stub | ...

On entry of function we do:

	for (i = 0; i < array_entries; i++) {
		if (array[i]->entryfunc(...)) {
			push i onto ret_stack;
		}
	}

On the return side, we do:

	idx = pop ret_stack;

	array[idx]->retfunc(...);

We only call the retfunc of a fgraph_ops if it returned non-zero from
its entryfunc(). The return can happen a long time from now (which is
why I don't save the &fgraph_ops on the ret_stack, because then we would
never be able to free it).

In the mean time, lets say we unregistered (and freed) fgraph_ops_2 and
then added fgraph_ops_3, so the array looks like:

 array: | &fgraph_ops_1 | &fgraph_ops_3 | &fgraph_ops_stub | ...

Then a function that was called when fgraph_ops_2 was on the stack
returns, it will call array[1]->retfunc() which now belongs to
fgraph_ops_3 and not fgraph_ops_2.

But if we add a counter array that gets updated when new ops are added
to the array, we have this:

 cnt_array: |       4       |     2         |       0          |
     array: | &fgraph_ops_1 | &fgraph_ops_2 | &fgraph_ops_stub | ...

And do:

	for (i = 0; i < array_entries; i++) {
		if (array[i]->entryfunc(...)) {
			idx = cnt_array[i] << 8 | i;
			push idx onto ret_stack;
		}
	}

Then on return we have:

	idx = pop ret_stack;

	if (idx >> 8 == cnt_array[idx & 0xff])
		array[idx & 0xff]->retfunc(...);

It wouldn't call fgraph_ops_3 because we would change the cnt_array
when we remove fgraph_ops_2 and the return would not match, as
cnt_array[1] would then be "3".

> 
> > > By the way, are there any way to hold a private data on each ret_stack entry?
> > > Since kretprobe supports "entry data" passed from entry_handler to
> > > return handler, we have to store the data or data-instance on the ret_stack.
> > > 
> > > This feature is used by systemtap to save the function entry data, like
> > > function parameters etc., so that return handler analyzes the parameters
> > > with return value.  
> > 
> > Yes, I remember you telling me about this at plumbers, and while I was
> > writing this code I had that in mind. It wouldn't be too hard to
> > implement, I just left it off for now. I also left it off because I
> > have some questions about what exactly is needed. What size do you
> > require to be stored. Especially if we want to keep the shadow stack
> > smaller. I was going to find a way to implement some of the data that
> > is already stored via the ret_stack with this instead, and make the
> > ret_stack entry smaller. Should we allow just sizeof(long)*3? or just
> > let user choose any size and if they run out of stack, then too bad. We
> > just wont let it crash.  
> 
> I need only sizeof(unsigned long). If the kretprobe user requires more,
> it will be fall back to current method -- get an "instance" and store
> its address to the entry :-)

Awesome, then this shouldn't be too hard to implement.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-29 16:46       ` Steven Rostedt
@ 2018-11-30  2:26         ` Masami Hiramatsu
  2018-11-30  3:24           ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2018-11-30  2:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, systemtap,
	Alexei Starovoitov, Daniel Borkmann

On Thu, 29 Nov 2018 11:46:52 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 29 Nov 2018 23:29:27 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > > One way to solve this is to also have a counter array that gets updated
> > > every time the index array gets updated. And save the counter to the
> > > shadow stack index as well. This way, we only call the return if the
> > > counter on the stack matches what's in the counter on the counter array
> > > for the index.  
> > 
> > Hmm, but we already know the current stack "header" entry when calling
> > handlers, don't we? I thought we just calcurate out from curr_ret_stack.
> 
> Basically we have this:
> 
>  array: | &fgraph_ops_1 | &fgraph_ops_2 | &fgraph_ops_stub | ...
> 
> On entry of function we do:
> 
	push header(including original ret_addr) onto ret_stack

> 	for (i = 0; i < array_entries; i++) {
> 		if (array[i]->entryfunc(...)) {
> 			push i onto ret_stack;
> 		}
> 	}
> 
> On the return side, we do:
> 
> 	idx = pop ret_stack;
> 
> 	array[idx]->retfunc(...);

So at this point we have the header on ret_stack, don't we? :)

Anyway, I think we may provide an API for unwinder to find correct
original return address form ret_stack. That is OK for me.


> 
> We only call the retfunc of a fgraph_ops if it returned non-zero from
> its entryfunc(). The return can happen a long time from now (which is
> why I don't save the &fgraph_ops on the ret_stack, because then we would
> never be able to free it).
> 
> In the mean time, lets say we unregistered (and freed) fgraph_ops_2 and
> then added fgraph_ops_3, so the array looks like:
> 
>  array: | &fgraph_ops_1 | &fgraph_ops_3 | &fgraph_ops_stub | ...
> 
> Then a function that was called when fgraph_ops_2 was on the stack
> returns, it will call array[1]->retfunc() which now belongs to
> fgraph_ops_3 and not fgraph_ops_2.
> 
> But if we add a counter array that gets updated when new ops are added
> to the array, we have this:
> 
>  cnt_array: |       4       |     2         |       0          |
>      array: | &fgraph_ops_1 | &fgraph_ops_2 | &fgraph_ops_stub | ...
> 
> And do:
> 
> 	for (i = 0; i < array_entries; i++) {
> 		if (array[i]->entryfunc(...)) {
> 			idx = cnt_array[i] << 8 | i;
> 			push idx onto ret_stack;
> 		}
> 	}
> 
> Then on return we have:
> 
> 	idx = pop ret_stack;
> 
> 	if (idx >> 8 == cnt_array[idx & 0xff])
> 		array[idx & 0xff]->retfunc(...);
> 
> It wouldn't call fgraph_ops_3 because we would change the cnt_array
> when we remove fgraph_ops_2 and the return would not match, as
> cnt_array[1] would then be "3".
> 
> > 
> > > > By the way, are there any way to hold a private data on each ret_stack entry?
> > > > Since kretprobe supports "entry data" passed from entry_handler to
> > > > return handler, we have to store the data or data-instance on the ret_stack.
> > > > 
> > > > This feature is used by systemtap to save the function entry data, like
> > > > function parameters etc., so that return handler analyzes the parameters
> > > > with return value.  
> > > 
> > > Yes, I remember you telling me about this at plumbers, and while I was
> > > writing this code I had that in mind. It wouldn't be too hard to
> > > implement, I just left it off for now. I also left it off because I
> > > have some questions about what exactly is needed. What size do you
> > > require to be stored. Especially if we want to keep the shadow stack
> > > smaller. I was going to find a way to implement some of the data that
> > > is already stored via the ret_stack with this instead, and make the
> > > ret_stack entry smaller. Should we allow just sizeof(long)*3? or just
> > > let user choose any size and if they run out of stack, then too bad. We
> > > just wont let it crash.  
> > 
> > I need only sizeof(unsigned long). If the kretprobe user requires more,
> > it will be fall back to current method -- get an "instance" and store
> > its address to the entry :-)
> 
> Awesome, then this shouldn't be too hard to implement.

Oops, anyway I noticed that I must store a value on each area so that we can
identify which kretprobe is using that if there are several kretprobes on same
function. So, kretprobe implementation will be something like below.

kretprobe_retfunc(trace, regs)
{
	kp = get_kprobe(trace->func);

	if (private == to_kretprobe(kp)) // this is directly mapped to current kprobe
		goto found_kretprobe;

	if (!list_empty(&kp->list)) {	// we need to find from multiple kretprobes
		list_for_each_entry(kp, &kp->list, list)
			if (private == kp)
				goto found_kretprobe;
	}

	// Or this must be an instance
	struct kretprobe_instance *ri = trace->private;
	rp = ri->rp;
	if (valid_kretprobe(rp))
		rp->handler(ri, regs);
	kretprobe_recycle_instance(ri);
	goto out;

found_kretprobe:
	struct kretprobe_instance rii = {.rp = to_kretprobe(kp),
		.ret_addr=trace->ret, .task = current}
	rp->handler(&rii, regs);

out:
	return 0;
}

I think we talked about pt_regs, which is redundant for return probe, so it should
be just a return value. (but how we pass it? trace->retval?)
That is OK for ftrace (but the transition needs more code).
And I would like to ask ebpf and systemtap people that is OK since it will change
the kernel ABI.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-30  2:26         ` Masami Hiramatsu
@ 2018-11-30  3:24           ` Steven Rostedt
  2018-11-30 14:11             ` Masami Hiramatsu
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2018-11-30  3:24 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, systemtap,
	Alexei Starovoitov, Daniel Borkmann

On Fri, 30 Nov 2018 11:26:58 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Thu, 29 Nov 2018 11:46:52 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Thu, 29 Nov 2018 23:29:27 +0900
> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >   
> > > > One way to solve this is to also have a counter array that gets updated
> > > > every time the index array gets updated. And save the counter to the
> > > > shadow stack index as well. This way, we only call the return if the
> > > > counter on the stack matches what's in the counter on the counter array
> > > > for the index.    
> > > 
> > > Hmm, but we already know the current stack "header" entry when calling
> > > handlers, don't we? I thought we just calcurate out from curr_ret_stack.  
> > 
> > Basically we have this:
> > 
> >  array: | &fgraph_ops_1 | &fgraph_ops_2 | &fgraph_ops_stub | ...
> > 
> > On entry of function we do:
> >   
> 	push header(including original ret_addr) onto ret_stack

We can't put the ret_addr of the callback on the stack. What if that
ret_addr is a module, and it gets unloaded? We must not call it.

> 
> > 	for (i = 0; i < array_entries; i++) {
> > 		if (array[i]->entryfunc(...)) {
> > 			push i onto ret_stack;
> > 		}
> > 	}
> > 
> > On the return side, we do:
> > 
> > 	idx = pop ret_stack;
> > 
> > 	array[idx]->retfunc(...);  
> 
> So at this point we have the header on ret_stack, don't we? :)
> 
> Anyway, I think we may provide an API for unwinder to find correct
> original return address form ret_stack. That is OK for me.

Yes. In fact, I have something that worked for that. I'll have to test
it some more.


> > > I need only sizeof(unsigned long). If the kretprobe user requires more,
> > > it will be fall back to current method -- get an "instance" and store
> > > its address to the entry :-)  
> > 
> > Awesome, then this shouldn't be too hard to implement.  
> 
> Oops, anyway I noticed that I must store a value on each area so that we can
> identify which kretprobe is using that if there are several kretprobes on same
> function. So, kretprobe implementation will be something like below.
> 
> kretprobe_retfunc(trace, regs)
> {
> 	kp = get_kprobe(trace->func);
> 
> 	if (private == to_kretprobe(kp)) // this is directly mapped to current kprobe
> 		goto found_kretprobe;
> 
> 	if (!list_empty(&kp->list)) {	// we need to find from multiple kretprobes
> 		list_for_each_entry(kp, &kp->list, list)
> 			if (private == kp)
> 				goto found_kretprobe;
> 	}
> 
> 	// Or this must be an instance
> 	struct kretprobe_instance *ri = trace->private;
> 	rp = ri->rp;
> 	if (valid_kretprobe(rp))
> 		rp->handler(ri, regs);
> 	kretprobe_recycle_instance(ri);
> 	goto out;
> 
> found_kretprobe:
> 	struct kretprobe_instance rii = {.rp = to_kretprobe(kp),
> 		.ret_addr=trace->ret, .task = current}
> 	rp->handler(&rii, regs);
> 
> out:
> 	return 0;
> }
> 
> I think we talked about pt_regs, which is redundant for return probe, so it should
> be just a return value. (but how we pass it? trace->retval?)

Yeah, we can add that.

> That is OK for ftrace (but the transition needs more code).
> And I would like to ask ebpf and systemtap people that is OK since it will change
> the kernel ABI.

I agree.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-30  3:24           ` Steven Rostedt
@ 2018-11-30 14:11             ` Masami Hiramatsu
  0 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu @ 2018-11-30 14:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, systemtap,
	Alexei Starovoitov, Daniel Borkmann

On Thu, 29 Nov 2018 22:24:35 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 30 Nov 2018 11:26:58 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > On Thu, 29 Nov 2018 11:46:52 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > On Thu, 29 Nov 2018 23:29:27 +0900
> > > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > >   
> > > > > One way to solve this is to also have a counter array that gets updated
> > > > > every time the index array gets updated. And save the counter to the
> > > > > shadow stack index as well. This way, we only call the return if the
> > > > > counter on the stack matches what's in the counter on the counter array
> > > > > for the index.    
> > > > 
> > > > Hmm, but we already know the current stack "header" entry when calling
> > > > handlers, don't we? I thought we just calcurate out from curr_ret_stack.  
> > > 
> > > Basically we have this:
> > > 
> > >  array: | &fgraph_ops_1 | &fgraph_ops_2 | &fgraph_ops_stub | ...
> > > 
> > > On entry of function we do:
> > >   
> > 	push header(including original ret_addr) onto ret_stack
> 
> We can't put the ret_addr of the callback on the stack. What if that
> ret_addr is a module, and it gets unloaded? We must not call it.

But in that case, how can we recover the original addr on the kernel (real)
stack? I don't call the entry, but kretprobe handler will need the info
to record as a caller-address.

> > 
> > > 	for (i = 0; i < array_entries; i++) {
> > > 		if (array[i]->entryfunc(...)) {
> > > 			push i onto ret_stack;
> > > 		}
> > > 	}
> > > 
> > > On the return side, we do:
> > > 
> > > 	idx = pop ret_stack;
> > > 
> > > 	array[idx]->retfunc(...);  
> > 
> > So at this point we have the header on ret_stack, don't we? :)
> > 
> > Anyway, I think we may provide an API for unwinder to find correct
> > original return address form ret_stack. That is OK for me.
> 
> Yes. In fact, I have something that worked for that. I'll have to test
> it some more.

Great! I think it will be enough for kretprobe.

> > > > I need only sizeof(unsigned long). If the kretprobe user requires more,
> > > > it will be fall back to current method -- get an "instance" and store
> > > > its address to the entry :-)  
> > > 
> > > Awesome, then this shouldn't be too hard to implement.  
> > 
> > Oops, anyway I noticed that I must store a value on each area so that we can
> > identify which kretprobe is using that if there are several kretprobes on same
> > function. So, kretprobe implementation will be something like below.
> > 
> > kretprobe_retfunc(trace, regs)
> > {
> > 	kp = get_kprobe(trace->func);
> > 
> > 	if (private == to_kretprobe(kp)) // this is directly mapped to current kprobe
> > 		goto found_kretprobe;
> > 
> > 	if (!list_empty(&kp->list)) {	// we need to find from multiple kretprobes
> > 		list_for_each_entry(kp, &kp->list, list)
> > 			if (private == kp)
> > 				goto found_kretprobe;
> > 	}
> > 
> > 	// Or this must be an instance
> > 	struct kretprobe_instance *ri = trace->private;
> > 	rp = ri->rp;
> > 	if (valid_kretprobe(rp))
> > 		rp->handler(ri, regs);
> > 	kretprobe_recycle_instance(ri);
> > 	goto out;
> > 
> > found_kretprobe:
> > 	struct kretprobe_instance rii = {.rp = to_kretprobe(kp),
> > 		.ret_addr=trace->ret, .task = current}
> > 	rp->handler(&rii, regs);
> > 
> > out:
> > 	return 0;
> > }
> > 
> > I think we talked about pt_regs, which is redundant for return probe, so it should
> > be just a return value. (but how we pass it? trace->retval?)
> 
> Yeah, we can add that.

OK, then I will start with making a fake pt_regs on stack and call handler,
which will be something like,

	struct pt_regs regs = {};
	regs_set_return_value(&regs, trace->retval);
	rp->handler(ri, &regs);

Thank you,

> > That is OK for ftrace (but the transition needs more code).
> > And I would like to ask ebpf and systemtap people that is OK since it will change
> > the kernel ABI.
> 
> I agree.
> 
> -- Steve


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 03/14 v2] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH
  2018-11-27 19:50     ` Steven Rostedt
@ 2018-12-05 21:50       ` Steven Rostedt
  0 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2018-12-05 21:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland, Catalin Marinas, linux-arm-kernel

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Functions in the set_graph_notrace no longer subtract FTRACE_NOTRACE_DEPTH
from curr_ret_stack, as that is now implemented via the trace_recursion
flags. Access to curr_ret_stack no longer needs to worry about checking for
this. curr_ret_stack is still initialized to -1, when there's not a shadow
stack allocated.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---

Changes since v1: A better change log. The patch itself is the same.

 arch/arm64/kernel/stacktrace.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 4989f7ea1e59..7723dadf25be 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -61,9 +61,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 			(frame->pc == (unsigned long)return_to_handler)) {
 		if (WARN_ON_ONCE(frame->graph == -1))
 			return -EINVAL;
-		if (frame->graph < -1)
-			frame->graph += FTRACE_NOTRACE_DEPTH;
-
 		/*
 		 * This is a case where function graph tracer has
 		 * modified a return address (LR) in a stack frame
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2018-12-05 21:50 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-22  1:27 [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Steven Rostedt
2018-11-22  1:27 ` [RFC][PATCH 01/14] fgraph: Create a fgraph.c file to store function graph infrastructure Steven Rostedt
2018-11-22  1:27 ` [RFC][PATCH 02/14] fgraph: Have set_graph_notrace only affect function_graph tracer Steven Rostedt
2018-11-23  0:01   ` Namhyung Kim
2018-11-23 17:37     ` Steven Rostedt
2018-11-24  5:49       ` Namhyung Kim
2018-11-24 18:41         ` Steven Rostedt
2018-11-26  4:54           ` Namhyung Kim
2018-11-22  1:27 ` [RFC][PATCH 03/14] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH Steven Rostedt
2018-11-27 19:31   ` Will Deacon
2018-11-27 19:50     ` Steven Rostedt
2018-12-05 21:50       ` [PATCH 03/14 v2] " Steven Rostedt
2018-11-22  1:27 ` [RFC][PATCH 04/14] function_graph: Remove the " Steven Rostedt
2018-11-22  1:27 ` [RFC][PATCH 05/14] ftrace: Create new ftrace-internal.h header Steven Rostedt
2018-11-22  1:27 ` [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c Steven Rostedt
2018-11-23  6:11   ` Joel Fernandes
2018-11-23 17:58     ` Steven Rostedt
2018-11-23 18:11       ` Steven Rostedt
2018-11-23 22:13         ` Joel Fernandes
2018-11-26  7:25     ` Masami Hiramatsu
2018-11-22  1:27 ` [RFC][PATCH 07/14] fgraph: Add new fgraph_ops structure to enable function graph hooks Steven Rostedt
2018-11-23  2:59   ` Joel Fernandes
2018-11-23 18:25     ` Steven Rostedt
2018-11-26 11:30   ` Masami Hiramatsu
2018-11-26 21:06     ` Steven Rostedt
2018-11-22  1:27 ` [RFC][PATCH 08/14] function_graph: Remove unused task_curr_ret_stack() Steven Rostedt
2018-11-26  7:40   ` Masami Hiramatsu
2018-11-26 21:26     ` Steven Rostedt
2018-11-26 10:02   ` Joey Pabalinas
2018-11-26 21:27     ` Steven Rostedt
2018-11-26 21:37       ` Joey Pabalinas
2018-11-22  1:27 ` [RFC][PATCH 09/14] function_graph: Move ftrace_graph_get_addr() to fgraph.c Steven Rostedt
2018-11-23  3:13   ` Joel Fernandes
2018-11-23 19:25     ` Steven Rostedt
2018-11-22  1:27 ` [RFC][PATCH 10/14] function_graph: Have profiler use new helper ftrace_graph_get_ret_stack() Steven Rostedt
2018-11-22  1:27 ` [RFC][PATCH 11/14] function_graph: Convert ret_stack to a series of longs Steven Rostedt
2018-11-24  5:31   ` Joel Fernandes
2018-11-26 16:07     ` Masami Hiramatsu
2018-11-26 16:26       ` Steven Rostedt
2018-11-28  1:38         ` Joel Fernandes
2018-11-26 21:31     ` Steven Rostedt
2018-11-22  1:27 ` [RFC][PATCH 12/14] function_graph: Add an array structure that will allow multiple callbacks Steven Rostedt
2018-11-22  1:27 ` [RFC][PATCH 13/14] function_graph: Allow multiple users to attach to function graph Steven Rostedt
2018-11-22  1:27 ` [RFC][PATCH 14/14] function_graph: Allow for more than one callback to be registered Steven Rostedt
2018-11-22 10:08 ` [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Peter Zijlstra
2018-11-22 12:46   ` Steven Rostedt
2018-11-22 13:42     ` Peter Zijlstra
2018-11-26  9:21 ` Masami Hiramatsu
2018-11-26 16:32   ` Steven Rostedt
2018-11-29 14:29     ` Masami Hiramatsu
2018-11-29 16:46       ` Steven Rostedt
2018-11-30  2:26         ` Masami Hiramatsu
2018-11-30  3:24           ` Steven Rostedt
2018-11-30 14:11             ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).