All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph
@ 2023-11-05 16:06 Masami Hiramatsu (Google)
  2023-11-05 16:06 ` [RFC PATCH 01/32] seq_buf: Export seq_buf_puts() Masami Hiramatsu (Google)
                   ` (31 more replies)
  0 siblings, 32 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:06 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

Hi,

Here is the renewed series to re-implement the fprobe on function-graph
tracer.
Previously it was discussed as the series to use ftrace_regs instead of
pt_regs in fprobe. The previous series is here;

https://lore.kernel.org/all/169556254640.146934.5654329452696494756.stgit@devnote2/

This RFC series does major 2 changes, enable multiple function-graphs on
the ftrace (e.g. allow function-graph on sub instances) and rewrite the
fprobe on this function-graph.

The former changes had been sent from Steven Rostedt 4 years ago (*),
which allows users to set different setting function-graph tracer (and
other tracers based on function-graph) in each trace-instances at the
same time.

(*) https://lore.kernel.org/all/20190525031633.811342628@goodmis.org/

The purpose of latter change are;

 1) Remove dependency of the rethook from fprobe so that we can reduce
   the return hook code and shadow stack.

 2) Make 'ftrace_regs' the common trace interface for the function
   boundary.

1) Currently we have 2(or 3) different function return hook codes,
 the function-graph tracer and rethook (and legacy kretprobe).
 But since this  is redundant and needs double maintenance cost,
 I would like to unify those. From the user's viewpoint, function-
 graph tracer is very useful to grasp the execution path. For this
 purpose, it is hard to use the rethook in the function-graph
 tracer, but the opposite is possible. (Strictly speaking, kretprobe
 can not use it because it requires 'pt_regs' for historical reasons.)

2) Now the fprobe provides the 'pt_regs' for its handler, but that is
 wrong for the function entry and exit. Moreover, depending on the
 architecture, there is no way to accurately reproduce 'pt_regs'
 outside of interrupt or exception handlers. This means fprobe should
 not use 'pt_regs' because it does not use such exceptions.
 (Conversely, kprobe should use 'pt_regs' because it is an abstract
  interface of the software breakpoint exception.)

This series changes fprobe to use function-graph tracer for tracing
function entry and exit, instead of mixture of ftrace and rethook.
Unlike the rethook which is a per-task list of system-wide allocated
nodes, the function graph's ret_stack is a per-task shadow stack.
Thus it does not need to set 'nr_maxactive' (which is the number of
pre-allocated nodes).
Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'.
Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as
their register interface, this changes it to convert 'ftrace_regs' to
'pt_regs'. Of course this conversion makes an incomplete 'pt_regs',
so users must access only registers for function parameters or
return value. 

Design:
Instead of using ftrace's function entry hook directly, the new fprobe
is built on top of the function-graph's entry and return callbacks
with 'ftrace_regs'.

Since the fprobe requires access to 'ftrace_regs', the architecture
must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, which enables to
call function-graph entry callback with 'ftrace_regs', and also
CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to
return_to_handler.

All fprobes share a single function-graph ops (means shares a common
ftrace filter) similar to the kprobe-on-ftrace. This needs another
layer to find corresponding fprobe in the common function-graph
callbacks, but has much better scalability, since the number of
registered function-graph ops is limited.

In the entry callback, the fprobe runs its entry_handler and saves the
address of 'fprobe' on the function-graph's shadow stack as data. The
return callback decodes the data to get the 'fprobe' address, and runs
the exit_handler.

The fprobe introduces two hash-tables, one is for entry callback which
searches fprobes related to the given function address passed by entry
callback. The other is for a return callback which checks if the given
'fprobe' data structure pointer is still valid. Note that it is
possible to unregister fprobe before the return callback runs. Thus
the address validation must be done before using it in the return
callback.

Series:
Patch [1/32] to [16/32] are forward port of the multiple function-graph
sent by Steven Rostedt.
[17/32] to [21/32] are bug fixes for the multiple function-graph series
(These will be merged to the corresponding patches. In this version
 I just split it to show what is fixed/changed).
[22/32] and [24/32] adds new function-graph callbacks with ftrace_regs
and x86-64 implementation.
[25/32] to [28/32] are preparation (adding util functions) of the new
fprobe and its user. [29/32] to [31/32] rewrites fprobes and updates
its users. [32/32] is a documentation update.

This series can be applied against the probes-fixes-v6.6-rc7 on linux-trace tree.

This series can also be found below branch.

https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-fgraph

Thank you,

---

Christophe JAILLET (1):
      seq_buf: Export seq_buf_puts()

Masami Hiramatsu (Google) (16):
      function_graph: Fix to update curr_ret_stack with ARRAY entry size
      function_graph: Fix to initalize ftrace_ops for fgraph with ftrace_graph_func
      function_graph: Fix to check the return value of ftrace_pop_return_trace()
      function_graph: Pass the size of reserved data when retrieving it
      function_graph: Expand the max reserved data size
      function_graph: Add a new entry handler with parent_ip and ftrace_regs
      function_graph: Add a new exit handler with parent_ip and ftrace_regs
      x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
      fprobe: Use ftrace_regs in fprobe entry handler
      fprobe: Use ftrace_regs in fprobe exit handler
      tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
      tracing: Add ftrace_fill_perf_regs() for perf event
      fprobe: Rewrite fprobe on function-graph tracer
      tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
      bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
      Documentation: probes: Update fprobe on function-graph tracer

Steven Rostedt (VMware) (15):
      function_graph: Convert ret_stack to a series of longs
      fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long
      function_graph: Add an array structure that will allow multiple callbacks
      function_graph: Allow multiple users to attach to function graph
      function_graph: Remove logic around ftrace_graph_entry and return
      ftrace/function_graph: Pass fgraph_ops to function graph callbacks
      ftrace: Allow function_graph tracer to be enabled in instances
      ftrace: Allow ftrace startup flags exist without dynamic ftrace
      function_graph: Have the instances use their own ftrace_ops for filtering
      function_graph: Add "task variables" per task for fgraph_ops
      function_graph: Move set_graph_function tests to shadow stack global var
      function_graph: Move graph depth stored data to shadow stack global var
      function_graph: Move graph notrace bit to shadow stack global var
      function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
      function_graph: Add selftest for passing local variables


 Documentation/trace/fprobe.rst           |   42 +
 arch/arm64/include/asm/ftrace.h          |   18 +
 arch/arm64/kernel/ftrace.c               |    8 
 arch/loongarch/Kconfig                   |    1 
 arch/loongarch/kernel/ftrace_dyn.c       |    6 
 arch/powerpc/include/asm/ftrace.h        |    7 
 arch/powerpc/kernel/trace/ftrace.c       |    2 
 arch/powerpc/kernel/trace/ftrace_64_pg.c |   10 
 arch/s390/Kconfig                        |    1 
 arch/s390/include/asm/ftrace.h           |    5 
 arch/x86/Kconfig                         |    4 
 arch/x86/include/asm/ftrace.h            |    9 
 arch/x86/kernel/ftrace.c                 |   50 +-
 arch/x86/kernel/ftrace_64.S              |   30 +
 include/linux/fprobe.h                   |   58 +-
 include/linux/ftrace.h                   |  132 ++++
 include/linux/sched.h                    |    2 
 include/linux/trace_recursion.h          |   39 -
 kernel/trace/Kconfig                     |   19 -
 kernel/trace/bpf_trace.c                 |   14 
 kernel/trace/fgraph.c                    |  945 ++++++++++++++++++++++++------
 kernel/trace/fprobe.c                    |  631 ++++++++++++++------
 kernel/trace/ftrace.c                    |   13 
 kernel/trace/ftrace_internal.h           |    2 
 kernel/trace/trace.h                     |   93 ++-
 kernel/trace/trace_fprobe.c              |   70 +-
 kernel/trace/trace_functions.c           |    8 
 kernel/trace/trace_functions_graph.c     |   96 ++-
 kernel/trace/trace_irqsoff.c             |   10 
 kernel/trace/trace_probe_tmpl.h          |    2 
 kernel/trace/trace_sched_wakeup.c        |   10 
 kernel/trace/trace_selftest.c            |  176 +++++-
 lib/seq_buf.c                            |    1 
 lib/test_fprobe.c                        |   51 --
 samples/fprobe/fprobe_example.c          |    4 
 35 files changed, 1915 insertions(+), 654 deletions(-)

--
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC PATCH 01/32] seq_buf: Export seq_buf_puts()
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
@ 2023-11-05 16:06 ` Masami Hiramatsu (Google)
  2023-11-05 16:06 ` [RFC PATCH 02/32] function_graph: Convert ret_stack to a series of longs Masami Hiramatsu (Google)
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:06 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Mark seq_buf_puts() which is part of the seq_buf API to be exported to
kernel loadable GPL modules.

Link: https://lkml.kernel.org/r/b9e3737f66ec2450221b492048ce0d9c65c84953.1698861216.git.christophe.jaillet@wanadoo.fr

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 lib/seq_buf.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 45c450f423fa..46a1b00c3815 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -189,6 +189,7 @@ int seq_buf_puts(struct seq_buf *s, const char *str)
 	seq_buf_set_overflow(s);
 	return -1;
 }
+EXPORT_SYMBOL_GPL(seq_buf_puts);
 
 /**
  * seq_buf_putc - sequence printing of simple character


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 02/32] function_graph: Convert ret_stack to a series of longs
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
  2023-11-05 16:06 ` [RFC PATCH 01/32] seq_buf: Export seq_buf_puts() Masami Hiramatsu (Google)
@ 2023-11-05 16:06 ` Masami Hiramatsu (Google)
  2023-11-05 16:07 ` [RFC PATCH 03/32] fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long Masami Hiramatsu (Google)
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:06 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

In order to make it possible to have multiple callbacks registered with the
function_graph tracer, the retstack needs to be converted from an array of
ftrace_ret_stack structures to an array of longs. This will allow to store
the list of callbacks on the stack for the return side of the functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/sched.h |    2 -
 kernel/trace/fgraph.c |  124 ++++++++++++++++++++++++++++---------------------
 2 files changed, 71 insertions(+), 55 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 77f01ac385f7..3af00d726847 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1386,7 +1386,7 @@ struct task_struct {
 	int				curr_ret_depth;
 
 	/* Stack of return addresses for return function tracing: */
-	struct ftrace_ret_stack		*ret_stack;
+	unsigned long			*ret_stack;
 
 	/* Timestamp for last schedule: */
 	unsigned long long		ftrace_timestamp;
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index c83c005e654e..30edeb6d4aa9 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -25,6 +25,18 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
+#define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
+#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_SIZE (PAGE_SIZE)
+#define SHADOW_STACK_INDEX			\
+	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+/* Leave on a buffer at the end */
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+
+#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
+#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
+#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
+
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
@@ -69,6 +81,7 @@ static int
 ftrace_push_return_trace(unsigned long ret, unsigned long func,
 			 unsigned long frame_pointer, unsigned long *retp)
 {
+	struct ftrace_ret_stack *ret_stack;
 	unsigned long long calltime;
 	int index;
 
@@ -85,23 +98,25 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
 	smp_rmb();
 
 	/* The return trace stack is full */
-	if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+	if (current->curr_ret_stack >= SHADOW_STACK_MAX_INDEX) {
 		atomic_inc(&current->trace_overrun);
 		return -EBUSY;
 	}
 
 	calltime = trace_clock_local();
 
-	index = ++current->curr_ret_stack;
+	index = current->curr_ret_stack;
+	RET_STACK_INC(current->curr_ret_stack);
+	ret_stack = RET_STACK(current, index);
 	barrier();
-	current->ret_stack[index].ret = ret;
-	current->ret_stack[index].func = func;
-	current->ret_stack[index].calltime = calltime;
+	ret_stack->ret = ret;
+	ret_stack->func = func;
+	ret_stack->calltime = calltime;
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
-	current->ret_stack[index].fp = frame_pointer;
+	ret_stack->fp = frame_pointer;
 #endif
 #ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-	current->ret_stack[index].retp = retp;
+	ret_stack->retp = retp;
 #endif
 	return 0;
 }
@@ -148,7 +163,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 
 	return 0;
  out_ret:
-	current->curr_ret_stack--;
+	RET_STACK_DEC(current->curr_ret_stack);
  out:
 	current->curr_ret_depth--;
 	return -EBUSY;
@@ -159,11 +174,13 @@ static void
 ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 			unsigned long frame_pointer)
 {
+	struct ftrace_ret_stack *ret_stack;
 	int index;
 
 	index = current->curr_ret_stack;
+	RET_STACK_DEC(index);
 
-	if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
+	if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
 		ftrace_graph_stop();
 		WARN_ON(1);
 		/* Might as well panic, otherwise we have no where to go */
@@ -171,6 +188,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 		return;
 	}
 
+	ret_stack = RET_STACK(current, index);
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	/*
 	 * The arch may choose to record the frame pointer used
@@ -186,22 +204,22 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 	 * Note, -mfentry does not use frame pointers, and this test
 	 *  is not needed if CC_USING_FENTRY is set.
 	 */
-	if (unlikely(current->ret_stack[index].fp != frame_pointer)) {
+	if (unlikely(ret_stack->fp != frame_pointer)) {
 		ftrace_graph_stop();
 		WARN(1, "Bad frame pointer: expected %lx, received %lx\n"
 		     "  from func %ps return to %lx\n",
 		     current->ret_stack[index].fp,
 		     frame_pointer,
-		     (void *)current->ret_stack[index].func,
-		     current->ret_stack[index].ret);
+		     (void *)ret_stack->func,
+		     ret_stack->ret);
 		*ret = (unsigned long)panic;
 		return;
 	}
 #endif
 
-	*ret = current->ret_stack[index].ret;
-	trace->func = current->ret_stack[index].func;
-	trace->calltime = current->ret_stack[index].calltime;
+	*ret = ret_stack->ret;
+	trace->func = ret_stack->func;
+	trace->calltime = ret_stack->calltime;
 	trace->overrun = atomic_read(&current->trace_overrun);
 	trace->depth = current->curr_ret_depth--;
 	/*
@@ -262,7 +280,7 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 	 * curr_ret_stack is after that.
 	 */
 	barrier();
-	current->curr_ret_stack--;
+	RET_STACK_DEC(current->curr_ret_stack);
 
 	if (unlikely(!ret)) {
 		ftrace_graph_stop();
@@ -305,12 +323,13 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 struct ftrace_ret_stack *
 ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
 {
-	idx = task->curr_ret_stack - idx;
+	int index = task->curr_ret_stack;
 
-	if (idx >= 0 && idx <= task->curr_ret_stack)
-		return &task->ret_stack[idx];
+	index -= FGRAPH_RET_INDEX * (idx + 1);
+	if (index < 0)
+		return NULL;
 
-	return NULL;
+	return RET_STACK(task, index);
 }
 
 /**
@@ -332,18 +351,20 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp)
 {
+	struct ftrace_ret_stack *ret_stack;
 	int index = task->curr_ret_stack;
 	int i;
 
 	if (ret != (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
 		return ret;
 
-	if (index < 0)
-		return ret;
+	RET_STACK_DEC(index);
 
-	for (i = 0; i <= index; i++)
-		if (task->ret_stack[i].retp == retp)
-			return task->ret_stack[i].ret;
+	for (i = index; i >= 0; RET_STACK_DEC(i)) {
+		ret_stack = RET_STACK(task, i);
+		if (ret_stack->retp == retp)
+			return ret_stack->ret;
+	}
 
 	return ret;
 }
@@ -357,14 +378,15 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 		return ret;
 
 	task_idx = task->curr_ret_stack;
+	RET_STACK_DEC(task_idx);
 
 	if (!task->ret_stack || task_idx < *idx)
 		return ret;
 
 	task_idx -= *idx;
-	(*idx)++;
+	RET_STACK_INC(*idx);
 
-	return task->ret_stack[task_idx].ret;
+	return RET_STACK(task, task_idx);
 }
 #endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
 
@@ -402,7 +424,7 @@ trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
 static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
 
 /* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
-static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
+static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
 {
 	int i;
 	int ret = 0;
@@ -410,10 +432,7 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
 	struct task_struct *g, *t;
 
 	for (i = 0; i < FTRACE_RETSTACK_ALLOC_SIZE; i++) {
-		ret_stack_list[i] =
-			kmalloc_array(FTRACE_RETFUNC_DEPTH,
-				      sizeof(struct ftrace_ret_stack),
-				      GFP_KERNEL);
+		ret_stack_list[i] = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 		if (!ret_stack_list[i]) {
 			start = 0;
 			end = i;
@@ -431,9 +450,9 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
 
 		if (t->ret_stack == NULL) {
 			atomic_set(&t->trace_overrun, 0);
-			t->curr_ret_stack = -1;
+			t->curr_ret_stack = 0;
 			t->curr_ret_depth = -1;
-			/* Make sure the tasks see the -1 first: */
+			/* Make sure the tasks see the 0 first: */
 			smp_wmb();
 			t->ret_stack = ret_stack_list[start++];
 		}
@@ -453,6 +472,7 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 				struct task_struct *next,
 				unsigned int prev_state)
 {
+	struct ftrace_ret_stack *ret_stack;
 	unsigned long long timestamp;
 	int index;
 
@@ -477,8 +497,11 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 	 */
 	timestamp -= next->ftrace_timestamp;
 
-	for (index = next->curr_ret_stack; index >= 0; index--)
-		next->ret_stack[index].calltime += timestamp;
+	for (index = next->curr_ret_stack - FGRAPH_RET_INDEX; index >= 0; ) {
+		ret_stack = RET_STACK(next, index);
+		ret_stack->calltime += timestamp;
+		index -= FGRAPH_RET_INDEX;
+	}
 }
 
 static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
@@ -521,10 +544,10 @@ void update_function_graph_func(void)
 		ftrace_graph_entry = __ftrace_graph_entry;
 }
 
-static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack);
+static DEFINE_PER_CPU(unsigned long *, idle_ret_stack);
 
 static void
-graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
+graph_init_task(struct task_struct *t, unsigned long *ret_stack)
 {
 	atomic_set(&t->trace_overrun, 0);
 	t->ftrace_timestamp = 0;
@@ -539,7 +562,7 @@ graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
  */
 void ftrace_graph_init_idle_task(struct task_struct *t, int cpu)
 {
-	t->curr_ret_stack = -1;
+	t->curr_ret_stack = 0;
 	t->curr_ret_depth = -1;
 	/*
 	 * The idle task has no parent, it either has its own
@@ -549,14 +572,11 @@ void ftrace_graph_init_idle_task(struct task_struct *t, int cpu)
 		WARN_ON(t->ret_stack != per_cpu(idle_ret_stack, cpu));
 
 	if (ftrace_graph_active) {
-		struct ftrace_ret_stack *ret_stack;
+		unsigned long *ret_stack;
 
 		ret_stack = per_cpu(idle_ret_stack, cpu);
 		if (!ret_stack) {
-			ret_stack =
-				kmalloc_array(FTRACE_RETFUNC_DEPTH,
-					      sizeof(struct ftrace_ret_stack),
-					      GFP_KERNEL);
+			ret_stack = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 			if (!ret_stack)
 				return;
 			per_cpu(idle_ret_stack, cpu) = ret_stack;
@@ -570,15 +590,13 @@ void ftrace_graph_init_task(struct task_struct *t)
 {
 	/* Make sure we do not use the parent ret_stack */
 	t->ret_stack = NULL;
-	t->curr_ret_stack = -1;
+	t->curr_ret_stack = 0;
 	t->curr_ret_depth = -1;
 
 	if (ftrace_graph_active) {
-		struct ftrace_ret_stack *ret_stack;
+		unsigned long *ret_stack;
 
-		ret_stack = kmalloc_array(FTRACE_RETFUNC_DEPTH,
-					  sizeof(struct ftrace_ret_stack),
-					  GFP_KERNEL);
+		ret_stack = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 		if (!ret_stack)
 			return;
 		graph_init_task(t, ret_stack);
@@ -587,7 +605,7 @@ void ftrace_graph_init_task(struct task_struct *t)
 
 void ftrace_graph_exit_task(struct task_struct *t)
 {
-	struct ftrace_ret_stack	*ret_stack = t->ret_stack;
+	unsigned long *ret_stack = t->ret_stack;
 
 	t->ret_stack = NULL;
 	/* NULL must become visible to IRQs before we free it: */
@@ -599,12 +617,10 @@ void ftrace_graph_exit_task(struct task_struct *t)
 /* Allocate a return stack for each task */
 static int start_graph_tracing(void)
 {
-	struct ftrace_ret_stack **ret_stack_list;
+	unsigned long **ret_stack_list;
 	int ret, cpu;
 
-	ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE,
-				       sizeof(struct ftrace_ret_stack *),
-				       GFP_KERNEL);
+	ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 
 	if (!ret_stack_list)
 		return -ENOMEM;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 03/32] fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
  2023-11-05 16:06 ` [RFC PATCH 01/32] seq_buf: Export seq_buf_puts() Masami Hiramatsu (Google)
  2023-11-05 16:06 ` [RFC PATCH 02/32] function_graph: Convert ret_stack to a series of longs Masami Hiramatsu (Google)
@ 2023-11-05 16:07 ` Masami Hiramatsu (Google)
  2023-11-05 16:07 ` [RFC PATCH 04/32] function_graph: Add an array structure that will allow multiple callbacks Masami Hiramatsu (Google)
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:07 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

Instead of using "ALIGN()", use BUILD_BUG_ON() as the structures should
always be divisible by sizeof(long).

Link: http://lkml.kernel.org/r/20190524111144.GI2589@hirez.programming.kicks-ass.net

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/fgraph.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 30edeb6d4aa9..837daf929d2a 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -26,10 +26,9 @@
 #endif
 
 #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
-#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define FGRAPH_RET_INDEX (FGRAPH_RET_SIZE / sizeof(long))
 #define SHADOW_STACK_SIZE (PAGE_SIZE)
-#define SHADOW_STACK_INDEX			\
-	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
 /* Leave on a buffer at the end */
 #define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
 
@@ -91,6 +90,8 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
 	if (!current->ret_stack)
 		return -EBUSY;
 
+	BUILD_BUG_ON(SHADOW_STACK_SIZE % sizeof(long));
+
 	/*
 	 * We must make sure the ret_stack is tested before we read
 	 * anything else.
@@ -325,6 +326,8 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
 {
 	int index = task->curr_ret_stack;
 
+	BUILD_BUG_ON(FGRAPH_RET_SIZE % sizeof(long));
+
 	index -= FGRAPH_RET_INDEX * (idx + 1);
 	if (index < 0)
 		return NULL;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 04/32] function_graph: Add an array structure that will allow multiple callbacks
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (2 preceding siblings ...)
  2023-11-05 16:07 ` [RFC PATCH 03/32] fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long Masami Hiramatsu (Google)
@ 2023-11-05 16:07 ` Masami Hiramatsu (Google)
  2023-11-05 16:07 ` [RFC PATCH 05/32] function_graph: Allow multiple users to attach to function graph Masami Hiramatsu (Google)
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:07 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

Add an array structure that will eventually allow the function graph tracer
to have up to 16 simultaneous callbacks attached. It's an array of 16
fgraph_ops pointers, that is assigned when one is registered. On entry of a
function the entry of the first item in the array is called, and if it
returns zero, then the callback returns non zero if it wants the return
callback to be called on exit of the function.

The array will simplify the process of having more than one callback
attached to the same function, as its index into the array can be stored on
the shadow stack. We need to only save the index, because this will allow
the fgraph_ops to be freed before the function returns (which may happen if
the function call schedule for a long time).

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/fgraph.c |  115 +++++++++++++++++++++++++++++++++++--------------
 1 file changed, 82 insertions(+), 33 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 837daf929d2a..b34ab7ecd7af 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -39,6 +39,11 @@
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
+static int fgraph_array_cnt;
+#define FGRAPH_ARRAY_SIZE	16
+
+static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
+
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
@@ -62,6 +67,20 @@ int __weak ftrace_disable_ftrace_graph_caller(void)
 }
 #endif
 
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+{
+	return 0;
+}
+
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+{
+}
+
+static struct fgraph_ops fgraph_stub = {
+	.entryfunc = ftrace_graph_entry_stub,
+	.retfunc = ftrace_graph_ret_stub,
+};
+
 /**
  * ftrace_graph_stop - set to permanently disable function graph tracing
  *
@@ -159,7 +178,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 		goto out;
 
 	/* Only trace if the calling function expects to */
-	if (!ftrace_graph_entry(&trace))
+	if (!fgraph_array[0]->entryfunc(&trace))
 		goto out_ret;
 
 	return 0;
@@ -274,7 +293,7 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 	trace.retval = fgraph_ret_regs_return_value(ret_regs);
 #endif
 	trace.rettime = trace_clock_local();
-	ftrace_graph_return(&trace);
+	fgraph_array[0]->retfunc(&trace);
 	/*
 	 * The ftrace_graph_return() may still access the current
 	 * ret_stack structure, we need to make sure the update of
@@ -410,11 +429,6 @@ void ftrace_graph_sleep_time_control(bool enable)
 	fgraph_sleep_time = enable;
 }
 
-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
-{
-	return 0;
-}
-
 /*
  * Simply points to ftrace_stub, but with the proper protocol.
  * Defined by the linker script in linux/vmlinux.lds.h
@@ -652,37 +666,55 @@ static int start_graph_tracing(void)
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
 	int ret = 0;
+	int i;
 
 	mutex_lock(&ftrace_lock);
 
-	/* we currently allow only one tracer registered at a time */
-	if (ftrace_graph_active) {
+	if (!fgraph_array[0]) {
+		/* The array must always have real data on it */
+		for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+			fgraph_array[i] = &fgraph_stub;
+		}
+	}
+
+	/* Look for an available spot */
+	for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+		if (fgraph_array[i] == &fgraph_stub)
+			break;
+	}
+	if (i >= FGRAPH_ARRAY_SIZE) {
 		ret = -EBUSY;
 		goto out;
 	}
 
-	register_pm_notifier(&ftrace_suspend_notifier);
+	fgraph_array[i] = gops;
+	if (i + 1 > fgraph_array_cnt)
+		fgraph_array_cnt = i + 1;
 
 	ftrace_graph_active++;
-	ret = start_graph_tracing();
-	if (ret) {
-		ftrace_graph_active--;
-		goto out;
-	}
 
-	ftrace_graph_return = gops->retfunc;
+	if (ftrace_graph_active == 1) {
+		register_pm_notifier(&ftrace_suspend_notifier);
+		ret = start_graph_tracing();
+		if (ret) {
+			ftrace_graph_active--;
+			goto out;
+		}
 
-	/*
-	 * Update the indirect function to the entryfunc, and the
-	 * function that gets called to the entry_test first. Then
-	 * call the update fgraph entry function to determine if
-	 * the entryfunc should be called directly or not.
-	 */
-	__ftrace_graph_entry = gops->entryfunc;
-	ftrace_graph_entry = ftrace_graph_entry_test;
-	update_function_graph_func();
+		ftrace_graph_return = gops->retfunc;
+
+		/*
+		 * Update the indirect function to the entryfunc, and the
+		 * function that gets called to the entry_test first. Then
+		 * call the update fgraph entry function to determine if
+		 * the entryfunc should be called directly or not.
+		 */
+		__ftrace_graph_entry = gops->entryfunc;
+		ftrace_graph_entry = ftrace_graph_entry_test;
+		update_function_graph_func();
 
-	ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+		ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+	}
 out:
 	mutex_unlock(&ftrace_lock);
 	return ret;
@@ -690,19 +722,36 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 
 void unregister_ftrace_graph(struct fgraph_ops *gops)
 {
+	int i;
+
 	mutex_lock(&ftrace_lock);
 
 	if (unlikely(!ftrace_graph_active))
 		goto out;
 
-	ftrace_graph_active--;
-	ftrace_graph_return = ftrace_stub_graph;
-	ftrace_graph_entry = ftrace_graph_entry_stub;
-	__ftrace_graph_entry = ftrace_graph_entry_stub;
-	ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
-	unregister_pm_notifier(&ftrace_suspend_notifier);
-	unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
+	for (i = 0; i < fgraph_array_cnt; i++)
+		if (gops == fgraph_array[i])
+			break;
+	if (i >= fgraph_array_cnt)
+		goto out;
+
+	fgraph_array[i] = &fgraph_stub;
+	if (i + 1 == fgraph_array_cnt) {
+		for (; i >= 0; i--)
+			if (fgraph_array[i] != &fgraph_stub)
+				break;
+		fgraph_array_cnt = i + 1;
+	}
 
+	ftrace_graph_active--;
+	if (!ftrace_graph_active) {
+		ftrace_graph_return = ftrace_stub_graph;
+		ftrace_graph_entry = ftrace_graph_entry_stub;
+		__ftrace_graph_entry = ftrace_graph_entry_stub;
+		ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
+		unregister_pm_notifier(&ftrace_suspend_notifier);
+		unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
+	}
  out:
 	mutex_unlock(&ftrace_lock);
 }


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 05/32] function_graph: Allow multiple users to attach to function graph
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (3 preceding siblings ...)
  2023-11-05 16:07 ` [RFC PATCH 04/32] function_graph: Add an array structure that will allow multiple callbacks Masami Hiramatsu (Google)
@ 2023-11-05 16:07 ` Masami Hiramatsu (Google)
  2023-11-05 16:07 ` [RFC PATCH 06/32] function_graph: Remove logic around ftrace_graph_entry and return Masami Hiramatsu (Google)
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:07 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

Allow for multiple users to attach to function graph tracer at the same
time. Only 16 simultaneous users can attach to the tracer. This is because
there's an array that stores the pointers to the attached fgraph_ops. When a
a function being traced is entered, each of the ftrace_ops entryfunc is
called and if it returns non zero, its index into the array will be added to
the shadow stack.

On exit of the function being traced, the shadow stack will contain the
indexes of the ftrace_ops on the array that want their retfunc to be called.

Because a function may sleep for a long time (if a task sleeps itself), the
return of the function may be literally days later. If the ftrace_ops is
removed, its place on the array is replaced with a ftrace_ops that contains
the stub functions and that will be called when the function finally
returns.

If another ftrace_ops is added that happens to get the same index into the
array, its return function may be called. But that's actually the way things
current work with the old function graph tracer. If one tracer is removed
and another is added, the new one will get the return calls of the function
traced by the previous one, thus this is not a regression. This can be fixed
by adding a counter to each time the array item is updated and save that on
the shadow stack as well, such that it wont be called if the index saved
does not match the index on the array.

Note, being able to filter functions when both are called is not completely
handled yet, but that shouldn't be too hard to manage.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/fgraph.c |  332 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 280 insertions(+), 52 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index b34ab7ecd7af..58750a395313 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -27,23 +27,144 @@
 
 #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
 #define FGRAPH_RET_INDEX (FGRAPH_RET_SIZE / sizeof(long))
+
+/*
+ * On entry to a function (via function_graph_enter()), a new ftrace_ret_stack
+ * is allocated on the task's ret_stack, then each fgraph_ops on the
+ * fgraph_array[]'s entryfunc is called and if that returns non-zero, the
+ * index into the fgraph_array[] for that fgraph_ops is added to the ret_stack.
+ * As the associated ftrace_ret_stack saved for those fgraph_ops needs to
+ * be found, the index to it is also added to the ret_stack along with the
+ * index of the fgraph_array[] to each fgraph_ops that needs their retfunc
+ * called.
+ *
+ * The top of the ret_stack (when not empty) will always have a reference
+ * to the last ftrace_ret_stack saved. All references to the
+ * ftrace_ret_stack has the format of:
+ *
+ * bits:  0 - 13	Index in words from the previous ftrace_ret_stack
+ * bits: 14 - 15	Type of storage
+ *			  0 - reserved
+ *			  1 - fgraph_array index
+ * For fgraph_array_index:
+ *  bits: 16 - 23	The fgraph_ops fgraph_array index
+ *
+ * That is, at the end of function_graph_enter, if the first and forth
+ * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
+ * on the return of the function being traced, this is what will be on the
+ * task's shadow ret_stack: (the stack grows upward)
+ *
+ * |                                  | <- task->curr_ret_stack
+ * +----------------------------------+
+ * | (3 << FGRAPH_ARRAY_SHIFT)|(2)    | ( 3 for index of fourth fgraph_ops)
+ * +----------------------------------+
+ * | (0 << FGRAPH_ARRAY_SHIFT)|(1)    | ( 0 for index of first fgraph_ops)
+ * +----------------------------------+
+ * | struct ftrace_ret_stack          |
+ * |   (stores the saved ret pointer) |
+ * +----------------------------------+
+ * |             (X) | (N)            | ( N words away from previous ret_stack)
+ * |                                  |
+ *
+ * If a backtrace is required, and the real return pointer needs to be
+ * fetched, then it looks at the task's curr_ret_stack index, if it
+ * is greater than zero, it would subtact one, and then mask the value
+ * on the ret_stack by FGRAPH_RET_INDEX_MASK and subtract FGRAPH_RET_INDEX
+ * from that, to get the index of the ftrace_ret_stack structure stored
+ * on the shadow stack.
+ */
+
+#define FGRAPH_RET_INDEX_SIZE	14
+#define FGRAPH_RET_INDEX_MASK	((1 << FGRAPH_RET_INDEX_SIZE) - 1)
+
+
+#define FGRAPH_TYPE_SIZE	2
+#define FGRAPH_TYPE_MASK	((1 << FGRAPH_TYPE_SIZE) - 1)
+#define FGRAPH_TYPE_SHIFT	FGRAPH_RET_INDEX_SIZE
+
+enum {
+	FGRAPH_TYPE_RESERVED	= 0,
+	FGRAPH_TYPE_ARRAY	= 1,
+};
+
+#define FGRAPH_ARRAY_SIZE	16
+#define FGRAPH_ARRAY_MASK	((1 << FGRAPH_ARRAY_SIZE) - 1)
+#define FGRAPH_ARRAY_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
+
+/* Currently the max stack index can't be more than register callers */
+#define FGRAPH_MAX_INDEX	FGRAPH_ARRAY_SIZE
+
+#define FGRAPH_FRAME_SIZE (FGRAPH_RET_SIZE + FGRAPH_ARRAY_SIZE * (sizeof(long)))
+#define FGRAPH_FRAME_INDEX (ALIGN(FGRAPH_FRAME_SIZE,		\
+				  sizeof(long)) / sizeof(long))
 #define SHADOW_STACK_SIZE (PAGE_SIZE)
 #define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
 /* Leave on a buffer at the end */
-#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1))
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
-#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
-#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
 
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
 static int fgraph_array_cnt;
-#define FGRAPH_ARRAY_SIZE	16
 
 static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
 
+static inline int get_ret_stack_index(struct task_struct *t, int offset)
+{
+	return current->ret_stack[offset] & FGRAPH_RET_INDEX_MASK;
+}
+
+static inline int get_fgraph_type(struct task_struct *t, int offset)
+{
+	return (current->ret_stack[offset] >> FGRAPH_TYPE_SHIFT) &
+		FGRAPH_TYPE_MASK;
+}
+
+static inline int get_fgraph_array(struct task_struct *t, int offset)
+{
+	return (current->ret_stack[offset] >> FGRAPH_ARRAY_SHIFT) &
+		FGRAPH_ARRAY_MASK;
+}
+
+/*
+ * @offset: The index into @t->ret_stack to find the ret_stack entry
+ * @index: Where to place the index into @t->ret_stack of that entry
+ *
+ * Calling this with:
+ *
+ *   offset = task->curr_ret_stack;
+ *   do {
+ *	ret_stack = get_ret_stack(task, offset, &offset);
+ *   } while (ret_stack);
+ *
+ * Will iterate through all the ret_stack entries from curr_ret_stack
+ * down to the first one.
+ */
+static inline struct ftrace_ret_stack *
+get_ret_stack(struct task_struct *t, int offset, int *index)
+{
+	int idx;
+
+	BUILD_BUG_ON(FGRAPH_RET_SIZE % sizeof(long));
+
+	if (offset <= 0)
+		return NULL;
+
+	idx = get_ret_stack_index(t, offset - 1);
+
+	if (idx <= 0 || idx > FGRAPH_MAX_INDEX)
+		return NULL;
+
+	offset -= idx + FGRAPH_RET_INDEX;
+	if (offset < 0)
+		return NULL;
+
+	*index = offset;
+	return RET_STACK(t, offset);
+}
+
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
@@ -126,9 +247,34 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
 	calltime = trace_clock_local();
 
 	index = current->curr_ret_stack;
-	RET_STACK_INC(current->curr_ret_stack);
+	/* ret offset = 1 ; type = reserved */
+	current->ret_stack[index + FGRAPH_RET_INDEX] = 1;
 	ret_stack = RET_STACK(current, index);
+	ret_stack->ret = ret;
+	/*
+	 * The undwinders expect curr_ret_stack to point to either zero
+	 * or an index where to find the next ret_stack. Even though the
+	 * ret stack might be bogus, we want to write the ret and the
+	 * index to find the ret_stack before we increment the stack point.
+	 * If an interrupt comes in now before we increment the curr_ret_stack
+	 * it may blow away what we wrote. But that's fine, because the
+	 * index will still be correct (even though the ret wont be).
+	 * What we worry about is the index being correct after we increment
+	 * the curr_ret_stack and before we update that index, as if an
+	 * interrupt comes in and does an unwind stack dump, it will need
+	 * at least a correct index!
+	 */
+	barrier();
+	current->curr_ret_stack += FGRAPH_RET_INDEX + 1;
+	/*
+	 * This next barrier is to ensure that an interrupt coming in
+	 * will not corrupt what we are about to write.
+	 */
 	barrier();
+
+	/* Still keep it reserved even if an interrupt came in */
+	current->ret_stack[index + FGRAPH_RET_INDEX] = 1;
+
 	ret_stack->ret = ret;
 	ret_stack->func = func;
 	ret_stack->calltime = calltime;
@@ -159,6 +305,12 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			 unsigned long frame_pointer, unsigned long *retp)
 {
 	struct ftrace_graph_ent trace;
+	int offset;
+	int start;
+	int type;
+	int val;
+	int cnt = 0;
+	int i;
 
 #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
 	/*
@@ -177,38 +329,87 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 	if (ftrace_push_return_trace(ret, func, frame_pointer, retp))
 		goto out;
 
-	/* Only trace if the calling function expects to */
-	if (!fgraph_array[0]->entryfunc(&trace))
+	/* Use start for the distance to ret_stack (skipping over reserve) */
+	start = offset = current->curr_ret_stack - 2;
+
+	for (i = 0; i < fgraph_array_cnt; i++) {
+		struct fgraph_ops *gops = fgraph_array[i];
+
+		if (gops == &fgraph_stub)
+			continue;
+
+		if ((offset == start) &&
+		    (current->curr_ret_stack >= SHADOW_STACK_INDEX - 1)) {
+			atomic_inc(&current->trace_overrun);
+			break;
+		}
+		if (fgraph_array[i]->entryfunc(&trace)) {
+			offset = current->curr_ret_stack;
+			/* Check the top level stored word */
+			type = get_fgraph_type(current, offset - 1);
+
+			val = (i << FGRAPH_ARRAY_SHIFT) |
+				(FGRAPH_TYPE_ARRAY << FGRAPH_TYPE_SHIFT) |
+				((offset - start) - 1);
+
+			/* We can reuse the top word if it is reserved */
+			if (type == FGRAPH_TYPE_RESERVED) {
+				current->ret_stack[offset - 1] = val;
+				cnt++;
+				continue;
+			}
+			val++;
+
+			current->ret_stack[offset] = val;
+			/*
+			 * Write the value before we increment, so that
+			 * if an interrupt comes in after we increment
+			 * it will still see the value and skip over
+			 * this.
+			 */
+			barrier();
+			current->curr_ret_stack++;
+			/*
+			 * Have to write again, in case an interrupt
+			 * came in before the increment and after we
+			 * wrote the value.
+			 */
+			barrier();
+			current->ret_stack[offset] = val;
+			cnt++;
+		}
+	}
+
+	if (!cnt)
 		goto out_ret;
 
 	return 0;
  out_ret:
-	RET_STACK_DEC(current->curr_ret_stack);
+	current->curr_ret_stack -= FGRAPH_RET_INDEX + 1;
  out:
 	current->curr_ret_depth--;
 	return -EBUSY;
 }
 
 /* Retrieve a function return address to the trace stack on thread info.*/
-static void
+static struct ftrace_ret_stack *
 ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 			unsigned long frame_pointer)
 {
 	struct ftrace_ret_stack *ret_stack;
 	int index;
 
-	index = current->curr_ret_stack;
-	RET_STACK_DEC(index);
+	ret_stack = get_ret_stack(current, current->curr_ret_stack, &index);
 
-	if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
+	if (unlikely(!ret_stack)) {
 		ftrace_graph_stop();
-		WARN_ON(1);
+		WARN(1, "Bad function graph ret_stack pointer: %d",
+		     current->curr_ret_stack);
 		/* Might as well panic, otherwise we have no where to go */
 		*ret = (unsigned long)panic;
-		return;
+		return NULL;
 	}
 
-	ret_stack = RET_STACK(current, index);
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	/*
 	 * The arch may choose to record the frame pointer used
@@ -228,12 +429,12 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 		ftrace_graph_stop();
 		WARN(1, "Bad frame pointer: expected %lx, received %lx\n"
 		     "  from func %ps return to %lx\n",
-		     current->ret_stack[index].fp,
+		     ret_stack->fp,
 		     frame_pointer,
 		     (void *)ret_stack->func,
 		     ret_stack->ret);
 		*ret = (unsigned long)panic;
-		return;
+		return NULL;
 	}
 #endif
 
@@ -241,13 +442,15 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 	trace->func = ret_stack->func;
 	trace->calltime = ret_stack->calltime;
 	trace->overrun = atomic_read(&current->trace_overrun);
-	trace->depth = current->curr_ret_depth--;
+	trace->depth = current->curr_ret_depth;
 	/*
 	 * We still want to trace interrupts coming in if
 	 * max_depth is set to 1. Make sure the decrement is
 	 * seen before ftrace_graph_return.
 	 */
 	barrier();
+
+	return ret_stack;
 }
 
 /*
@@ -285,30 +488,47 @@ struct fgraph_ret_regs;
 static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs,
 						unsigned long frame_pointer)
 {
+	struct ftrace_ret_stack *ret_stack;
 	struct ftrace_graph_ret trace;
 	unsigned long ret;
+	int offset;
+	int index;
+	int idx;
+	int i;
+
+	ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer);
+
+	if (unlikely(!ret)) {
+		ftrace_graph_stop();
+		WARN_ON(1);
+		/* Might as well panic. What else to do? */
+		return (unsigned long)panic;
+	}
 
-	ftrace_pop_return_trace(&trace, &ret, frame_pointer);
+	trace.rettime = trace_clock_local();
 #ifdef CONFIG_FUNCTION_GRAPH_RETVAL
 	trace.retval = fgraph_ret_regs_return_value(ret_regs);
 #endif
-	trace.rettime = trace_clock_local();
-	fgraph_array[0]->retfunc(&trace);
+
+	offset = current->curr_ret_stack - 1;
+	index = get_ret_stack_index(current, offset);
+
+	/* index has to be at least one! Optimize for it */
+	i = 0;
+	do {
+		idx = get_fgraph_array(current, offset - i);
+		fgraph_array[idx]->retfunc(&trace);
+		i++;
+	} while (i < index);
+
 	/*
 	 * The ftrace_graph_return() may still access the current
 	 * ret_stack structure, we need to make sure the update of
 	 * curr_ret_stack is after that.
 	 */
 	barrier();
-	RET_STACK_DEC(current->curr_ret_stack);
-
-	if (unlikely(!ret)) {
-		ftrace_graph_stop();
-		WARN_ON(1);
-		/* Might as well panic. What else to do? */
-		ret = (unsigned long)panic;
-	}
-
+	current->curr_ret_stack -= index + FGRAPH_RET_INDEX;
+	current->curr_ret_depth--;
 	return ret;
 }
 
@@ -343,15 +563,17 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 struct ftrace_ret_stack *
 ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
 {
+	struct ftrace_ret_stack *ret_stack = NULL;
 	int index = task->curr_ret_stack;
 
-	BUILD_BUG_ON(FGRAPH_RET_SIZE % sizeof(long));
-
-	index -= FGRAPH_RET_INDEX * (idx + 1);
 	if (index < 0)
 		return NULL;
 
-	return RET_STACK(task, index);
+	do {
+		ret_stack = get_ret_stack(task, index, &index);
+	} while (ret_stack && --idx >= 0);
+
+	return ret_stack;
 }
 
 /**
@@ -374,16 +596,15 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp)
 {
 	struct ftrace_ret_stack *ret_stack;
-	int index = task->curr_ret_stack;
-	int i;
+	int i = task->curr_ret_stack;
 
 	if (ret != (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
 		return ret;
 
-	RET_STACK_DEC(index);
-
-	for (i = index; i >= 0; RET_STACK_DEC(i)) {
-		ret_stack = RET_STACK(task, i);
+	while (i > 0) {
+		ret_stack = get_ret_stack(current, i, &i);
+		if (!ret_stack)
+			break;
 		if (ret_stack->retp == retp)
 			return ret_stack->ret;
 	}
@@ -394,21 +615,26 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp)
 {
-	int task_idx;
+	struct ftrace_ret_stack *ret_stack;
+	int task_idx = task->curr_ret_stack;
+	int i;
 
 	if (ret != (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
 		return ret;
 
-	task_idx = task->curr_ret_stack;
-	RET_STACK_DEC(task_idx);
-
-	if (!task->ret_stack || task_idx < *idx)
+	if (!idx)
 		return ret;
 
-	task_idx -= *idx;
-	RET_STACK_INC(*idx);
+	i = *idx;
+	do {
+		ret_stack = get_ret_stack(task, task_idx, &task_idx);
+		i--;
+	} while (i >= 0 && ret_stack);
+
+	if (ret_stack)
+		return ret_stack->ret;
 
-	return RET_STACK(task, task_idx);
+	return ret;
 }
 #endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
 
@@ -514,10 +740,10 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 	 */
 	timestamp -= next->ftrace_timestamp;
 
-	for (index = next->curr_ret_stack - FGRAPH_RET_INDEX; index >= 0; ) {
-		ret_stack = RET_STACK(next, index);
-		ret_stack->calltime += timestamp;
-		index -= FGRAPH_RET_INDEX;
+	for (index = next->curr_ret_stack; index > 0; ) {
+		ret_stack = get_ret_stack(next, index, &index);
+		if (ret_stack)
+			ret_stack->calltime += timestamp;
 	}
 }
 
@@ -568,6 +794,8 @@ graph_init_task(struct task_struct *t, unsigned long *ret_stack)
 {
 	atomic_set(&t->trace_overrun, 0);
 	t->ftrace_timestamp = 0;
+	t->curr_ret_stack = 0;
+	t->curr_ret_depth = -1;
 	/* make curr_ret_stack visible before we add the ret_stack */
 	smp_wmb();
 	t->ret_stack = ret_stack;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 06/32] function_graph: Remove logic around ftrace_graph_entry and return
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (4 preceding siblings ...)
  2023-11-05 16:07 ` [RFC PATCH 05/32] function_graph: Allow multiple users to attach to function graph Masami Hiramatsu (Google)
@ 2023-11-05 16:07 ` Masami Hiramatsu (Google)
  2023-11-05 16:07 ` [RFC PATCH 07/32] ftrace/function_graph: Pass fgraph_ops to function graph callbacks Masami Hiramatsu (Google)
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:07 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

The function pointers ftrace_graph_entry and ftrace_graph_return are no
longer called via the function_graph tracer. Instead, an array structure is
now used that will allow for multiple users of the function_graph
infrastructure. The variables are still used by the architecture code for
non dynamic ftrace configs, where a test is made against them to see if they
point to the default stub function or not. This is how the static function
tracing knows to call into the function graph tracer infrastructure or not.

Two new stub functions are made. entry_run() and return_run(). The
ftrace_graph_entry and ftrace_graph_return are set to them repectively when
the function graph tracer is enabled, and this will trigger the architecture
specific function graph code to be executed.

This also requires checking the global_ops hash for all calls into the
function_graph tracer.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/fgraph.c          |   72 +++++++++++-----------------------------
 kernel/trace/ftrace.c          |    2 -
 kernel/trace/ftrace_internal.h |    2 -
 3 files changed, 20 insertions(+), 56 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 58750a395313..78335be91007 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -128,6 +128,18 @@ static inline int get_fgraph_array(struct task_struct *t, int offset)
 		FGRAPH_ARRAY_MASK;
 }
 
+/* ftrace_graph_entry set to this to tell some archs to run function graph */
+static int entry_run(struct ftrace_graph_ent *trace)
+{
+	return 0;
+}
+
+/* ftrace_graph_return set to this to tell some archs to run function graph */
+static void return_run(struct ftrace_graph_ret *trace)
+{
+	return;
+}
+
 /*
  * @offset: The index into @t->ret_stack to find the ret_stack entry
  * @index: Where to place the index into @t->ret_stack of that entry
@@ -323,6 +335,10 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 	    ftrace_find_rec_direct(ret - MCOUNT_INSN_SIZE))
 		return -EBUSY;
 #endif
+
+	if (!ftrace_ops_test(&global_ops, func, NULL))
+		return -EBUSY;
+
 	trace.func = func;
 	trace.depth = ++current->curr_ret_depth;
 
@@ -664,7 +680,6 @@ extern void ftrace_stub_graph(struct ftrace_graph_ret *);
 /* The callbacks that hook a function */
 trace_func_graph_ret_t ftrace_graph_return = ftrace_stub_graph;
 trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
-static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
 
 /* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
 static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
@@ -747,46 +762,6 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 	}
 }
 
-static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
-{
-	if (!ftrace_ops_test(&global_ops, trace->func, NULL))
-		return 0;
-	return __ftrace_graph_entry(trace);
-}
-
-/*
- * The function graph tracer should only trace the functions defined
- * by set_ftrace_filter and set_ftrace_notrace. If another function
- * tracer ops is registered, the graph tracer requires testing the
- * function against the global ops, and not just trace any function
- * that any ftrace_ops registered.
- */
-void update_function_graph_func(void)
-{
-	struct ftrace_ops *op;
-	bool do_test = false;
-
-	/*
-	 * The graph and global ops share the same set of functions
-	 * to test. If any other ops is on the list, then
-	 * the graph tracing needs to test if its the function
-	 * it should call.
-	 */
-	do_for_each_ftrace_op(op, ftrace_ops_list) {
-		if (op != &global_ops && op != &graph_ops &&
-		    op != &ftrace_list_end) {
-			do_test = true;
-			/* in double loop, break out with goto */
-			goto out;
-		}
-	} while_for_each_ftrace_op(op);
- out:
-	if (do_test)
-		ftrace_graph_entry = ftrace_graph_entry_test;
-	else
-		ftrace_graph_entry = __ftrace_graph_entry;
-}
-
 static DEFINE_PER_CPU(unsigned long *, idle_ret_stack);
 
 static void
@@ -928,18 +903,12 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 			ftrace_graph_active--;
 			goto out;
 		}
-
-		ftrace_graph_return = gops->retfunc;
-
 		/*
-		 * Update the indirect function to the entryfunc, and the
-		 * function that gets called to the entry_test first. Then
-		 * call the update fgraph entry function to determine if
-		 * the entryfunc should be called directly or not.
+		 * Some archs just test to see if these are not
+		 * the default function
 		 */
-		__ftrace_graph_entry = gops->entryfunc;
-		ftrace_graph_entry = ftrace_graph_entry_test;
-		update_function_graph_func();
+		ftrace_graph_return = return_run;
+		ftrace_graph_entry = entry_run;
 
 		ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
 	}
@@ -975,7 +944,6 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
 	if (!ftrace_graph_active) {
 		ftrace_graph_return = ftrace_stub_graph;
 		ftrace_graph_entry = ftrace_graph_entry_stub;
-		__ftrace_graph_entry = ftrace_graph_entry_stub;
 		ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
 		unregister_pm_notifier(&ftrace_suspend_notifier);
 		unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8de8bec5f366..fd64021ec52f 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -235,8 +235,6 @@ static void update_ftrace_function(void)
 		func = ftrace_ops_list_func;
 	}
 
-	update_function_graph_func();
-
 	/* If there's no change, then do nothing more here */
 	if (ftrace_trace_function == func)
 		return;
diff --git a/kernel/trace/ftrace_internal.h b/kernel/trace/ftrace_internal.h
index 5012c04f92c0..19eddcb91584 100644
--- a/kernel/trace/ftrace_internal.h
+++ b/kernel/trace/ftrace_internal.h
@@ -42,10 +42,8 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 extern int ftrace_graph_active;
-void update_function_graph_func(void);
 #else /* !CONFIG_FUNCTION_GRAPH_TRACER */
 # define ftrace_graph_active 0
-static inline void update_function_graph_func(void) { }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 #else /* !CONFIG_FUNCTION_TRACER */


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 07/32] ftrace/function_graph: Pass fgraph_ops to function graph callbacks
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (5 preceding siblings ...)
  2023-11-05 16:07 ` [RFC PATCH 06/32] function_graph: Remove logic around ftrace_graph_entry and return Masami Hiramatsu (Google)
@ 2023-11-05 16:07 ` Masami Hiramatsu (Google)
  2023-11-05 16:08 ` [RFC PATCH 08/32] ftrace: Allow function_graph tracer to be enabled in instances Masami Hiramatsu (Google)
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:07 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

Pass the fgraph_ops structure to the function graph callbacks. This will
allow callbacks to add a descriptor to a fgraph_ops private field that wil
be added in the future and use it for the callbacks. This will be useful
when more than one callback can be registered to the function graph tracer.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/ftrace.h               |   10 +++++++---
 kernel/trace/fgraph.c                |   17 ++++++++++-------
 kernel/trace/ftrace.c                |    6 ++++--
 kernel/trace/trace.h                 |    4 ++--
 kernel/trace/trace_functions_graph.c |   11 +++++++----
 kernel/trace/trace_irqsoff.c         |    6 ++++--
 kernel/trace/trace_sched_wakeup.c    |    6 ++++--
 kernel/trace/trace_selftest.c        |    5 +++--
 8 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index e8921871ef9a..ff32a0ec57aa 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1029,11 +1029,15 @@ struct ftrace_graph_ret {
 	unsigned long long rettime;
 } __packed;
 
+struct fgraph_ops;
+
 /* Type of the callback handlers for tracing function graph*/
-typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *); /* return */
-typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *); /* entry */
+typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *,
+				       struct fgraph_ops *); /* return */
+typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *,
+				      struct fgraph_ops *); /* entry */
 
-extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace);
+extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 78335be91007..1e8c17f70b84 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -129,13 +129,13 @@ static inline int get_fgraph_array(struct task_struct *t, int offset)
 }
 
 /* ftrace_graph_entry set to this to tell some archs to run function graph */
-static int entry_run(struct ftrace_graph_ent *trace)
+static int entry_run(struct ftrace_graph_ent *trace, struct fgraph_ops *ops)
 {
 	return 0;
 }
 
 /* ftrace_graph_return set to this to tell some archs to run function graph */
-static void return_run(struct ftrace_graph_ret *trace)
+static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops)
 {
 	return;
 }
@@ -200,12 +200,14 @@ int __weak ftrace_disable_ftrace_graph_caller(void)
 }
 #endif
 
-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace,
+			    struct fgraph_ops *gops)
 {
 	return 0;
 }
 
-static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace,
+				  struct fgraph_ops *gops)
 {
 }
 
@@ -359,7 +361,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			atomic_inc(&current->trace_overrun);
 			break;
 		}
-		if (fgraph_array[i]->entryfunc(&trace)) {
+		if (fgraph_array[i]->entryfunc(&trace, fgraph_array[i])) {
 			offset = current->curr_ret_stack;
 			/* Check the top level stored word */
 			type = get_fgraph_type(current, offset - 1);
@@ -533,7 +535,7 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 	i = 0;
 	do {
 		idx = get_fgraph_array(current, offset - i);
-		fgraph_array[idx]->retfunc(&trace);
+		fgraph_array[idx]->retfunc(&trace, fgraph_array[idx]);
 		i++;
 	} while (i < index);
 
@@ -675,7 +677,8 @@ void ftrace_graph_sleep_time_control(bool enable)
  * Simply points to ftrace_stub, but with the proper protocol.
  * Defined by the linker script in linux/vmlinux.lds.h
  */
-extern void ftrace_stub_graph(struct ftrace_graph_ret *);
+extern void ftrace_stub_graph(struct ftrace_graph_ret *,
+			      struct fgraph_ops *);
 
 /* The callbacks that hook a function */
 trace_func_graph_ret_t ftrace_graph_return = ftrace_stub_graph;
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fd64021ec52f..7ff5c454622a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -815,7 +815,8 @@ void ftrace_graph_graph_time_control(bool enable)
 	fgraph_graph_time = enable;
 }
 
-static int profile_graph_entry(struct ftrace_graph_ent *trace)
+static int profile_graph_entry(struct ftrace_graph_ent *trace,
+			       struct fgraph_ops *gops)
 {
 	struct ftrace_ret_stack *ret_stack;
 
@@ -832,7 +833,8 @@ static int profile_graph_entry(struct ftrace_graph_ent *trace)
 	return 1;
 }
 
-static void profile_graph_return(struct ftrace_graph_ret *trace)
+static void profile_graph_return(struct ftrace_graph_ret *trace,
+				 struct fgraph_ops *gops)
 {
 	struct ftrace_ret_stack *ret_stack;
 	struct ftrace_profile_stat *stat;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 77debe53f07c..ada49bf1fbc8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -670,8 +670,8 @@ void trace_latency_header(struct seq_file *m);
 void trace_default_header(struct seq_file *m);
 void print_trace_header(struct seq_file *m, struct trace_iterator *iter);
 
-void trace_graph_return(struct ftrace_graph_ret *trace);
-int trace_graph_entry(struct ftrace_graph_ent *trace);
+void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops *gops);
+int trace_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
 void set_graph_array(struct trace_array *tr);
 
 void tracing_start_cmdline_record(void);
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index c35fbaab2a47..b7b142b65299 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -129,7 +129,8 @@ static inline int ftrace_graph_ignore_irqs(void)
 	return in_hardirq();
 }
 
-int trace_graph_entry(struct ftrace_graph_ent *trace)
+int trace_graph_entry(struct ftrace_graph_ent *trace,
+		      struct fgraph_ops *gops)
 {
 	struct trace_array *tr = graph_array;
 	struct trace_array_cpu *data;
@@ -238,7 +239,8 @@ void __trace_graph_return(struct trace_array *tr,
 		trace_buffer_unlock_commit_nostack(buffer, event);
 }
 
-void trace_graph_return(struct ftrace_graph_ret *trace)
+void trace_graph_return(struct ftrace_graph_ret *trace,
+			struct fgraph_ops *gops)
 {
 	struct trace_array *tr = graph_array;
 	struct trace_array_cpu *data;
@@ -275,7 +277,8 @@ void set_graph_array(struct trace_array *tr)
 	smp_mb();
 }
 
-static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
+static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
+				      struct fgraph_ops *gops)
 {
 	ftrace_graph_addr_finish(trace);
 
@@ -288,7 +291,7 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
 	    (trace->rettime - trace->calltime < tracing_thresh))
 		return;
 	else
-		trace_graph_return(trace);
+		trace_graph_return(trace, gops);
 }
 
 static struct fgraph_ops funcgraph_thresh_ops = {
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index ba37f768e2f2..5478f4c4f708 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -175,7 +175,8 @@ static int irqsoff_display_graph(struct trace_array *tr, int set)
 	return start_irqsoff_tracer(irqsoff_trace, set);
 }
 
-static int irqsoff_graph_entry(struct ftrace_graph_ent *trace)
+static int irqsoff_graph_entry(struct ftrace_graph_ent *trace,
+			       struct fgraph_ops *gops)
 {
 	struct trace_array *tr = irqsoff_trace;
 	struct trace_array_cpu *data;
@@ -205,7 +206,8 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent *trace)
 	return ret;
 }
 
-static void irqsoff_graph_return(struct ftrace_graph_ret *trace)
+static void irqsoff_graph_return(struct ftrace_graph_ret *trace,
+				 struct fgraph_ops *gops)
 {
 	struct trace_array *tr = irqsoff_trace;
 	struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 0469a04a355f..49bcc812652c 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -112,7 +112,8 @@ static int wakeup_display_graph(struct trace_array *tr, int set)
 	return start_func_tracer(tr, set);
 }
 
-static int wakeup_graph_entry(struct ftrace_graph_ent *trace)
+static int wakeup_graph_entry(struct ftrace_graph_ent *trace,
+			      struct fgraph_ops *gops)
 {
 	struct trace_array *tr = wakeup_trace;
 	struct trace_array_cpu *data;
@@ -141,7 +142,8 @@ static int wakeup_graph_entry(struct ftrace_graph_ent *trace)
 	return ret;
 }
 
-static void wakeup_graph_return(struct ftrace_graph_ret *trace)
+static void wakeup_graph_return(struct ftrace_graph_ret *trace,
+				struct fgraph_ops *gops)
 {
 	struct trace_array *tr = wakeup_trace;
 	struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 529590499b1f..914331d8242c 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -762,7 +762,8 @@ trace_selftest_startup_function(struct tracer *trace, struct trace_array *tr)
 static unsigned int graph_hang_thresh;
 
 /* Wrap the real function entry probe to avoid possible hanging */
-static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace)
+static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace,
+				      struct fgraph_ops *gops)
 {
 	/* This is harmlessly racy, we want to approximately detect a hang */
 	if (unlikely(++graph_hang_thresh > GRAPH_MAX_FUNC_TEST)) {
@@ -776,7 +777,7 @@ static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace)
 		return 0;
 	}
 
-	return trace_graph_entry(trace);
+	return trace_graph_entry(trace, gops);
 }
 
 static struct fgraph_ops fgraph_ops __initdata  = {


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 08/32] ftrace: Allow function_graph tracer to be enabled in instances
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (6 preceding siblings ...)
  2023-11-05 16:07 ` [RFC PATCH 07/32] ftrace/function_graph: Pass fgraph_ops to function graph callbacks Masami Hiramatsu (Google)
@ 2023-11-05 16:08 ` Masami Hiramatsu (Google)
  2023-11-05 16:08 ` [RFC PATCH 09/32] ftrace: Allow ftrace startup flags exist without dynamic ftrace Masami Hiramatsu (Google)
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

Now that function graph tracing can handle more than one user, allow it to
be enabled in the ftrace instances. Note, the filtering of the functions is
still joined by the top level set_ftrace_filter and friends, as well as the
graph and nograph files.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/ftrace.h               |    1 +
 kernel/trace/ftrace.c                |    1 +
 kernel/trace/trace.h                 |   12 ++++++
 kernel/trace/trace_functions.c       |    8 ++++
 kernel/trace/trace_functions_graph.c |   65 +++++++++++++++++++++-------------
 kernel/trace/trace_selftest.c        |    2 +
 6 files changed, 63 insertions(+), 26 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index ff32a0ec57aa..4ad587b3525e 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1044,6 +1044,7 @@ extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph
 struct fgraph_ops {
 	trace_func_graph_ent_t		entryfunc;
 	trace_func_graph_ret_t		retfunc;
+	void				*private;
 };
 
 /*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7ff5c454622a..83fbfb7b48f8 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7319,6 +7319,7 @@ __init void ftrace_init_global_array_ops(struct trace_array *tr)
 	tr->ops = &global_ops;
 	tr->ops->private = tr;
 	ftrace_init_trace_array(tr);
+	init_array_fgraph_ops(tr);
 }
 
 void ftrace_init_array_ops(struct trace_array *tr, ftrace_func_t func)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index ada49bf1fbc8..73fe789ba143 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -395,6 +395,9 @@ struct trace_array {
 	struct ftrace_ops	*ops;
 	struct trace_pid_list	__rcu *function_pids;
 	struct trace_pid_list	__rcu *function_no_pids;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+	struct fgraph_ops	*gops;
+#endif
 #ifdef CONFIG_DYNAMIC_FTRACE
 	/* All of these are protected by the ftrace_lock */
 	struct list_head	func_probes;
@@ -883,6 +886,9 @@ extern int __trace_graph_entry(struct trace_array *tr,
 extern void __trace_graph_return(struct trace_array *tr,
 				 struct ftrace_graph_ret *trace,
 				 unsigned int trace_ctx);
+extern void init_array_fgraph_ops(struct trace_array *tr);
+extern int allocate_fgraph_ops(struct trace_array *tr);
+extern void free_fgraph_ops(struct trace_array *tr);
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash __rcu *ftrace_graph_hash;
@@ -995,6 +1001,12 @@ print_graph_function_flags(struct trace_iterator *iter, u32 flags)
 {
 	return TRACE_TYPE_UNHANDLED;
 }
+static inline void init_array_fgraph_ops(struct trace_array *tr) { }
+static inline int allocate_fgraph_ops(struct trace_array *tr)
+{
+	return 0;
+}
+static inline void free_fgraph_ops(struct trace_array *tr) { }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 extern struct list_head ftrace_pids;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 9f1bfbe105e8..8e8da0d0ee52 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -80,6 +80,7 @@ void ftrace_free_ftrace_ops(struct trace_array *tr)
 int ftrace_create_function_files(struct trace_array *tr,
 				 struct dentry *parent)
 {
+	int ret;
 	/*
 	 * The top level array uses the "global_ops", and the files are
 	 * created on boot up.
@@ -90,6 +91,12 @@ int ftrace_create_function_files(struct trace_array *tr,
 	if (!tr->ops)
 		return -EINVAL;
 
+	ret = allocate_fgraph_ops(tr);
+	if (ret) {
+		kfree(tr->ops);
+		return ret;
+	}
+
 	ftrace_create_filter_files(tr->ops, parent);
 
 	return 0;
@@ -99,6 +106,7 @@ void ftrace_destroy_function_files(struct trace_array *tr)
 {
 	ftrace_destroy_filter_files(tr->ops);
 	ftrace_free_ftrace_ops(tr);
+	free_fgraph_ops(tr);
 }
 
 static ftrace_func_t select_trace_function(u32 flags_val)
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index b7b142b65299..9ccc904a7703 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -83,8 +83,6 @@ static struct tracer_flags tracer_flags = {
 	.opts = trace_opts
 };
 
-static struct trace_array *graph_array;
-
 /*
  * DURATION column is being also used to display IRQ signs,
  * following values are used by print_graph_irq and others
@@ -132,7 +130,7 @@ static inline int ftrace_graph_ignore_irqs(void)
 int trace_graph_entry(struct ftrace_graph_ent *trace,
 		      struct fgraph_ops *gops)
 {
-	struct trace_array *tr = graph_array;
+	struct trace_array *tr = gops->private;
 	struct trace_array_cpu *data;
 	unsigned long flags;
 	unsigned int trace_ctx;
@@ -242,7 +240,7 @@ void __trace_graph_return(struct trace_array *tr,
 void trace_graph_return(struct ftrace_graph_ret *trace,
 			struct fgraph_ops *gops)
 {
-	struct trace_array *tr = graph_array;
+	struct trace_array *tr = gops->private;
 	struct trace_array_cpu *data;
 	unsigned long flags;
 	unsigned int trace_ctx;
@@ -268,15 +266,6 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
 	local_irq_restore(flags);
 }
 
-void set_graph_array(struct trace_array *tr)
-{
-	graph_array = tr;
-
-	/* Make graph_array visible before we start tracing */
-
-	smp_mb();
-}
-
 static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
 				      struct fgraph_ops *gops)
 {
@@ -294,25 +283,53 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
 		trace_graph_return(trace, gops);
 }
 
-static struct fgraph_ops funcgraph_thresh_ops = {
-	.entryfunc = &trace_graph_entry,
-	.retfunc = &trace_graph_thresh_return,
-};
-
 static struct fgraph_ops funcgraph_ops = {
 	.entryfunc = &trace_graph_entry,
 	.retfunc = &trace_graph_return,
 };
 
+int allocate_fgraph_ops(struct trace_array *tr)
+{
+	struct fgraph_ops *gops;
+
+	gops = kzalloc(sizeof(*gops), GFP_KERNEL);
+	if (!gops)
+		return -ENOMEM;
+
+	gops->entryfunc = &trace_graph_entry;
+	gops->retfunc = &trace_graph_return;
+
+	tr->gops = gops;
+	gops->private = tr;
+	return 0;
+}
+
+void free_fgraph_ops(struct trace_array *tr)
+{
+	kfree(tr->gops);
+}
+
+__init void init_array_fgraph_ops(struct trace_array *tr)
+{
+	tr->gops = &funcgraph_ops;
+	funcgraph_ops.private = tr;
+}
+
 static int graph_trace_init(struct trace_array *tr)
 {
 	int ret;
 
-	set_graph_array(tr);
+	tr->gops->entryfunc = trace_graph_entry;
+
 	if (tracing_thresh)
-		ret = register_ftrace_graph(&funcgraph_thresh_ops);
+		tr->gops->retfunc = trace_graph_thresh_return;
 	else
-		ret = register_ftrace_graph(&funcgraph_ops);
+		tr->gops->retfunc = trace_graph_return;
+
+	/* Make gops functions are visible before we start tracing */
+	smp_mb();
+
+	ret = register_ftrace_graph(tr->gops);
 	if (ret)
 		return ret;
 	tracing_start_cmdline_record();
@@ -323,10 +340,7 @@ static int graph_trace_init(struct trace_array *tr)
 static void graph_trace_reset(struct trace_array *tr)
 {
 	tracing_stop_cmdline_record();
-	if (tracing_thresh)
-		unregister_ftrace_graph(&funcgraph_thresh_ops);
-	else
-		unregister_ftrace_graph(&funcgraph_ops);
+	unregister_ftrace_graph(tr->gops);
 }
 
 static int graph_trace_update_thresh(struct trace_array *tr)
@@ -1365,6 +1379,7 @@ static struct tracer graph_trace __tracer_data = {
 	.print_header	= print_graph_headers,
 	.flags		= &tracer_flags,
 	.set_flag	= func_graph_set_flag,
+	.allow_instances = true,
 #ifdef CONFIG_FTRACE_SELFTEST
 	.selftest	= trace_selftest_startup_function_graph,
 #endif
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 914331d8242c..e05f7335525c 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -813,7 +813,7 @@ trace_selftest_startup_function_graph(struct tracer *trace,
 	 * to detect and recover from possible hangs
 	 */
 	tracing_reset_online_cpus(&tr->array_buffer);
-	set_graph_array(tr);
+	fgraph_ops.private = tr;
 	ret = register_ftrace_graph(&fgraph_ops);
 	if (ret) {
 		warn_failed_init_tracer(trace, ret);


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 09/32] ftrace: Allow ftrace startup flags exist without dynamic ftrace
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (7 preceding siblings ...)
  2023-11-05 16:08 ` [RFC PATCH 08/32] ftrace: Allow function_graph tracer to be enabled in instances Masami Hiramatsu (Google)
@ 2023-11-05 16:08 ` Masami Hiramatsu (Google)
  2023-11-05 16:08 ` [RFC PATCH 10/32] function_graph: Have the instances use their own ftrace_ops for filtering Masami Hiramatsu (Google)
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

Some of the flags for ftrace_startup() may be exposed even when
CONFIG_DYNAMIC_FTRACE is not configured in. This is fine as the difference
between dynamic ftrace and static ftrace is done within the internals of
ftrace itself. No need to have use cases fail to compile because dynamic
ftrace is disabled.

This change is needed to move some of the logic of what is passed to
ftrace_startup() out of the parameters of ftrace_startup().

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/ftrace.h |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 4ad587b3525e..7fd044ae3da5 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -512,6 +512,15 @@ static inline void stack_tracer_disable(void) { }
 static inline void stack_tracer_enable(void) { }
 #endif
 
+enum {
+	FTRACE_UPDATE_CALLS		= (1 << 0),
+	FTRACE_DISABLE_CALLS		= (1 << 1),
+	FTRACE_UPDATE_TRACE_FUNC	= (1 << 2),
+	FTRACE_START_FUNC_RET		= (1 << 3),
+	FTRACE_STOP_FUNC_RET		= (1 << 4),
+	FTRACE_MAY_SLEEP		= (1 << 5),
+};
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 void ftrace_arch_code_modify_prepare(void);
@@ -606,15 +615,6 @@ void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
 void ftrace_free_filter(struct ftrace_ops *ops);
 void ftrace_ops_set_global_filter(struct ftrace_ops *ops);
 
-enum {
-	FTRACE_UPDATE_CALLS		= (1 << 0),
-	FTRACE_DISABLE_CALLS		= (1 << 1),
-	FTRACE_UPDATE_TRACE_FUNC	= (1 << 2),
-	FTRACE_START_FUNC_RET		= (1 << 3),
-	FTRACE_STOP_FUNC_RET		= (1 << 4),
-	FTRACE_MAY_SLEEP		= (1 << 5),
-};
-
 /*
  * The FTRACE_UPDATE_* enum is used to pass information back
  * from the ftrace_update_record() and ftrace_test_record()


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 10/32] function_graph: Have the instances use their own ftrace_ops for filtering
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (8 preceding siblings ...)
  2023-11-05 16:08 ` [RFC PATCH 09/32] ftrace: Allow ftrace startup flags exist without dynamic ftrace Masami Hiramatsu (Google)
@ 2023-11-05 16:08 ` Masami Hiramatsu (Google)
  2023-11-07  1:47   ` Masami Hiramatsu
  2023-11-05 16:08 ` [RFC PATCH 11/32] function_graph: Add "task variables" per task for fgraph_ops Masami Hiramatsu (Google)
                   ` (21 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

Allow for instances to have their own ftrace_ops part of the fgraph_ops that
makes the funtion_graph tracer filter on the set_ftrace_filter file of the
instance and not the top instance.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/ftrace.h               |    1 +
 kernel/trace/fgraph.c                |   60 +++++++++++++++++++++-------------
 kernel/trace/ftrace.c                |    6 ++-
 kernel/trace/trace.h                 |   16 +++++----
 kernel/trace/trace_functions.c       |    2 +
 kernel/trace/trace_functions_graph.c |    8 +++--
 6 files changed, 58 insertions(+), 35 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 7fd044ae3da5..9dab365c6023 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1044,6 +1044,7 @@ extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph
 struct fgraph_ops {
 	trace_func_graph_ent_t		entryfunc;
 	trace_func_graph_ret_t		retfunc;
+	struct ftrace_ops		ops; /* for the hash lists */
 	void				*private;
 };
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 1e8c17f70b84..0642f3281b64 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -17,14 +17,6 @@
 #include "ftrace_internal.h"
 #include "trace.h"
 
-#ifdef CONFIG_DYNAMIC_FTRACE
-#define ASSIGN_OPS_HASH(opsname, val) \
-	.func_hash		= val, \
-	.local_hash.regex_lock	= __MUTEX_INITIALIZER(opsname.local_hash.regex_lock),
-#else
-#define ASSIGN_OPS_HASH(opsname, val)
-#endif
-
 #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
 #define FGRAPH_RET_INDEX (FGRAPH_RET_SIZE / sizeof(long))
 
@@ -338,9 +330,6 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 		return -EBUSY;
 #endif
 
-	if (!ftrace_ops_test(&global_ops, func, NULL))
-		return -EBUSY;
-
 	trace.func = func;
 	trace.depth = ++current->curr_ret_depth;
 
@@ -361,7 +350,8 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			atomic_inc(&current->trace_overrun);
 			break;
 		}
-		if (fgraph_array[i]->entryfunc(&trace, fgraph_array[i])) {
+		if (ftrace_ops_test(&gops->ops, func, NULL) &&
+		    gops->entryfunc(&trace, gops)) {
 			offset = current->curr_ret_stack;
 			/* Check the top level stored word */
 			type = get_fgraph_type(current, offset - 1);
@@ -656,17 +646,25 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 }
 #endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
 
-static struct ftrace_ops graph_ops = {
-	.func			= ftrace_graph_func,
-	.flags			= FTRACE_OPS_FL_INITIALIZED |
-				   FTRACE_OPS_FL_PID |
-				   FTRACE_OPS_GRAPH_STUB,
+void fgraph_init_ops(struct ftrace_ops *dst_ops,
+		     struct ftrace_ops *src_ops)
+{
+	dst_ops->func = ftrace_stub;
+	dst_ops->flags = FTRACE_OPS_FL_PID | FTRACE_OPS_FL_STUB;
+
 #ifdef FTRACE_GRAPH_TRAMP_ADDR
-	.trampoline		= FTRACE_GRAPH_TRAMP_ADDR,
+	dst_ops->trampoline = FTRACE_GRAPH_TRAMP_ADDR;
 	/* trampoline_size is only needed for dynamically allocated tramps */
 #endif
-	ASSIGN_OPS_HASH(graph_ops, &global_ops.local_hash)
-};
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+	if (src_ops) {
+		dst_ops->func_hash = &src_ops->local_hash;
+		mutex_init(&dst_ops->local_hash.regex_lock);
+		dst_ops->flags |= FTRACE_OPS_FL_INITIALIZED;
+	}
+#endif
+}
 
 void ftrace_graph_sleep_time_control(bool enable)
 {
@@ -871,11 +869,20 @@ static int start_graph_tracing(void)
 
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
+	int command = 0;
 	int ret = 0;
 	int i;
 
 	mutex_lock(&ftrace_lock);
 
+	if (!gops->ops.func) {
+		gops->ops.flags |= FTRACE_OPS_FL_STUB;
+		gops->ops.func = ftrace_stub;
+#ifdef FTRACE_GRAPH_TRAMP_ADDR
+		gops->ops.trampoline = FTRACE_GRAPH_TRAMP_ADDR;
+#endif
+	}
+
 	if (!fgraph_array[0]) {
 		/* The array must always have real data on it */
 		for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
@@ -912,9 +919,10 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 		 */
 		ftrace_graph_return = return_run;
 		ftrace_graph_entry = entry_run;
-
-		ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+		command = FTRACE_START_FUNC_RET;
 	}
+
+	ret = ftrace_startup(&gops->ops, command);
 out:
 	mutex_unlock(&ftrace_lock);
 	return ret;
@@ -922,6 +930,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 
 void unregister_ftrace_graph(struct fgraph_ops *gops)
 {
+	int command = 0;
 	int i;
 
 	mutex_lock(&ftrace_lock);
@@ -944,10 +953,15 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
 	}
 
 	ftrace_graph_active--;
+
+	if (!ftrace_graph_active)
+		command = FTRACE_STOP_FUNC_RET;
+
+	ftrace_shutdown(&gops->ops, command);
+
 	if (!ftrace_graph_active) {
 		ftrace_graph_return = ftrace_stub_graph;
 		ftrace_graph_entry = ftrace_graph_entry_stub;
-		ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
 		unregister_pm_notifier(&ftrace_suspend_notifier);
 		unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
 	}
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 83fbfb7b48f8..c4cc2a9d0047 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3050,6 +3050,8 @@ int ftrace_startup(struct ftrace_ops *ops, int command)
 	if (unlikely(ftrace_disabled))
 		return -ENODEV;
 
+	ftrace_ops_init(ops);
+
 	ret = __register_ftrace_function(ops);
 	if (ret)
 		return ret;
@@ -7319,7 +7321,7 @@ __init void ftrace_init_global_array_ops(struct trace_array *tr)
 	tr->ops = &global_ops;
 	tr->ops->private = tr;
 	ftrace_init_trace_array(tr);
-	init_array_fgraph_ops(tr);
+	init_array_fgraph_ops(tr, tr->ops);
 }
 
 void ftrace_init_array_ops(struct trace_array *tr, ftrace_func_t func)
@@ -8051,7 +8053,7 @@ static int register_ftrace_function_nolock(struct ftrace_ops *ops)
  */
 int register_ftrace_function(struct ftrace_ops *ops)
 {
-	int ret;
+	int ret = -1;
 
 	lock_direct_mutex();
 	ret = prepare_direct_functions_for_ipmodify(ops);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 73fe789ba143..531cfdb44911 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -886,8 +886,8 @@ extern int __trace_graph_entry(struct trace_array *tr,
 extern void __trace_graph_return(struct trace_array *tr,
 				 struct ftrace_graph_ret *trace,
 				 unsigned int trace_ctx);
-extern void init_array_fgraph_ops(struct trace_array *tr);
-extern int allocate_fgraph_ops(struct trace_array *tr);
+extern void init_array_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
+extern int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
 extern void free_fgraph_ops(struct trace_array *tr);
 
 #ifdef CONFIG_DYNAMIC_FTRACE
@@ -970,6 +970,7 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
 	preempt_enable_notrace();
 	return ret;
 }
+
 #else
 static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
 {
@@ -995,18 +996,19 @@ static inline bool ftrace_graph_ignore_func(struct ftrace_graph_ent *trace)
 		(fgraph_max_depth && trace->depth >= fgraph_max_depth);
 }
 
+void fgraph_init_ops(struct ftrace_ops *dst_ops,
+		     struct ftrace_ops *src_ops);
+
 #else /* CONFIG_FUNCTION_GRAPH_TRACER */
 static inline enum print_line_t
 print_graph_function_flags(struct trace_iterator *iter, u32 flags)
 {
 	return TRACE_TYPE_UNHANDLED;
 }
-static inline void init_array_fgraph_ops(struct trace_array *tr) { }
-static inline int allocate_fgraph_ops(struct trace_array *tr)
-{
-	return 0;
-}
 static inline void free_fgraph_ops(struct trace_array *tr) { }
+/* ftrace_ops may not be defined */
+#define init_array_fgraph_ops(tr, ops) do { } while (0)
+#define allocate_fgraph_ops(tr, ops) ({ 0; })
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 extern struct list_head ftrace_pids;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 8e8da0d0ee52..13bf2415245d 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -91,7 +91,7 @@ int ftrace_create_function_files(struct trace_array *tr,
 	if (!tr->ops)
 		return -EINVAL;
 
-	ret = allocate_fgraph_ops(tr);
+	ret = allocate_fgraph_ops(tr, tr->ops);
 	if (ret) {
 		kfree(tr->ops);
 		return ret;
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 9ccc904a7703..7f30652f0e97 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -288,7 +288,7 @@ static struct fgraph_ops funcgraph_ops = {
 	.retfunc = &trace_graph_return,
 };
 
-int allocate_fgraph_ops(struct trace_array *tr)
+int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops)
 {
 	struct fgraph_ops *gops;
 
@@ -301,6 +301,9 @@ int allocate_fgraph_ops(struct trace_array *tr)
 
 	tr->gops = gops;
 	gops->private = tr;
+
+	fgraph_init_ops(&gops->ops, ops);
+
 	return 0;
 }
 
@@ -309,10 +312,11 @@ void free_fgraph_ops(struct trace_array *tr)
 	kfree(tr->gops);
 }
 
-__init void init_array_fgraph_ops(struct trace_array *tr)
+__init void init_array_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops)
 {
 	tr->gops = &funcgraph_ops;
 	funcgraph_ops.private = tr;
+	fgraph_init_ops(&tr->gops->ops, ops);
 }
 
 static int graph_trace_init(struct trace_array *tr)


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 11/32] function_graph: Add "task variables" per task for fgraph_ops
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (9 preceding siblings ...)
  2023-11-05 16:08 ` [RFC PATCH 10/32] function_graph: Have the instances use their own ftrace_ops for filtering Masami Hiramatsu (Google)
@ 2023-11-05 16:08 ` Masami Hiramatsu (Google)
  2023-11-05 16:08 ` [RFC PATCH 12/32] function_graph: Move set_graph_function tests to shadow stack global var Masami Hiramatsu (Google)
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

Add a "task variables" array on the tasks shadow ret_stack that is the
size of longs for each possible registered fgraph_ops. That's a total of 16,
taking up 8 * 16 = 128 bytes (out of a page size 4k).

This will allow for fgraph_ops to do specific features on a per task basis
having a way to maintain state for each task.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/ftrace.h |    2 +
 kernel/trace/fgraph.c  |   73 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9dab365c6023..31ad20e78522 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1046,6 +1046,7 @@ struct fgraph_ops {
 	trace_func_graph_ret_t		retfunc;
 	struct ftrace_ops		ops; /* for the hash lists */
 	void				*private;
+	int				idx;
 };
 
 /*
@@ -1084,6 +1085,7 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
 
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp);
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops);
 
 /*
  * Sometimes we don't want to trace a function with the function
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 0642f3281b64..748896e320e2 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -92,10 +92,18 @@ enum {
 #define SHADOW_STACK_SIZE (PAGE_SIZE)
 #define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
 /* Leave on a buffer at the end */
-#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1))
+#define SHADOW_STACK_MAX_INDEX				\
+	(SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1 + FGRAPH_ARRAY_SIZE))
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
 
+/*
+ * Each fgraph_ops has a reservered unsigned long at the end (top) of the
+ * ret_stack to store task specific state.
+ */
+#define SHADOW_STACK_TASK_VARS(ret_stack) \
+	((unsigned long *)(&(ret_stack)[SHADOW_STACK_INDEX - FGRAPH_ARRAY_SIZE]))
+
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
@@ -132,6 +140,44 @@ static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops)
 	return;
 }
 
+static void ret_stack_set_task_var(struct task_struct *t, int idx, long val)
+{
+	unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+	gvals[idx] = val;
+}
+
+static unsigned long *
+ret_stack_get_task_var(struct task_struct *t, int idx)
+{
+	unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+	return &gvals[idx];
+}
+
+static void ret_stack_init_task_vars(unsigned long *ret_stack)
+{
+	unsigned long *gvals = SHADOW_STACK_TASK_VARS(ret_stack);
+
+	memset(gvals, 0, sizeof(*gvals) * FGRAPH_ARRAY_SIZE);
+}
+
+/**
+ * fgraph_get_task_var - retrieve a task specific state variable
+ * @gops: The ftrace_ops that owns the task specific variable
+ *
+ * Every registered fgraph_ops has a task state variable
+ * reserved on the task's ret_stack. This function returns the
+ * address to that variable.
+ *
+ * Returns the address to the fgraph_ops @gops tasks specific
+ * unsigned long variable.
+ */
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops)
+{
+	return ret_stack_get_task_var(current, gops->idx);
+}
+
 /*
  * @offset: The index into @t->ret_stack to find the ret_stack entry
  * @index: Where to place the index into @t->ret_stack of that entry
@@ -709,6 +755,7 @@ static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
 
 		if (t->ret_stack == NULL) {
 			atomic_set(&t->trace_overrun, 0);
+			ret_stack_init_task_vars(ret_stack_list[start]);
 			t->curr_ret_stack = 0;
 			t->curr_ret_depth = -1;
 			/* Make sure the tasks see the 0 first: */
@@ -769,6 +816,7 @@ static void
 graph_init_task(struct task_struct *t, unsigned long *ret_stack)
 {
 	atomic_set(&t->trace_overrun, 0);
+	ret_stack_init_task_vars(ret_stack);
 	t->ftrace_timestamp = 0;
 	t->curr_ret_stack = 0;
 	t->curr_ret_depth = -1;
@@ -867,6 +915,24 @@ static int start_graph_tracing(void)
 	return ret;
 }
 
+static void init_task_vars(int idx)
+{
+	struct task_struct *g, *t;
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		if (idle_task(cpu)->ret_stack)
+			ret_stack_set_task_var(idle_task(cpu), idx, 0);
+	}
+
+	read_lock(&tasklist_lock);
+	for_each_process_thread(g, t) {
+		if (t->ret_stack)
+			ret_stack_set_task_var(t, idx, 0);
+	}
+	read_unlock(&tasklist_lock);
+}
+
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
 	int command = 0;
@@ -903,6 +969,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 	fgraph_array[i] = gops;
 	if (i + 1 > fgraph_array_cnt)
 		fgraph_array_cnt = i + 1;
+	gops->idx = i;
 
 	ftrace_graph_active++;
 
@@ -920,6 +987,8 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 		ftrace_graph_return = return_run;
 		ftrace_graph_entry = entry_run;
 		command = FTRACE_START_FUNC_RET;
+	} else {
+		init_task_vars(gops->idx);
 	}
 
 	ret = ftrace_startup(&gops->ops, command);
@@ -944,6 +1013,8 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
 	if (i >= fgraph_array_cnt)
 		goto out;
 
+	WARN_ON_ONCE(gops->idx != i);
+
 	fgraph_array[i] = &fgraph_stub;
 	if (i + 1 == fgraph_array_cnt) {
 		for (; i >= 0; i--)


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 12/32] function_graph: Move set_graph_function tests to shadow stack global var
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (10 preceding siblings ...)
  2023-11-05 16:08 ` [RFC PATCH 11/32] function_graph: Add "task variables" per task for fgraph_ops Masami Hiramatsu (Google)
@ 2023-11-05 16:08 ` Masami Hiramatsu (Google)
  2023-11-05 16:09 ` [RFC PATCH 13/32] function_graph: Move graph depth stored data " Masami Hiramatsu (Google)
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

The use of the task->trace_recursion for the logic used for the
set_graph_funnction was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/trace_recursion.h      |    5 +----
 kernel/trace/trace.h                 |   32 +++++++++++++++++++++-----------
 kernel/trace/trace_functions_graph.c |    6 +++---
 kernel/trace/trace_irqsoff.c         |    4 ++--
 kernel/trace/trace_sched_wakeup.c    |    4 ++--
 5 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index d48cd92d2364..2efd5ec46d7f 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,9 +44,6 @@ enum {
  */
 	TRACE_IRQ_BIT,
 
-	/* Set if the function is in the set_graph_function file */
-	TRACE_GRAPH_BIT,
-
 	/*
 	 * In the very unlikely case that an interrupt came in
 	 * at a start of graph tracing, and we want to trace
@@ -60,7 +57,7 @@ enum {
 	 * that preempted a softirq start of a function that
 	 * preempted normal context!!!! Luckily, it can't be
 	 * greater than 3, so the next two bits are a mask
-	 * of what the depth is when we set TRACE_GRAPH_BIT
+	 * of what the depth is when we set TRACE_GRAPH_FL
 	 */
 
 	TRACE_GRAPH_DEPTH_START_BIT,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 531cfdb44911..219520621091 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -890,11 +890,16 @@ extern void init_array_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops
 extern int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
 extern void free_fgraph_ops(struct trace_array *tr);
 
+enum {
+	TRACE_GRAPH_FL		= 1,
+};
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash __rcu *ftrace_graph_hash;
 extern struct ftrace_hash __rcu *ftrace_graph_notrace_hash;
 
-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int
+ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
 {
 	unsigned long addr = trace->func;
 	int ret = 0;
@@ -916,12 +921,11 @@ static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
 	}
 
 	if (ftrace_lookup_ip(hash, addr)) {
-
 		/*
 		 * This needs to be cleared on the return functions
 		 * when the depth is zero.
 		 */
-		trace_recursion_set(TRACE_GRAPH_BIT);
+		*task_var |= TRACE_GRAPH_FL;
 		trace_recursion_set_depth(trace->depth);
 
 		/*
@@ -941,11 +945,14 @@ static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
 	return ret;
 }
 
-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void
+ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret *trace)
 {
-	if (trace_recursion_test(TRACE_GRAPH_BIT) &&
+	unsigned long *task_var = fgraph_get_task_var(gops);
+
+	if ((*task_var & TRACE_GRAPH_FL) &&
 	    trace->depth == trace_recursion_depth())
-		trace_recursion_clear(TRACE_GRAPH_BIT);
+		*task_var &= ~TRACE_GRAPH_FL;
 }
 
 static inline int ftrace_graph_notrace_addr(unsigned long addr)
@@ -972,7 +979,7 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
 }
 
 #else
-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
 {
 	return 1;
 }
@@ -981,17 +988,20 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
 {
 	return 0;
 }
-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret *trace)
 { }
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 extern unsigned int fgraph_max_depth;
 
-static inline bool ftrace_graph_ignore_func(struct ftrace_graph_ent *trace)
+static inline bool
+ftrace_graph_ignore_func(struct fgraph_ops *gops, struct ftrace_graph_ent *trace)
 {
+	unsigned long *task_var = fgraph_get_task_var(gops);
+
 	/* trace it when it is-nested-in or is a function enabled. */
-	return !(trace_recursion_test(TRACE_GRAPH_BIT) ||
-		 ftrace_graph_addr(trace)) ||
+	return !((*task_var & TRACE_GRAPH_FL) ||
+		 ftrace_graph_addr(task_var, trace)) ||
 		(trace->depth < 0) ||
 		(fgraph_max_depth && trace->depth >= fgraph_max_depth);
 }
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 7f30652f0e97..66cce73e94f8 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -160,7 +160,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
 	if (!ftrace_trace_task(tr))
 		return 0;
 
-	if (ftrace_graph_ignore_func(trace))
+	if (ftrace_graph_ignore_func(gops, trace))
 		return 0;
 
 	if (ftrace_graph_ignore_irqs())
@@ -247,7 +247,7 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
 	long disabled;
 	int cpu;
 
-	ftrace_graph_addr_finish(trace);
+	ftrace_graph_addr_finish(gops, trace);
 
 	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
 		trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
@@ -269,7 +269,7 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
 static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
 				      struct fgraph_ops *gops)
 {
-	ftrace_graph_addr_finish(trace);
+	ftrace_graph_addr_finish(gops, trace);
 
 	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
 		trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 5478f4c4f708..fce064e20570 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -184,7 +184,7 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent *trace,
 	unsigned int trace_ctx;
 	int ret;
 
-	if (ftrace_graph_ignore_func(trace))
+	if (ftrace_graph_ignore_func(gops, trace))
 		return 0;
 	/*
 	 * Do not trace a function if it's filtered by set_graph_notrace.
@@ -214,7 +214,7 @@ static void irqsoff_graph_return(struct ftrace_graph_ret *trace,
 	unsigned long flags;
 	unsigned int trace_ctx;
 
-	ftrace_graph_addr_finish(trace);
+	ftrace_graph_addr_finish(gops, trace);
 
 	if (!func_prolog_dec(tr, &data, &flags))
 		return;
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 49bcc812652c..130ca7e7787e 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -120,7 +120,7 @@ static int wakeup_graph_entry(struct ftrace_graph_ent *trace,
 	unsigned int trace_ctx;
 	int ret = 0;
 
-	if (ftrace_graph_ignore_func(trace))
+	if (ftrace_graph_ignore_func(gops, trace))
 		return 0;
 	/*
 	 * Do not trace a function if it's filtered by set_graph_notrace.
@@ -149,7 +149,7 @@ static void wakeup_graph_return(struct ftrace_graph_ret *trace,
 	struct trace_array_cpu *data;
 	unsigned int trace_ctx;
 
-	ftrace_graph_addr_finish(trace);
+	ftrace_graph_addr_finish(gops, trace);
 
 	if (!func_prolog_preempt_disable(tr, &data, &trace_ctx))
 		return;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 13/32] function_graph: Move graph depth stored data to shadow stack global var
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (11 preceding siblings ...)
  2023-11-05 16:08 ` [RFC PATCH 12/32] function_graph: Move set_graph_function tests to shadow stack global var Masami Hiramatsu (Google)
@ 2023-11-05 16:09 ` Masami Hiramatsu (Google)
  2023-11-05 16:09 ` [RFC PATCH 14/32] function_graph: Move graph notrace bit " Masami Hiramatsu (Google)
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

The use of the task->trace_recursion for the logic used for the function
graph depth was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/trace_recursion.h |   29 -----------------------------
 kernel/trace/trace.h            |   34 ++++++++++++++++++++++++++++++++--
 2 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index 2efd5ec46d7f..00e792bf148d 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,25 +44,6 @@ enum {
  */
 	TRACE_IRQ_BIT,
 
-	/*
-	 * In the very unlikely case that an interrupt came in
-	 * at a start of graph tracing, and we want to trace
-	 * the function in that interrupt, the depth can be greater
-	 * than zero, because of the preempted start of a previous
-	 * trace. In an even more unlikely case, depth could be 2
-	 * if a softirq interrupted the start of graph tracing,
-	 * followed by an interrupt preempting a start of graph
-	 * tracing in the softirq, and depth can even be 3
-	 * if an NMI came in at the start of an interrupt function
-	 * that preempted a softirq start of a function that
-	 * preempted normal context!!!! Luckily, it can't be
-	 * greater than 3, so the next two bits are a mask
-	 * of what the depth is when we set TRACE_GRAPH_FL
-	 */
-
-	TRACE_GRAPH_DEPTH_START_BIT,
-	TRACE_GRAPH_DEPTH_END_BIT,
-
 	/*
 	 * To implement set_graph_notrace, if this bit is set, we ignore
 	 * function graph tracing of called functions, until the return
@@ -78,16 +59,6 @@ enum {
 #define trace_recursion_clear(bit)	do { (current)->trace_recursion &= ~(1<<(bit)); } while (0)
 #define trace_recursion_test(bit)	((current)->trace_recursion & (1<<(bit)))
 
-#define trace_recursion_depth() \
-	(((current)->trace_recursion >> TRACE_GRAPH_DEPTH_START_BIT) & 3)
-#define trace_recursion_set_depth(depth) \
-	do {								\
-		current->trace_recursion &=				\
-			~(3 << TRACE_GRAPH_DEPTH_START_BIT);		\
-		current->trace_recursion |=				\
-			((depth) & 3) << TRACE_GRAPH_DEPTH_START_BIT;	\
-	} while (0)
-
 #define TRACE_CONTEXT_BITS	4
 
 #define TRACE_FTRACE_START	TRACE_FTRACE_BIT
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 219520621091..8e5943d3d25c 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -892,8 +892,38 @@ extern void free_fgraph_ops(struct trace_array *tr);
 
 enum {
 	TRACE_GRAPH_FL		= 1,
+
+	/*
+	 * In the very unlikely case that an interrupt came in
+	 * at a start of graph tracing, and we want to trace
+	 * the function in that interrupt, the depth can be greater
+	 * than zero, because of the preempted start of a previous
+	 * trace. In an even more unlikely case, depth could be 2
+	 * if a softirq interrupted the start of graph tracing,
+	 * followed by an interrupt preempting a start of graph
+	 * tracing in the softirq, and depth can even be 3
+	 * if an NMI came in at the start of an interrupt function
+	 * that preempted a softirq start of a function that
+	 * preempted normal context!!!! Luckily, it can't be
+	 * greater than 3, so the next two bits are a mask
+	 * of what the depth is when we set TRACE_GRAPH_FL
+	 */
+
+	TRACE_GRAPH_DEPTH_START_BIT,
+	TRACE_GRAPH_DEPTH_END_BIT,
 };
 
+static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
+{
+	return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
+}
+
+static inline void ftrace_graph_set_depth(unsigned long *task_var, int depth)
+{
+	*task_var &= ~(3 << TRACE_GRAPH_DEPTH_START_BIT);
+	*task_var |= (depth & 3) << TRACE_GRAPH_DEPTH_START_BIT;
+}
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash __rcu *ftrace_graph_hash;
 extern struct ftrace_hash __rcu *ftrace_graph_notrace_hash;
@@ -926,7 +956,7 @@ ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
 		 * when the depth is zero.
 		 */
 		*task_var |= TRACE_GRAPH_FL;
-		trace_recursion_set_depth(trace->depth);
+		ftrace_graph_set_depth(task_var, trace->depth);
 
 		/*
 		 * If no irqs are to be traced, but a set_graph_function
@@ -951,7 +981,7 @@ ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret *trace
 	unsigned long *task_var = fgraph_get_task_var(gops);
 
 	if ((*task_var & TRACE_GRAPH_FL) &&
-	    trace->depth == trace_recursion_depth())
+	    trace->depth == ftrace_graph_depth(task_var))
 		*task_var &= ~TRACE_GRAPH_FL;
 }
 


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 14/32] function_graph: Move graph notrace bit to shadow stack global var
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (12 preceding siblings ...)
  2023-11-05 16:09 ` [RFC PATCH 13/32] function_graph: Move graph depth stored data " Masami Hiramatsu (Google)
@ 2023-11-05 16:09 ` Masami Hiramatsu (Google)
  2023-11-05 16:09 ` [RFC PATCH 15/32] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data() Masami Hiramatsu (Google)
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

The use of the task->trace_recursion for the logic used for the function
graph no-trace was a bit of an abuse of that variable. Now that there exists
global vars that are per stack for registered graph traces, use that instead.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/trace_recursion.h      |    7 -------
 kernel/trace/trace.h                 |    9 +++++++++
 kernel/trace/trace_functions_graph.c |   10 ++++++----
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index 00e792bf148d..cc11b0e9d220 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,13 +44,6 @@ enum {
  */
 	TRACE_IRQ_BIT,
 
-	/*
-	 * To implement set_graph_notrace, if this bit is set, we ignore
-	 * function graph tracing of called functions, until the return
-	 * function is called to clear it.
-	 */
-	TRACE_GRAPH_NOTRACE_BIT,
-
 	/* Used to prevent recursion recording from recursing. */
 	TRACE_RECORD_RECURSION_BIT,
 };
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 8e5943d3d25c..350c889c2f1e 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -911,8 +911,17 @@ enum {
 
 	TRACE_GRAPH_DEPTH_START_BIT,
 	TRACE_GRAPH_DEPTH_END_BIT,
+
+	/*
+	 * To implement set_graph_notrace, if this bit is set, we ignore
+	 * function graph tracing of called functions, until the return
+	 * function is called to clear it.
+	 */
+	TRACE_GRAPH_NOTRACE_BIT,
 };
 
+#define TRACE_GRAPH_NOTRACE		(1 << TRACE_GRAPH_NOTRACE_BIT)
+
 static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
 {
 	return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 66cce73e94f8..13d0387ac6a6 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -130,6 +130,7 @@ static inline int ftrace_graph_ignore_irqs(void)
 int trace_graph_entry(struct ftrace_graph_ent *trace,
 		      struct fgraph_ops *gops)
 {
+	unsigned long *task_var = fgraph_get_task_var(gops);
 	struct trace_array *tr = gops->private;
 	struct trace_array_cpu *data;
 	unsigned long flags;
@@ -138,7 +139,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
 	int ret;
 	int cpu;
 
-	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
+	if (*task_var & TRACE_GRAPH_NOTRACE)
 		return 0;
 
 	/*
@@ -149,7 +150,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
 	 * returning from the function.
 	 */
 	if (ftrace_graph_notrace_addr(trace->func)) {
-		trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
+		*task_var |= TRACE_GRAPH_NOTRACE_BIT;
 		/*
 		 * Need to return 1 to have the return called
 		 * that will clear the NOTRACE bit.
@@ -240,6 +241,7 @@ void __trace_graph_return(struct trace_array *tr,
 void trace_graph_return(struct ftrace_graph_ret *trace,
 			struct fgraph_ops *gops)
 {
+	unsigned long *task_var = fgraph_get_task_var(gops);
 	struct trace_array *tr = gops->private;
 	struct trace_array_cpu *data;
 	unsigned long flags;
@@ -249,8 +251,8 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
 
 	ftrace_graph_addr_finish(gops, trace);
 
-	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
-		trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+	if (*task_var & TRACE_GRAPH_NOTRACE) {
+		*task_var &= ~TRACE_GRAPH_NOTRACE;
 		return;
 	}
 


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 15/32] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (13 preceding siblings ...)
  2023-11-05 16:09 ` [RFC PATCH 14/32] function_graph: Move graph notrace bit " Masami Hiramatsu (Google)
@ 2023-11-05 16:09 ` Masami Hiramatsu (Google)
  2023-11-05 16:09 ` [RFC PATCH 16/32] function_graph: Add selftest for passing local variables Masami Hiramatsu (Google)
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

Added functions that can be called by a fgraph_ops entryfunc and retfunc to
store state between the entry of the function being traced to the exit of
the same function. The fgraph_ops entryfunc() may call fgraph_reserve_data()
to store up to 4 words onto the task's shadow ret_stack and this then can be
retrived by fgraph_retrieve_data() called by the corresponding retfunc().

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/ftrace.h |    3 +
 kernel/trace/fgraph.c  |  244 +++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 213 insertions(+), 34 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 31ad20e78522..115758fe9fec 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1049,6 +1049,9 @@ struct fgraph_ops {
 	int				idx;
 };
 
+void *fgraph_reserve_data(int size_bytes);
+void *fgraph_retrieve_data(void);
+
 /*
  * Stack of return addresses for functions
  * of a thread.
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 748896e320e2..0f8e9f22e9e0 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -38,25 +38,36 @@
  * bits: 14 - 15	Type of storage
  *			  0 - reserved
  *			  1 - fgraph_array index
+ *			  2 - reservered data
  * For fgraph_array_index:
  *  bits: 16 - 23	The fgraph_ops fgraph_array index
  *
+ * For reserved data:
+ *  bits: 16 - 17	The size in words that is stored
+ *
  * That is, at the end of function_graph_enter, if the first and forth
  * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
- * on the return of the function being traced, this is what will be on the
- * task's shadow ret_stack: (the stack grows upward)
+ * on the return of the function being traced, and the forth fgraph_ops
+ * stored two words of data, this is what will be on the task's shadow
+ * ret_stack: (the stack grows upward)
+ *
+ * |                                     | <- task->curr_ret_stack
+ * +-------------------------------------+
+ * | (3 << FGRAPH_ARRAY_SHIFT)|type:1|(5)| ( 3 for index of fourth fgraph_ops)
+ * +-------------------------------------+
+ * | (3 << FGRAPH_DATA_SHIFT)|type:2|(4) | ( Data with size of 2 words)
+ * +-------------------------------------+ ( It is 4 words from the ret_stack)
+ * |         STORED DATA WORD 2          |
+ * |         STORED DATA WORD 1          |
+ * +-------------------------------------+
+ * | (0 << FGRAPH_ARRAY_SHIFT)|type:1|(1)| ( 0 for index of first fgraph_ops)
+ * +-------------------------------------+
+ * | struct ftrace_ret_stack             |
+ * |   (stores the saved ret pointer)    |
+ * +-------------------------------------+
+ * |             (X) | (N)               | ( N words away from last ret_stack)
+ * |                                     |
  *
- * |                                  | <- task->curr_ret_stack
- * +----------------------------------+
- * | (3 << FGRAPH_ARRAY_SHIFT)|(2)    | ( 3 for index of fourth fgraph_ops)
- * +----------------------------------+
- * | (0 << FGRAPH_ARRAY_SHIFT)|(1)    | ( 0 for index of first fgraph_ops)
- * +----------------------------------+
- * | struct ftrace_ret_stack          |
- * |   (stores the saved ret pointer) |
- * +----------------------------------+
- * |             (X) | (N)            | ( N words away from previous ret_stack)
- * |                                  |
  *
  * If a backtrace is required, and the real return pointer needs to be
  * fetched, then it looks at the task's curr_ret_stack index, if it
@@ -77,12 +88,17 @@
 enum {
 	FGRAPH_TYPE_RESERVED	= 0,
 	FGRAPH_TYPE_ARRAY	= 1,
+	FGRAPH_TYPE_DATA	= 2,
 };
 
 #define FGRAPH_ARRAY_SIZE	16
 #define FGRAPH_ARRAY_MASK	((1 << FGRAPH_ARRAY_SIZE) - 1)
 #define FGRAPH_ARRAY_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
 
+#define FGRAPH_DATA_SIZE	2
+#define FGRAPH_DATA_MASK	((1 << FGRAPH_DATA_SIZE) - 1)
+#define FGRAPH_DATA_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
+
 /* Currently the max stack index can't be more than register callers */
 #define FGRAPH_MAX_INDEX	FGRAPH_ARRAY_SIZE
 
@@ -97,6 +113,8 @@ enum {
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
 
+#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * 4)
+
 /*
  * Each fgraph_ops has a reservered unsigned long at the end (top) of the
  * ret_stack to store task specific state.
@@ -111,21 +129,44 @@ static int fgraph_array_cnt;
 
 static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
 
+static inline int __get_index(unsigned long val)
+{
+	return val & FGRAPH_RET_INDEX_MASK;
+}
+
+static inline int __get_type(unsigned long val)
+{
+	return (val >> FGRAPH_TYPE_SHIFT) & FGRAPH_TYPE_MASK;
+}
+
+static inline int __get_array(unsigned long val)
+{
+	return (val >> FGRAPH_ARRAY_SHIFT) & FGRAPH_ARRAY_MASK;
+}
+
+static inline int __get_data(unsigned long val)
+{
+	return (val >> FGRAPH_DATA_SHIFT) & FGRAPH_DATA_MASK;
+}
+
 static inline int get_ret_stack_index(struct task_struct *t, int offset)
 {
-	return current->ret_stack[offset] & FGRAPH_RET_INDEX_MASK;
+	return __get_index(current->ret_stack[offset]);
 }
 
 static inline int get_fgraph_type(struct task_struct *t, int offset)
 {
-	return (current->ret_stack[offset] >> FGRAPH_TYPE_SHIFT) &
-		FGRAPH_TYPE_MASK;
+	return __get_type(current->ret_stack[offset]);
 }
 
 static inline int get_fgraph_array(struct task_struct *t, int offset)
 {
-	return (current->ret_stack[offset] >> FGRAPH_ARRAY_SHIFT) &
-		FGRAPH_ARRAY_MASK;
+	return __get_array(current->ret_stack[offset]);
+}
+
+static inline int get_data_idx(struct task_struct *t, int offset)
+{
+	return __get_data(current->ret_stack[offset]);
 }
 
 /* ftrace_graph_entry set to this to tell some archs to run function graph */
@@ -162,6 +203,121 @@ static void ret_stack_init_task_vars(unsigned long *ret_stack)
 	memset(gvals, 0, sizeof(*gvals) * FGRAPH_ARRAY_SIZE);
 }
 
+/**
+ * fgraph_reserve_data - Reserve storage on the task's ret_stack
+ * @size_bytes: The size in bytes to reserve (max of 4 words in size)
+ *
+ * Reserves space of up to 4 words (in word increments) on the
+ * task's ret_stack shadow stack, for a given fgraph_ops during
+ * the entryfunc() call. If entryfunc() returns zero, the storage
+ * is discarded. An entryfunc() can only call this once per iteration.
+ * The fgraph_ops retfunc() can retrieve this stored data with
+ * fgraph_retrieve_data().
+ *
+ * Returns: On success, a pointer to the data on the stack.
+ *   Otherwise, NULL if there's not enough space left on the
+ *   ret_stack for the data, or if fgraph_reserve_data() was called
+ *   more than once for a single entryfunc() call.
+ */
+void *fgraph_reserve_data(int size_bytes)
+{
+	unsigned long val;
+	void *data;
+	int curr_ret_stack = current->curr_ret_stack;
+	int data_size;
+	int size;
+
+	if (size_bytes > FGRAPH_MAX_DATA_SIZE)
+		return NULL;
+
+	/* Convert to number of longs + data word */
+	data_size = ALIGN(size_bytes, sizeof(long)) / sizeof(long) + 1;
+
+	/* The size to add to ret_stack (including the reserve word) */
+	size = data_size + 1;
+
+	val = current->ret_stack[curr_ret_stack - 1];
+
+	switch (__get_type(val)) {
+	case FGRAPH_TYPE_RESERVED:
+		/*
+		 * A reserve word is only saved after the ret_stack
+		 * or after a data storage, not after an fgraph_array
+		 * entry. It's OK if its after the ret_stack in which
+		 * case the index will be one, but if the index is
+		 * greater than 1 it means it's a double call to
+		 * fgraph_reserve_data()
+		 */
+		if (__get_index(val) > 1)
+			return NULL;
+		/*
+		 * Leave the reserve in case the entryfunc() doesn't
+		 * want to be recorded.
+		 */
+		break;
+	case FGRAPH_TYPE_ARRAY:
+		break;
+	default:
+		return NULL;
+	}
+	data = &current->ret_stack[curr_ret_stack];
+
+	curr_ret_stack += size;
+	if (unlikely(curr_ret_stack >= SHADOW_STACK_MAX_INDEX))
+		return NULL;
+
+	val = __get_index(val) + size;
+
+	/* Set the last word to be reserved */
+	current->ret_stack[curr_ret_stack - 1] = val;
+
+	/* Make sure interrupts see this */
+	barrier();
+	current->curr_ret_stack = curr_ret_stack;
+	/* Again sync with interrupts, and reset reserve */
+	current->ret_stack[curr_ret_stack - 1] = val;
+
+	val = (data_size << FGRAPH_DATA_SHIFT) |
+		(FGRAPH_TYPE_DATA << FGRAPH_TYPE_SHIFT) |
+		(val - 1);
+
+	/* Save the data header */
+	current->ret_stack[curr_ret_stack - 2] = val;
+
+	return data;
+}
+
+/**
+ * fgraph_retrieve_data - Retrieve stored data from fgraph_reserve_data()
+ *
+ * This is to be called by a fgraph_ops retfunc(), to retrieve data that
+ * was stored by the fgraph_ops entryfunc() on the function entry.
+ * That is, this will retrieve the data that was reserved on the
+ * entry of the function that corresponds to the exit of the function
+ * that the fgraph_ops retfunc() is called on.
+ *
+ * Returns: The stored data from fgraph_reserve_data() called by the
+ *    matching entryfunc() for the retfunc() this is called from.
+ *   Or NULL if there was nothing stored.
+ */
+void *fgraph_retrieve_data(void)
+{
+	unsigned long val;
+	int curr_ret_stack = current->curr_ret_stack;
+
+	/* Top of stack is the fgraph_ops */
+	val = current->ret_stack[curr_ret_stack - 1];
+	/* Check if there's nothing between the fgraph_ops and ret_stack */
+	if (__get_index(val) == 1)
+		return NULL;
+	val = current->ret_stack[curr_ret_stack - 2];
+	if (__get_type(val) != FGRAPH_TYPE_DATA)
+		return NULL;
+
+	return &current->ret_stack[curr_ret_stack -
+				   (__get_data(val) + 1)];
+}
+
 /**
  * fgraph_get_task_var - retrieve a task specific state variable
  * @gops: The ftrace_ops that owns the task specific variable
@@ -357,6 +513,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			 unsigned long frame_pointer, unsigned long *retp)
 {
 	struct ftrace_graph_ent trace;
+	int save_curr_ret_stack;
 	int offset;
 	int start;
 	int type;
@@ -396,8 +553,10 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			atomic_inc(&current->trace_overrun);
 			break;
 		}
+		save_curr_ret_stack = current->curr_ret_stack;
 		if (ftrace_ops_test(&gops->ops, func, NULL) &&
 		    gops->entryfunc(&trace, gops)) {
+			/* Note, curr_ret_stack could change by enryfunc() */
 			offset = current->curr_ret_stack;
 			/* Check the top level stored word */
 			type = get_fgraph_type(current, offset - 1);
@@ -431,6 +590,9 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			barrier();
 			current->ret_stack[offset] = val;
 			cnt++;
+		} else {
+			/* Clear out any saved storage */
+			current->curr_ret_stack = save_curr_ret_stack;
 		}
 	}
 
@@ -545,10 +707,10 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 	struct ftrace_ret_stack *ret_stack;
 	struct ftrace_graph_ret trace;
 	unsigned long ret;
-	int offset;
+	int curr_ret_stack;
+	int stop_at;
 	int index;
 	int idx;
-	int i;
 
 	ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer);
 
@@ -564,24 +726,38 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 	trace.retval = fgraph_ret_regs_return_value(ret_regs);
 #endif
 
-	offset = current->curr_ret_stack - 1;
-	index = get_ret_stack_index(current, offset);
+	curr_ret_stack = current->curr_ret_stack;
+	index = get_ret_stack_index(current, curr_ret_stack - 1);
+
+	stop_at = curr_ret_stack - index;
 
 	/* index has to be at least one! Optimize for it */
-	i = 0;
 	do {
-		idx = get_fgraph_array(current, offset - i);
-		fgraph_array[idx]->retfunc(&trace, fgraph_array[idx]);
-		i++;
-	} while (i < index);
+		unsigned long val;
+
+		val = current->ret_stack[curr_ret_stack - 1];
+		switch (__get_type(val)) {
+		case FGRAPH_TYPE_ARRAY:
+			idx = __get_array(val);
+			fgraph_array[idx]->retfunc(&trace, fgraph_array[idx]);
+			fallthrough;
+		case FGRAPH_TYPE_RESERVED:
+			curr_ret_stack--;
+			break;
+		case FGRAPH_TYPE_DATA:
+			curr_ret_stack -= __get_data(val);
+			break;
+		default:
+			WARN_ONCE(1, "Bad fgraph ret_stack data type %d",
+				  __get_type(val));
+			curr_ret_stack--;
+		}
+		/* Make sure interrupts see the update after the above */
+		barrier();
+		current->curr_ret_stack = curr_ret_stack;
+	} while (curr_ret_stack > stop_at);
 
-	/*
-	 * The ftrace_graph_return() may still access the current
-	 * ret_stack structure, we need to make sure the update of
-	 * curr_ret_stack is after that.
-	 */
-	barrier();
-	current->curr_ret_stack -= index + FGRAPH_RET_INDEX;
+	current->curr_ret_stack -= FGRAPH_RET_INDEX;
 	current->curr_ret_depth--;
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 16/32] function_graph: Add selftest for passing local variables
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (14 preceding siblings ...)
  2023-11-05 16:09 ` [RFC PATCH 15/32] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data() Masami Hiramatsu (Google)
@ 2023-11-05 16:09 ` Masami Hiramatsu (Google)
  2023-11-05 16:09 ` [RFC PATCH 17/32] function_graph: Fix to update curr_ret_stack with ARRAY entry size Masami Hiramatsu (Google)
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

Add boot up selftest that passes variables from a function entry to a
function exit, and make sure that they do get passed around.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/trace_selftest.c |  161 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 161 insertions(+)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index e05f7335525c..14966d8509ed 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -756,6 +756,165 @@ trace_selftest_startup_function(struct tracer *trace, struct trace_array *tr)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
+#ifdef CONFIG_DYNAMIC_FTRACE
+
+#define BYTE_NUMBER 123
+#define SHORT_NUMBER 12345
+#define WORD_NUMBER 1234567890
+#define LONG_NUMBER 1234567890123456789LL
+
+static int fgraph_store_size __initdata;
+static const char *fgraph_store_type_name __initdata;
+static char *fgraph_error_str __initdata;
+static char fgraph_error_str_buf[128] __initdata;
+
+static __init int store_entry(struct ftrace_graph_ent *trace,
+			      struct fgraph_ops *gops)
+{
+	const char *type = fgraph_store_type_name;
+	int size = fgraph_store_size;
+	void *p;
+
+	p = fgraph_reserve_data(size);
+	if (!p) {
+		snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+			 "Failed to reserve %s\n", type);
+		fgraph_error_str = fgraph_error_str_buf;
+		return 0;
+	}
+
+	switch (fgraph_store_size) {
+	case 1:
+		*(char *)p = BYTE_NUMBER;
+		break;
+	case 2:
+		*(short *)p = SHORT_NUMBER;
+		break;
+	case 4:
+		*(int *)p = WORD_NUMBER;
+		break;
+	case 8:
+		*(long long *)p = LONG_NUMBER;
+		break;
+	}
+
+	return 1;
+}
+
+static __init void store_return(struct ftrace_graph_ret *trace,
+				struct fgraph_ops *gops)
+{
+	const char *type = fgraph_store_type_name;
+	long long expect = 0;
+	long long found = -1;
+	char *p;
+
+	p = fgraph_retrieve_data();
+	if (!p) {
+		snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+			 "Failed to retrieve %s\n", type);
+		fgraph_error_str = fgraph_error_str_buf;
+		return;
+	}
+
+	switch (fgraph_store_size) {
+	case 1:
+		expect = BYTE_NUMBER;
+		found = *(char *)p;
+		break;
+	case 2:
+		expect = SHORT_NUMBER;
+		found = *(short *)p;
+		break;
+	case 4:
+		expect = WORD_NUMBER;
+		found = *(int *)p;
+		break;
+	case 8:
+		expect = LONG_NUMBER;
+		found = *(long long *)p;
+		break;
+	}
+
+	if (found != expect) {
+		snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+			 "%s returned not %lld but %lld\n", type, expect, found);
+		fgraph_error_str = fgraph_error_str_buf;
+		return;
+	}
+	fgraph_error_str = NULL;
+}
+
+static struct fgraph_ops store_bytes __initdata = {
+	.entryfunc		= store_entry,
+	.retfunc		= store_return,
+};
+
+static int __init test_graph_storage_type(const char *name, int size)
+{
+	char *func_name;
+	int len;
+	int ret;
+
+	fgraph_store_type_name = name;
+	fgraph_store_size = size;
+
+	snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+		 "Failed to execute storage %s\n", name);
+	fgraph_error_str = fgraph_error_str_buf;
+
+	printk(KERN_CONT "PASSED\n");
+	pr_info("Testing fgraph storage of %d byte%s: ", size, size > 1 ? "s" : "");
+
+	func_name = "*" __stringify(DYN_FTRACE_TEST_NAME);
+	len = strlen(func_name);
+
+	ret = ftrace_set_filter(&store_bytes.ops, func_name, len, 1);
+	if (ret && ret != -ENODEV) {
+		pr_cont("*Could not set filter* ");
+		return -1;
+	}
+
+	ret = register_ftrace_graph(&store_bytes);
+	if (ret) {
+		printk(KERN_WARNING "Failed to init store_bytes fgraph tracing\n");
+		return -1;
+	}
+
+	DYN_FTRACE_TEST_NAME();
+
+	unregister_ftrace_graph(&store_bytes);
+
+	if (fgraph_error_str) {
+		printk(KERN_CONT "*** %s ***", fgraph_error_str);
+		return -1;
+	}
+
+	return 0;
+}
+/* Test the storage passed across function_graph entry and return */
+static __init int test_graph_storage(void)
+{
+	int ret;
+
+	ret = test_graph_storage_type("byte", 1);
+	if (ret)
+		return ret;
+	ret = test_graph_storage_type("short", 2);
+	if (ret)
+		return ret;
+	ret = test_graph_storage_type("word", 4);
+	if (ret)
+		return ret;
+	ret = test_graph_storage_type("long long", 8);
+	if (ret)
+		return ret;
+	return 0;
+}
+#else
+static inline int test_graph_storage(void) { return 0; }
+#endif /* CONFIG_DYNAMIC_FTRACE */
+
 /* Maximum number of functions to trace before diagnosing a hang */
 #define GRAPH_MAX_FUNC_TEST	100000000
 
@@ -913,6 +1072,8 @@ trace_selftest_startup_function_graph(struct tracer *trace,
 	ftrace_set_global_filter(NULL, 0, 1);
 #endif
 
+	ret = test_graph_storage();
+
 	/* Don't test dynamic tracing, the function tracer already did */
 out:
 	/* Stop it if we failed */


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 17/32] function_graph: Fix to update curr_ret_stack with ARRAY entry size
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (15 preceding siblings ...)
  2023-11-05 16:09 ` [RFC PATCH 16/32] function_graph: Add selftest for passing local variables Masami Hiramatsu (Google)
@ 2023-11-05 16:09 ` Masami Hiramatsu (Google)
  2023-11-05 16:10 ` [RFC PATCH 18/32] function_graph: Fix to initalize ftrace_ops for fgraph with ftrace_graph_func Masami Hiramatsu (Google)
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

__ftrace_return_to_handler() missed to update the curr_ret_stack when it
gets a FGRAPH_TYPE_ARRAY. Since that type entry will follows some data
words, it must update curr_ret_stack by entry size value (__get_index(val))
instead of 1.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/fgraph.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 0f8e9f22e9e0..597250bd30dc 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -740,7 +740,8 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 		case FGRAPH_TYPE_ARRAY:
 			idx = __get_array(val);
 			fgraph_array[idx]->retfunc(&trace, fgraph_array[idx]);
-			fallthrough;
+			curr_ret_stack -= __get_index(val);
+			break;
 		case FGRAPH_TYPE_RESERVED:
 			curr_ret_stack--;
 			break;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 18/32] function_graph: Fix to initalize ftrace_ops for fgraph with ftrace_graph_func
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (16 preceding siblings ...)
  2023-11-05 16:09 ` [RFC PATCH 17/32] function_graph: Fix to update curr_ret_stack with ARRAY entry size Masami Hiramatsu (Google)
@ 2023-11-05 16:10 ` Masami Hiramatsu (Google)
  2023-11-06 10:04   ` Masami Hiramatsu
  2023-11-05 16:10 ` [RFC PATCH 19/32] function_graph: Fix to check the return value of ftrace_pop_return_trace() Masami Hiramatsu (Google)
                   ` (13 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Fix to initialize the ftrace_ops of fgraph_ops with ftrace_graph_func
instead of ftrace_stub.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/fgraph.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 597250bd30dc..858fb73440ec 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -872,7 +872,7 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 void fgraph_init_ops(struct ftrace_ops *dst_ops,
 		     struct ftrace_ops *src_ops)
 {
-	dst_ops->func = ftrace_stub;
+	dst_ops->func = ftrace_graph_func;
 	dst_ops->flags = FTRACE_OPS_FL_PID | FTRACE_OPS_FL_STUB;
 
 #ifdef FTRACE_GRAPH_TRAMP_ADDR
@@ -1120,7 +1120,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 
 	if (!gops->ops.func) {
 		gops->ops.flags |= FTRACE_OPS_FL_STUB;
-		gops->ops.func = ftrace_stub;
+		gops->ops.func = ftrace_graph_func;
 #ifdef FTRACE_GRAPH_TRAMP_ADDR
 		gops->ops.trampoline = FTRACE_GRAPH_TRAMP_ADDR;
 #endif


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 19/32] function_graph: Fix to check the return value of ftrace_pop_return_trace()
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (17 preceding siblings ...)
  2023-11-05 16:10 ` [RFC PATCH 18/32] function_graph: Fix to initalize ftrace_ops for fgraph with ftrace_graph_func Masami Hiramatsu (Google)
@ 2023-11-05 16:10 ` Masami Hiramatsu (Google)
  2023-11-05 16:10 ` [RFC PATCH 20/32] function_graph: Pass the size of reserved data when retrieving it Masami Hiramatsu (Google)
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Fix to check the return value ('ret_stack') of ftrace_pop_return_trace()
instead of passed storage ('ret') because ret_stack becomes NULL in
error case.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/fgraph.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 858fb73440ec..e51695441476 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -714,7 +714,7 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 
 	ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer);
 
-	if (unlikely(!ret)) {
+	if (unlikely(!ret_stack)) {
 		ftrace_graph_stop();
 		WARN_ON(1);
 		/* Might as well panic. What else to do? */


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 20/32] function_graph: Pass the size of reserved data when retrieving it
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (18 preceding siblings ...)
  2023-11-05 16:10 ` [RFC PATCH 19/32] function_graph: Fix to check the return value of ftrace_pop_return_trace() Masami Hiramatsu (Google)
@ 2023-11-05 16:10 ` Masami Hiramatsu (Google)
  2023-11-05 16:10 ` [RFC PATCH 21/32] function_graph: Expand the max reserved data size Masami Hiramatsu (Google)
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Make the fgraph_retrieve_data() returns the reverved data size via
size_byte parameter.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/ftrace.h        |    2 +-
 kernel/trace/fgraph.c         |    5 ++++-
 kernel/trace/trace_selftest.c |   10 +++++++++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 115758fe9fec..2e24a2611ca8 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1050,7 +1050,7 @@ struct fgraph_ops {
 };
 
 void *fgraph_reserve_data(int size_bytes);
-void *fgraph_retrieve_data(void);
+void *fgraph_retrieve_data(int *size_bytes);
 
 /*
  * Stack of return addresses for functions
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index e51695441476..0b8a1daef733 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -289,6 +289,7 @@ void *fgraph_reserve_data(int size_bytes)
 
 /**
  * fgraph_retrieve_data - Retrieve stored data from fgraph_reserve_data()
+ * @size_bytes: pointer to retrieved data size.
  *
  * This is to be called by a fgraph_ops retfunc(), to retrieve data that
  * was stored by the fgraph_ops entryfunc() on the function entry.
@@ -300,7 +301,7 @@ void *fgraph_reserve_data(int size_bytes)
  *    matching entryfunc() for the retfunc() this is called from.
  *   Or NULL if there was nothing stored.
  */
-void *fgraph_retrieve_data(void)
+void *fgraph_retrieve_data(int *size_bytes)
 {
 	unsigned long val;
 	int curr_ret_stack = current->curr_ret_stack;
@@ -313,6 +314,8 @@ void *fgraph_retrieve_data(void)
 	val = current->ret_stack[curr_ret_stack - 2];
 	if (__get_type(val) != FGRAPH_TYPE_DATA)
 		return NULL;
+	if (size_bytes)
+		*size_bytes = (__get_data(val) - 1) * sizeof(long);
 
 	return &current->ret_stack[curr_ret_stack -
 				   (__get_data(val) + 1)];
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 14966d8509ed..c656ec5b3348 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -807,15 +807,23 @@ static __init void store_return(struct ftrace_graph_ret *trace,
 	const char *type = fgraph_store_type_name;
 	long long expect = 0;
 	long long found = -1;
+	int size;
 	char *p;
 
-	p = fgraph_retrieve_data();
+	p = fgraph_retrieve_data(&size);
 	if (!p) {
 		snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
 			 "Failed to retrieve %s\n", type);
 		fgraph_error_str = fgraph_error_str_buf;
 		return;
 	}
+	if (fgraph_store_size > size) {
+		snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+			 "Retrieved size %d is smaller than expected %d\n",
+			 size, (int)fgraph_store_size);
+		fgraph_error_str = fgraph_error_str_buf;
+		return;
+	}
 
 	switch (fgraph_store_size) {
 	case 1:


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 21/32] function_graph: Expand the max reserved data size
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (19 preceding siblings ...)
  2023-11-05 16:10 ` [RFC PATCH 20/32] function_graph: Pass the size of reserved data when retrieving it Masami Hiramatsu (Google)
@ 2023-11-05 16:10 ` Masami Hiramatsu (Google)
  2023-11-05 16:10 ` [RFC PATCH 22/32] function_graph: Add a new entry handler with parent_ip and ftrace_regs Masami Hiramatsu (Google)
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Expand the maximum reserved data size to 32 from 4. On 64bit system,
this will allow reserving 256 bytes on the ret_stack in maximum but
that may not happen.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/fgraph.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 0b8a1daef733..e7188c67356e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -95,7 +95,7 @@ enum {
 #define FGRAPH_ARRAY_MASK	((1 << FGRAPH_ARRAY_SIZE) - 1)
 #define FGRAPH_ARRAY_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
 
-#define FGRAPH_DATA_SIZE	2
+#define FGRAPH_DATA_SIZE	5
 #define FGRAPH_DATA_MASK	((1 << FGRAPH_DATA_SIZE) - 1)
 #define FGRAPH_DATA_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
 
@@ -113,7 +113,7 @@ enum {
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
 
-#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * 4)
+#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * (1 << FGRAPH_DATA_SIZE))
 
 /*
  * Each fgraph_ops has a reservered unsigned long at the end (top) of the


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 22/32] function_graph: Add a new entry handler with parent_ip and ftrace_regs
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (20 preceding siblings ...)
  2023-11-05 16:10 ` [RFC PATCH 21/32] function_graph: Expand the max reserved data size Masami Hiramatsu (Google)
@ 2023-11-05 16:10 ` Masami Hiramatsu (Google)
  2023-11-05 16:11 ` [RFC PATCH 23/32] function_graph: Add a new exit " Masami Hiramatsu (Google)
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Add a new entry handler to fgraph_ops as 'entryregfunc'  which takes
parent_ip and ftrace_regs. Note that the 'entryfunc' and 'entryregfunc'
are mutual exclusive. You can set only one of them.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 arch/arm64/kernel/ftrace.c               |    8 ++++-
 arch/loongarch/kernel/ftrace_dyn.c       |    6 +++-
 arch/powerpc/kernel/trace/ftrace.c       |    2 +
 arch/powerpc/kernel/trace/ftrace_64_pg.c |   10 ++++--
 arch/x86/kernel/ftrace.c                 |   50 ++++++++++++++++++++----------
 include/linux/ftrace.h                   |   17 +++++++++-
 kernel/trace/fgraph.c                    |   27 +++++++++++++---
 7 files changed, 90 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index a650f5e11fc5..e94d9dc13d89 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -481,7 +481,13 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
 		       struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
-	prepare_ftrace_return(ip, &fregs->lr, fregs->fp);
+	if (unlikely(atomic_read(&current->tracing_graph_pause)))
+		return;
+
+	if (!function_graph_enter_regs(fregs->lr, ip, fregs->fp,
+				       (void *)fregs->fp, fregs)) {
+		fregs->lr = (unsigned long)&return_to_handler;
+	}
 }
 #else
 /*
diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
index 73858c9029cc..39b3f09a5e0c 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -244,7 +244,11 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
 	struct pt_regs *regs = &fregs->regs;
 	unsigned long *parent = (unsigned long *)&regs->regs[1];
 
-	prepare_ftrace_return(ip, (unsigned long *)parent);
+	if (unlikely(atomic_read(&current->tracing_graph_pause)))
+		return;
+
+	if (!function_graph_enter_regs(regs->regs[1], ip, 0, parent, fregs))
+		regs->regs[1] = (unsigned long)&return_to_handler;
 }
 #else
 static int ftrace_modify_graph_caller(bool enable)
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 82010629cf88..9bf1b6912116 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -422,7 +422,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
 	if (bit < 0)
 		goto out;
 
-	if (!function_graph_enter(parent_ip, ip, 0, (unsigned long *)sp))
+	if (!function_graph_enter_regs(parent_ip, ip, 0, (unsigned long *)sp, fregs))
 		parent_ip = ppc_function_entry(return_to_handler);
 
 	ftrace_test_recursion_unlock(bit);
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 7b85c3b460a3..43f6cfaaf7db 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -795,7 +795,8 @@ int ftrace_disable_ftrace_graph_caller(void)
  * in current thread info. Return the address we want to divert to.
  */
 static unsigned long
-__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp)
+__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp,
+			struct ftrace_regs *fregs)
 {
 	unsigned long return_hooker;
 	int bit;
@@ -812,7 +813,7 @@ __prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp
 
 	return_hooker = ppc_function_entry(return_to_handler);
 
-	if (!function_graph_enter(parent, ip, 0, (unsigned long *)sp))
+	if (!function_graph_enter_regs(parent, ip, 0, (unsigned long *)sp, fregs))
 		parent = return_hooker;
 
 	ftrace_test_recursion_unlock(bit);
@@ -824,13 +825,14 @@ __prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
 		       struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
-	fregs->regs.link = __prepare_ftrace_return(parent_ip, ip, fregs->regs.gpr[1]);
+	fregs->regs.link = __prepare_ftrace_return(parent_ip, ip,
+						   fregs->regs.gpr[1], fregs);
 }
 #else
 unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
 				    unsigned long sp)
 {
-	return __prepare_ftrace_return(parent, ip, sp);
+	return __prepare_ftrace_return(parent, ip, sp, NULL);
 }
 #endif
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 12df54ff0e81..85247a8f265b 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -614,16 +614,8 @@ int ftrace_disable_ftrace_graph_caller(void)
 }
 #endif /* CONFIG_DYNAMIC_FTRACE && !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
 
-/*
- * Hook the return address and push it in the stack of return addrs
- * in current thread info.
- */
-void prepare_ftrace_return(unsigned long ip, unsigned long *parent,
-			   unsigned long frame_pointer)
+static inline bool skip_ftrace_return(void)
 {
-	unsigned long return_hooker = (unsigned long)&return_to_handler;
-	int bit;
-
 	/*
 	 * When resuming from suspend-to-ram, this function can be indirectly
 	 * called from early CPU startup code while the CPU is in real mode,
@@ -633,13 +625,28 @@ void prepare_ftrace_return(unsigned long ip, unsigned long *parent,
 	 * This check isn't as accurate as virt_addr_valid(), but it should be
 	 * good enough for this purpose, and it's fast.
 	 */
-	if (unlikely((long)__builtin_frame_address(0) >= 0))
-		return;
+	if ((long)__builtin_frame_address(0) >= 0)
+		return true;
 
-	if (unlikely(ftrace_graph_is_dead()))
-		return;
+	if (ftrace_graph_is_dead())
+		return true;
+
+	if (atomic_read(&current->tracing_graph_pause))
+		return true;
+	return false;
+}
+
+/*
+ * Hook the return address and push it in the stack of return addrs
+ * in current thread info.
+ */
+void prepare_ftrace_return(unsigned long ip, unsigned long *parent,
+			   unsigned long frame_pointer)
+{
+	unsigned long return_hooker = (unsigned long)&return_to_handler;
+	int bit;
 
-	if (unlikely(atomic_read(&current->tracing_graph_pause)))
+	if (unlikely(skip_ftrace_return()))
 		return;
 
 	bit = ftrace_test_recursion_trylock(ip, *parent);
@@ -657,9 +664,20 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
 		       struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
 	struct pt_regs *regs = &fregs->regs;
-	unsigned long *stack = (unsigned long *)kernel_stack_pointer(regs);
+	unsigned long *parent = (unsigned long *)kernel_stack_pointer(regs);
+	int bit;
 
-	prepare_ftrace_return(ip, (unsigned long *)stack, 0);
+	if (unlikely(skip_ftrace_return()))
+		return;
+
+	bit = ftrace_test_recursion_trylock(ip, *parent);
+	if (bit < 0)
+		return;
+
+	if (!function_graph_enter_regs(*parent, ip, 0, parent, fregs))
+		*parent = (unsigned long)&return_to_handler;
+
+	ftrace_test_recursion_unlock(bit);
 }
 #endif
 
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 2e24a2611ca8..a6fd930d3500 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1037,6 +1037,11 @@ typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *,
 typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *,
 				      struct fgraph_ops *); /* entry */
 
+typedef int (*trace_func_graph_regs_ent_t)(unsigned long func,
+					   unsigned long parent_ip,
+					   struct ftrace_regs *fregs,
+					   struct fgraph_ops *); /* entry w/ regs */
+
 extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -1044,6 +1049,7 @@ extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph
 struct fgraph_ops {
 	trace_func_graph_ent_t		entryfunc;
 	trace_func_graph_ret_t		retfunc;
+	trace_func_graph_regs_ent_t	entryregfunc;
 	struct ftrace_ops		ops; /* for the hash lists */
 	void				*private;
 	int				idx;
@@ -1080,8 +1086,15 @@ struct ftrace_ret_stack {
 extern void return_to_handler(void);
 
 extern int
-function_graph_enter(unsigned long ret, unsigned long func,
-		     unsigned long frame_pointer, unsigned long *retp);
+function_graph_enter_regs(unsigned long ret, unsigned long func,
+			  unsigned long frame_pointer, unsigned long *retp,
+			  struct ftrace_regs *fregs);
+
+static inline int function_graph_enter(unsigned long ret, unsigned long func,
+				       unsigned long fp, unsigned long *retp)
+{
+	return function_graph_enter_regs(ret, func, fp, retp, NULL);
+}
 
 struct ftrace_ret_stack *
 ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index e7188c67356e..2623a5987f52 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -512,8 +512,20 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
 # define MCOUNT_INSN_SIZE 0
 #endif
 
-int function_graph_enter(unsigned long ret, unsigned long func,
-			 unsigned long frame_pointer, unsigned long *retp)
+static inline int call_entry_func(struct ftrace_graph_ent *trace,
+				  unsigned long func, unsigned long ret,
+				  struct ftrace_regs *fregs,
+				  struct fgraph_ops *gops)
+{
+	if (gops->entryregfunc)
+		return gops->entryregfunc(func, ret, fregs, gops);
+
+	return gops->entryfunc(trace, gops);
+}
+
+int function_graph_enter_regs(unsigned long ret, unsigned long func,
+			      unsigned long frame_pointer, unsigned long *retp,
+			      struct ftrace_regs *fregs)
 {
 	struct ftrace_graph_ent trace;
 	int save_curr_ret_stack;
@@ -558,7 +570,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 		}
 		save_curr_ret_stack = current->curr_ret_stack;
 		if (ftrace_ops_test(&gops->ops, func, NULL) &&
-		    gops->entryfunc(&trace, gops)) {
+		    call_entry_func(&trace, func, ret, fregs, gops)) {
 			/* Note, curr_ret_stack could change by enryfunc() */
 			offset = current->curr_ret_stack;
 			/* Check the top level stored word */
@@ -876,7 +888,8 @@ void fgraph_init_ops(struct ftrace_ops *dst_ops,
 		     struct ftrace_ops *src_ops)
 {
 	dst_ops->func = ftrace_graph_func;
-	dst_ops->flags = FTRACE_OPS_FL_PID | FTRACE_OPS_FL_STUB;
+	dst_ops->flags = FTRACE_OPS_FL_PID | FTRACE_OPS_FL_STUB |
+			 FTRACE_OPS_FL_SAVE_ARGS;
 
 #ifdef FTRACE_GRAPH_TRAMP_ADDR
 	dst_ops->trampoline = FTRACE_GRAPH_TRAMP_ADDR;
@@ -1119,10 +1132,14 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 	int ret = 0;
 	int i;
 
+	if (gops->entryfunc && gops->entryregfunc)
+		return -EINVAL;
+
 	mutex_lock(&ftrace_lock);
 
 	if (!gops->ops.func) {
-		gops->ops.flags |= FTRACE_OPS_FL_STUB;
+		gops->ops.flags |=
+			FTRACE_OPS_FL_STUB | FTRACE_OPS_FL_SAVE_ARGS;
 		gops->ops.func = ftrace_graph_func;
 #ifdef FTRACE_GRAPH_TRAMP_ADDR
 		gops->ops.trampoline = FTRACE_GRAPH_TRAMP_ADDR;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 23/32] function_graph: Add a new exit handler with parent_ip and ftrace_regs
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (21 preceding siblings ...)
  2023-11-05 16:10 ` [RFC PATCH 22/32] function_graph: Add a new entry handler with parent_ip and ftrace_regs Masami Hiramatsu (Google)
@ 2023-11-05 16:11 ` Masami Hiramatsu (Google)
  2023-11-05 16:11 ` [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS Masami Hiramatsu (Google)
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:11 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Add a new return handler to fgraph_ops as 'retregfunc'  which takes
parent_ip and ftrace_regs instead of ftrace_graph_ret. This handler
is available only if the arch support CONFIG_HAVE_FUNCTION_GRAPH_FREGS.
Note that the 'retfunc' and 'reregfunc' are mutual exclusive.
You can set only one of them.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 arch/x86/include/asm/ftrace.h |    2 +
 include/linux/ftrace.h        |   10 +++++-
 kernel/trace/Kconfig          |    5 ++-
 kernel/trace/fgraph.c         |   74 +++++++++++++++++++++++++++--------------
 4 files changed, 63 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 897cf02c20b1..74b1d245c38b 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -66,6 +66,8 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
 	override_function_with_return(&(fregs)->regs)
 #define ftrace_regs_query_register_offset(name) \
 	regs_query_register_offset(name)
+#define ftrace_regs_get_frame_pointer(fregs) \
+	frame_pointer(&(fregs)->regs)
 
 struct ftrace_ops;
 #define ftrace_graph_func ftrace_graph_func
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index a6fd930d3500..0c036c5d7c12 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -43,7 +43,9 @@ struct dyn_ftrace;
 
 char *arch_ftrace_match_adjust(char *str, const char *search);
 
-#ifdef CONFIG_HAVE_FUNCTION_GRAPH_RETVAL
+#ifdef CONFIG_HAVE_FUNCTION_GRAPH_FREGS
+unsigned long ftrace_return_to_handler(struct ftrace_regs *fregs);
+#elif defined(CONFIG_HAVE_FUNCTION_GRAPH_RETVAL)
 struct fgraph_ret_regs;
 unsigned long ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs);
 #else
@@ -131,6 +133,7 @@ struct ftrace_regs {
 #define ftrace_regs_set_instruction_pointer(fregs, ip) do { } while (0)
 #endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
 
+
 static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs *fregs)
 {
 	if (!fregs)
@@ -1041,6 +1044,10 @@ typedef int (*trace_func_graph_regs_ent_t)(unsigned long func,
 					   unsigned long parent_ip,
 					   struct ftrace_regs *fregs,
 					   struct fgraph_ops *); /* entry w/ regs */
+typedef void (*trace_func_graph_regs_ret_t)(unsigned long func,
+					    unsigned long parent_ip,
+					    struct ftrace_regs *,
+					    struct fgraph_ops *); /* return w/ regs */
 
 extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
 
@@ -1050,6 +1057,7 @@ struct fgraph_ops {
 	trace_func_graph_ent_t		entryfunc;
 	trace_func_graph_ret_t		retfunc;
 	trace_func_graph_regs_ent_t	entryregfunc;
+	trace_func_graph_regs_ret_t	retregfunc;
 	struct ftrace_ops		ops; /* for the hash lists */
 	void				*private;
 	int				idx;
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 61c541c36596..308b3bec01b1 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -34,6 +34,9 @@ config HAVE_FUNCTION_GRAPH_TRACER
 config HAVE_FUNCTION_GRAPH_RETVAL
 	bool
 
+config HAVE_FUNCTION_GRAPH_FREGS
+	bool
+
 config HAVE_DYNAMIC_FTRACE
 	bool
 	help
@@ -232,7 +235,7 @@ config FUNCTION_GRAPH_TRACER
 
 config FUNCTION_GRAPH_RETVAL
 	bool "Kernel Function Graph Return Value"
-	depends on HAVE_FUNCTION_GRAPH_RETVAL
+	depends on HAVE_FUNCTION_GRAPH_RETVAL || HAVE_FUNCTION_GRAPH_FREGS
 	depends on FUNCTION_GRAPH_TRACER
 	default n
 	help
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 2623a5987f52..c76fb2f33437 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -624,8 +624,7 @@ int function_graph_enter_regs(unsigned long ret, unsigned long func,
 
 /* Retrieve a function return address to the trace stack on thread info.*/
 static struct ftrace_ret_stack *
-ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
-			unsigned long frame_pointer)
+ftrace_pop_return_trace(unsigned long *ret, unsigned long frame_pointer)
 {
 	struct ftrace_ret_stack *ret_stack;
 	int index;
@@ -670,10 +669,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 #endif
 
 	*ret = ret_stack->ret;
-	trace->func = ret_stack->func;
-	trace->calltime = ret_stack->calltime;
-	trace->overrun = atomic_read(&current->trace_overrun);
-	trace->depth = current->curr_ret_depth;
 	/*
 	 * We still want to trace interrupts coming in if
 	 * max_depth is set to 1. Make sure the decrement is
@@ -712,22 +707,43 @@ static struct notifier_block ftrace_suspend_notifier = {
 /* fgraph_ret_regs is not defined without CONFIG_FUNCTION_GRAPH_RETVAL */
 struct fgraph_ret_regs;
 
+static void fgraph_call_retfunc(struct ftrace_regs *fregs,
+				struct fgraph_ret_regs *ret_regs,
+				struct ftrace_ret_stack *ret_stack,
+				struct fgraph_ops *gops)
+{
+	struct ftrace_graph_ret trace;
+
+	trace.func = ret_stack->func;
+	trace.calltime = ret_stack->calltime;
+	trace.overrun = atomic_read(&current->trace_overrun);
+	trace.depth = current->curr_ret_depth;
+	trace.rettime = trace_clock_local();
+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+	if (fregs)
+		trace.retval = ftrace_regs_return_value(fregs);
+	else
+		trace.retval = fgraph_ret_regs_return_value(ret_regs);
+#endif
+	gops->retfunc(&trace, gops);
+}
+
 /*
  * Send the trace to the ring-buffer.
  * @return the original return address.
  */
-static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs,
+static unsigned long __ftrace_return_to_handler(struct ftrace_regs *fregs,
+						struct fgraph_ret_regs *ret_regs,
 						unsigned long frame_pointer)
 {
 	struct ftrace_ret_stack *ret_stack;
-	struct ftrace_graph_ret trace;
-	unsigned long ret;
+	struct fgraph_ops *gops;
 	int curr_ret_stack;
+	unsigned long ret;
 	int stop_at;
 	int index;
-	int idx;
 
-	ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer);
+	ret_stack = ftrace_pop_return_trace(&ret, frame_pointer);
 
 	if (unlikely(!ret_stack)) {
 		ftrace_graph_stop();
@@ -736,11 +752,6 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 		return (unsigned long)panic;
 	}
 
-	trace.rettime = trace_clock_local();
-#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
-	trace.retval = fgraph_ret_regs_return_value(ret_regs);
-#endif
-
 	curr_ret_stack = current->curr_ret_stack;
 	index = get_ret_stack_index(current, curr_ret_stack - 1);
 
@@ -753,8 +764,11 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 		val = current->ret_stack[curr_ret_stack - 1];
 		switch (__get_type(val)) {
 		case FGRAPH_TYPE_ARRAY:
-			idx = __get_array(val);
-			fgraph_array[idx]->retfunc(&trace, fgraph_array[idx]);
+			gops = fgraph_array[__get_array(val)];
+			if (gops->retregfunc)
+				gops->retregfunc(ret_stack->func, ret, fregs, gops);
+			else
+				fgraph_call_retfunc(fregs, ret_regs, ret_stack, gops);
 			curr_ret_stack -= __get_index(val);
 			break;
 		case FGRAPH_TYPE_RESERVED:
@@ -778,20 +792,22 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 	return ret;
 }
 
-/*
- * After all architecures have selected HAVE_FUNCTION_GRAPH_RETVAL, we can
- * leave only ftrace_return_to_handler(ret_regs).
- */
-#ifdef CONFIG_HAVE_FUNCTION_GRAPH_RETVAL
+#ifdef CONFIG_HAVE_FUNCTION_GRAPH_FREGS
+unsigned long ftrace_return_to_handler(struct ftrace_regs *fregs)
+{
+	return __ftrace_return_to_handler(fregs, NULL,
+				ftrace_regs_get_frame_pointer(fregs));
+}
+#elif defined(CONFIG_HAVE_FUNCTION_GRAPH_RETVAL)
 unsigned long ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs)
 {
-	return __ftrace_return_to_handler(ret_regs,
+	return __ftrace_return_to_handler(NULL, ret_regs,
 				fgraph_ret_regs_frame_pointer(ret_regs));
 }
 #else
 unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 {
-	return __ftrace_return_to_handler(NULL, frame_pointer);
+	return __ftrace_return_to_handler(NULL, NULL, frame_pointer);
 }
 #endif
 
@@ -1132,9 +1148,15 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 	int ret = 0;
 	int i;
 
-	if (gops->entryfunc && gops->entryregfunc)
+	if ((gops->entryfunc && gops->entryregfunc) ||
+	    (gops->retfunc && gops->retregfunc))
 		return -EINVAL;
 
+#ifndef CONFIG_HAVE_FUNCTION_GRAPH_FREGS
+	if (gops->retregfunc)
+		return -EOPNOTSUPP;
+#endif
+
 	mutex_lock(&ftrace_lock);
 
 	if (!gops->ops.func) {


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (22 preceding siblings ...)
  2023-11-05 16:11 ` [RFC PATCH 23/32] function_graph: Add a new exit " Masami Hiramatsu (Google)
@ 2023-11-05 16:11 ` Masami Hiramatsu (Google)
  2023-11-05 17:25   ` Peter Zijlstra
  2023-11-05 16:11 ` [RFC PATCH 25/32] fprobe: Use ftrace_regs in fprobe entry handler Masami Hiramatsu (Google)
                   ` (7 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:11 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Support HAVE_FUNCTION_GRAPH_FREGS on x86-64, which saves ftrace_regs
on the stack in ftrace_graph return trampoline so that the callbacks
can access registers via ftrace_regs APIs.

Note that this only recovers 'rax' and 'rdx' registers because other
registers are not used anymore and recovered by caller. 'rax' and
'rdx' will be used for passing the return value.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 arch/x86/Kconfig            |    3 ++-
 arch/x86/kernel/ftrace_64.S |   30 ++++++++++++++++++++++--------
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 66bfabae8814..4b4c2f9d67da 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -219,7 +219,8 @@ config X86
 	select HAVE_FAST_GUP
 	select HAVE_FENTRY			if X86_64 || DYNAMIC_FTRACE
 	select HAVE_FTRACE_MCOUNT_RECORD
-	select HAVE_FUNCTION_GRAPH_RETVAL	if HAVE_FUNCTION_GRAPH_TRACER
+	select HAVE_FUNCTION_GRAPH_FREGS	if HAVE_DYNAMIC_FTRACE_WITH_ARGS
+	select HAVE_FUNCTION_GRAPH_RETVAL	if !HAVE_DYNAMIC_FTRACE_WITH_ARGS
 	select HAVE_FUNCTION_GRAPH_TRACER	if X86_32 || (X86_64 && DYNAMIC_FTRACE)
 	select HAVE_FUNCTION_TRACER
 	select HAVE_GCC_PLUGINS
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 945cfa5f7239..41351a621753 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -348,21 +348,35 @@ STACK_FRAME_NON_STANDARD_FP(__fentry__)
 SYM_CODE_START(return_to_handler)
 	UNWIND_HINT_UNDEFINED
 	ANNOTATE_NOENDBR
-	subq  $24, %rsp
+	/*
+	 * We add enough stack to save all regs, but saves only registers
+	 * for function params.
+	 */
+	subq $(FRAME_SIZE), %rsp
+	movq %rax, RAX(%rsp)
+	movq %rcx, RCX(%rsp)
+	movq %rdx, RDX(%rsp)
+	movq %rsi, RSI(%rsp)
+	movq %rdi, RDI(%rsp)
+	movq %r8, R8(%rsp)
+	movq %r9, R9(%rsp)
+	movq $0, ORIG_RAX(%rsp)
+	movq %rbp, RBP(%rsp)
 
-	/* Save the return values */
-	movq %rax, (%rsp)
-	movq %rdx, 8(%rsp)
-	movq %rbp, 16(%rsp)
 	movq %rsp, %rdi
 
 	call ftrace_return_to_handler
 
 	movq %rax, %rdi
-	movq 8(%rsp), %rdx
-	movq (%rsp), %rax
 
-	addq $24, %rsp
+	/*
+	 * Restore only rax and rdx because other registers are not used
+	 * for return value nor callee saved. Caller will reuse/recover it.
+	 */
+	movq RDX(%rsp), %rdx
+	movq RAX(%rsp), %rax
+
+	addq $(FRAME_SIZE), %rsp
 	/*
 	 * Jump back to the old return address. This cannot be JMP_NOSPEC rdi
 	 * since IBT would demand that contain ENDBR, which simply isn't so for


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 25/32] fprobe: Use ftrace_regs in fprobe entry handler
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (23 preceding siblings ...)
  2023-11-05 16:11 ` [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS Masami Hiramatsu (Google)
@ 2023-11-05 16:11 ` Masami Hiramatsu (Google)
  2023-11-05 16:11 ` [RFC PATCH 26/32] fprobe: Use ftrace_regs in fprobe exit handler Masami Hiramatsu (Google)
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:11 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

This allows fprobes to be available with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
instead of CONFIG_DYNAMIC_FTRACE_WITH_REGS, then we can enable fprobe
on arm64.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Florent Revest <revest@chromium.org>
---
 Changes from previous series: NOTHING, just forward ported.
---
 include/linux/fprobe.h          |    2 +-
 kernel/trace/Kconfig            |    3 ++-
 kernel/trace/bpf_trace.c        |   10 +++++++---
 kernel/trace/fprobe.c           |    4 ++--
 kernel/trace/trace_fprobe.c     |    6 +++++-
 lib/test_fprobe.c               |    4 ++--
 samples/fprobe/fprobe_example.c |    2 +-
 7 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 3e03758151f4..36c0595f7b93 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -35,7 +35,7 @@ struct fprobe {
 	int			nr_maxactive;
 
 	int (*entry_handler)(struct fprobe *fp, unsigned long entry_ip,
-			     unsigned long ret_ip, struct pt_regs *regs,
+			     unsigned long ret_ip, struct ftrace_regs *regs,
 			     void *entry_data);
 	void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip,
 			     unsigned long ret_ip, struct pt_regs *regs,
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 308b3bec01b1..805d72ab77c6 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -290,7 +290,7 @@ config DYNAMIC_FTRACE_WITH_ARGS
 config FPROBE
 	bool "Kernel Function Probe (fprobe)"
 	depends on FUNCTION_TRACER
-	depends on DYNAMIC_FTRACE_WITH_REGS
+	depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
 	depends on HAVE_RETHOOK
 	select RETHOOK
 	default n
@@ -675,6 +675,7 @@ config FPROBE_EVENTS
 	select TRACING
 	select PROBE_EVENTS
 	select DYNAMIC_EVENTS
+	depends on DYNAMIC_FTRACE_WITH_REGS
 	default y
 	help
 	  This allows user to add tracing events on the function entry and
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 868008f56fec..b0b98f0a0e52 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2501,7 +2501,7 @@ static int __init bpf_event_init(void)
 fs_initcall(bpf_event_init);
 #endif /* CONFIG_MODULES */
 
-#ifdef CONFIG_FPROBE
+#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
 struct bpf_kprobe_multi_link {
 	struct bpf_link link;
 	struct fprobe fp;
@@ -2729,10 +2729,14 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
 
 static int
 kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,
-			  unsigned long ret_ip, struct pt_regs *regs,
+			  unsigned long ret_ip, struct ftrace_regs *fregs,
 			  void *data)
 {
 	struct bpf_kprobe_multi_link *link;
+	struct pt_regs *regs = ftrace_get_regs(fregs);
+
+	if (!regs)
+		return 0;
 
 	link = container_of(fp, struct bpf_kprobe_multi_link, fp);
 	kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
@@ -3004,7 +3008,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
 	kvfree(cookies);
 	return err;
 }
-#else /* !CONFIG_FPROBE */
+#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
 int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
 	return -EOPNOTSUPP;
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 881f90f0cbcf..df323cb7bed1 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -46,7 +46,7 @@ static inline void __fprobe_handler(unsigned long ip, unsigned long parent_ip,
 	}
 
 	if (fp->entry_handler)
-		ret = fp->entry_handler(fp, ip, parent_ip, ftrace_get_regs(fregs), entry_data);
+		ret = fp->entry_handler(fp, ip, parent_ip, fregs, entry_data);
 
 	/* If entry_handler returns !0, nmissed is not counted. */
 	if (rh) {
@@ -182,7 +182,7 @@ static void fprobe_init(struct fprobe *fp)
 		fp->ops.func = fprobe_kprobe_handler;
 	else
 		fp->ops.func = fprobe_handler;
-	fp->ops.flags |= FTRACE_OPS_FL_SAVE_REGS;
+	fp->ops.flags |= FTRACE_OPS_FL_SAVE_ARGS;
 }
 
 static int fprobe_init_rethook(struct fprobe *fp, int num)
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 8bfe23af9c73..71bf38d698f1 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -320,12 +320,16 @@ NOKPROBE_SYMBOL(fexit_perf_func);
 #endif	/* CONFIG_PERF_EVENTS */
 
 static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
-			     unsigned long ret_ip, struct pt_regs *regs,
+			     unsigned long ret_ip, struct ftrace_regs *fregs,
 			     void *entry_data)
 {
 	struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
+	struct pt_regs *regs = ftrace_get_regs(fregs);
 	int ret = 0;
 
+	if (!regs)
+		return 0;
+
 	if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
 		fentry_trace_func(tf, entry_ip, regs);
 #ifdef CONFIG_PERF_EVENTS
diff --git a/lib/test_fprobe.c b/lib/test_fprobe.c
index 24de0e5ff859..ff607babba18 100644
--- a/lib/test_fprobe.c
+++ b/lib/test_fprobe.c
@@ -40,7 +40,7 @@ static noinline u32 fprobe_selftest_nest_target(u32 value, u32 (*nest)(u32))
 
 static notrace int fp_entry_handler(struct fprobe *fp, unsigned long ip,
 				    unsigned long ret_ip,
-				    struct pt_regs *regs, void *data)
+				    struct ftrace_regs *fregs, void *data)
 {
 	KUNIT_EXPECT_FALSE(current_test, preemptible());
 	/* This can be called on the fprobe_selftest_target and the fprobe_selftest_target2 */
@@ -81,7 +81,7 @@ static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip,
 
 static notrace int nest_entry_handler(struct fprobe *fp, unsigned long ip,
 				      unsigned long ret_ip,
-				      struct pt_regs *regs, void *data)
+				      struct ftrace_regs *fregs, void *data)
 {
 	KUNIT_EXPECT_FALSE(current_test, preemptible());
 	return 0;
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index 64e715e7ed11..1545a1aac616 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -50,7 +50,7 @@ static void show_backtrace(void)
 
 static int sample_entry_handler(struct fprobe *fp, unsigned long ip,
 				unsigned long ret_ip,
-				struct pt_regs *regs, void *data)
+				struct ftrace_regs *fregs, void *data)
 {
 	if (use_trace)
 		/*


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 26/32] fprobe: Use ftrace_regs in fprobe exit handler
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (24 preceding siblings ...)
  2023-11-05 16:11 ` [RFC PATCH 25/32] fprobe: Use ftrace_regs in fprobe entry handler Masami Hiramatsu (Google)
@ 2023-11-05 16:11 ` Masami Hiramatsu (Google)
  2023-11-05 16:11 ` [RFC PATCH 27/32] tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs Masami Hiramatsu (Google)
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:11 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Change the fprobe exit handler to use ftrace_regs structure instead of
pt_regs. This also introduce HAVE_PT_REGS_TO_FTRACE_REGS_CAST which means
the ftrace_regs's memory layout is equal to the pt_regs so that those are
able to cast. Fprobe introduces a new dependency with that.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
  Changes from previous series: NOTHING, just forward ported.
---
 arch/loongarch/Kconfig          |    1 +
 arch/s390/Kconfig               |    1 +
 arch/x86/Kconfig                |    1 +
 include/linux/fprobe.h          |    2 +-
 include/linux/ftrace.h          |    5 +++++
 kernel/trace/Kconfig            |    8 ++++++++
 kernel/trace/bpf_trace.c        |    6 +++++-
 kernel/trace/fprobe.c           |    3 ++-
 kernel/trace/trace_fprobe.c     |    6 +++++-
 lib/test_fprobe.c               |    6 +++---
 samples/fprobe/fprobe_example.c |    2 +-
 11 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index e14396a2ddcb..258e9bee1503 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -108,6 +108,7 @@ config LOONGARCH
 	select HAVE_DMA_CONTIGUOUS
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS
+	select HAVE_PT_REGS_TO_FTRACE_REGS_CAST
 	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
 	select HAVE_EBPF_JIT
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index ae29e4392664..5aedb4320e7c 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -167,6 +167,7 @@ config S390
 	select HAVE_DMA_CONTIGUOUS
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS
+	select HAVE_PT_REGS_TO_FTRACE_REGS_CAST
 	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
 	select HAVE_EBPF_JIT if HAVE_MARCH_Z196_FEATURES
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4b4c2f9d67da..e11ba9a33afe 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -209,6 +209,7 @@ config X86
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS	if X86_64
+	select HAVE_PT_REGS_TO_FTRACE_REGS_CAST	if X86_64
 	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 	select HAVE_SAMPLE_FTRACE_DIRECT	if X86_64
 	select HAVE_SAMPLE_FTRACE_DIRECT_MULTI	if X86_64
diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 36c0595f7b93..879a30956009 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -38,7 +38,7 @@ struct fprobe {
 			     unsigned long ret_ip, struct ftrace_regs *regs,
 			     void *entry_data);
 	void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip,
-			     unsigned long ret_ip, struct pt_regs *regs,
+			     unsigned long ret_ip, struct ftrace_regs *fregs,
 			     void *entry_data);
 };
 
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 0c036c5d7c12..894117e909dd 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -133,6 +133,11 @@ struct ftrace_regs {
 #define ftrace_regs_set_instruction_pointer(fregs, ip) do { } while (0)
 #endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
 
+#ifdef CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST
+
+static_assert(sizeof(struct pt_regs) == sizeof(struct ftrace_regs));
+
+#endif /* CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
 
 static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs *fregs)
 {
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 805d72ab77c6..1a2544712690 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -60,6 +60,13 @@ config HAVE_DYNAMIC_FTRACE_WITH_ARGS
 	 This allows for use of ftrace_regs_get_argument() and
 	 ftrace_regs_get_stack_pointer().
 
+config HAVE_PT_REGS_TO_FTRACE_REGS_CAST
+	bool
+	help
+	 If this is set, the memory layout of the ftrace_regs data structure
+	 is the same as the pt_regs. So the pt_regs is possible to be casted
+	 to ftrace_regs.
+
 config HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
 	bool
 	help
@@ -291,6 +298,7 @@ config FPROBE
 	bool "Kernel Function Probe (fprobe)"
 	depends on FUNCTION_TRACER
 	depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
+	depends on HAVE_PT_REGS_TO_FTRACE_REGS_CAST || !HAVE_DYNAMIC_FTRACE_WITH_ARGS
 	depends on HAVE_RETHOOK
 	select RETHOOK
 	default n
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index b0b98f0a0e52..a4b5e34b0419 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2745,10 +2745,14 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,
 
 static void
 kprobe_multi_link_exit_handler(struct fprobe *fp, unsigned long fentry_ip,
-			       unsigned long ret_ip, struct pt_regs *regs,
+			       unsigned long ret_ip, struct ftrace_regs *fregs,
 			       void *data)
 {
 	struct bpf_kprobe_multi_link *link;
+	struct pt_regs *regs = ftrace_get_regs(fregs);
+
+	if (!regs)
+		return;
 
 	link = container_of(fp, struct bpf_kprobe_multi_link, fp);
 	kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index df323cb7bed1..38fe6a19450b 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -124,6 +124,7 @@ static void fprobe_exit_handler(struct rethook_node *rh, void *data,
 {
 	struct fprobe *fp = (struct fprobe *)data;
 	struct fprobe_rethook_node *fpr;
+	struct ftrace_regs *fregs = (struct ftrace_regs *)regs;
 	int bit;
 
 	if (!fp || fprobe_disabled(fp))
@@ -141,7 +142,7 @@ static void fprobe_exit_handler(struct rethook_node *rh, void *data,
 		return;
 	}
 
-	fp->exit_handler(fp, fpr->entry_ip, ret_ip, regs,
+	fp->exit_handler(fp, fpr->entry_ip, ret_ip, fregs,
 			 fp->entry_data_size ? (void *)fpr->data : NULL);
 	ftrace_test_recursion_unlock(bit);
 }
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 71bf38d698f1..c60d0d9f1a95 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -341,10 +341,14 @@ static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
 NOKPROBE_SYMBOL(fentry_dispatcher);
 
 static void fexit_dispatcher(struct fprobe *fp, unsigned long entry_ip,
-			     unsigned long ret_ip, struct pt_regs *regs,
+			     unsigned long ret_ip, struct ftrace_regs *fregs,
 			     void *entry_data)
 {
 	struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
+	struct pt_regs *regs = ftrace_get_regs(fregs);
+
+	if (!regs)
+		return;
 
 	if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
 		fexit_trace_func(tf, entry_ip, ret_ip, regs);
diff --git a/lib/test_fprobe.c b/lib/test_fprobe.c
index ff607babba18..d1e80653bf0c 100644
--- a/lib/test_fprobe.c
+++ b/lib/test_fprobe.c
@@ -59,9 +59,9 @@ static notrace int fp_entry_handler(struct fprobe *fp, unsigned long ip,
 
 static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip,
 				    unsigned long ret_ip,
-				    struct pt_regs *regs, void *data)
+				    struct ftrace_regs *fregs, void *data)
 {
-	unsigned long ret = regs_return_value(regs);
+	unsigned long ret = ftrace_regs_return_value(fregs);
 
 	KUNIT_EXPECT_FALSE(current_test, preemptible());
 	if (ip != target_ip) {
@@ -89,7 +89,7 @@ static notrace int nest_entry_handler(struct fprobe *fp, unsigned long ip,
 
 static notrace void nest_exit_handler(struct fprobe *fp, unsigned long ip,
 				      unsigned long ret_ip,
-				      struct pt_regs *regs, void *data)
+				      struct ftrace_regs *fregs, void *data)
 {
 	KUNIT_EXPECT_FALSE(current_test, preemptible());
 	KUNIT_EXPECT_EQ(current_test, ip, target_nest_ip);
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index 1545a1aac616..d476d1f07538 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -67,7 +67,7 @@ static int sample_entry_handler(struct fprobe *fp, unsigned long ip,
 }
 
 static void sample_exit_handler(struct fprobe *fp, unsigned long ip,
-				unsigned long ret_ip, struct pt_regs *regs,
+				unsigned long ret_ip, struct ftrace_regs *regs,
 				void *data)
 {
 	unsigned long rip = ret_ip;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 27/32] tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (25 preceding siblings ...)
  2023-11-05 16:11 ` [RFC PATCH 26/32] fprobe: Use ftrace_regs in fprobe exit handler Masami Hiramatsu (Google)
@ 2023-11-05 16:11 ` Masami Hiramatsu (Google)
  2023-11-05 16:12 ` [RFC PATCH 28/32] tracing: Add ftrace_fill_perf_regs() for perf event Masami Hiramatsu (Google)
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:11 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Add ftrace_partial_regs() which converts the ftrace_regs to pt_regs.
If the architecture defines its own ftrace_regs, this copies partial
registers to pt_regs and returns it. If not, ftrace_regs is the same as
pt_regs and ftrace_partial_regs() will return ftrace_regs::regs.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Florent Revest <revest@chromium.org>
---
 Changes from previous series: NOTHING, just forward ported.
---
 arch/arm64/include/asm/ftrace.h |   11 +++++++++++
 include/linux/ftrace.h          |   17 +++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index ab158196480c..5ad24f315d52 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -137,6 +137,17 @@ ftrace_override_function_with_return(struct ftrace_regs *fregs)
 	fregs->pc = fregs->lr;
 }
 
+static __always_inline struct pt_regs *
+ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+	memcpy(regs->regs, fregs->regs, sizeof(u64) * 9);
+	regs->sp = fregs->sp;
+	regs->pc = fregs->pc;
+	regs->regs[29] = fregs->fp;
+	regs->regs[30] = fregs->lr;
+	return regs;
+}
+
 int ftrace_regs_query_register_offset(const char *name);
 
 int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 894117e909dd..3d0ff6e2c4bf 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -147,6 +147,23 @@ static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs *fregs
 	return arch_ftrace_get_regs(fregs);
 }
 
+#if !defined(CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS) || \
+	defined(CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST)
+
+static __always_inline struct pt_regs *
+ftrace_partial_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+	/*
+	 * If CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST=y, ftrace_regs memory
+	 * layout is the same as pt_regs. So always returns that address.
+	 * Since arch_ftrace_get_regs() will check some members and may return
+	 * NULL, we can not use it.
+	 */
+	return &fregs->regs;
+}
+
+#endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
+
 /*
  * When true, the ftrace_regs_{get,set}_*() functions may be used on fregs.
  * Note: this can be true even when ftrace_get_regs() cannot provide a pt_regs.


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 28/32] tracing: Add ftrace_fill_perf_regs() for perf event
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (26 preceding siblings ...)
  2023-11-05 16:11 ` [RFC PATCH 27/32] tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs Masami Hiramatsu (Google)
@ 2023-11-05 16:12 ` Masami Hiramatsu (Google)
  2023-11-05 16:12 ` [RFC PATCH 29/32] fprobe: Rewrite fprobe on function-graph tracer Masami Hiramatsu (Google)
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:12 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Add ftrace_fill_perf_regs() which should be compatible with the
perf_fetch_caller_regs(). In other words, the pt_regs returned from the
ftrace_fill_perf_regs() must satisfy 'user_mode(regs) == false' and can be
used for stack tracing.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
  Changes from previous series: NOTHING, just forward ported.
---
 arch/arm64/include/asm/ftrace.h   |    7 +++++++
 arch/powerpc/include/asm/ftrace.h |    7 +++++++
 arch/s390/include/asm/ftrace.h    |    5 +++++
 arch/x86/include/asm/ftrace.h     |    7 +++++++
 include/linux/ftrace.h            |   31 +++++++++++++++++++++++++++++++
 5 files changed, 57 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 5ad24f315d52..5fbe8fe3e8a5 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -148,6 +148,13 @@ ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs *regs)
 	return regs;
 }
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {		\
+		(_regs)->pc = (fregs)->pc;			\
+		(_regs)->regs[29] = (fregs)->fp;		\
+		(_regs)->sp = (fregs)->sp;			\
+		(_regs)->pstate = PSR_MODE_EL1h;		\
+	} while (0)
+
 int ftrace_regs_query_register_offset(const char *name);
 
 int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 9e5a39b6a311..53fa39ef476b 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -52,6 +52,13 @@ static __always_inline struct pt_regs *arch_ftrace_get_regs(struct ftrace_regs *
 	return fregs->regs.msr ? &fregs->regs : NULL;
 }
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {		\
+		(_regs)->result = 0;				\
+		(_regs)->nip = (fregs)->regs.nip;		\
+		(_regs)->gpr[1] = (fregs)->regs.gpr[1];		\
+		asm volatile("mfmsr %0" : "=r" ((_regs)->msr));	\
+	} while (0)
+
 static __always_inline void
 ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs,
 				    unsigned long ip)
diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
index 5a82b08f03cd..f36ee3510af8 100644
--- a/arch/s390/include/asm/ftrace.h
+++ b/arch/s390/include/asm/ftrace.h
@@ -97,6 +97,11 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs,
 #define ftrace_regs_query_register_offset(name) \
 	regs_query_register_offset(name)
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs)	 do {		\
+		(_regs)->psw.addr = (fregs)->regs.psw.addr;		\
+		(_regs)->gprs[15] = (fregs)->regs.gprs[15];		\
+	} while (0)
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 /*
  * When an ftrace registered caller is tracing a function that is
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 74b1d245c38b..66e7d2df7916 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -48,6 +48,13 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
 	return &fregs->regs;
 }
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {	\
+		(_regs)->ip = (fregs)->regs.ip;		\
+		(_regs)->sp = (fregs)->regs.sp;		\
+		(_regs)->cs = __KERNEL_CS;		\
+		(_regs)->flags = 0;			\
+	} while (0)
+
 #define ftrace_regs_set_instruction_pointer(fregs, _ip)	\
 	do { (fregs)->regs.ip = (_ip); } while (0)
 
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 3d0ff6e2c4bf..7db38341bc9e 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -164,6 +164,37 @@ ftrace_partial_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
 
 #endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
 
+#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
+
+/*
+ * Please define arch dependent pt_regs which compatible to the
+ * perf_arch_fetch_caller_regs() but based on ftrace_regs.
+ * This requires
+ *   - user_mode(_regs) returns false (always kernel mode).
+ *   - able to use the _regs for stack trace.
+ */
+#ifndef arch_ftrace_fill_perf_regs
+/* As same as perf_arch_fetch_caller_regs(), do nothing by default */
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {} while (0)
+#endif
+
+static __always_inline struct pt_regs *
+ftrace_fill_perf_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+	arch_ftrace_fill_perf_regs(fregs, regs);
+	return regs;
+}
+
+#else /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
+
+static __always_inline struct pt_regs *
+ftrace_fill_perf_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+	return &fregs->regs;
+}
+
+#endif
+
 /*
  * When true, the ftrace_regs_{get,set}_*() functions may be used on fregs.
  * Note: this can be true even when ftrace_get_regs() cannot provide a pt_regs.


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 29/32] fprobe: Rewrite fprobe on function-graph tracer
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (27 preceding siblings ...)
  2023-11-05 16:12 ` [RFC PATCH 28/32] tracing: Add ftrace_fill_perf_regs() for perf event Masami Hiramatsu (Google)
@ 2023-11-05 16:12 ` Masami Hiramatsu (Google)
  2023-11-05 16:12 ` [RFC PATCH 30/32] tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS Masami Hiramatsu (Google)
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:12 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Rewrite fprobe implementation on function-graph tracer.
Major API changes are:
 -  'nr_maxactive' field is deprecated.
 -  This depends on CONFIG_DYNAMIC_FTRACE_WITH_ARGS or
    !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and
    CONFIG_HAVE_FUNCTION_GRAPH_FREGS. So currently works only
    on x86_64.
 -  Currently the entry size is limited in 15 * sizeof(long).
 -  If there is too many fprobe exit handler set on the same
    function, it will fail to probe.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/fprobe.h |   54 +++-
 kernel/trace/Kconfig   |    8 -
 kernel/trace/fprobe.c  |  632 ++++++++++++++++++++++++++++++++++--------------
 lib/test_fprobe.c      |   45 ---
 4 files changed, 487 insertions(+), 252 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 879a30956009..08b37b0d1d05 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -5,32 +5,56 @@
 
 #include <linux/compiler.h>
 #include <linux/ftrace.h>
-#include <linux/rethook.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+#include <linux/slab.h>
+
+struct fprobe;
+
+/**
+ * strcut fprobe_hlist_node - address based hash list node for fprobe.
+ *
+ * @hlist: The hlist node for address search hash table.
+ * @addr: The address represented by this.
+ * @fp: The fprobe which owns this.
+ */
+struct fprobe_hlist_node {
+	struct hlist_node	hlist;
+	unsigned long		addr;
+	struct fprobe		*fp;
+};
+
+/**
+ * struct fprobe_hlist - hash list nodes for fprobe.
+ *
+ * @hlist: The hlist node for existence checking hash table.
+ * @rcu: rcu_head for RCU deferred release.
+ * @fp: The fprobe which owns this fprobe_hlist.
+ * @size: The size of @array.
+ * @array: The fprobe_hlist_node for each address to probe.
+ */
+struct fprobe_hlist {
+	struct hlist_node		hlist;
+	struct rcu_head			rcu;
+	struct fprobe			*fp;
+	int				size;
+	struct fprobe_hlist_node	array[];
+};
 
 /**
  * struct fprobe - ftrace based probe.
- * @ops: The ftrace_ops.
+ *
  * @nmissed: The counter for missing events.
  * @flags: The status flag.
- * @rethook: The rethook data structure. (internal data)
  * @entry_data_size: The private data storage size.
- * @nr_maxactive: The max number of active functions.
+ * @nr_maxactive: The max number of active functions. (*deprecated)
  * @entry_handler: The callback function for function entry.
  * @exit_handler: The callback function for function exit.
+ * @hlist_array: The fprobe_hlist for fprobe search from IP hash table.
  */
 struct fprobe {
-#ifdef CONFIG_FUNCTION_TRACER
-	/*
-	 * If CONFIG_FUNCTION_TRACER is not set, CONFIG_FPROBE is disabled too.
-	 * But user of fprobe may keep embedding the struct fprobe on their own
-	 * code. To avoid build error, this will keep the fprobe data structure
-	 * defined here, but remove ftrace_ops data structure.
-	 */
-	struct ftrace_ops	ops;
-#endif
 	unsigned long		nmissed;
 	unsigned int		flags;
-	struct rethook		*rethook;
 	size_t			entry_data_size;
 	int			nr_maxactive;
 
@@ -40,6 +64,8 @@ struct fprobe {
 	void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip,
 			     unsigned long ret_ip, struct ftrace_regs *fregs,
 			     void *entry_data);
+
+	struct fprobe_hlist	*hlist_array;
 };
 
 /* This fprobe is soft-disabled. */
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 1a2544712690..11a96275b68c 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -296,11 +296,9 @@ config DYNAMIC_FTRACE_WITH_ARGS
 
 config FPROBE
 	bool "Kernel Function Probe (fprobe)"
-	depends on FUNCTION_TRACER
-	depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
-	depends on HAVE_PT_REGS_TO_FTRACE_REGS_CAST || !HAVE_DYNAMIC_FTRACE_WITH_ARGS
-	depends on HAVE_RETHOOK
-	select RETHOOK
+	depends on FUNCTION_GRAPH_TRACER
+	depends on HAVE_FUNCTION_GRAPH_FREGS
+	depends on DYNAMIC_FTRACE_WITH_ARGS || !HAVE_DYNAMIC_FTRACE_WITH_ARGS
 	default n
 	help
 	  This option enables kernel function probe (fprobe) based on ftrace.
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 38fe6a19450b..7a0c7a32bd03 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -8,98 +8,194 @@
 #include <linux/fprobe.h>
 #include <linux/kallsyms.h>
 #include <linux/kprobes.h>
-#include <linux/rethook.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
 #include <linux/slab.h>
 #include <linux/sort.h>
 
 #include "trace.h"
 
-struct fprobe_rethook_node {
-	struct rethook_node node;
-	unsigned long entry_ip;
-	unsigned long entry_parent_ip;
-	char data[];
-};
+#define FPROBE_IP_HASH_BITS 8
+#define FPROBE_IP_TABLE_SIZE (1 << FPROBE_IP_HASH_BITS)
 
-static inline void __fprobe_handler(unsigned long ip, unsigned long parent_ip,
-			struct ftrace_ops *ops, struct ftrace_regs *fregs)
-{
-	struct fprobe_rethook_node *fpr;
-	struct rethook_node *rh = NULL;
-	struct fprobe *fp;
-	void *entry_data = NULL;
-	int ret = 0;
+#define FPROBE_HASH_BITS 6
+#define FPROBE_TABLE_SIZE (1 << FPROBE_HASH_BITS)
 
-	fp = container_of(ops, struct fprobe, ops);
+/*
+ * fprobe_table: hold 'fprobe_hlist::hlist' for checking the fprobe still
+ *   exists. The key is the address of fprobe instance.
+ * fprobe_ip_table: hold 'fprobe_hlist::array[*]' for searching the fprobe
+ *   instance related to the funciton address. The key is the ftrace IP
+ *   address.
+ *
+ * When unregistering the fprobe, fprobe_hlist::fp and fprobe_hlist::array[*].fp
+ * are set NULL and delete those from both hash tables (by hlist_del_rcu).
+ * After an RCU grace period, the fprobe_hlist itself will be released.
+ *
+ * fprobe_table and fprobe_ip_table can be accessed from either
+ *  - Normal hlist traversal and RCU add/del under 'fprobe_mutex' is held.
+ *  - RCU hlist traversal under disabling preempt
+ */
+static struct hlist_head fprobe_table[FPROBE_TABLE_SIZE];
+static struct hlist_head fprobe_ip_table[FPROBE_IP_TABLE_SIZE];
+static DEFINE_MUTEX(fprobe_mutex);
 
-	if (fp->exit_handler) {
-		rh = rethook_try_get(fp->rethook);
-		if (!rh) {
-			fp->nmissed++;
-			return;
-		}
-		fpr = container_of(rh, struct fprobe_rethook_node, node);
-		fpr->entry_ip = ip;
-		fpr->entry_parent_ip = parent_ip;
-		if (fp->entry_data_size)
-			entry_data = fpr->data;
+/*
+ * Find first fprobe in the hlist. It will be iterated twice in the entry
+ * probe, once for correcting the total required size, the second time is
+ * calling back the user handlers.
+ * Thus the hlist in the fprobe_table must be sorted and new probe needs to
+ * be added *before* the first fprobe.
+ */
+static struct fprobe_hlist_node *find_first_fprobe_node(unsigned long ip)
+{
+	struct fprobe_hlist_node *node;
+	struct hlist_head *head;
+
+	head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)];
+	hlist_for_each_entry_rcu(node, head, hlist,
+				 lockdep_is_held(&fprobe_mutex)) {
+		if (node->addr == ip)
+			return node;
 	}
+	return NULL;
+}
 
-	if (fp->entry_handler)
-		ret = fp->entry_handler(fp, ip, parent_ip, fregs, entry_data);
+/* Node insertion and deletion requires the fprobe_mutex */
+static void insert_fprobe_node(struct fprobe_hlist_node *node)
+{
+	unsigned long ip = node->addr;
+	struct fprobe_hlist_node *next;
+	struct hlist_head *head;
 
-	/* If entry_handler returns !0, nmissed is not counted. */
-	if (rh) {
-		if (ret)
-			rethook_recycle(rh);
-		else
-			rethook_hook(rh, ftrace_get_regs(fregs), true);
+	lockdep_assert_held(&fprobe_mutex);
+
+	next = find_first_fprobe_node(ip);
+	if (next) {
+		hlist_add_before_rcu(&node->hlist, &next->hlist);
+		return;
 	}
+	head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)];
+	hlist_add_head_rcu(&node->hlist, head);
 }
 
-static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
-		struct ftrace_ops *ops, struct ftrace_regs *fregs)
+/* Return true if there are synonims */
+static bool delete_fprobe_node(struct fprobe_hlist_node *node)
 {
-	struct fprobe *fp;
-	int bit;
+	lockdep_assert_held(&fprobe_mutex);
 
-	fp = container_of(ops, struct fprobe, ops);
-	if (fprobe_disabled(fp))
-		return;
+	node->fp = NULL;
+	hlist_del_rcu(&node->hlist);
+	return !!find_first_fprobe_node(node->addr);
+}
 
-	/* recursion detection has to go before any traceable function and
-	 * all functions before this point should be marked as notrace
-	 */
-	bit = ftrace_test_recursion_trylock(ip, parent_ip);
-	if (bit < 0) {
-		fp->nmissed++;
-		return;
+/* Check existence of the fprobe */
+static bool is_fprobe_still_exist(struct fprobe *fp)
+{
+	struct hlist_head *head;
+	struct fprobe_hlist *fph;
+
+	head = &fprobe_table[hash_ptr(fp, FPROBE_HASH_BITS)];
+	hlist_for_each_entry_rcu(fph, head, hlist,
+				 lockdep_is_held(&fprobe_mutex)) {
+		if (fph->fp == fp)
+			return true;
 	}
-	__fprobe_handler(ip, parent_ip, ops, fregs);
-	ftrace_test_recursion_unlock(bit);
+	return false;
+}
+
+static int add_fprobe_hash(struct fprobe *fp)
+{
+	struct fprobe_hlist *fph = fp->hlist_array;
+	struct hlist_head *head;
+
+	if (WARN_ON_ONCE(!fph))
+		return -EINVAL;
 
+	if (is_fprobe_still_exist(fp))
+		return -EEXIST;
+
+	head = &fprobe_table[hash_ptr(fp, FPROBE_HASH_BITS)];
+	hlist_add_head_rcu(&fp->hlist_array->hlist, head);
+	return 0;
 }
-NOKPROBE_SYMBOL(fprobe_handler);
 
-static void fprobe_kprobe_handler(unsigned long ip, unsigned long parent_ip,
-				  struct ftrace_ops *ops, struct ftrace_regs *fregs)
+static int del_fprobe_hash(struct fprobe *fp)
 {
-	struct fprobe *fp;
-	int bit;
+	struct fprobe_hlist *fph = fp->hlist_array;
 
-	fp = container_of(ops, struct fprobe, ops);
-	if (fprobe_disabled(fp))
-		return;
+	if (WARN_ON_ONCE(!fph))
+		return -EINVAL;
 
-	/* recursion detection has to go before any traceable function and
-	 * all functions called before this point should be marked as notrace
-	 */
-	bit = ftrace_test_recursion_trylock(ip, parent_ip);
-	if (bit < 0) {
-		fp->nmissed++;
-		return;
+	if (!is_fprobe_still_exist(fp))
+		return -ENOENT;
+
+	fph->fp = NULL;
+	hlist_del_rcu(&fph->hlist);
+	return 0;
+}
+
+/* The entry data size is 4 bits (=16) * sizeof(long) in maximum */
+#define FPROBE_HEADER_SIZE_BITS		4
+#define MAX_FPROBE_DATA_SIZE_WORD	((1L << FPROBE_HEADER_SIZE_BITS) - 1)
+#define MAX_FPROBE_DATA_SIZE		(MAX_FPROBE_DATA_SIZE_WORD * sizeof(long))
+#define FPROBE_HEADER_PTR_BITS		(BITS_PER_LONG - FPROBE_HEADER_SIZE_BITS)
+#define FPROBE_HEADER_PTR_MASK		GENMASK(FPROBE_HEADER_PTR_BITS - 1, 0)
+#define FPROBE_HEADER_SIZE		sizeof(unsigned long)
+
+static inline unsigned long encode_fprobe_header(struct fprobe *fp, int size_words)
+{
+	if (WARN_ON_ONCE(size_words > MAX_FPROBE_DATA_SIZE_WORD ||
+	    ((unsigned long)fp & ~FPROBE_HEADER_PTR_MASK) !=
+	    ~FPROBE_HEADER_PTR_MASK)) {
+		return 0;
 	}
+	return ((unsigned long)size_words << FPROBE_HEADER_PTR_BITS) |
+		((unsigned long)fp & FPROBE_HEADER_PTR_MASK);
+}
 
+/* Return reserved data size in words */
+static inline int decode_fprobe_header(unsigned long val, struct fprobe **fp)
+{
+	unsigned long ptr;
+
+	ptr = (val & FPROBE_HEADER_PTR_MASK) | ~FPROBE_HEADER_PTR_MASK;
+	if (fp)
+		*fp = (struct fprobe *)ptr;
+	return val >> FPROBE_HEADER_PTR_BITS;
+}
+
+/*
+ * fprobe shadow stack management:
+ * Since fprobe shares a single fgraph_ops, it needs to share the stack entry
+ * among the probes on the same function exit. Note that a new probe can be
+ * registered before a target function is returning, we can not use the hash
+ * table to find the corresponding probes. Thus the probe address is stored on
+ * the shadow stack with its entry data size.
+ *
+ */
+static inline int __fprobe_handler(unsigned long ip, unsigned long parent_ip,
+				   struct fprobe *fp, struct ftrace_regs *fregs,
+				   void *data)
+{
+	int ret = 0;
+
+	if (fp->entry_handler) {
+		if (fp->exit_handler && fp->entry_data_size)
+			data += sizeof(unsigned long);
+		else
+			data = NULL;
+		ret = fp->entry_handler(fp, ip, parent_ip, fregs, data);
+	}
+
+	return ret;
+}
+
+static inline int __fprobe_kprobe_handler(unsigned long ip, unsigned long parent_ip,
+					  struct fprobe *fp, struct ftrace_regs *fregs,
+					  void *data)
+{
+	int ret;
 	/*
 	 * This user handler is shared with other kprobes and is not expected to be
 	 * called recursively. So if any other kprobe handler is running, this will
@@ -108,45 +204,173 @@ static void fprobe_kprobe_handler(unsigned long ip, unsigned long parent_ip,
 	 */
 	if (unlikely(kprobe_running())) {
 		fp->nmissed++;
-		goto recursion_unlock;
+		return 0;
 	}
 
 	kprobe_busy_begin();
-	__fprobe_handler(ip, parent_ip, ops, fregs);
+	ret = __fprobe_handler(ip, parent_ip, fp, fregs, data);
 	kprobe_busy_end();
-
-recursion_unlock:
-	ftrace_test_recursion_unlock(bit);
+	return ret;
 }
 
-static void fprobe_exit_handler(struct rethook_node *rh, void *data,
-				unsigned long ret_ip, struct pt_regs *regs)
+static int fprobe_entry(unsigned long func, unsigned long ret_ip,
+			struct ftrace_regs *fregs, struct fgraph_ops *gops)
 {
-	struct fprobe *fp = (struct fprobe *)data;
-	struct fprobe_rethook_node *fpr;
-	struct ftrace_regs *fregs = (struct ftrace_regs *)regs;
-	int bit;
+	struct fprobe_hlist_node *node, *first;
+	unsigned long header;
+	void *fgraph_data = NULL;
+	struct fprobe *fp;
+	int size, used, ret;
 
-	if (!fp || fprobe_disabled(fp))
-		return;
+	if (WARN_ON_ONCE(!fregs))
+		return 0;
 
-	fpr = container_of(rh, struct fprobe_rethook_node, node);
+	first = node = find_first_fprobe_node(func);
+	if (unlikely(!first))
+		return 0;
+
+	size = 0;
+	hlist_for_each_entry_from_rcu(node, hlist) {
+		if (node->addr != func)
+			break;
+		fp = node->fp;
+		/*
+		 * Since fprobe can be enabled until the next loop, we ignore the
+		 * disabled flag in this loop.
+		 */
+		if (fp && fp->exit_handler)
+			size += FPROBE_HEADER_SIZE + fp->entry_data_size;
+	}
+	node = first;
+	/* size can be 0 because fp only has entry_handler. */
+	if (size) {
+		fgraph_data = fgraph_reserve_data(size);
+		if (unlikely(!fgraph_data)) {
+			hlist_for_each_entry_from_rcu(node, hlist) {
+				if (node->addr != func)
+					break;
+				fp = node->fp;
+				if (fp && !fprobe_disabled(fp))
+					fp->nmissed++;
+			}
+			return 0;
+		}
+	}
 
 	/*
-	 * we need to assure no calls to traceable functions in-between the
-	 * end of fprobe_handler and the beginning of fprobe_exit_handler.
+	 * TODO: recursion detection has been done in the fgraph. Thus we need
+	 * to add a callback to increment missed counter.
 	 */
-	bit = ftrace_test_recursion_trylock(fpr->entry_ip, fpr->entry_parent_ip);
-	if (bit < 0) {
-		fp->nmissed++;
+	used = 0;
+	hlist_for_each_entry_from_rcu(node, hlist) {
+		if (node->addr != func)
+			break;
+		fp = node->fp;
+		if (!fp || fprobe_disabled(fp))
+			continue;
+
+		if (fprobe_shared_with_kprobes(fp))
+			ret = __fprobe_kprobe_handler(func, ret_ip,
+					fp, fregs, fgraph_data + used);
+		else
+			ret = __fprobe_handler(func, ret_ip, fp,
+					fregs, fgraph_data + used);
+		/* If entry_handler returns !0, nmissed is not counted but skips exit_handler. */
+		if (!ret && fp->exit_handler) {
+			header = encode_fprobe_header(fp, fp->entry_data_size / sizeof(long));
+			if (likely(header)) {
+				*(unsigned long *)fgraph_data = header;
+				used += fp->entry_data_size + sizeof(header);
+			}
+			if (WARN_ON_ONCE(size - used < 0))
+				return 0;
+		}
+	}
+	if (used < size)
+		memset(fgraph_data + used, 0, size - used);
+
+	/* If any exit_handler is set, data must be used. */
+	return used != 0;
+}
+NOKPROBE_SYMBOL(fprobe_entry);
+
+static void fprobe_return(unsigned long func, unsigned long ret_ip,
+			  struct ftrace_regs *fregs, struct fgraph_ops *gops)
+{
+	unsigned long val;
+	struct fprobe *fp;
+	char *data;
+	int esize, size = 0;
+
+	data = fgraph_retrieve_data(&size);
+	if (!data)
+		return;
+
+	preempt_disable();
+	while (size >= sizeof(val)) {
+		val = *(unsigned long *)data;
+		if (!val)
+			break;
+		data += sizeof(val);
+		size -= sizeof(val);
+
+		esize = decode_fprobe_header(val, &fp) * sizeof(long);
+		if (WARN_ON_ONCE(esize > size))
+			break;
+
+		if (fp && is_fprobe_still_exist(fp) && !fprobe_disabled(fp)) {
+			fp->exit_handler(fp, func, ret_ip, fregs,
+					 esize ? data : NULL);
+		}
+		data += esize;
+		size -= esize;
+	}
+	preempt_enable();
+}
+NOKPROBE_SYMBOL(fprobe_return);
+
+static struct fgraph_ops fprobe_graph_ops = {
+	.entryregfunc	= fprobe_entry,
+	.retregfunc	= fprobe_return,
+};
+static int fprobe_graph_active;
+
+/* Add @addrs to the ftrace filter and register fgraph if needed. */
+static int fprobe_graph_add_ips(unsigned long *addrs, int num)
+{
+	int ret;
+
+	lockdep_assert_held(&fprobe_mutex);
+
+	ret = ftrace_set_filter_ips(&fprobe_graph_ops.ops, addrs, num, 0, 0);
+	if (ret)
+		return ret;
+
+	if (!fprobe_graph_active) {
+		ret = register_ftrace_graph(&fprobe_graph_ops);
+		if (WARN_ON_ONCE(ret)) {
+			ftrace_free_filter(&fprobe_graph_ops.ops);
+			return ret;
+		}
+	}
+	fprobe_graph_active++;
+	return 0;
+}
+
+/* Remove @addrs from the ftrace filter and unregister fgraph if possible. */
+static void fprobe_graph_remove_ips(unsigned long *addrs, int num)
+{
+	lockdep_assert_held(&fprobe_mutex);
+
+	fprobe_graph_active--;
+	if (!fprobe_graph_active) {
+		/* Q: should we unregister it ? */
+		unregister_ftrace_graph(&fprobe_graph_ops);
 		return;
 	}
 
-	fp->exit_handler(fp, fpr->entry_ip, ret_ip, fregs,
-			 fp->entry_data_size ? (void *)fpr->data : NULL);
-	ftrace_test_recursion_unlock(bit);
+	ftrace_set_filter_ips(&fprobe_graph_ops.ops, addrs, num, 1, 0);
 }
-NOKPROBE_SYMBOL(fprobe_exit_handler);
 
 static int symbols_cmp(const void *a, const void *b)
 {
@@ -176,62 +400,96 @@ static unsigned long *get_ftrace_locations(const char **syms, int num)
 	return ERR_PTR(-ENOENT);
 }
 
-static void fprobe_init(struct fprobe *fp)
+struct filter_match_data {
+	const char *filter;
+	const char *notfilter;
+	size_t index;
+	size_t size;
+	unsigned long *addrs;
+};
+
+static int filter_match_callback(void *data, const char *name, unsigned long addr)
 {
-	fp->nmissed = 0;
-	if (fprobe_shared_with_kprobes(fp))
-		fp->ops.func = fprobe_kprobe_handler;
-	else
-		fp->ops.func = fprobe_handler;
-	fp->ops.flags |= FTRACE_OPS_FL_SAVE_ARGS;
+	struct filter_match_data *match = data;
+
+	if (!glob_match(match->filter, name) ||
+	    (match->notfilter && glob_match(match->notfilter, name)))
+		return 0;
+
+	if (!ftrace_location(addr))
+		return 0;
+
+	if (match->addrs)
+		match->addrs[match->index] = addr;
+
+	match->index++;
+	return match->index == match->size;
 }
 
-static int fprobe_init_rethook(struct fprobe *fp, int num)
+/*
+ * Make IP list from the filter/no-filter glob patterns.
+ * Return the number of matched symbols, or -ENOENT.
+ */
+static int ip_list_from_filter(const char *filter, const char *notfilter,
+			       unsigned long *addrs, size_t size)
 {
-	int i, size;
+	struct filter_match_data match = { .filter = filter, .notfilter = notfilter,
+		.index = 0, .size = size, .addrs = addrs};
+	int ret;
 
-	if (num <= 0)
-		return -EINVAL;
+	ret = kallsyms_on_each_symbol(filter_match_callback, &match);
+	if (ret < 0)
+		return ret;
+	ret = module_kallsyms_on_each_symbol(NULL, filter_match_callback, &match);
+	if (ret < 0)
+		return ret;
 
-	if (!fp->exit_handler) {
-		fp->rethook = NULL;
-		return 0;
-	}
+	return match.index ?: -ENOENT;
+}
 
-	/* Initialize rethook if needed */
-	if (fp->nr_maxactive)
-		size = fp->nr_maxactive;
-	else
-		size = num * num_possible_cpus() * 2;
-	if (size <= 0)
+static void fprobe_fail_cleanup(struct fprobe *fp)
+{
+	kfree(fp->hlist_array);
+	fp->hlist_array = NULL;
+}
+
+/* Initialize the fprobe data structure. */
+static int fprobe_init(struct fprobe *fp, unsigned long *addrs, int num)
+{
+	struct fprobe_hlist *hlist_array;
+	unsigned long addr;
+	int size, i;
+
+	if (!fp || !addrs || num <= 0)
 		return -EINVAL;
 
-	fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler);
-	if (!fp->rethook)
+	size = ALIGN(fp->entry_data_size, sizeof(long));
+	if (size > MAX_FPROBE_DATA_SIZE)
+		return -E2BIG;
+	fp->entry_data_size = size;
+
+	hlist_array = kzalloc(struct_size(hlist_array, array, num), GFP_KERNEL);
+	if (!hlist_array)
 		return -ENOMEM;
-	for (i = 0; i < size; i++) {
-		struct fprobe_rethook_node *node;
-
-		node = kzalloc(sizeof(*node) + fp->entry_data_size, GFP_KERNEL);
-		if (!node) {
-			rethook_free(fp->rethook);
-			fp->rethook = NULL;
-			return -ENOMEM;
+
+	fp->nmissed = 0;
+
+	hlist_array->size = num;
+	fp->hlist_array = hlist_array;
+	hlist_array->fp = fp;
+	for (i = 0; i < num; i++) {
+		hlist_array->array[i].fp = fp;
+		addr = ftrace_location(addrs[i]);
+		if (!addr) {
+			fprobe_fail_cleanup(fp);
+			return -ENOENT;
 		}
-		rethook_add_node(fp->rethook, &node->node);
+		hlist_array->array[i].addr = addr;
 	}
 	return 0;
 }
 
-static void fprobe_fail_cleanup(struct fprobe *fp)
-{
-	if (fp->rethook) {
-		/* Don't need to cleanup rethook->handler because this is not used. */
-		rethook_free(fp->rethook);
-		fp->rethook = NULL;
-	}
-	ftrace_free_filter(&fp->ops);
-}
+#define FPROBE_IPS_MAX	INT_MAX
 
 /**
  * register_fprobe() - Register fprobe to ftrace by pattern.
@@ -246,46 +504,24 @@ static void fprobe_fail_cleanup(struct fprobe *fp)
  */
 int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter)
 {
-	struct ftrace_hash *hash;
-	unsigned char *str;
-	int ret, len;
+	unsigned long *addrs;
+	int ret;
 
 	if (!fp || !filter)
 		return -EINVAL;
 
-	fprobe_init(fp);
-
-	len = strlen(filter);
-	str = kstrdup(filter, GFP_KERNEL);
-	ret = ftrace_set_filter(&fp->ops, str, len, 0);
-	kfree(str);
-	if (ret)
+	ret = ip_list_from_filter(filter, notfilter, NULL, FPROBE_IPS_MAX);
+	if (ret < 0)
 		return ret;
 
-	if (notfilter) {
-		len = strlen(notfilter);
-		str = kstrdup(notfilter, GFP_KERNEL);
-		ret = ftrace_set_notrace(&fp->ops, str, len, 0);
-		kfree(str);
-		if (ret)
-			goto out;
-	}
-
-	/* TODO:
-	 * correctly calculate the total number of filtered symbols
-	 * from both filter and notfilter.
-	 */
-	hash = rcu_access_pointer(fp->ops.local_hash.filter_hash);
-	if (WARN_ON_ONCE(!hash))
-		goto out;
-
-	ret = fprobe_init_rethook(fp, (int)hash->count);
-	if (!ret)
-		ret = register_ftrace_function(&fp->ops);
+	addrs = kcalloc(ret, sizeof(unsigned long), GFP_KERNEL);
+	if (!addrs)
+		return -ENOMEM;
+	ret = ip_list_from_filter(filter, notfilter, addrs, ret);
+	if (ret > 0)
+		ret = register_fprobe_ips(fp, addrs, ret);
 
-out:
-	if (ret)
-		fprobe_fail_cleanup(fp);
+	kfree(addrs);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(register_fprobe);
@@ -293,7 +529,7 @@ EXPORT_SYMBOL_GPL(register_fprobe);
 /**
  * register_fprobe_ips() - Register fprobe to ftrace by address.
  * @fp: A fprobe data structure to be registered.
- * @addrs: An array of target ftrace location addresses.
+ * @addrs: An array of target function address.
  * @num: The number of entries of @addrs.
  *
  * Register @fp to ftrace for enabling the probe on the address given by @addrs.
@@ -305,23 +541,27 @@ EXPORT_SYMBOL_GPL(register_fprobe);
  */
 int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num)
 {
-	int ret;
-
-	if (!fp || !addrs || num <= 0)
-		return -EINVAL;
+	struct fprobe_hlist *hlist_array;
+	int ret, i;
 
-	fprobe_init(fp);
-
-	ret = ftrace_set_filter_ips(&fp->ops, addrs, num, 0, 0);
+	ret = fprobe_init(fp, addrs, num);
 	if (ret)
 		return ret;
 
-	ret = fprobe_init_rethook(fp, num);
-	if (!ret)
-		ret = register_ftrace_function(&fp->ops);
+	mutex_lock(&fprobe_mutex);
+
+	hlist_array = fp->hlist_array;
+	ret = fprobe_graph_add_ips(addrs, num);
+	if (!ret) {
+		add_fprobe_hash(fp);
+		for (i = 0; i < hlist_array->size; i++)
+			insert_fprobe_node(&hlist_array->array[i]);
+	}
+	mutex_unlock(&fprobe_mutex);
 
 	if (ret)
 		fprobe_fail_cleanup(fp);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(register_fprobe_ips);
@@ -359,14 +599,13 @@ EXPORT_SYMBOL_GPL(register_fprobe_syms);
 
 bool fprobe_is_registered(struct fprobe *fp)
 {
-	if (!fp || (fp->ops.saved_func != fprobe_handler &&
-		    fp->ops.saved_func != fprobe_kprobe_handler))
+	if (!fp || !fp->hlist_array)
 		return false;
 	return true;
 }
 
 /**
- * unregister_fprobe() - Unregister fprobe from ftrace
+ * unregister_fprobe() - Unregister fprobe.
  * @fp: A fprobe data structure to be unregistered.
  *
  * Unregister fprobe (and remove ftrace hooks from the function entries).
@@ -375,23 +614,40 @@ bool fprobe_is_registered(struct fprobe *fp)
  */
 int unregister_fprobe(struct fprobe *fp)
 {
-	int ret;
+	struct fprobe_hlist *hlist_array;
+	unsigned long *addrs = NULL;
+	int ret = 0, i, count;
 
-	if (!fprobe_is_registered(fp))
-		return -EINVAL;
+	mutex_lock(&fprobe_mutex);
+	if (!fp || !is_fprobe_still_exist(fp)) {
+		ret = -EINVAL;
+		goto out;
+	}
 
-	if (fp->rethook)
-		rethook_stop(fp->rethook);
+	hlist_array = fp->hlist_array;
+	addrs = kcalloc(hlist_array->size, sizeof(unsigned long), GFP_KERNEL);
+	if (!addrs) {
+		ret = -ENOMEM;	/* TODO: Fallback to one-by-one loop */
+		goto out;
+	}
 
-	ret = unregister_ftrace_function(&fp->ops);
-	if (ret < 0)
-		return ret;
+	/* Remove non-synonim ips from table and hash */
+	count = 0;
+	for (i = 0; i < hlist_array->size; i++) {
+		if (!delete_fprobe_node(&hlist_array->array[i]))
+			addrs[count++] = hlist_array->array[i].addr;
+	}
+	del_fprobe_hash(fp);
 
-	if (fp->rethook)
-		rethook_free(fp->rethook);
+	fprobe_graph_remove_ips(addrs, count);
 
-	ftrace_free_filter(&fp->ops);
+	kfree_rcu(hlist_array, rcu);
+	fp->hlist_array = NULL;
 
+out:
+	mutex_unlock(&fprobe_mutex);
+
+	kfree(addrs);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(unregister_fprobe);
diff --git a/lib/test_fprobe.c b/lib/test_fprobe.c
index d1e80653bf0c..ddffb212fba5 100644
--- a/lib/test_fprobe.c
+++ b/lib/test_fprobe.c
@@ -17,10 +17,8 @@ static u32 rand1, entry_val, exit_val;
 /* Use indirect calls to avoid inlining the target functions */
 static u32 (*target)(u32 value);
 static u32 (*target2)(u32 value);
-static u32 (*target_nest)(u32 value, u32 (*nest)(u32));
 static unsigned long target_ip;
 static unsigned long target2_ip;
-static unsigned long target_nest_ip;
 static int entry_return_value;
 
 static noinline u32 fprobe_selftest_target(u32 value)
@@ -33,11 +31,6 @@ static noinline u32 fprobe_selftest_target2(u32 value)
 	return (value / div_factor) + 1;
 }
 
-static noinline u32 fprobe_selftest_nest_target(u32 value, u32 (*nest)(u32))
-{
-	return nest(value + 2);
-}
-
 static notrace int fp_entry_handler(struct fprobe *fp, unsigned long ip,
 				    unsigned long ret_ip,
 				    struct ftrace_regs *fregs, void *data)
@@ -79,22 +72,6 @@ static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip,
 		KUNIT_EXPECT_NULL(current_test, data);
 }
 
-static notrace int nest_entry_handler(struct fprobe *fp, unsigned long ip,
-				      unsigned long ret_ip,
-				      struct ftrace_regs *fregs, void *data)
-{
-	KUNIT_EXPECT_FALSE(current_test, preemptible());
-	return 0;
-}
-
-static notrace void nest_exit_handler(struct fprobe *fp, unsigned long ip,
-				      unsigned long ret_ip,
-				      struct ftrace_regs *fregs, void *data)
-{
-	KUNIT_EXPECT_FALSE(current_test, preemptible());
-	KUNIT_EXPECT_EQ(current_test, ip, target_nest_ip);
-}
-
 /* Test entry only (no rethook) */
 static void test_fprobe_entry(struct kunit *test)
 {
@@ -191,25 +168,6 @@ static void test_fprobe_data(struct kunit *test)
 	KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp));
 }
 
-/* Test nr_maxactive */
-static void test_fprobe_nest(struct kunit *test)
-{
-	static const char *syms[] = {"fprobe_selftest_target", "fprobe_selftest_nest_target"};
-	struct fprobe fp = {
-		.entry_handler = nest_entry_handler,
-		.exit_handler = nest_exit_handler,
-		.nr_maxactive = 1,
-	};
-
-	current_test = test;
-	KUNIT_EXPECT_EQ(test, 0, register_fprobe_syms(&fp, syms, 2));
-
-	target_nest(rand1, target);
-	KUNIT_EXPECT_EQ(test, 1, fp.nmissed);
-
-	KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp));
-}
-
 static void test_fprobe_skip(struct kunit *test)
 {
 	struct fprobe fp = {
@@ -247,10 +205,8 @@ static int fprobe_test_init(struct kunit *test)
 	rand1 = get_random_u32_above(div_factor);
 	target = fprobe_selftest_target;
 	target2 = fprobe_selftest_target2;
-	target_nest = fprobe_selftest_nest_target;
 	target_ip = get_ftrace_location(target);
 	target2_ip = get_ftrace_location(target2);
-	target_nest_ip = get_ftrace_location(target_nest);
 
 	return 0;
 }
@@ -260,7 +216,6 @@ static struct kunit_case fprobe_testcases[] = {
 	KUNIT_CASE(test_fprobe),
 	KUNIT_CASE(test_fprobe_syms),
 	KUNIT_CASE(test_fprobe_data),
-	KUNIT_CASE(test_fprobe_nest),
 	KUNIT_CASE(test_fprobe_skip),
 	{}
 };


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 30/32] tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (28 preceding siblings ...)
  2023-11-05 16:12 ` [RFC PATCH 29/32] fprobe: Rewrite fprobe on function-graph tracer Masami Hiramatsu (Google)
@ 2023-11-05 16:12 ` Masami Hiramatsu (Google)
  2023-11-05 16:12 ` [RFC PATCH 31/32] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled Masami Hiramatsu (Google)
  2023-11-05 16:12 ` [RFC PATCH 32/32] Documentation: probes: Update fprobe on function-graph tracer Masami Hiramatsu (Google)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:12 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Allow fprobe events to be enabled with CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
With this change, fprobe events mostly use ftrace_regs instead of pt_regs.
Note that if the arch doesn't enable HAVE_PT_REGS_COMPAT_FTRACE_REGS,
fprobe events will not be able to be used from perf.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes from previous series: Update against the new series.
---
 include/linux/ftrace.h          |   15 ++++++++
 kernel/trace/Kconfig            |    1 -
 kernel/trace/trace_fprobe.c     |   74 ++++++++++++++++++++-------------------
 kernel/trace/trace_probe_tmpl.h |    2 +
 4 files changed, 53 insertions(+), 39 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 7db38341bc9e..34108f0610f7 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -224,6 +224,21 @@ static __always_inline bool ftrace_regs_has_args(struct ftrace_regs *fregs)
 	regs_query_register_offset(name)
 #endif
 
+#ifdef CONFIG_HAVE_REGS_AND_STACK_ACCESS_API
+static __always_inline unsigned long
+ftrace_regs_get_kernel_stack_nth(struct ftrace_regs *fregs, unsigned int nth)
+{
+	unsigned long *stackp;
+
+	stackp = (unsigned long *)ftrace_regs_get_stack_pointer(fregs);
+	if (((unsigned long)(stackp + nth) & ~(THREAD_SIZE - 1)) ==
+	    ((unsigned long)stackp & ~(THREAD_SIZE - 1)))
+		return *(stackp + nth);
+
+	return 0;
+}
+#endif /* CONFIG_HAVE_REGS_AND_STACK_ACCESS_API */
+
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
 			      struct ftrace_ops *op, struct ftrace_regs *fregs);
 
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 11a96275b68c..169588021d90 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -681,7 +681,6 @@ config FPROBE_EVENTS
 	select TRACING
 	select PROBE_EVENTS
 	select DYNAMIC_EVENTS
-	depends on DYNAMIC_FTRACE_WITH_REGS
 	default y
 	help
 	  This allows user to add tracing events on the function entry and
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index c60d0d9f1a95..882c220333cc 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -132,7 +132,7 @@ static int
 process_fetch_insn(struct fetch_insn *code, void *rec, void *dest,
 		   void *base)
 {
-	struct pt_regs *regs = rec;
+	struct ftrace_regs *fregs = rec;
 	unsigned long val;
 	int ret;
 
@@ -140,17 +140,17 @@ process_fetch_insn(struct fetch_insn *code, void *rec, void *dest,
 	/* 1st stage: get value from context */
 	switch (code->op) {
 	case FETCH_OP_STACK:
-		val = regs_get_kernel_stack_nth(regs, code->param);
+		val = ftrace_regs_get_kernel_stack_nth(fregs, code->param);
 		break;
 	case FETCH_OP_STACKP:
-		val = kernel_stack_pointer(regs);
+		val = ftrace_regs_get_stack_pointer(fregs);
 		break;
 	case FETCH_OP_RETVAL:
-		val = regs_return_value(regs);
+		val = ftrace_regs_return_value(fregs);
 		break;
 #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
 	case FETCH_OP_ARG:
-		val = regs_get_kernel_argument(regs, code->param);
+		val = ftrace_regs_get_argument(fregs, code->param);
 		break;
 #endif
 	case FETCH_NOP_SYMBOL:	/* Ignore a place holder */
@@ -170,7 +170,7 @@ NOKPROBE_SYMBOL(process_fetch_insn)
 /* function entry handler */
 static nokprobe_inline void
 __fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
-		    struct pt_regs *regs,
+		    struct ftrace_regs *fregs,
 		    struct trace_event_file *trace_file)
 {
 	struct fentry_trace_entry_head *entry;
@@ -184,36 +184,36 @@ __fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
 	if (trace_trigger_soft_disabled(trace_file))
 		return;
 
-	dsize = __get_data_size(&tf->tp, regs);
+	dsize = __get_data_size(&tf->tp, fregs);
 
 	entry = trace_event_buffer_reserve(&fbuffer, trace_file,
 					   sizeof(*entry) + tf->tp.size + dsize);
 	if (!entry)
 		return;
 
-	fbuffer.regs = regs;
+	fbuffer.regs = ftrace_get_regs(fregs);
 	entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
 	entry->ip = entry_ip;
-	store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
+	store_trace_args(&entry[1], &tf->tp, fregs, sizeof(*entry), dsize);
 
 	trace_event_buffer_commit(&fbuffer);
 }
 
 static void
 fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
-		  struct pt_regs *regs)
+		  struct ftrace_regs *fregs)
 {
 	struct event_file_link *link;
 
 	trace_probe_for_each_link_rcu(link, &tf->tp)
-		__fentry_trace_func(tf, entry_ip, regs, link->file);
+		__fentry_trace_func(tf, entry_ip, fregs, link->file);
 }
 NOKPROBE_SYMBOL(fentry_trace_func);
 
 /* Kretprobe handler */
 static nokprobe_inline void
 __fexit_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
-		   unsigned long ret_ip, struct pt_regs *regs,
+		   unsigned long ret_ip, struct ftrace_regs *fregs,
 		   struct trace_event_file *trace_file)
 {
 	struct fexit_trace_entry_head *entry;
@@ -227,60 +227,63 @@ __fexit_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
 	if (trace_trigger_soft_disabled(trace_file))
 		return;
 
-	dsize = __get_data_size(&tf->tp, regs);
+	dsize = __get_data_size(&tf->tp, fregs);
 
 	entry = trace_event_buffer_reserve(&fbuffer, trace_file,
 					   sizeof(*entry) + tf->tp.size + dsize);
 	if (!entry)
 		return;
 
-	fbuffer.regs = regs;
+	fbuffer.regs = ftrace_get_regs(fregs);
 	entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
 	entry->func = entry_ip;
 	entry->ret_ip = ret_ip;
-	store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
+	store_trace_args(&entry[1], &tf->tp, fregs, sizeof(*entry), dsize);
 
 	trace_event_buffer_commit(&fbuffer);
 }
 
 static void
 fexit_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
-		 unsigned long ret_ip, struct pt_regs *regs)
+		 unsigned long ret_ip, struct ftrace_regs *fregs)
 {
 	struct event_file_link *link;
 
 	trace_probe_for_each_link_rcu(link, &tf->tp)
-		__fexit_trace_func(tf, entry_ip, ret_ip, regs, link->file);
+		__fexit_trace_func(tf, entry_ip, ret_ip, fregs, link->file);
 }
 NOKPROBE_SYMBOL(fexit_trace_func);
 
 #ifdef CONFIG_PERF_EVENTS
 
 static int fentry_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
-			    struct pt_regs *regs)
+			    struct ftrace_regs *fregs)
 {
 	struct trace_event_call *call = trace_probe_event_call(&tf->tp);
 	struct fentry_trace_entry_head *entry;
 	struct hlist_head *head;
 	int size, __size, dsize;
+	struct pt_regs *regs;
 	int rctx;
 
 	head = this_cpu_ptr(call->perf_events);
 	if (hlist_empty(head))
 		return 0;
 
-	dsize = __get_data_size(&tf->tp, regs);
+	dsize = __get_data_size(&tf->tp, fregs);
 	__size = sizeof(*entry) + tf->tp.size + dsize;
 	size = ALIGN(__size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	entry = perf_trace_buf_alloc(size, NULL, &rctx);
+	entry = perf_trace_buf_alloc(size, &regs, &rctx);
 	if (!entry)
 		return 0;
 
+	regs = ftrace_fill_perf_regs(fregs, regs);
+
 	entry->ip = entry_ip;
 	memset(&entry[1], 0, dsize);
-	store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
+	store_trace_args(&entry[1], &tf->tp, fregs, sizeof(*entry), dsize);
 	perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
 			      head, NULL);
 	return 0;
@@ -289,30 +292,33 @@ NOKPROBE_SYMBOL(fentry_perf_func);
 
 static void
 fexit_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
-		unsigned long ret_ip, struct pt_regs *regs)
+		unsigned long ret_ip, struct ftrace_regs *fregs)
 {
 	struct trace_event_call *call = trace_probe_event_call(&tf->tp);
 	struct fexit_trace_entry_head *entry;
 	struct hlist_head *head;
 	int size, __size, dsize;
+	struct pt_regs *regs;
 	int rctx;
 
 	head = this_cpu_ptr(call->perf_events);
 	if (hlist_empty(head))
 		return;
 
-	dsize = __get_data_size(&tf->tp, regs);
+	dsize = __get_data_size(&tf->tp, fregs);
 	__size = sizeof(*entry) + tf->tp.size + dsize;
 	size = ALIGN(__size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	entry = perf_trace_buf_alloc(size, NULL, &rctx);
+	entry = perf_trace_buf_alloc(size, &regs, &rctx);
 	if (!entry)
 		return;
 
+	regs = ftrace_fill_perf_regs(fregs, regs);
+
 	entry->func = entry_ip;
 	entry->ret_ip = ret_ip;
-	store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
+	store_trace_args(&entry[1], &tf->tp, fregs, sizeof(*entry), dsize);
 	perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
 			      head, NULL);
 }
@@ -324,17 +330,14 @@ static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
 			     void *entry_data)
 {
 	struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
-	struct pt_regs *regs = ftrace_get_regs(fregs);
 	int ret = 0;
 
-	if (!regs)
-		return 0;
-
 	if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
-		fentry_trace_func(tf, entry_ip, regs);
+		fentry_trace_func(tf, entry_ip, fregs);
+
 #ifdef CONFIG_PERF_EVENTS
 	if (trace_probe_test_flag(&tf->tp, TP_FLAG_PROFILE))
-		ret = fentry_perf_func(tf, entry_ip, regs);
+		ret = fentry_perf_func(tf, entry_ip, fregs);
 #endif
 	return ret;
 }
@@ -345,16 +348,13 @@ static void fexit_dispatcher(struct fprobe *fp, unsigned long entry_ip,
 			     void *entry_data)
 {
 	struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
-	struct pt_regs *regs = ftrace_get_regs(fregs);
-
-	if (!regs)
-		return;
 
 	if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
-		fexit_trace_func(tf, entry_ip, ret_ip, regs);
+		fexit_trace_func(tf, entry_ip, ret_ip, fregs);
+
 #ifdef CONFIG_PERF_EVENTS
 	if (trace_probe_test_flag(&tf->tp, TP_FLAG_PROFILE))
-		fexit_perf_func(tf, entry_ip, ret_ip, regs);
+		fexit_perf_func(tf, entry_ip, ret_ip, fregs);
 #endif
 }
 NOKPROBE_SYMBOL(fexit_dispatcher);
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 3935b347f874..05445a745a07 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -232,7 +232,7 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 
 /* Sum up total data length for dynamic arrays (strings) */
 static nokprobe_inline int
-__get_data_size(struct trace_probe *tp, struct pt_regs *regs)
+__get_data_size(struct trace_probe *tp, void *regs)
 {
 	struct probe_arg *arg;
 	int i, len, ret = 0;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 31/32] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (29 preceding siblings ...)
  2023-11-05 16:12 ` [RFC PATCH 30/32] tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS Masami Hiramatsu (Google)
@ 2023-11-05 16:12 ` Masami Hiramatsu (Google)
  2023-11-05 16:12 ` [RFC PATCH 32/32] Documentation: probes: Update fprobe on function-graph tracer Masami Hiramatsu (Google)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:12 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Enable kprobe_multi feature if CONFIG_FPROBE is enabled. The pt_regs is
converted from ftrace_regs by ftrace_partial_regs(), thus some registers
may always returns 0. But it should be enough for function entry (access
arguments) and exit (access return value).

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Florent Revest <revest@chromium.org>
---
 Changes from previous series: NOTHING, Update against the new series.
---
 kernel/trace/bpf_trace.c |   22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index a4b5e34b0419..96d6e9993f75 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2501,7 +2501,7 @@ static int __init bpf_event_init(void)
 fs_initcall(bpf_event_init);
 #endif /* CONFIG_MODULES */
 
-#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
+#ifdef CONFIG_FPROBE
 struct bpf_kprobe_multi_link {
 	struct bpf_link link;
 	struct fprobe fp;
@@ -2524,6 +2524,8 @@ struct user_syms {
 	char *buf;
 };
 
+static DEFINE_PER_CPU(struct pt_regs, bpf_kprobe_multi_pt_regs);
+
 static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, u32 cnt)
 {
 	unsigned long __user usymbol;
@@ -2700,13 +2702,14 @@ static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
 
 static int
 kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
-			   unsigned long entry_ip, struct pt_regs *regs)
+			   unsigned long entry_ip, struct ftrace_regs *fregs)
 {
 	struct bpf_kprobe_multi_run_ctx run_ctx = {
 		.link = link,
 		.entry_ip = entry_ip,
 	};
 	struct bpf_run_ctx *old_run_ctx;
+	struct pt_regs *regs;
 	int err;
 
 	if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
@@ -2716,6 +2719,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
 
 	migrate_disable();
 	rcu_read_lock();
+	regs = ftrace_partial_regs(fregs, this_cpu_ptr(&bpf_kprobe_multi_pt_regs));
 	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 	err = bpf_prog_run(link->link.prog, regs);
 	bpf_reset_run_ctx(old_run_ctx);
@@ -2733,13 +2737,9 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,
 			  void *data)
 {
 	struct bpf_kprobe_multi_link *link;
-	struct pt_regs *regs = ftrace_get_regs(fregs);
-
-	if (!regs)
-		return 0;
 
 	link = container_of(fp, struct bpf_kprobe_multi_link, fp);
-	kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
+	kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
 	return 0;
 }
 
@@ -2749,13 +2749,9 @@ kprobe_multi_link_exit_handler(struct fprobe *fp, unsigned long fentry_ip,
 			       void *data)
 {
 	struct bpf_kprobe_multi_link *link;
-	struct pt_regs *regs = ftrace_get_regs(fregs);
-
-	if (!regs)
-		return;
 
 	link = container_of(fp, struct bpf_kprobe_multi_link, fp);
-	kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
+	kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
 }
 
 static int symbols_cmp_r(const void *a, const void *b, const void *priv)
@@ -3012,7 +3008,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
 	kvfree(cookies);
 	return err;
 }
-#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+#else /* !CONFIG_FPROBE */
 int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
 	return -EOPNOTSUPP;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 32/32] Documentation: probes: Update fprobe on function-graph tracer
  2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
                   ` (30 preceding siblings ...)
  2023-11-05 16:12 ` [RFC PATCH 31/32] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled Masami Hiramatsu (Google)
@ 2023-11-05 16:12 ` Masami Hiramatsu (Google)
  31 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-11-05 16:12 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt, Florent Revest
  Cc: linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Update fprobe documentation for the new fprobe on function-graph
tracer. This includes some bahvior changes and pt_regs to
ftrace_regs interface change.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Documentation/trace/fprobe.rst |   42 ++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/Documentation/trace/fprobe.rst b/Documentation/trace/fprobe.rst
index 196f52386aaa..fb0446f68bcb 100644
--- a/Documentation/trace/fprobe.rst
+++ b/Documentation/trace/fprobe.rst
@@ -9,9 +9,10 @@ Fprobe - Function entry/exit probe
 Introduction
 ============
 
-Fprobe is a function entry/exit probe mechanism based on ftrace.
-Instead of using ftrace full feature, if you only want to attach callbacks
-on function entry and exit, similar to the kprobes and kretprobes, you can
+Fprobe is a function entry/exit probe mechanism based on the function-graph
+tracer.
+Instead of tracing all functions, if you want to attach callbacks on specific
+function entry and exit, similar to the kprobes and kretprobes, you can
 use fprobe. Compared with kprobes and kretprobes, fprobe gives faster
 instrumentation for multiple functions with single handler. This document
 describes how to use fprobe.
@@ -91,12 +92,14 @@ The prototype of the entry/exit callback function are as follows:
 
 .. code-block:: c
 
- int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct pt_regs *regs, void *entry_data);
+ int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data);
 
- void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct pt_regs *regs, void *entry_data);
+ void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data);
 
-Note that the @entry_ip is saved at function entry and passed to exit handler.
-If the entry callback function returns !0, the corresponding exit callback will be cancelled.
+Note that the @entry_ip is saved at function entry and passed to exit
+handler.
+If the entry callback function returns !0, the corresponding exit callback
+will be cancelled.
 
 @fp
         This is the address of `fprobe` data structure related to this handler.
@@ -112,12 +115,10 @@ If the entry callback function returns !0, the corresponding exit callback will
         This is the return address that the traced function will return to,
         somewhere in the caller. This can be used at both entry and exit.
 
-@regs
-        This is the `pt_regs` data structure at the entry and exit. Note that
-        the instruction pointer of @regs may be different from the @entry_ip
-        in the entry_handler. If you need traced instruction pointer, you need
-        to use @entry_ip. On the other hand, in the exit_handler, the instruction
-        pointer of @regs is set to the current return address.
+@fregs
+        This is the `ftrace_regs` data structure at the entry and exit. Note that
+        the instruction pointer of @fregs may be incorrect in entry handler and
+        exit handler, so you have to use @entry_ip and @ret_ip instead.
 
 @entry_data
         This is a local storage to share the data between entry and exit handlers.
@@ -125,6 +126,17 @@ If the entry callback function returns !0, the corresponding exit callback will
         and `entry_data_size` field when registering the fprobe, the storage is
         allocated and passed to both `entry_handler` and `exit_handler`.
 
+Entry data size and exit handlers on the same function
+======================================================
+
+Since the entry data is passed via per-task stack and it is has limited size,
+the entry data size per probe is limited to `15 * sizeof(long)`. You also need
+to take care that the different fprobes are probing on the same function, this
+limit becomes smaller. The entry data size is aligned to `sizeof(long)` and
+each fprobe which has exit handler uses a `sizeof(long)` space on the stack,
+you should keep the number of fprobes on the same function as small as
+possible.
+
 Share the callbacks with kprobes
 ================================
 
@@ -165,8 +177,8 @@ This counter counts up when;
  - fprobe fails to take ftrace_recursion lock. This usually means that a function
    which is traced by other ftrace users is called from the entry_handler.
 
- - fprobe fails to setup the function exit because of the shortage of rethook
-   (the shadow stack for hooking the function return.)
+ - fprobe fails to setup the function exit because of failing to allocate the
+   data buffer from the per-task shadow stack.
 
 The `fprobe::nmissed` field counts up in both cases. Therefore, the former
 skips both of entry and exit callback and the latter skips the exit


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-05 16:11 ` [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS Masami Hiramatsu (Google)
@ 2023-11-05 17:25   ` Peter Zijlstra
  2023-11-05 19:11     ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2023-11-05 17:25 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Alexei Starovoitov, Steven Rostedt, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Thomas Gleixner,
	Guo Ren

On Mon, Nov 06, 2023 at 01:11:21AM +0900, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Support HAVE_FUNCTION_GRAPH_FREGS on x86-64, which saves ftrace_regs
> on the stack in ftrace_graph return trampoline so that the callbacks
> can access registers via ftrace_regs APIs.

What is ftrace_regs ? If I look at arch/x86/include/asm/ftrace.h it's a
pointless wrapper around pt_regs.

Can we please remove the pointless wrappery and call it what it is?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-05 17:25   ` Peter Zijlstra
@ 2023-11-05 19:11     ` Steven Rostedt
  2023-11-05 21:28       ` Steven Rostedt
  2023-11-05 23:17       ` Peter Zijlstra
  0 siblings, 2 replies; 54+ messages in thread
From: Steven Rostedt @ 2023-11-05 19:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu (Google),
	Alexei Starovoitov, Florent Revest, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Thomas Gleixner, Guo Ren

On Sun, 5 Nov 2023 18:25:36 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Nov 06, 2023 at 01:11:21AM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > 
> > Support HAVE_FUNCTION_GRAPH_FREGS on x86-64, which saves ftrace_regs
> > on the stack in ftrace_graph return trampoline so that the callbacks
> > can access registers via ftrace_regs APIs.  
> 
> What is ftrace_regs ? If I look at arch/x86/include/asm/ftrace.h it's a
> pointless wrapper around pt_regs.
> 
> Can we please remove the pointless wrappery and call it what it is?

A while back ago when I introduced FTRACE_WITH_ARGS, it would have all
ftrace callbacks get a pt_regs, but it would be partially filled for
those that did not specify the "REGS" flag when registering the
callback. You and Thomas complained that it would be a bug to return
pt_regs that was not full because something might read the non filled
registers and think they were valid.

To solve this, I came up with ftrace_regs to only hold the registers
that were required for function parameters (including the stack
pointer). You could then call arch_ftrace_get_regs(ftrace_regs) and if
this "wrapper" had all valid pt_regs registers, then it would return
the pt_regs, otherwise it would return NULL, and you would need to use
the ftrace_regs accessor calls to get the function registers. You and
Thomas agreed with this.

You even Acked the patch:

commit 02a474ca266a47ea8f4d5a11f4ffa120f83730ad
Author: Steven Rostedt (VMware) <rostedt@goodmis.org>
Date:   Tue Oct 27 10:55:55 2020 -0400

    ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
    
    Currently, the only way to get access to the registers of a function via a
    ftrace callback is to set the "FL_SAVE_REGS" bit in the ftrace_ops. But as this
    saves all regs as if a breakpoint were to trigger (for use with kprobes), it
    is expensive.
    
    The regs are already saved on the stack for the default ftrace callbacks, as
    that is required otherwise a function being traced will get the wrong
    arguments and possibly crash. And on x86, the arguments are already stored
    where they would be on a pt_regs structure to use that code for both the
    regs version of a callback, it makes sense to pass that information always
    to all functions.
    
    If an architecture does this (as x86_64 now does), it is to set
    HAVE_DYNAMIC_FTRACE_WITH_ARGS, and this will let the generic code that it
    could have access to arguments without having to set the flags.
    
    This also includes having the stack pointer being saved, which could be used
    for accessing arguments on the stack, as well as having the function graph
    tracer not require its own trampoline!
    
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-05 19:11     ` Steven Rostedt
@ 2023-11-05 21:28       ` Steven Rostedt
  2023-11-05 23:17       ` Peter Zijlstra
  1 sibling, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2023-11-05 21:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu (Google),
	Alexei Starovoitov, Florent Revest, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Thomas Gleixner, Guo Ren

On Sun, 5 Nov 2023 14:11:30 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> You even Acked the patch:

More specifically, you acked the series and stressed the ftrace_regs
wrapper part when doing so:

  https://lore.kernel.org/all/20201113080733.GZ2628@hirez.programming.kicks-ass.net/

> On Thu, Nov 12, 2020 at 09:01:42PM -0500, Steven Rostedt wrote:
> > Steven Rostedt (VMware) (3):
> >       ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
> >       ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
> >       livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-05 19:11     ` Steven Rostedt
  2023-11-05 21:28       ` Steven Rostedt
@ 2023-11-05 23:17       ` Peter Zijlstra
  2023-11-05 23:33         ` Steven Rostedt
  1 sibling, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2023-11-05 23:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu (Google),
	Alexei Starovoitov, Florent Revest, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Thomas Gleixner, Guo Ren

On Sun, Nov 05, 2023 at 02:11:30PM -0500, Steven Rostedt wrote:
> On Sun, 5 Nov 2023 18:25:36 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, Nov 06, 2023 at 01:11:21AM +0900, Masami Hiramatsu (Google) wrote:
> > > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > 
> > > Support HAVE_FUNCTION_GRAPH_FREGS on x86-64, which saves ftrace_regs
> > > on the stack in ftrace_graph return trampoline so that the callbacks
> > > can access registers via ftrace_regs APIs.  
> > 
> > What is ftrace_regs ? If I look at arch/x86/include/asm/ftrace.h it's a
> > pointless wrapper around pt_regs.
> > 
> > Can we please remove the pointless wrappery and call it what it is?
> 
> A while back ago when I introduced FTRACE_WITH_ARGS, it would have all
> ftrace callbacks get a pt_regs, but it would be partially filled for
> those that did not specify the "REGS" flag when registering the
> callback. You and Thomas complained that it would be a bug to return
> pt_regs that was not full because something might read the non filled
> registers and think they were valid.
> 
> To solve this, I came up with ftrace_regs to only hold the registers
> that were required for function parameters (including the stack
> pointer). You could then call arch_ftrace_get_regs(ftrace_regs) and if
> this "wrapper" had all valid pt_regs registers, then it would return
> the pt_regs, otherwise it would return NULL, and you would need to use
> the ftrace_regs accessor calls to get the function registers. You and
> Thomas agreed with this.

Changelog nor code made it clear this was partial anything. So this is
still the partial thing?

Can we then pretty clear clarify all that, and make it clear which regs
are in there? Because when I do 'vim -t ftrace_regs' it just gets me a
seemingly pointless wrapper struct, no elucidating comments nothingses.

> You even Acked the patch:
>
> commit 02a474ca266a47ea8f4d5a11f4ffa120f83730ad
> Author: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Date:   Tue Oct 27 10:55:55 2020 -0400

You expect me to remember things from 3 years ago?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-05 23:17       ` Peter Zijlstra
@ 2023-11-05 23:33         ` Steven Rostedt
  2023-11-05 23:34           ` Steven Rostedt
  2023-11-06  1:05           ` Masami Hiramatsu
  0 siblings, 2 replies; 54+ messages in thread
From: Steven Rostedt @ 2023-11-05 23:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu (Google),
	Alexei Starovoitov, Florent Revest, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Thomas Gleixner, Guo Ren

On Mon, 6 Nov 2023 00:17:34 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Changelog nor code made it clear this was partial anything. So this is
> still the partial thing?
> 
> Can we then pretty clear clarify all that, and make it clear which regs
> are in there? Because when I do 'vim -t ftrace_regs' it just gets me a
> seemingly pointless wrapper struct, no elucidating comments nothingses.

I agree it should be better documented (like everything else). The
ftrace_regs must have all the registers needed to produce a function's
arguments. For x86_64, that would be:

  rdi, rsi, rdx, r8, r9, rsp

Basically anything that is needed to call mcount/fentry.

But yes, it's still partial registers but for archs that support
FTRACE_WITH_REGS, it can also hold all pt_regs which can be retrieved
by the arch_ftrace_get_regs(), which is why there's a pt_regs struct in
the x86 version. But that's not the case for arm64, as
arch_ftrace_get_regs() will always return NULL.

> 
> > You even Acked the patch:
> >
> > commit 02a474ca266a47ea8f4d5a11f4ffa120f83730ad
> > Author: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > Date:   Tue Oct 27 10:55:55 2020 -0400  
> 
> You expect me to remember things from 3 years ago?

Heh, of course not. I just thought it amusing that I created
ftrace_regs because of you and then 3 years later you ask to get rid of
it. But the real issue is that it's not documented clearly why it
exists, and that should be rectified.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-05 23:33         ` Steven Rostedt
@ 2023-11-05 23:34           ` Steven Rostedt
  2023-11-06  0:38             ` Masami Hiramatsu
  2023-11-06  1:05           ` Masami Hiramatsu
  1 sibling, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2023-11-05 23:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu (Google),
	Alexei Starovoitov, Florent Revest, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Thomas Gleixner, Guo Ren

On Sun, 5 Nov 2023 18:33:01 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> For x86_64, that would be:
> 
>   rdi, rsi, rdx, r8, r9, rsp

I missed rcx.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-05 23:34           ` Steven Rostedt
@ 2023-11-06  0:38             ` Masami Hiramatsu
  2023-11-06 10:19               ` Peter Zijlstra
  0 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2023-11-06  0:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Masami Hiramatsu (Google),
	Alexei Starovoitov, Florent Revest, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Thomas Gleixner, Guo Ren

On Sun, 5 Nov 2023 18:34:09 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Sun, 5 Nov 2023 18:33:01 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > For x86_64, that would be:
> > 
> >   rdi, rsi, rdx, r8, r9, rsp
> 
> I missed rcx.

I would like to add rax to the list so that it can handle the return value too. :)

Thanks,

> 
> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-05 23:33         ` Steven Rostedt
  2023-11-05 23:34           ` Steven Rostedt
@ 2023-11-06  1:05           ` Masami Hiramatsu
  2023-11-06 16:37             ` Steven Rostedt
  1 sibling, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2023-11-06  1:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Masami Hiramatsu (Google),
	Alexei Starovoitov, Florent Revest, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Thomas Gleixner, Guo Ren

On Sun, 5 Nov 2023 18:33:01 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 6 Nov 2023 00:17:34 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Changelog nor code made it clear this was partial anything. So this is
> > still the partial thing?
> > 
> > Can we then pretty clear clarify all that, and make it clear which regs
> > are in there? Because when I do 'vim -t ftrace_regs' it just gets me a
> > seemingly pointless wrapper struct, no elucidating comments nothingses.
> 
> I agree it should be better documented (like everything else). The
> ftrace_regs must have all the registers needed to produce a function's
> arguments. For x86_64, that would be:
> 
>   rdi, rsi, rdx, r8, r9, rsp
> 
> Basically anything that is needed to call mcount/fentry.

Oops, I found I missed to save rsp. let me update it.

Anyway, this will be defined clearly. ftrace_regs needs to be a partial
set of registers related to the (kernel) function call.

 - registers which is used for passing the function parameters in
   integer registers and stack pointer (for parameters on memory).

 - registers which is used for passing the return values.

 - call-frame-pointer register if exists.

So for x86-64,

 - rdi, rsi, rcx, rdx, r8, r9, and rsp
 - rax and rdx
 - rbp

(BTW, why orig_rax is cleared?)

> But yes, it's still partial registers but for archs that support
> FTRACE_WITH_REGS, it can also hold all pt_regs which can be retrieved
> by the arch_ftrace_get_regs(), which is why there's a pt_regs struct in
> the x86 version. But that's not the case for arm64, as
> arch_ftrace_get_regs() will always return NULL.

The major reason of the DYNAMIC_FTRACE_WITH_REGS is livepatch and
kprobe on ftrace (if kprobe puts probe on the ftrace address, it uses
ftrace instead of breakpoint).

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 18/32] function_graph: Fix to initalize ftrace_ops for fgraph with ftrace_graph_func
  2023-11-05 16:10 ` [RFC PATCH 18/32] function_graph: Fix to initalize ftrace_ops for fgraph with ftrace_graph_func Masami Hiramatsu (Google)
@ 2023-11-06 10:04   ` Masami Hiramatsu
  2023-11-07  1:49     ` Masami Hiramatsu
  0 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2023-11-06 10:04 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Alexei Starovoitov, Steven Rostedt, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

On Mon,  6 Nov 2023 01:10:10 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Fix to initialize the ftrace_ops of fgraph_ops with ftrace_graph_func
> instead of ftrace_stub.

I've changed this, because fprobe entry handler is not called via
fgraph without this. But maybe I have to set correct gops->ops.func
after init?

Thank you,

> 
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> ---
>  kernel/trace/fgraph.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index 597250bd30dc..858fb73440ec 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -872,7 +872,7 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
>  void fgraph_init_ops(struct ftrace_ops *dst_ops,
>  		     struct ftrace_ops *src_ops)
>  {
> -	dst_ops->func = ftrace_stub;
> +	dst_ops->func = ftrace_graph_func;
>  	dst_ops->flags = FTRACE_OPS_FL_PID | FTRACE_OPS_FL_STUB;
>  
>  #ifdef FTRACE_GRAPH_TRAMP_ADDR
> @@ -1120,7 +1120,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
>  
>  	if (!gops->ops.func) {
>  		gops->ops.flags |= FTRACE_OPS_FL_STUB;
> -		gops->ops.func = ftrace_stub;
> +		gops->ops.func = ftrace_graph_func;
>  #ifdef FTRACE_GRAPH_TRAMP_ADDR
>  		gops->ops.trampoline = FTRACE_GRAPH_TRAMP_ADDR;
>  #endif
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-06  0:38             ` Masami Hiramatsu
@ 2023-11-06 10:19               ` Peter Zijlstra
  2023-11-06 12:47                 ` Masami Hiramatsu
  0 siblings, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2023-11-06 10:19 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Alexei Starovoitov, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Thomas Gleixner,
	Guo Ren

On Mon, Nov 06, 2023 at 09:38:50AM +0900, Masami Hiramatsu wrote:
> On Sun, 5 Nov 2023 18:34:09 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Sun, 5 Nov 2023 18:33:01 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > For x86_64, that would be:
> > > 
> > >   rdi, rsi, rdx, r8, r9, rsp
> > 
> > I missed rcx.
> 
> I would like to add rax to the list so that it can handle the return value too. :)

So something like so?


diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 897cf02c20b1..71bfe27594a5 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -36,6 +36,10 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
 
 #ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
 struct ftrace_regs {
+	/*
+	 * Partial, filled with:
+	 *  rax, rcx, rdx, rdi, rsi, r8, r9, rsp
+	 */
 	struct pt_regs		regs;
 };

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-06 10:19               ` Peter Zijlstra
@ 2023-11-06 12:47                 ` Masami Hiramatsu
  2023-11-06 12:52                   ` Peter Zijlstra
  0 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2023-11-06 12:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Alexei Starovoitov, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Thomas Gleixner,
	Guo Ren

On Mon, 6 Nov 2023 11:19:32 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Nov 06, 2023 at 09:38:50AM +0900, Masami Hiramatsu wrote:
> > On Sun, 5 Nov 2023 18:34:09 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > On Sun, 5 Nov 2023 18:33:01 -0500
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > For x86_64, that would be:
> > > > 
> > > >   rdi, rsi, rdx, r8, r9, rsp
> > > 
> > > I missed rcx.
> > 
> > I would like to add rax to the list so that it can handle the return value too. :)
> 
> So something like so?
> 
> 
> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index 897cf02c20b1..71bfe27594a5 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
> @@ -36,6 +36,10 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
>  
>  #ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
>  struct ftrace_regs {
> +	/*
> +	 * Partial, filled with:
> +	 *  rax, rcx, rdx, rdi, rsi, r8, r9, rsp

Don't we need rbp too? (for frame pointer)


> +	 */
>  	struct pt_regs		regs;
>  };


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-06 12:47                 ` Masami Hiramatsu
@ 2023-11-06 12:52                   ` Peter Zijlstra
  0 siblings, 0 replies; 54+ messages in thread
From: Peter Zijlstra @ 2023-11-06 12:52 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Alexei Starovoitov, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Thomas Gleixner,
	Guo Ren

On Mon, Nov 06, 2023 at 09:47:08PM +0900, Masami Hiramatsu wrote:
> On Mon, 6 Nov 2023 11:19:32 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, Nov 06, 2023 at 09:38:50AM +0900, Masami Hiramatsu wrote:
> > > On Sun, 5 Nov 2023 18:34:09 -0500
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > On Sun, 5 Nov 2023 18:33:01 -0500
> > > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > 
> > > > > For x86_64, that would be:
> > > > > 
> > > > >   rdi, rsi, rdx, r8, r9, rsp
> > > > 
> > > > I missed rcx.
> > > 
> > > I would like to add rax to the list so that it can handle the return value too. :)
> > 
> > So something like so?
> > 
> > 
> > diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> > index 897cf02c20b1..71bfe27594a5 100644
> > --- a/arch/x86/include/asm/ftrace.h
> > +++ b/arch/x86/include/asm/ftrace.h
> > @@ -36,6 +36,10 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
> >  
> >  #ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
> >  struct ftrace_regs {
> > +	/*
> > +	 * Partial, filled with:
> > +	 *  rax, rcx, rdx, rdi, rsi, r8, r9, rsp
> 
> Don't we need rbp too? (for frame pointer)

/me goes stare at ftrace_64.S, and yes it appears it fills out rbp too.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-06  1:05           ` Masami Hiramatsu
@ 2023-11-06 16:37             ` Steven Rostedt
  2023-11-07  0:42               ` Masami Hiramatsu
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2023-11-06 16:37 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Peter Zijlstra, Alexei Starovoitov, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Thomas Gleixner,
	Guo Ren

On Mon, 6 Nov 2023 10:05:49 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> So for x86-64,
> 
>  - rdi, rsi, rcx, rdx, r8, r9, and rsp
>  - rax and rdx
>  - rbp
> 
> (BTW, why orig_rax is cleared?)

You mean from ftrace_caller?

That's a "hack" to determine if we need to call the direct trampoline or
not. When you have both a direct trampoline and ftrace functions on the
same function, it will call ftrace_ops_list_func() to iterate all the
registered ftrace callbacks. The direct callback helper will set "orig_rax"
to let the return of the ftrace trampoline call the direct callback.

Remember if a direct callback is by itself, the fentry will call that
direct trampoline without going through the ftrace trampoline. This is used
to tell the ftrace trampoline that it's attached to a direct caller and
needs to call that and not return back to the function it is tracing.

See later down in that file we have:

	/*
	 * If ORIG_RAX is anything but zero, make this a call to that.
	 * See arch_ftrace_set_direct_caller().
	 */
	testq	%rax, %rax

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-06 16:37             ` Steven Rostedt
@ 2023-11-07  0:42               ` Masami Hiramatsu
  2023-11-07  3:06                 ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2023-11-07  0:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Alexei Starovoitov, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Thomas Gleixner,
	Guo Ren

On Mon, 6 Nov 2023 11:37:10 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 6 Nov 2023 10:05:49 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > So for x86-64,
> > 
> >  - rdi, rsi, rcx, rdx, r8, r9, and rsp
> >  - rax and rdx
> >  - rbp
> > 
> > (BTW, why orig_rax is cleared?)
> 
> You mean from ftrace_caller?
> 
> That's a "hack" to determine if we need to call the direct trampoline or
> not. When you have both a direct trampoline and ftrace functions on the
> same function, it will call ftrace_ops_list_func() to iterate all the
> registered ftrace callbacks. The direct callback helper will set "orig_rax"
> to let the return of the ftrace trampoline call the direct callback.

Got it. So does ftrace_regs need a placeholder for direct trampoline?
(Or, can we use a register to pass it?)
I think we don't need to clear it for return_to_handler() but if
`ftrace_regs` spec requires it, it is better to do so.

Thank you,

> 
> Remember if a direct callback is by itself, the fentry will call that
> direct trampoline without going through the ftrace trampoline. This is used
> to tell the ftrace trampoline that it's attached to a direct caller and
> needs to call that and not return back to the function it is tracing.
> 
> See later down in that file we have:
> 
> 	/*
> 	 * If ORIG_RAX is anything but zero, make this a call to that.
> 	 * See arch_ftrace_set_direct_caller().
> 	 */
> 	testq	%rax, %rax
> 
> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 10/32] function_graph: Have the instances use their own ftrace_ops for filtering
  2023-11-05 16:08 ` [RFC PATCH 10/32] function_graph: Have the instances use their own ftrace_ops for filtering Masami Hiramatsu (Google)
@ 2023-11-07  1:47   ` Masami Hiramatsu
  0 siblings, 0 replies; 54+ messages in thread
From: Masami Hiramatsu @ 2023-11-07  1:47 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Alexei Starovoitov, Steven Rostedt, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

On Mon,  6 Nov 2023 01:08:32 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> From: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> Allow for instances to have their own ftrace_ops part of the fgraph_ops that
> makes the funtion_graph tracer filter on the set_ftrace_filter file of the
> instance and not the top instance.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> ---
>  include/linux/ftrace.h               |    1 +
>  kernel/trace/fgraph.c                |   60 +++++++++++++++++++++-------------
>  kernel/trace/ftrace.c                |    6 ++-
>  kernel/trace/trace.h                 |   16 +++++----
>  kernel/trace/trace_functions.c       |    2 +
>  kernel/trace/trace_functions_graph.c |    8 +++--
>  6 files changed, 58 insertions(+), 35 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 7fd044ae3da5..9dab365c6023 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -1044,6 +1044,7 @@ extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph
>  struct fgraph_ops {
>  	trace_func_graph_ent_t		entryfunc;
>  	trace_func_graph_ret_t		retfunc;
> +	struct ftrace_ops		ops; /* for the hash lists */
>  	void				*private;
>  };
>  
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index 1e8c17f70b84..0642f3281b64 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -17,14 +17,6 @@
>  #include "ftrace_internal.h"
>  #include "trace.h"
>  
> -#ifdef CONFIG_DYNAMIC_FTRACE
> -#define ASSIGN_OPS_HASH(opsname, val) \
> -	.func_hash		= val, \
> -	.local_hash.regex_lock	= __MUTEX_INITIALIZER(opsname.local_hash.regex_lock),
> -#else
> -#define ASSIGN_OPS_HASH(opsname, val)
> -#endif
> -
>  #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
>  #define FGRAPH_RET_INDEX (FGRAPH_RET_SIZE / sizeof(long))
>  
> @@ -338,9 +330,6 @@ int function_graph_enter(unsigned long ret, unsigned long func,
>  		return -EBUSY;
>  #endif
>  
> -	if (!ftrace_ops_test(&global_ops, func, NULL))
> -		return -EBUSY;
> -
>  	trace.func = func;
>  	trace.depth = ++current->curr_ret_depth;
>  
> @@ -361,7 +350,8 @@ int function_graph_enter(unsigned long ret, unsigned long func,
>  			atomic_inc(&current->trace_overrun);
>  			break;
>  		}
> -		if (fgraph_array[i]->entryfunc(&trace, fgraph_array[i])) {
> +		if (ftrace_ops_test(&gops->ops, func, NULL) &&
> +		    gops->entryfunc(&trace, gops)) {
>  			offset = current->curr_ret_stack;
>  			/* Check the top level stored word */
>  			type = get_fgraph_type(current, offset - 1);
> @@ -656,17 +646,25 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
>  }
>  #endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
>  
> -static struct ftrace_ops graph_ops = {
> -	.func			= ftrace_graph_func,
> -	.flags			= FTRACE_OPS_FL_INITIALIZED |
> -				   FTRACE_OPS_FL_PID |
> -				   FTRACE_OPS_GRAPH_STUB,
> +void fgraph_init_ops(struct ftrace_ops *dst_ops,
> +		     struct ftrace_ops *src_ops)
> +{
> +	dst_ops->func = ftrace_stub;
> +	dst_ops->flags = FTRACE_OPS_FL_PID | FTRACE_OPS_FL_STUB;

This needs to use FTRACE_OPS_GRAPH_STUB instead of FTRACE_OPS_FL_STUB, 
because commit 0c0593b45c9b ("x86/ftrace: Make function graph use ftrace
directly") introduced this flag to switch the mode. (fgraph on ftrace)

> +
>  #ifdef FTRACE_GRAPH_TRAMP_ADDR
> -	.trampoline		= FTRACE_GRAPH_TRAMP_ADDR,
> +	dst_ops->trampoline = FTRACE_GRAPH_TRAMP_ADDR;
>  	/* trampoline_size is only needed for dynamically allocated tramps */
>  #endif
> -	ASSIGN_OPS_HASH(graph_ops, &global_ops.local_hash)
> -};
> +
> +#ifdef CONFIG_DYNAMIC_FTRACE
> +	if (src_ops) {
> +		dst_ops->func_hash = &src_ops->local_hash;
> +		mutex_init(&dst_ops->local_hash.regex_lock);
> +		dst_ops->flags |= FTRACE_OPS_FL_INITIALIZED;
> +	}
> +#endif
> +}
>  
>  void ftrace_graph_sleep_time_control(bool enable)
>  {
> @@ -871,11 +869,20 @@ static int start_graph_tracing(void)
>  
>  int register_ftrace_graph(struct fgraph_ops *gops)
>  {
> +	int command = 0;
>  	int ret = 0;
>  	int i;
>  
>  	mutex_lock(&ftrace_lock);
>  
> +	if (!gops->ops.func) {
> +		gops->ops.flags |= FTRACE_OPS_FL_STUB;

Ditto.

Thanks,


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 18/32] function_graph: Fix to initalize ftrace_ops for fgraph with ftrace_graph_func
  2023-11-06 10:04   ` Masami Hiramatsu
@ 2023-11-07  1:49     ` Masami Hiramatsu
  2023-11-07  3:07       ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2023-11-07  1:49 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Alexei Starovoitov, Steven Rostedt, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Guo Ren

On Mon, 6 Nov 2023 19:04:16 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> On Mon,  6 Nov 2023 01:10:10 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> 
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > 
> > Fix to initialize the ftrace_ops of fgraph_ops with ftrace_graph_func
> > instead of ftrace_stub.
> 
> I've changed this, because fprobe entry handler is not called via
> fgraph without this. But maybe I have to set correct gops->ops.func
> after init?

I confirmed that this is right because it is introduced by
0c0593b45c9b ("x86/ftrace: Make function graph use ftrace directly")
which replaces ftrace_stub with ftrace_graph_func (which automatically
switched by architecture)

Thanks,

> 
> Thank you,
> 
> > 
> > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > ---
> >  kernel/trace/fgraph.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> > index 597250bd30dc..858fb73440ec 100644
> > --- a/kernel/trace/fgraph.c
> > +++ b/kernel/trace/fgraph.c
> > @@ -872,7 +872,7 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
> >  void fgraph_init_ops(struct ftrace_ops *dst_ops,
> >  		     struct ftrace_ops *src_ops)
> >  {
> > -	dst_ops->func = ftrace_stub;
> > +	dst_ops->func = ftrace_graph_func;
> >  	dst_ops->flags = FTRACE_OPS_FL_PID | FTRACE_OPS_FL_STUB;
> >  
> >  #ifdef FTRACE_GRAPH_TRAMP_ADDR
> > @@ -1120,7 +1120,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
> >  
> >  	if (!gops->ops.func) {
> >  		gops->ops.flags |= FTRACE_OPS_FL_STUB;
> > -		gops->ops.func = ftrace_stub;
> > +		gops->ops.func = ftrace_graph_func;
> >  #ifdef FTRACE_GRAPH_TRAMP_ADDR
> >  		gops->ops.trampoline = FTRACE_GRAPH_TRAMP_ADDR;
> >  #endif
> > 
> 
> 
> -- 
> Masami Hiramatsu (Google) <mhiramat@kernel.org>


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-07  0:42               ` Masami Hiramatsu
@ 2023-11-07  3:06                 ` Steven Rostedt
  2023-11-07  5:43                   ` Masami Hiramatsu
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2023-11-07  3:06 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Peter Zijlstra, Alexei Starovoitov, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Thomas Gleixner,
	Guo Ren

On Tue, 7 Nov 2023 09:42:58 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> Got it. So does ftrace_regs need a placeholder for direct trampoline?
> (Or, can we use a register to pass it?)
> I think we don't need to clear it for return_to_handler() but if
> `ftrace_regs` spec requires it, it is better to do so.

It's per arch defined. I think I wrote somewhere that it just needs to pass
back something that can tell if the handler is to return to a direct
trampoline or not. It could be a unused register, or something else.

It's only needed if an architecture supports direct trampolines.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 18/32] function_graph: Fix to initalize ftrace_ops for fgraph with ftrace_graph_func
  2023-11-07  1:49     ` Masami Hiramatsu
@ 2023-11-07  3:07       ` Steven Rostedt
  0 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2023-11-07  3:07 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Alexei Starovoitov, Florent Revest, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Guo Ren

On Tue, 7 Nov 2023 10:49:24 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > I've changed this, because fprobe entry handler is not called via
> > fgraph without this. But maybe I have to set correct gops->ops.func
> > after init?  
> 
> I confirmed that this is right because it is introduced by
> 0c0593b45c9b ("x86/ftrace: Make function graph use ftrace directly")
> which replaces ftrace_stub with ftrace_graph_func (which automatically
> switched by architecture)

Agreed.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-07  3:06                 ` Steven Rostedt
@ 2023-11-07  5:43                   ` Masami Hiramatsu
  2023-11-07 13:48                     ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Masami Hiramatsu @ 2023-11-07  5:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Alexei Starovoitov, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Thomas Gleixner,
	Guo Ren

On Mon, 6 Nov 2023 22:06:17 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 7 Nov 2023 09:42:58 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > Got it. So does ftrace_regs need a placeholder for direct trampoline?
> > (Or, can we use a register to pass it?)
> > I think we don't need to clear it for return_to_handler() but if
> > `ftrace_regs` spec requires it, it is better to do so.
> 
> It's per arch defined. I think I wrote somewhere that it just needs to pass
> back something that can tell if the handler is to return to a direct
> trampoline or not. It could be a unused register, or something else.

Oh, I meant the flag (address) for "return" trampoline. If we have
direct "return" trampoline we may use it, but currently not.

> 
> It's only needed if an architecture supports direct trampolines.

I see, and x86_64 needs it.
OK, maybe better to keep it clear on x86-64 even on the
return handler.

Thank you,

> 
> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-07  5:43                   ` Masami Hiramatsu
@ 2023-11-07 13:48                     ` Steven Rostedt
  2023-11-07 14:09                       ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2023-11-07 13:48 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Peter Zijlstra, Alexei Starovoitov, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Thomas Gleixner,
	Guo Ren

On Tue, 7 Nov 2023 14:43:28 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > 
> > It's only needed if an architecture supports direct trampolines.  
> 
> I see, and x86_64 needs it.
> OK, maybe better to keep it clear on x86-64 even on the
> return handler.

As it is arch specific, I'm not sure it matters for the return handler, as
the return should never call a direct trampoline.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS
  2023-11-07 13:48                     ` Steven Rostedt
@ 2023-11-07 14:09                       ` Steven Rostedt
  0 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2023-11-07 14:09 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Peter Zijlstra, Alexei Starovoitov, Florent Revest,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Thomas Gleixner,
	Guo Ren

On Tue, 7 Nov 2023 08:48:44 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 7 Nov 2023 14:43:28 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > > 
> > > It's only needed if an architecture supports direct trampolines.    
> > 
> > I see, and x86_64 needs it.
> > OK, maybe better to keep it clear on x86-64 even on the
> > return handler.  
> 
> As it is arch specific, I'm not sure it matters for the return handler, as
> the return should never call a direct trampoline.

Just to clarify, the return trampoline should not bother touching that
register. The register was cleared in the fentry trampoline before calling
all the callbacks because the arch_ftrace_set_direct_caller() would set it.
Then on return of calling the function callbacks, it would test if
something set it or not.

If the return trampoline is not testing it after the return from the
callbacks, there's no reason to clear it. The fentry trampoline used it to
communicate to itself:

	orig_rax = 0;

	call ftrace_ops_list_func()

	/* Did something set orig_rax? */
	if (orig_rax != 0)
		return orig_rax;

It's not setting it to communicate with the callbacks. That is, the
callback does not expect it to be set.

-- Steve


^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2023-11-07 14:10 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-05 16:06 [RFC PATCH 00/32] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
2023-11-05 16:06 ` [RFC PATCH 01/32] seq_buf: Export seq_buf_puts() Masami Hiramatsu (Google)
2023-11-05 16:06 ` [RFC PATCH 02/32] function_graph: Convert ret_stack to a series of longs Masami Hiramatsu (Google)
2023-11-05 16:07 ` [RFC PATCH 03/32] fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long Masami Hiramatsu (Google)
2023-11-05 16:07 ` [RFC PATCH 04/32] function_graph: Add an array structure that will allow multiple callbacks Masami Hiramatsu (Google)
2023-11-05 16:07 ` [RFC PATCH 05/32] function_graph: Allow multiple users to attach to function graph Masami Hiramatsu (Google)
2023-11-05 16:07 ` [RFC PATCH 06/32] function_graph: Remove logic around ftrace_graph_entry and return Masami Hiramatsu (Google)
2023-11-05 16:07 ` [RFC PATCH 07/32] ftrace/function_graph: Pass fgraph_ops to function graph callbacks Masami Hiramatsu (Google)
2023-11-05 16:08 ` [RFC PATCH 08/32] ftrace: Allow function_graph tracer to be enabled in instances Masami Hiramatsu (Google)
2023-11-05 16:08 ` [RFC PATCH 09/32] ftrace: Allow ftrace startup flags exist without dynamic ftrace Masami Hiramatsu (Google)
2023-11-05 16:08 ` [RFC PATCH 10/32] function_graph: Have the instances use their own ftrace_ops for filtering Masami Hiramatsu (Google)
2023-11-07  1:47   ` Masami Hiramatsu
2023-11-05 16:08 ` [RFC PATCH 11/32] function_graph: Add "task variables" per task for fgraph_ops Masami Hiramatsu (Google)
2023-11-05 16:08 ` [RFC PATCH 12/32] function_graph: Move set_graph_function tests to shadow stack global var Masami Hiramatsu (Google)
2023-11-05 16:09 ` [RFC PATCH 13/32] function_graph: Move graph depth stored data " Masami Hiramatsu (Google)
2023-11-05 16:09 ` [RFC PATCH 14/32] function_graph: Move graph notrace bit " Masami Hiramatsu (Google)
2023-11-05 16:09 ` [RFC PATCH 15/32] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data() Masami Hiramatsu (Google)
2023-11-05 16:09 ` [RFC PATCH 16/32] function_graph: Add selftest for passing local variables Masami Hiramatsu (Google)
2023-11-05 16:09 ` [RFC PATCH 17/32] function_graph: Fix to update curr_ret_stack with ARRAY entry size Masami Hiramatsu (Google)
2023-11-05 16:10 ` [RFC PATCH 18/32] function_graph: Fix to initalize ftrace_ops for fgraph with ftrace_graph_func Masami Hiramatsu (Google)
2023-11-06 10:04   ` Masami Hiramatsu
2023-11-07  1:49     ` Masami Hiramatsu
2023-11-07  3:07       ` Steven Rostedt
2023-11-05 16:10 ` [RFC PATCH 19/32] function_graph: Fix to check the return value of ftrace_pop_return_trace() Masami Hiramatsu (Google)
2023-11-05 16:10 ` [RFC PATCH 20/32] function_graph: Pass the size of reserved data when retrieving it Masami Hiramatsu (Google)
2023-11-05 16:10 ` [RFC PATCH 21/32] function_graph: Expand the max reserved data size Masami Hiramatsu (Google)
2023-11-05 16:10 ` [RFC PATCH 22/32] function_graph: Add a new entry handler with parent_ip and ftrace_regs Masami Hiramatsu (Google)
2023-11-05 16:11 ` [RFC PATCH 23/32] function_graph: Add a new exit " Masami Hiramatsu (Google)
2023-11-05 16:11 ` [RFC PATCH 24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS Masami Hiramatsu (Google)
2023-11-05 17:25   ` Peter Zijlstra
2023-11-05 19:11     ` Steven Rostedt
2023-11-05 21:28       ` Steven Rostedt
2023-11-05 23:17       ` Peter Zijlstra
2023-11-05 23:33         ` Steven Rostedt
2023-11-05 23:34           ` Steven Rostedt
2023-11-06  0:38             ` Masami Hiramatsu
2023-11-06 10:19               ` Peter Zijlstra
2023-11-06 12:47                 ` Masami Hiramatsu
2023-11-06 12:52                   ` Peter Zijlstra
2023-11-06  1:05           ` Masami Hiramatsu
2023-11-06 16:37             ` Steven Rostedt
2023-11-07  0:42               ` Masami Hiramatsu
2023-11-07  3:06                 ` Steven Rostedt
2023-11-07  5:43                   ` Masami Hiramatsu
2023-11-07 13:48                     ` Steven Rostedt
2023-11-07 14:09                       ` Steven Rostedt
2023-11-05 16:11 ` [RFC PATCH 25/32] fprobe: Use ftrace_regs in fprobe entry handler Masami Hiramatsu (Google)
2023-11-05 16:11 ` [RFC PATCH 26/32] fprobe: Use ftrace_regs in fprobe exit handler Masami Hiramatsu (Google)
2023-11-05 16:11 ` [RFC PATCH 27/32] tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs Masami Hiramatsu (Google)
2023-11-05 16:12 ` [RFC PATCH 28/32] tracing: Add ftrace_fill_perf_regs() for perf event Masami Hiramatsu (Google)
2023-11-05 16:12 ` [RFC PATCH 29/32] fprobe: Rewrite fprobe on function-graph tracer Masami Hiramatsu (Google)
2023-11-05 16:12 ` [RFC PATCH 30/32] tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS Masami Hiramatsu (Google)
2023-11-05 16:12 ` [RFC PATCH 31/32] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled Masami Hiramatsu (Google)
2023-11-05 16:12 ` [RFC PATCH 32/32] Documentation: probes: Update fprobe on function-graph tracer Masami Hiramatsu (Google)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.