All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
@ 2014-07-03 20:07 Steven Rostedt
  2014-07-03 20:07 ` [RFC][PATCH 1/3] ftrace/x86: Add dynamic allocated trampoline for ftrace_ops Steven Rostedt
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Steven Rostedt @ 2014-07-03 20:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Paul E. McKenney,
	Masami Hiramatsu, Namhyung Kim, H. Peter Anvin, Oleg Nesterov,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

[ NOT READY FOR INCLUSION! ]

Note, this is based off of my remove ftrace_start/stop() patch set.

I've been wanting to do this for years, and just never gotten around to it.
But with all this talk of kpatch and kgraft live kernel patching using
the ftrace infrastructure, it seems like a good time to do it.

The way the function callback mechanism works in ftrace is that if there's
only one function callback registered, it will set the mcount/fentry
trampoline to call that function directly. But as soon as you register
another callback, the mcount trampoline calls a loop function that iterates
over all the registered callbacks (ftrace_ops) checking their hash tables
to see if the called function matches the ops before calling its callback.
This happens even if the two registered functions are not even tracing
the same function!

This really sucks if you are tracing all functions, and then add a kprobe
or perf event that traces a single function. That will cause all the
other functions being traced to perform the loop test.

Ideally, if only a single callback (ftrace_ops) is registered to a
function, than that function should call a trampoline that will only
call that one callback without doing any other tests.

This patch set adds this functionality to x86_64. If a callback is
registered to a function and there's no other callback registered to
that function that ftrace_ops will get its own trampoline allocated
for it that will call the function directly.

Note, for dynamically allocated ftrace_ops (kprobes, perf, instance
directory function tracing), the dynamic trampoline will only be created
if CONFIG_PREEMPT is not set. That's because, until Paul finishes his
rcu_call_task() code, there's no safe way to know if a task was preempted
while on the trampoline and is waiting to run on it some more.

I need to write up a bunch of tests for this code, but currently it works
on the few tests I did manually. I didn't even run this code yet under
my full test suite, so it may very well have bugs in it that might be
easily triggered. But I wanted to get the code out for review to see
if anyone has any other idea to help enhance this feature.

If you want a git repo to play with this, you can get it from below.
That repo will rebase often, so do not build against it.

Enjoy,

-- Steve


  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
rfc/trampoline

Head SHA1: 4d781e010842a56f8e7c1bbe309e38075c277c45


Steven Rostedt (Red Hat) (3):
      ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
      ftrace/x86: Show trampoline call function in enabled_functions
      ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines

----
 arch/x86/kernel/ftrace.c    | 240 ++++++++++++++++++++++++++++++++++++++++++--
 arch/x86/kernel/mcount_64.S |  26 ++++-
 include/linux/ftrace.h      |   8 ++
 kernel/trace/ftrace.c       |  86 +++++++++++++++-
 4 files changed, 344 insertions(+), 16 deletions(-)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC][PATCH 1/3] ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
  2014-07-03 20:07 [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines Steven Rostedt
@ 2014-07-03 20:07 ` Steven Rostedt
  2014-07-04 13:32   ` Masami Hiramatsu
  2014-07-14  2:34   ` Masami Hiramatsu
  2014-07-03 20:07 ` [RFC][PATCH 2/3] ftrace/x86: Show trampoline call function in enabled_functions Steven Rostedt
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 30+ messages in thread
From: Steven Rostedt @ 2014-07-03 20:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Paul E. McKenney,
	Masami Hiramatsu, Namhyung Kim, H. Peter Anvin, Oleg Nesterov,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

[-- Attachment #1: 0001-ftrace-x86-Add-dynamic-allocated-trampoline-for-ftra.patch --]
[-- Type: text/plain, Size: 13148 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.

For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!

Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.

For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/kernel/ftrace.c    | 157 ++++++++++++++++++++++++++++++++++++++++++--
 arch/x86/kernel/mcount_64.S |  26 ++++++--
 include/linux/ftrace.h      |   8 +++
 kernel/trace/ftrace.c       |  46 ++++++++++++-
 4 files changed, 224 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 3386dc9aa333..fcc256a33c1d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -17,9 +17,11 @@
 #include <linux/ftrace.h>
 #include <linux/percpu.h>
 #include <linux/sched.h>
+#include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/module.h>
+#include <linux/moduleloader.h>
 
 #include <trace/syscall.h>
 
@@ -644,12 +646,6 @@ int __init ftrace_dyn_arch_init(void)
 {
 	return 0;
 }
-#endif
-
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-
-#ifdef CONFIG_DYNAMIC_FTRACE
-extern void ftrace_graph_call(void);
 
 static unsigned char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
 {
@@ -665,6 +661,155 @@ static unsigned char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
 	return calc.code;
 }
 
+/* Currently only x86_64 supports dynamic trampolines */
+#ifdef CONFIG_X86_64
+
+/* Defined as markers to the end of the ftrace default trampolines */
+extern void ftrace_caller_end(void);
+extern void ftrace_regs_caller_end(void);
+extern void ftrace_return(void);
+extern void ftrace_caller_op_ptr(void);
+extern void ftrace_regs_caller_op_ptr(void);
+
+/* movq function_trace_op(%rip), %rdx */
+/* 0x48 0x8b 0x15 <offset-to-ftrace_trace_op (4 bytes)> */
+#define OP_REF_SIZE	7
+
+/*
+ * The ftrace_ops is passed to the function, we can pass
+ * in the ops directly as this trampoline will only call
+ * a function for a single ops.
+ */
+union ftrace_op_code_union {
+	char code[OP_REF_SIZE];
+	struct {
+		char op[3];
+		int offset;
+	} __attribute__((packed));
+};
+
+static unsigned long create_trampoline(struct ftrace_ops *ops)
+{
+	unsigned const char *jmp;
+	unsigned long start_offset;
+	unsigned long end_offset;
+	unsigned long op_offset;
+	unsigned long offset;
+	unsigned long size;
+	unsigned long ip;
+	unsigned long *ptr;
+	void *trampoline;
+	unsigned const char op_ref[] = { 0x48, 0x8b, 0x15 };
+	union ftrace_op_code_union op_ptr;
+	int ret;
+
+	if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) {
+		start_offset = (unsigned long)ftrace_regs_caller;
+		end_offset = (unsigned long)ftrace_regs_caller_end;
+		op_offset = (unsigned long)ftrace_regs_caller_op_ptr;
+	} else {
+		start_offset = (unsigned long)ftrace_caller;
+		end_offset = (unsigned long)ftrace_caller_end;
+		op_offset = (unsigned long)ftrace_caller_op_ptr;
+	}
+
+	size = end_offset - start_offset;
+
+	trampoline = module_alloc(size + MCOUNT_INSN_SIZE + sizeof(void *));
+	if (!trampoline)
+		return 0;
+
+	ret = probe_kernel_read(trampoline, (void *)start_offset, size);
+	if (WARN_ON(ret < 0)) {
+		module_free(NULL, trampoline);
+		return 0;
+	}
+
+	ip = (unsigned long)trampoline + size;
+
+	jmp = ftrace_jmp_replace(ip, (unsigned long)ftrace_return);
+	memcpy(trampoline + size, jmp, MCOUNT_INSN_SIZE);
+
+	/*
+	 * Make the op pointer point directly to this ops.
+	 * Copy the ops address to the end of the trampoline.
+	 */
+
+	ptr = (unsigned long *)(trampoline + size + MCOUNT_INSN_SIZE);
+	*ptr = (unsigned long)ops;
+
+	op_offset -= start_offset;
+	memcpy(&op_ptr, trampoline + op_offset, OP_REF_SIZE);
+
+	/* Are we pointing to the reference? */
+	if (WARN_ON(memcmp(op_ptr.op, op_ref, 3) != 0)) {
+		module_free(NULL, trampoline);
+		return 0;
+	}
+
+	/* Load the contents of ptr into the callback parameter */
+	offset = (unsigned long)ptr;
+	offset -= (unsigned long)trampoline + op_offset + OP_REF_SIZE;
+
+	op_ptr.offset = offset;
+
+	/* put in the new offset to the ftrace_ops */
+	memcpy(trampoline + op_offset, &op_ptr, OP_REF_SIZE);
+
+	/* ALLOC_TRAMP flags lets us know we created it */
+	ops->flags |= FTRACE_OPS_FL_ALLOC_TRAMP;
+
+	return (unsigned long)trampoline;
+}
+
+void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
+{
+	unsigned char *new;
+	unsigned long start_offset;
+	unsigned long call_offset;
+	unsigned long offset;
+	unsigned long ip;
+	int ret;
+
+	if (ops->trampoline) {
+		/*
+		 * The ftrace_ops caller may set up its own trampoline.
+		 * In such a case, this code must not modify it.
+		 */
+		if (!(ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
+			return;
+	} else {
+		ops->trampoline = create_trampoline(ops);
+		if (!ops->trampoline)
+			return;
+	}
+
+	if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) {
+		start_offset = (unsigned long)ftrace_regs_caller;
+		call_offset = (unsigned long)ftrace_regs_call;
+	} else {
+		start_offset = (unsigned long)ftrace_caller;
+		call_offset = (unsigned long)ftrace_call;
+	}
+
+	offset = call_offset - start_offset;
+	ip = ops->trampoline + offset;
+
+	/* Do a safe modify in case the trampoline is executing */
+	new = ftrace_call_replace(ip, (unsigned long)ops->func);
+	ret = update_ftrace_func(ip, new);
+
+	/* The update should never fail */
+	WARN_ON(ret);
+}
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_DYNAMIC_FTRACE */
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+extern void ftrace_graph_call(void);
+
 static int ftrace_mod_jmp(unsigned long ip, void *func)
 {
 	unsigned char *new;
diff --git a/arch/x86/kernel/mcount_64.S b/arch/x86/kernel/mcount_64.S
index c73aecf10d34..39121b594a91 100644
--- a/arch/x86/kernel/mcount_64.S
+++ b/arch/x86/kernel/mcount_64.S
@@ -28,9 +28,11 @@ ENTRY(function_hook)
 END(function_hook)
 
 /* skip is set if stack has been adjusted */
-.macro ftrace_caller_setup skip=0
+.macro ftrace_caller_setup trace_label skip=0
 	MCOUNT_SAVE_FRAME \skip
 
+	/* Save this location */
+GLOBAL(\trace_label)
 	/* Load the ftrace_ops into the 3rd parameter */
 	movq function_trace_op(%rip), %rdx
 
@@ -45,8 +47,9 @@ END(function_hook)
 #endif
 .endm
 
+
 ENTRY(ftrace_caller)
-	ftrace_caller_setup
+	ftrace_caller_setup ftrace_caller_op_ptr
 	/* regs go into 4th parameter (but make it NULL) */
 	movq $0, %rcx
 
@@ -54,7 +57,14 @@ GLOBAL(ftrace_call)
 	call ftrace_stub
 
 	MCOUNT_RESTORE_FRAME
-ftrace_return:
+
+	/*
+	 * The copied trampoline must call ftrace_return as it
+	 * still may need to call the function graph tracer.
+	 */
+GLOBAL(ftrace_caller_end)
+
+GLOBAL(ftrace_return)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 GLOBAL(ftrace_graph_call)
@@ -70,7 +80,7 @@ ENTRY(ftrace_regs_caller)
 	pushfq
 
 	/* skip=8 to skip flags saved in SS */
-	ftrace_caller_setup 8
+	ftrace_caller_setup ftrace_regs_caller_op_ptr 8
 
 	/* Save the rest of pt_regs */
 	movq %r15, R15(%rsp)
@@ -122,6 +132,14 @@ GLOBAL(ftrace_regs_call)
 	/* Restore flags */
 	popfq
 
+	/*
+	 * As the jmp to return can be a short jump we
+	 * it must not be copied into the trampoline.
+	 * The trampoline will add the code to jump
+	 * to the return.
+	 */
+GLOBAL(ftrace_regs_caller_end)
+
 	jmp ftrace_return
 
 	popfq
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index ff860dbff75a..4e1a87a3341d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -89,6 +89,13 @@ typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
  * INITIALIZED - The ftrace_ops has already been initialized (first use time
  *            register_ftrace_function() is called, it will initialized the ops)
  * DELETED - The ops are being deleted, do not let them be registered again.
+ * ALLOC_TRAMP - A dynamic trampoline was allocated by the core code.
+ *            The arch specific code sets this flag when it allocated a
+ *            trampoline. This lets the arch know that it can update the
+ *            trampoline in case the callback function changes.
+ *            The ftrace_ops trampoline can be set by the ftrace users, and
+ *            in such cases the arch must not modify it. Only the arch ftrace
+ *            core code should set this flag.
  */
 enum {
 	FTRACE_OPS_FL_ENABLED			= 1 << 0,
@@ -100,6 +107,7 @@ enum {
 	FTRACE_OPS_FL_STUB			= 1 << 6,
 	FTRACE_OPS_FL_INITIALIZED		= 1 << 7,
 	FTRACE_OPS_FL_DELETED			= 1 << 8,
+	FTRACE_OPS_FL_ALLOC_TRAMP		= 1 << 9,
 };
 
 /*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e9f831f4e929..c3683d06f0b2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -371,6 +371,8 @@ static int remove_ftrace_list_ops(struct ftrace_ops **list,
 	return ret;
 }
 
+static void ftrace_update_trampoline(struct ftrace_ops *ops);
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ops->flags & FTRACE_OPS_FL_DELETED)
@@ -403,6 +405,8 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
+	ftrace_update_trampoline(ops);
+
 	if (ftrace_enabled)
 		update_ftrace_function();
 
@@ -3882,6 +3886,9 @@ static char ftrace_graph_buf[FTRACE_FILTER_SIZE] __initdata;
 static char ftrace_graph_notrace_buf[FTRACE_FILTER_SIZE] __initdata;
 static int ftrace_set_func(unsigned long *array, int *idx, int size, char *buffer);
 
+static unsigned long save_global_trampoline;
+static unsigned long save_global_flags;
+
 static int __init set_graph_function(char *str)
 {
 	strlcpy(ftrace_graph_buf, str, FTRACE_FILTER_SIZE);
@@ -4600,6 +4607,20 @@ void __init ftrace_init(void)
 	ftrace_disabled = 1;
 }
 
+/* Do nothing if arch does not support this */
+void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops)
+{
+}
+
+static void ftrace_update_trampoline(struct ftrace_ops *ops)
+{
+	/* Currently, only non dynamic ops can have a trampoline */
+	if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
+		return;
+
+	arch_ftrace_update_trampoline(ops);
+}
+
 #else
 
 static struct ftrace_ops global_ops = {
@@ -4642,6 +4663,10 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
 	return 1;
 }
 
+static void ftrace_update_trampoline(struct ftrace_ops *ops)
+{
+}
+
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 __init void ftrace_init_global_array_ops(struct trace_array *tr)
@@ -5351,6 +5376,13 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
 	global_ops.flags |= FTRACE_OPS_FL_STUB;
 
 #ifdef CONFIG_DYNAMIC_FTRACE
+	/* Save off the trampoline if there was one. */
+	save_global_trampoline = global_ops.trampoline;
+	save_global_flags = global_ops.flags;
+
+	/* Function graph does not allocate the trampoline */
+	global_ops.flags &= ~FTRACE_OPS_FL_ALLOC_TRAMP;
+
 	/* Optimize function graph calling (if implemented by arch) */
 	global_ops.trampoline = FTRACE_GRAPH_ADDR;
 #endif
@@ -5375,12 +5407,20 @@ void unregister_ftrace_graph(void)
 	__ftrace_graph_entry = ftrace_graph_entry_stub;
 	ftrace_shutdown(&global_ops, FTRACE_STOP_FUNC_RET);
 	global_ops.flags &= ~FTRACE_OPS_FL_STUB;
-#ifdef CONFIG_DYNAMIC_FTRACE
-	global_ops.trampoline = 0;
-#endif
 	unregister_pm_notifier(&ftrace_suspend_notifier);
 	unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
 
+#ifdef CONFIG_DYNAMIC_FTRACE
+	/*
+	 * Function graph does not allocate the trampoline, but
+	 * other global_ops do. We need to reset the ALLOC_TRAMP flag
+	 * if one was used.
+	 */
+	global_ops.trampoline = save_global_trampoline;
+	if (save_global_flags & FTRACE_OPS_FL_ALLOC_TRAMP)
+		global_ops.flags |= FTRACE_OPS_FL_ALLOC_TRAMP;
+#endif
+
  out:
 	mutex_unlock(&ftrace_lock);
 }
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC][PATCH 2/3] ftrace/x86: Show trampoline call function in enabled_functions
  2014-07-03 20:07 [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines Steven Rostedt
  2014-07-03 20:07 ` [RFC][PATCH 1/3] ftrace/x86: Add dynamic allocated trampoline for ftrace_ops Steven Rostedt
@ 2014-07-03 20:07 ` Steven Rostedt
  2014-07-03 20:07 ` [RFC][PATCH 3/3] ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines Steven Rostedt
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2014-07-03 20:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Paul E. McKenney,
	Masami Hiramatsu, Namhyung Kim, H. Peter Anvin, Oleg Nesterov,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby,
	H. Peter Anvin

[-- Attachment #1: 0002-ftrace-x86-Show-trampoline-call-function-in-enabled_.patch --]
[-- Type: text/plain, Size: 5518 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

The file /sys/kernel/debug/tracing/eneabled_functions is used to debug
ftrace function hooks. Add to the output what function is being called
by the trampoline if the arch supports it.

Add support for this feature in x86_64.

Cc: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/kernel/ftrace.c | 97 ++++++++++++++++++++++++++++++++++++++++++------
 kernel/trace/ftrace.c    | 22 ++++++++++-
 2 files changed, 105 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index fcc256a33c1d..50de611d44b4 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -49,7 +49,7 @@ int ftrace_arch_code_modify_post_process(void)
 union ftrace_code_union {
 	char code[MCOUNT_INSN_SIZE];
 	struct {
-		char e8;
+		unsigned char e8;
 		int offset;
 	} __attribute__((packed));
 };
@@ -762,11 +762,25 @@ static unsigned long create_trampoline(struct ftrace_ops *ops)
 	return (unsigned long)trampoline;
 }
 
-void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
+static unsigned long calc_trampoline_call_offset(bool save_regs)
 {
-	unsigned char *new;
 	unsigned long start_offset;
 	unsigned long call_offset;
+
+	if (save_regs) {
+		start_offset = (unsigned long)ftrace_regs_caller;
+		call_offset = (unsigned long)ftrace_regs_call;
+	} else {
+		start_offset = (unsigned long)ftrace_caller;
+		call_offset = (unsigned long)ftrace_call;
+	}
+
+	return call_offset - start_offset;
+}
+
+void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
+{
+	unsigned char *new;
 	unsigned long offset;
 	unsigned long ip;
 	int ret;
@@ -784,15 +798,7 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 			return;
 	}
 
-	if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) {
-		start_offset = (unsigned long)ftrace_regs_caller;
-		call_offset = (unsigned long)ftrace_regs_call;
-	} else {
-		start_offset = (unsigned long)ftrace_caller;
-		call_offset = (unsigned long)ftrace_call;
-	}
-
-	offset = call_offset - start_offset;
+	offset = calc_trampoline_call_offset(ops->flags & FTRACE_OPS_FL_SAVE_REGS);
 	ip = ops->trampoline + offset;
 
 	/* Do a safe modify in case the trampoline is executing */
@@ -802,6 +808,73 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 	/* The update should never fail */
 	WARN_ON(ret);
 }
+
+/* Return the address of the function the trampoline calls */
+static void *addr_from_call(void *ptr)
+{
+	union ftrace_code_union calc;
+	int ret;
+
+	ret = probe_kernel_read(&calc, ptr, MCOUNT_INSN_SIZE);
+	if (WARN_ON_ONCE(ret < 0))
+		return NULL;
+
+	/* Make sure this is a call */
+	if (WARN_ON_ONCE(calc.e8 != 0xe8)) {
+		pr_warn("Expected e8, got %lx\n", calc.e8);
+		return NULL;
+	}
+
+	return ptr + MCOUNT_INSN_SIZE + calc.offset;
+}
+
+void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
+			   unsigned long frame_pointer);
+
+/*
+ * If the ops->trampoline was not allocated, then it probably
+ * has a static trampoline func, or is the ftrace caller itself.
+ */
+static void *static_tramp_func(struct ftrace_ops *ops, struct dyn_ftrace *rec)
+{
+	unsigned long offset;
+	bool save_regs = rec->flags & FTRACE_FL_REGS_EN;
+	void *ptr;
+
+	if (ops && ops->trampoline) {
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+		/*
+		 * We only know about function graph tracer setting as static
+		 * trampoline.
+		 */
+		if (ops->trampoline == FTRACE_GRAPH_ADDR)
+			return (void *)prepare_ftrace_return;
+#endif
+		return NULL;
+	}
+
+	offset = calc_trampoline_call_offset(save_regs);
+
+	if (save_regs)
+		ptr = (void *)FTRACE_REGS_ADDR + offset;
+	else
+		ptr = (void *)FTRACE_ADDR + offset;
+
+	return addr_from_call(ptr);
+}
+
+void *arch_ftrace_trampoline_func(struct ftrace_ops *ops, struct dyn_ftrace *rec)
+{
+	unsigned long offset;
+
+	if (!ops || !(ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
+		return static_tramp_func(ops, rec);
+
+	offset = calc_trampoline_call_offset(ops->flags & FTRACE_OPS_FL_SAVE_REGS);
+	return addr_from_call((void *)ops->trampoline + offset);
+}
+
+
 #endif /* CONFIG_X86_64 */
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c3683d06f0b2..66cbb3b7251d 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2868,6 +2868,22 @@ static void t_stop(struct seq_file *m, void *p)
 	mutex_unlock(&ftrace_lock);
 }
 
+void * __weak
+arch_ftrace_trampoline_func(struct ftrace_ops *ops, struct dyn_ftrace *rec)
+{
+	return NULL;
+}
+
+static void add_trampoline_func(struct seq_file *m, struct ftrace_ops *ops,
+				struct dyn_ftrace *rec)
+{
+	void *ptr;
+
+	ptr = arch_ftrace_trampoline_func(ops, rec);
+	if (ptr)
+		seq_printf(m, " ->%pS", ptr);
+}
+
 static int t_show(struct seq_file *m, void *v)
 {
 	struct ftrace_iterator *iter = m->private;
@@ -2891,19 +2907,21 @@ static int t_show(struct seq_file *m, void *v)
 
 	seq_printf(m, "%ps", (void *)rec->ip);
 	if (iter->flags & FTRACE_ITER_ENABLED) {
+		struct ftrace_ops *ops = NULL;
+
 		seq_printf(m, " (%ld)%s",
 			   ftrace_rec_count(rec),
 			   rec->flags & FTRACE_FL_REGS ? " R" : "  ");
 		if (rec->flags & FTRACE_FL_TRAMP_EN) {
-			struct ftrace_ops *ops;
-
 			ops = ftrace_find_tramp_ops_curr(rec);
 			if (ops && ops->trampoline)
 				seq_printf(m, "\ttramp: %pS",
 					   (void *)ops->trampoline);
 			else
 				seq_printf(m, "\ttramp: ERROR!");
+
 		}
+		add_trampoline_func(m, ops, rec);
 	}	
 
 	seq_printf(m, "\n");
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC][PATCH 3/3] ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines
  2014-07-03 20:07 [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines Steven Rostedt
  2014-07-03 20:07 ` [RFC][PATCH 1/3] ftrace/x86: Add dynamic allocated trampoline for ftrace_ops Steven Rostedt
  2014-07-03 20:07 ` [RFC][PATCH 2/3] ftrace/x86: Show trampoline call function in enabled_functions Steven Rostedt
@ 2014-07-03 20:07 ` Steven Rostedt
  2014-07-03 20:32 ` [RFC][PATCH 0/3] ftrace: Add dynamically " Steven Rostedt
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2014-07-03 20:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Paul E. McKenney,
	Masami Hiramatsu, Namhyung Kim, H. Peter Anvin, Oleg Nesterov,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby,
	H. Peter Anvin

[-- Attachment #1: 0003-ftrace-x86-Allow-CONFIG_PREEMPT-dynamic-ops-to-use-a.patch --]
[-- Type: text/plain, Size: 3490 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

When the static ftrace_ops (like function tracer) enables tracing, and it
is the only callback that is referencing a function, a trampoline is
dynamically allocated to the function that calls the callback directly
instead of calling a loop function that iterates over all the registered
ftrace ops (if more than one ops is registered).

But when it comes to dynamically allocated ftrace_ops, where they may be
freed, on a CONFIG_PREEMPT kernel there's no way to know when it is safe
to free the trampoline. If a task was preempted while executing on the
trampoline, there's currently no way to know when it will be off that
trampoline.

But this is not true when it comes to !CONFIG_PREEMPT. The current method
of calling schedule_on_each_cpu() will force tasks off the trampoline,
becaues they can not schedule while on it (kernel preemption is not
configured). That means it is safe to free a dynamically allocated
ftrace ops trampoline when CONFIG_PREEMPT is not configured.

Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/kernel/ftrace.c |  8 ++++++++
 kernel/trace/ftrace.c    | 18 ++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 50de611d44b4..b718fe6f881a 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -874,6 +874,14 @@ void *arch_ftrace_trampoline_func(struct ftrace_ops *ops, struct dyn_ftrace *rec
 	return addr_from_call((void *)ops->trampoline + offset);
 }
 
+void arch_ftrace_trampoline_free(struct ftrace_ops *ops)
+{
+	if (!ops || !(ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
+		return;
+
+	module_free(NULL, ops->trampoline);
+	ops->trampoline = NULL;
+}
 
 #endif /* CONFIG_X86_64 */
 #endif /* CONFIG_DYNAMIC_FTRACE */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 66cbb3b7251d..a1202c6d57d2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2281,6 +2281,10 @@ static void ftrace_run_update_code(int command)
 static ftrace_func_t saved_ftrace_func;
 static int ftrace_start_up;
 
+void __weak arch_ftrace_trampoline_free(struct ftrace_ops *ops)
+{
+}
+
 static void control_ops_free(struct ftrace_ops *ops)
 {
 	free_percpu(ops->disabled);
@@ -2391,6 +2395,8 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command)
 	if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_CONTROL)) {
 		schedule_on_each_cpu(ftrace_sync);
 
+		arch_ftrace_trampoline_free(ops);
+
 		if (ops->flags & FTRACE_OPS_FL_CONTROL)
 			control_ops_free(ops);
 	}
@@ -4632,9 +4638,21 @@ void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 
 static void ftrace_update_trampoline(struct ftrace_ops *ops)
 {
+
+/*
+ * Currently there's no safe way to free a trampoline when the kernel
+ * is configured with PREEMPT. That is because a task could be preempted
+ * when it jumped to the trampoline, it may be preempted for a long time
+ * depending on the system load, and currently there's no way to know
+ * when it will be off the trampoline. If the trampoline is freed
+ * too early, when the task runs again, it will be executing on freed
+ * memory and crash.
+ */
+#ifdef CONFIG_PREEMPT
 	/* Currently, only non dynamic ops can have a trampoline */
 	if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
 		return;
+#endif
 
 	arch_ftrace_update_trampoline(ops);
 }
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-03 20:07 [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines Steven Rostedt
                   ` (2 preceding siblings ...)
  2014-07-03 20:07 ` [RFC][PATCH 3/3] ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines Steven Rostedt
@ 2014-07-03 20:32 ` Steven Rostedt
  2014-07-04 13:20 ` Masami Hiramatsu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2014-07-03 20:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Masami Hiramatsu, Namhyung Kim, H. Peter Anvin,
	Oleg Nesterov, Josh Poimboeuf, Jiri Kosina, Seth Jennings,
	Jiri Slaby

On Thu, 03 Jul 2014 16:07:50 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> [ NOT READY FOR INCLUSION! ]

One more thing to note. This will not go in before 3.18 (and depending
on issues, maybe later than that).

-- Steve

> 
> Note, this is based off of my remove ftrace_start/stop() patch set.
> 
> I've been wanting to do this for years, and just never gotten around to it.
> But with all this talk of kpatch and kgraft live kernel patching using
> the ftrace infrastructure, it seems like a good time to do it.
> 
> The way the function callback mechanism works in ftrace is that if there's
> only one function callback registered, it will set the mcount/fentry
> trampoline to call that function directly. But as soon as you register
> another callback, the mcount trampoline calls a loop function that iterates
> over all the registered callbacks (ftrace_ops) checking their hash tables
> to see if the called function matches the ops before calling its callback.
> This happens even if the two registered functions are not even tracing
> the same function!
> 
> This really sucks if you are tracing all functions, and then add a kprobe
> or perf event that traces a single function. That will cause all the
> other functions being traced to perform the loop test.
> 
> Ideally, if only a single callback (ftrace_ops) is registered to a
> function, than that function should call a trampoline that will only
> call that one callback without doing any other tests.
> 
> This patch set adds this functionality to x86_64. If a callback is
> registered to a function and there's no other callback registered to
> that function that ftrace_ops will get its own trampoline allocated
> for it that will call the function directly.
> 
> Note, for dynamically allocated ftrace_ops (kprobes, perf, instance
> directory function tracing), the dynamic trampoline will only be created
> if CONFIG_PREEMPT is not set. That's because, until Paul finishes his
> rcu_call_task() code, there's no safe way to know if a task was preempted
> while on the trampoline and is waiting to run on it some more.
> 
> I need to write up a bunch of tests for this code, but currently it works
> on the few tests I did manually. I didn't even run this code yet under
> my full test suite, so it may very well have bugs in it that might be
> easily triggered. But I wanted to get the code out for review to see
> if anyone has any other idea to help enhance this feature.
> 
> If you want a git repo to play with this, you can get it from below.
> That repo will rebase often, so do not build against it.
> 
> Enjoy,
> 
> -- Steve
> 
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> rfc/trampoline
> 
> Head SHA1: 4d781e010842a56f8e7c1bbe309e38075c277c45
> 
> 
> Steven Rostedt (Red Hat) (3):
>       ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
>       ftrace/x86: Show trampoline call function in enabled_functions
>       ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines
> 
> ----
>  arch/x86/kernel/ftrace.c    | 240 ++++++++++++++++++++++++++++++++++++++++++--
>  arch/x86/kernel/mcount_64.S |  26 ++++-
>  include/linux/ftrace.h      |   8 ++
>  kernel/trace/ftrace.c       |  86 +++++++++++++++-
>  4 files changed, 344 insertions(+), 16 deletions(-)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-03 20:07 [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines Steven Rostedt
                   ` (3 preceding siblings ...)
  2014-07-03 20:32 ` [RFC][PATCH 0/3] ftrace: Add dynamically " Steven Rostedt
@ 2014-07-04 13:20 ` Masami Hiramatsu
  2014-07-04 14:21   ` Steven Rostedt
  2014-07-07 13:58 ` Jiri Kosina
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Masami Hiramatsu @ 2014-07-04 13:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Namhyung Kim, H. Peter Anvin, Oleg Nesterov,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

(2014/07/04 5:07), Steven Rostedt wrote:
> [ NOT READY FOR INCLUSION! ]
> 
> Note, this is based off of my remove ftrace_start/stop() patch set.
> 
> I've been wanting to do this for years, and just never gotten around to it.
> But with all this talk of kpatch and kgraft live kernel patching using
> the ftrace infrastructure, it seems like a good time to do it.
> 
> The way the function callback mechanism works in ftrace is that if there's
> only one function callback registered, it will set the mcount/fentry
> trampoline to call that function directly. But as soon as you register
> another callback, the mcount trampoline calls a loop function that iterates
> over all the registered callbacks (ftrace_ops) checking their hash tables
> to see if the called function matches the ops before calling its callback.
> This happens even if the two registered functions are not even tracing
> the same function!
> 
> This really sucks if you are tracing all functions, and then add a kprobe
> or perf event that traces a single function. That will cause all the
> other functions being traced to perform the loop test.

Ah, I've thought that ftrace already had different trampoline for loop and
single and replaced each mcount-call instruction to appropriate one. But
this series actually does that, doesn't this? :)

> Ideally, if only a single callback (ftrace_ops) is registered to a
> function, than that function should call a trampoline that will only
> call that one callback without doing any other tests.
> 
> This patch set adds this functionality to x86_64. If a callback is
> registered to a function and there's no other callback registered to
> that function that ftrace_ops will get its own trampoline allocated
> for it that will call the function directly.
> 
> Note, for dynamically allocated ftrace_ops (kprobes, perf, instance
> directory function tracing), the dynamic trampoline will only be created
> if CONFIG_PREEMPT is not set. That's because, until Paul finishes his
> rcu_call_task() code, there's no safe way to know if a task was preempted
> while on the trampoline and is waiting to run on it some more.

Hmm, if we can declare "this ftrace_ops is permanent"(like finalizing) then
we can allocate trampoline for such dynamic one. Since the kprobes actually
doesn't need to free (or unregister) ftrace_ops, I can use it.


> I need to write up a bunch of tests for this code, but currently it works
> on the few tests I did manually. I didn't even run this code yet under
> my full test suite, so it may very well have bugs in it that might be
> easily triggered. But I wanted to get the code out for review to see
> if anyone has any other idea to help enhance this feature.

Yeah, I'll review it.


Thank you,


> 
> If you want a git repo to play with this, you can get it from below.
> That repo will rebase often, so do not build against it.
> 
> Enjoy,
> 
> -- Steve
> 
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> rfc/trampoline
> 
> Head SHA1: 4d781e010842a56f8e7c1bbe309e38075c277c45
> 
> 
> Steven Rostedt (Red Hat) (3):
>       ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
>       ftrace/x86: Show trampoline call function in enabled_functions
>       ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines
> 
> ----
>  arch/x86/kernel/ftrace.c    | 240 ++++++++++++++++++++++++++++++++++++++++++--
>  arch/x86/kernel/mcount_64.S |  26 ++++-
>  include/linux/ftrace.h      |   8 ++
>  kernel/trace/ftrace.c       |  86 +++++++++++++++-
>  4 files changed, 344 insertions(+), 16 deletions(-)
> 
> 
> 


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 1/3] ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
  2014-07-03 20:07 ` [RFC][PATCH 1/3] ftrace/x86: Add dynamic allocated trampoline for ftrace_ops Steven Rostedt
@ 2014-07-04 13:32   ` Masami Hiramatsu
  2014-07-04 14:25     ` Steven Rostedt
  2014-07-14  2:34   ` Masami Hiramatsu
  1 sibling, 1 reply; 30+ messages in thread
From: Masami Hiramatsu @ 2014-07-04 13:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Namhyung Kim, H. Peter Anvin, Oleg Nesterov,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

(2014/07/04 5:07), Steven Rostedt wrote:
> +
> +void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
> +{
> +	unsigned char *new;
> +	unsigned long start_offset;
> +	unsigned long call_offset;
> +	unsigned long offset;
> +	unsigned long ip;
> +	int ret;
> +
> +	if (ops->trampoline) {
> +		/*
> +		 * The ftrace_ops caller may set up its own trampoline.
> +		 * In such a case, this code must not modify it.
> +		 */
> +		if (!(ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
> +			return;

Just a question, what happen if the ftrace_ops caller sets up a trampoline which is
not compatible to the ftrace's trampoline, and the ftrace_ops conflicts on a IP with other
ftrace_ops? I guess in that case ftrace will use the loop callback on the IP, but since
the trampoline is not compatible, the result will not be same, is that right? :)

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-04 13:20 ` Masami Hiramatsu
@ 2014-07-04 14:21   ` Steven Rostedt
  2014-07-07 13:22     ` Jiri Kosina
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2014-07-04 14:21 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Namhyung Kim, H. Peter Anvin, Oleg Nesterov,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

On Fri, 04 Jul 2014 22:20:12 +0900
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

> (2014/07/04 5:07), Steven Rostedt wrote:
> > [ NOT READY FOR INCLUSION! ]
> > 
> > Note, this is based off of my remove ftrace_start/stop() patch set.
> > 
> > I've been wanting to do this for years, and just never gotten around to it.
> > But with all this talk of kpatch and kgraft live kernel patching using
> > the ftrace infrastructure, it seems like a good time to do it.
> > 
> > The way the function callback mechanism works in ftrace is that if there's
> > only one function callback registered, it will set the mcount/fentry
> > trampoline to call that function directly. But as soon as you register
> > another callback, the mcount trampoline calls a loop function that iterates
> > over all the registered callbacks (ftrace_ops) checking their hash tables
> > to see if the called function matches the ops before calling its callback.
> > This happens even if the two registered functions are not even tracing
> > the same function!
> > 
> > This really sucks if you are tracing all functions, and then add a kprobe
> > or perf event that traces a single function. That will cause all the
> > other functions being traced to perform the loop test.
> 
> Ah, I've thought that ftrace already had different trampoline for loop and
> single and replaced each mcount-call instruction to appropriate one. But
> this series actually does that, doesn't this? :)

Well, I guess the answer to that is what do you consider the
trampoline? I'm currently considering it to be the assembly code that
the mcount/fentry call jumps to. We only have two trampolines (three if
you count the function graph code that will be called directly come
3.17). Those two are the normal ftrace_caller and the
ftrace_regs_caller. Now what they call can be different. When only a
single ftrace_ops is registered, they call the ftrace_ops->func
directly. If there are more than one ftrace_ops registered, then they
call the loop function directly.


> 
> > Ideally, if only a single callback (ftrace_ops) is registered to a
> > function, than that function should call a trampoline that will only
> > call that one callback without doing any other tests.
> > 
> > This patch set adds this functionality to x86_64. If a callback is
> > registered to a function and there's no other callback registered to
> > that function that ftrace_ops will get its own trampoline allocated
> > for it that will call the function directly.
> > 
> > Note, for dynamically allocated ftrace_ops (kprobes, perf, instance
> > directory function tracing), the dynamic trampoline will only be created
> > if CONFIG_PREEMPT is not set. That's because, until Paul finishes his
> > rcu_call_task() code, there's no safe way to know if a task was preempted
> > while on the trampoline and is waiting to run on it some more.
> 
> Hmm, if we can declare "this ftrace_ops is permanent"(like finalizing) then
> we can allocate trampoline for such dynamic one. Since the kprobes actually
> doesn't need to free (or unregister) ftrace_ops, I can use it.

Yeah, if we add a ftrace_ops PERMANENT flag, then we could allow them
too.

> 
> 
> > I need to write up a bunch of tests for this code, but currently it works
> > on the few tests I did manually. I didn't even run this code yet under
> > my full test suite, so it may very well have bugs in it that might be
> > easily triggered. But I wanted to get the code out for review to see
> > if anyone has any other idea to help enhance this feature.
> 
> Yeah, I'll review it.

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 1/3] ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
  2014-07-04 13:32   ` Masami Hiramatsu
@ 2014-07-04 14:25     ` Steven Rostedt
  0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2014-07-04 14:25 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Namhyung Kim, H. Peter Anvin, Oleg Nesterov,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

On Fri, 04 Jul 2014 22:32:44 +0900
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

> (2014/07/04 5:07), Steven Rostedt wrote:
> > +
> > +void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
> > +{
> > +	unsigned char *new;
> > +	unsigned long start_offset;
> > +	unsigned long call_offset;
> > +	unsigned long offset;
> > +	unsigned long ip;
> > +	int ret;
> > +
> > +	if (ops->trampoline) {
> > +		/*
> > +		 * The ftrace_ops caller may set up its own trampoline.
> > +		 * In such a case, this code must not modify it.
> > +		 */
> > +		if (!(ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
> > +			return;
> 
> Just a question, what happen if the ftrace_ops caller sets up a trampoline which is
> not compatible to the ftrace's trampoline, and the ftrace_ops conflicts on a IP with other
> ftrace_ops? I guess in that case ftrace will use the loop callback on the IP, but since
> the trampoline is not compatible, the result will not be same, is that right? :)

If the caller sets up a trampoline, it must not set the ALLOC_TRAMP
flag. If you look at the comment about that flag it states this:

+ * ALLOC_TRAMP - A dynamic trampoline was allocated by the core code.
+ *            The arch specific code sets this flag when it allocated a
+ *            trampoline. This lets the arch know that it can update the
+ *            trampoline in case the callback function changes.
+ *            The ftrace_ops trampoline can be set by the ftrace users, and
+ *            in such cases the arch must not modify it. Only the arch ftrace
+ *            core code should set this flag.


That last line is important. Only the arch ftrace code (the one that
may modify it with arch_ftrace_update_trampoline should set the
ALLOC_TRAMP flag. That's how it knows if it can modify it or not.

The function_graph tracer sets up its own trampoline. Although it needs
to go through some hoops there because it shares the ftrace_ops with
the function tracer. Thus, it has to store the trampoline and this flag
before registering ftrace ops, and then it has to restore it when it
unregisters.

-- Steve


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-04 14:21   ` Steven Rostedt
@ 2014-07-07 13:22     ` Jiri Kosina
  2014-07-08 14:24       ` Steven Rostedt
  0 siblings, 1 reply; 30+ messages in thread
From: Jiri Kosina @ 2014-07-07 13:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Paul E. McKenney, Namhyung Kim, H. Peter Anvin,
	Oleg Nesterov, Josh Poimboeuf, Seth Jennings, Jiri Slaby

On Fri, 4 Jul 2014, Steven Rostedt wrote:

> Well, I guess the answer to that is what do you consider the trampoline? 
> I'm currently considering it to be the assembly code that the 
> mcount/fentry call jumps to. We only have two trampolines (three if you 
> count the function graph code that will be called directly come 3.17). 
> Those two are the normal ftrace_caller and the ftrace_regs_caller. 

BTW, on those archs that support regs saving already, is there really a 
reson not to kill ftrace_caller and keep just ftrace_regs_caller?

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-03 20:07 [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines Steven Rostedt
                   ` (4 preceding siblings ...)
  2014-07-04 13:20 ` Masami Hiramatsu
@ 2014-07-07 13:58 ` Jiri Kosina
  2014-07-10 21:36 ` Josh Poimboeuf
  2014-07-22 16:47 ` Oleg Nesterov
  7 siblings, 0 replies; 30+ messages in thread
From: Jiri Kosina @ 2014-07-07 13:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Masami Hiramatsu, Namhyung Kim, H. Peter Anvin,
	Oleg Nesterov, Josh Poimboeuf, Seth Jennings, Jiri Slaby

On Thu, 3 Jul 2014, Steven Rostedt wrote:

> [ NOT READY FOR INCLUSION! ]
> 
> Note, this is based off of my remove ftrace_start/stop() patch set.
> 
> I've been wanting to do this for years, and just never gotten around to it.
> But with all this talk of kpatch and kgraft live kernel patching using
> the ftrace infrastructure, it seems like a good time to do it.
> 
> The way the function callback mechanism works in ftrace is that if there's
> only one function callback registered, it will set the mcount/fentry
> trampoline to call that function directly. But as soon as you register
> another callback, the mcount trampoline calls a loop function that iterates
> over all the registered callbacks (ftrace_ops) checking their hash tables
> to see if the called function matches the ops before calling its callback.
> This happens even if the two registered functions are not even tracing
> the same function!
> 
> This really sucks if you are tracing all functions, and then add a kprobe
> or perf event that traces a single function. That will cause all the
> other functions being traced to perform the loop test.
[ ... snip ... ]

For the record -- I just did a quick test, and kGraft works nicely on top 
of this patchset.

Will be looking into the patches more closely later this week.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-07 13:22     ` Jiri Kosina
@ 2014-07-08 14:24       ` Steven Rostedt
  0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2014-07-08 14:24 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Masami Hiramatsu, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Paul E. McKenney, Namhyung Kim, H. Peter Anvin,
	Oleg Nesterov, Josh Poimboeuf, Seth Jennings, Jiri Slaby

On Mon, 7 Jul 2014 15:22:27 +0200 (CEST)
Jiri Kosina <jkosina@suse.cz> wrote:

> On Fri, 4 Jul 2014, Steven Rostedt wrote:
> 
> > Well, I guess the answer to that is what do you consider the trampoline? 
> > I'm currently considering it to be the assembly code that the 
> > mcount/fentry call jumps to. We only have two trampolines (three if you 
> > count the function graph code that will be called directly come 3.17). 
> > Those two are the normal ftrace_caller and the ftrace_regs_caller. 
> 
> BTW, on those archs that support regs saving already, is there really a 
> reson not to kill ftrace_caller and keep just ftrace_regs_caller?
> 

Consistency. Perhaps the two can be the same trampoline, which would be
trivial to implement, but I wouldn't kill one which would make the
generic code more complex.

-- Steve

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-03 20:07 [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines Steven Rostedt
                   ` (5 preceding siblings ...)
  2014-07-07 13:58 ` Jiri Kosina
@ 2014-07-10 21:36 ` Josh Poimboeuf
  2014-07-10 21:44   ` Jiri Kosina
  2014-07-22 16:47 ` Oleg Nesterov
  7 siblings, 1 reply; 30+ messages in thread
From: Josh Poimboeuf @ 2014-07-10 21:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Masami Hiramatsu, Namhyung Kim, H. Peter Anvin,
	Oleg Nesterov, Jiri Kosina, Seth Jennings, Jiri Slaby

On Thu, Jul 03, 2014 at 04:07:50PM -0400, Steven Rostedt wrote:
> [ NOT READY FOR INCLUSION! ]
> 
> Note, this is based off of my remove ftrace_start/stop() patch set.
> 
> I've been wanting to do this for years, and just never gotten around to it.
> But with all this talk of kpatch and kgraft live kernel patching using
> the ftrace infrastructure, it seems like a good time to do it.
> 
> The way the function callback mechanism works in ftrace is that if there's
> only one function callback registered, it will set the mcount/fentry
> trampoline to call that function directly. But as soon as you register
> another callback, the mcount trampoline calls a loop function that iterates
> over all the registered callbacks (ftrace_ops) checking their hash tables
> to see if the called function matches the ops before calling its callback.
> This happens even if the two registered functions are not even tracing
> the same function!
> 
> This really sucks if you are tracing all functions, and then add a kprobe
> or perf event that traces a single function. That will cause all the
> other functions being traced to perform the loop test.
> 
> Ideally, if only a single callback (ftrace_ops) is registered to a
> function, than that function should call a trampoline that will only
> call that one callback without doing any other tests.

Hi Steven,

I did some testing with kpatch and I found one minor issue.  The dynamically
allocated trampoline seems to confuse dump_stack() somewhat.

I added a dump_stack() call in my ftrace_ops callback function
(kpatch_ftrace_handler) which had a filter on meminfo_proc_show().

Without this patch set (3.14):

  [ 1533.256720]  [<ffffffff816ef90e>] dump_stack+0x45/0x56
  [ 1533.256725]  [<ffffffff8125cc80>] ? meminfo_proc_open+0x30/0x30
  [ 1533.256731]  [<ffffffffa07d1494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
  [ 1533.256734]  [<ffffffff8125cc80>] ? meminfo_proc_open+0x30/0x30
  [ 1533.256739]  [<ffffffff81137a76>] ftrace_ops_list_func+0xf6/0x110
  [ 1533.256744]  [<ffffffff816ff9c2>] ftrace_regs_call+0x5/0x77
  [ 1533.256748]  [<ffffffff8120d4ce>] ? seq_read+0x2de/0x3b0
  [ 1533.256751]  [<ffffffff8120d4ce>] ? seq_read+0x2de/0x3b0
  [ 1533.256755]  [<ffffffff8125cc85>] ? meminfo_proc_show+0x5/0x5e0
  [ 1533.256757]  [<ffffffff8125cc85>] ? meminfo_proc_show+0x5/0x5e0
  [ 1533.256760]  [<ffffffff8120d35a>] ? seq_read+0x16a/0x3b0
  [ 1533.256764]  [<ffffffff81253f1d>] proc_reg_read+0x3d/0x80
  [ 1533.256769]  [<ffffffff811e93cb>] vfs_read+0x9b/0x160
  [ 1533.256772]  [<ffffffff811e9ed5>] SyS_read+0x55/0xd0
  [ 1533.256776]  [<ffffffff816ffc29>] system_call_fastpath+0x16/0x1b

With this patch set:

  [  934.546013]  [<ffffffff81700312>] dump_stack+0x45/0x56
  [  934.546020]  [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
  [  934.546027]  [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
  [  934.546058]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
  [  934.546062]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
  [  934.546067]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
  [  934.546071]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
  [  934.546075]  [<ffffffff8121423a>] ? seq_read+0x16a/0x3b0
  [  934.546081]  [<ffffffff8125768d>] ? proc_reg_read+0x3d/0x80
  [  934.546088]  [<ffffffff811f0668>] ? vfs_read+0x98/0x170
  [  934.546093]  [<ffffffff811f1345>] ? SyS_read+0x55/0xd0
  [  934.546099]  [<ffffffff81707969>] ? system_call_fastpath+0x16/0x1b

It correctly shows the traced function's callers, but they're shown as
unreliable (question marks).  I'm not sure why that happens, but if
something goes wrong in a callback, this could make the reported stack
traces confusing.  It could also cause trouble for code which relies on
reliable stack traces (e.g. kpatch).

> 
> This patch set adds this functionality to x86_64. If a callback is
> registered to a function and there's no other callback registered to
> that function that ftrace_ops will get its own trampoline allocated
> for it that will call the function directly.
> 
> Note, for dynamically allocated ftrace_ops (kprobes, perf, instance
> directory function tracing), the dynamic trampoline will only be created
> if CONFIG_PREEMPT is not set. That's because, until Paul finishes his
> rcu_call_task() code, there's no safe way to know if a task was preempted
> while on the trampoline and is waiting to run on it some more.
> 
> I need to write up a bunch of tests for this code, but currently it works
> on the few tests I did manually. I didn't even run this code yet under
> my full test suite, so it may very well have bugs in it that might be
> easily triggered. But I wanted to get the code out for review to see
> if anyone has any other idea to help enhance this feature.
> 
> If you want a git repo to play with this, you can get it from below.
> That repo will rebase often, so do not build against it.
> 
> Enjoy,
> 
> -- Steve
> 
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> rfc/trampoline
> 
> Head SHA1: 4d781e010842a56f8e7c1bbe309e38075c277c45
> 
> 
> Steven Rostedt (Red Hat) (3):
>       ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
>       ftrace/x86: Show trampoline call function in enabled_functions
>       ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines
> 
> ----
>  arch/x86/kernel/ftrace.c    | 240 ++++++++++++++++++++++++++++++++++++++++++--
>  arch/x86/kernel/mcount_64.S |  26 ++++-
>  include/linux/ftrace.h      |   8 ++
>  kernel/trace/ftrace.c       |  86 +++++++++++++++-
>  4 files changed, 344 insertions(+), 16 deletions(-)
> 

-- 
Josh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-10 21:36 ` Josh Poimboeuf
@ 2014-07-10 21:44   ` Jiri Kosina
  2014-07-10 22:01     ` Josh Poimboeuf
  2014-07-11  2:26     ` Masami Hiramatsu
  0 siblings, 2 replies; 30+ messages in thread
From: Jiri Kosina @ 2014-07-10 21:44 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Paul E. McKenney, Masami Hiramatsu,
	Namhyung Kim, H. Peter Anvin, Oleg Nesterov, Seth Jennings,
	Jiri Slaby

On Thu, 10 Jul 2014, Josh Poimboeuf wrote:

> I did some testing with kpatch and I found one minor issue.  The dynamically
> allocated trampoline seems to confuse dump_stack() somewhat.
> 
> I added a dump_stack() call in my ftrace_ops callback function
> (kpatch_ftrace_handler) which had a filter on meminfo_proc_show().

Interesting. Are you using dwarf2 unwinder for stack dumping by any 
chance? It seems to get things right here. Will look into it more 
tomorrow.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-10 21:44   ` Jiri Kosina
@ 2014-07-10 22:01     ` Josh Poimboeuf
  2014-07-11  2:26     ` Masami Hiramatsu
  1 sibling, 0 replies; 30+ messages in thread
From: Josh Poimboeuf @ 2014-07-10 22:01 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Paul E. McKenney, Masami Hiramatsu,
	Namhyung Kim, H. Peter Anvin, Oleg Nesterov, Seth Jennings,
	Jiri Slaby

On Thu, Jul 10, 2014 at 11:44:43PM +0200, Jiri Kosina wrote:
> On Thu, 10 Jul 2014, Josh Poimboeuf wrote:
> 
> > I did some testing with kpatch and I found one minor issue.  The dynamically
> > allocated trampoline seems to confuse dump_stack() somewhat.
> > 
> > I added a dump_stack() call in my ftrace_ops callback function
> > (kpatch_ftrace_handler) which had a filter on meminfo_proc_show().
> 
> Interesting. Are you using dwarf2 unwinder for stack dumping by any 
> chance?

I think so (but not sure).  How can I check?  I do see the following
macros being defined in my kernel build:

  -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1

> It seems to get things right here. Will look into it more tomorrow.

Thanks, please let me know what you find...

-- 
Josh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-10 21:44   ` Jiri Kosina
  2014-07-10 22:01     ` Josh Poimboeuf
@ 2014-07-11  2:26     ` Masami Hiramatsu
  2014-07-11 13:24       ` Jiri Kosina
  1 sibling, 1 reply; 30+ messages in thread
From: Masami Hiramatsu @ 2014-07-11  2:26 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Josh Poimboeuf, Steven Rostedt, linux-kernel, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Paul E. McKenney, Namhyung Kim,
	H. Peter Anvin, Oleg Nesterov, Seth Jennings, Jiri Slaby

(2014/07/11 6:44), Jiri Kosina wrote:
> On Thu, 10 Jul 2014, Josh Poimboeuf wrote:
> 
>> I did some testing with kpatch and I found one minor issue.  The dynamically
>> allocated trampoline seems to confuse dump_stack() somewhat.
>>
>> I added a dump_stack() call in my ftrace_ops callback function
>> (kpatch_ftrace_handler) which had a filter on meminfo_proc_show().
> 
> Interesting. Are you using dwarf2 unwinder for stack dumping by any 
> chance? It seems to get things right here. Will look into it more 
> tomorrow.

Hmm, can dwarf2 unwinder work on the trampoline method?
Since the trampoline just a copy of instructions which
will not have CFI(which is stored in dwarf section),
I guess it may not work...
Frame pointer (push bp and save sp to bp on the entry) can
work anyway.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-11  2:26     ` Masami Hiramatsu
@ 2014-07-11 13:24       ` Jiri Kosina
  2014-07-11 14:29         ` Josh Poimboeuf
  0 siblings, 1 reply; 30+ messages in thread
From: Jiri Kosina @ 2014-07-11 13:24 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Josh Poimboeuf, Steven Rostedt, linux-kernel, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Paul E. McKenney, Namhyung Kim,
	H. Peter Anvin, Oleg Nesterov, Seth Jennings, Jiri Slaby

On Fri, 11 Jul 2014, Masami Hiramatsu wrote:

> >> I did some testing with kpatch and I found one minor issue.  The dynamically
> >> allocated trampoline seems to confuse dump_stack() somewhat.
> >>
> >> I added a dump_stack() call in my ftrace_ops callback function
> >> (kpatch_ftrace_handler) which had a filter on meminfo_proc_show().
> > 
> > Interesting. Are you using dwarf2 unwinder for stack dumping by any 
> > chance? It seems to get things right here. Will look into it more 
> > tomorrow.
> 
> Hmm, can dwarf2 unwinder work on the trampoline method? Since the 
> trampoline just a copy of instructions which will not have CFI(which is 
> stored in dwarf section), I guess it may not work... Frame pointer (push 
> bp and save sp to bp on the entry) can work anyway.

That was exactly my idea and that's why I asked, thanks for confirming.

I am afraid we'll have to declare dynamic trampolines incompatible with 
drawf2 stack dumping.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-11 13:24       ` Jiri Kosina
@ 2014-07-11 14:29         ` Josh Poimboeuf
  2014-07-14  1:35           ` Masami Hiramatsu
  0 siblings, 1 reply; 30+ messages in thread
From: Josh Poimboeuf @ 2014-07-11 14:29 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Masami Hiramatsu, Steven Rostedt, linux-kernel, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Paul E. McKenney, Namhyung Kim,
	H. Peter Anvin, Oleg Nesterov, Seth Jennings, Jiri Slaby

On Fri, Jul 11, 2014 at 03:24:28PM +0200, Jiri Kosina wrote:
> On Fri, 11 Jul 2014, Masami Hiramatsu wrote:
> 
> > >> I did some testing with kpatch and I found one minor issue.  The dynamically
> > >> allocated trampoline seems to confuse dump_stack() somewhat.
> > >>
> > >> I added a dump_stack() call in my ftrace_ops callback function
> > >> (kpatch_ftrace_handler) which had a filter on meminfo_proc_show().
> > > 
> > > Interesting. Are you using dwarf2 unwinder for stack dumping by any 
> > > chance? It seems to get things right here. Will look into it more 
> > > tomorrow.
> > 
> > Hmm, can dwarf2 unwinder work on the trampoline method? Since the 
> > trampoline just a copy of instructions which will not have CFI(which is 
> > stored in dwarf section), I guess it may not work... Frame pointer (push 
> > bp and save sp to bp on the entry) can work anyway.
> 
> That was exactly my idea and that's why I asked, thanks for confirming.
> 
> I am afraid we'll have to declare dynamic trampolines incompatible with 
> drawf2 stack dumping.

In this case, the problem wasn't related to DWARF, because dump_stack()
uses the frame pointer to unwind the stack.  I was able to fix the
problem with the following patch.

---

>From 951d2aec17885a62905df6b910dc705d99c63993 Mon Sep 17 00:00:00 2001
From: Josh Poimboeuf <jpoimboe@redhat.com>
Date: Fri, 11 Jul 2014 08:58:33 -0500
Subject: [PATCH] x86/dumpstack: fix stack traces for generated code

If a function in the stack trace is dynamically generated, for example an
ftrace dynamically generated trampoline, print_context_stack() gets confused
and ends up showing all the following addresses as unreliable:

  [  934.546013]  [<ffffffff81700312>] dump_stack+0x45/0x56
  [  934.546020]  [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
  [  934.546027]  [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
  [  934.546058]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
  [  934.546062]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
  [  934.546067]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
  [  934.546071]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
  [  934.546075]  [<ffffffff8121423a>] ? seq_read+0x16a/0x3b0
  [  934.546081]  [<ffffffff8125768d>] ? proc_reg_read+0x3d/0x80
  [  934.546088]  [<ffffffff811f0668>] ? vfs_read+0x98/0x170
  [  934.546093]  [<ffffffff811f1345>] ? SyS_read+0x55/0xd0
  [  934.546099]  [<ffffffff81707969>] ? system_call_fastpath+0x16/0x1b

Once it encounters an address which is not in the kernel's text area, it gets
confused and stops updating the frame pointer.

The __kernel_text_address() check isn't needed when determining whether an
address is reliable.  It's only needed when deciding whether to print an
unreliable address.

Here's the same stack trace with this patch:

  [ 1314.612287]  [<ffffffff81700312>] dump_stack+0x45/0x56
  [ 1314.612290]  [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
  [ 1314.612293]  [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
  [ 1314.612306]  [<ffffffffa00160c4>] 0xffffffffa00160c3
  [ 1314.612309]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
  [ 1314.612311]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
  [ 1314.612312]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
  [ 1314.612314]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
  [ 1314.612315]  [<ffffffff8121423a>] ? seq_read+0x16a/0x3b0
  [ 1314.612318]  [<ffffffff8125768d>] proc_reg_read+0x3d/0x80
  [ 1314.612320]  [<ffffffff811f0668>] vfs_read+0x98/0x170
  [ 1314.612322]  [<ffffffff811f1345>] SyS_read+0x55/0xd0
  [ 1314.612324]  [<ffffffff81707969>] system_call_fastpath+0x16/0x1b
---
 arch/x86/kernel/dumpstack.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index b74ebc7..db0a33c 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -102,14 +102,13 @@ print_context_stack(struct thread_info *tinfo,
 		unsigned long addr;
 
 		addr = *stack;
-		if (__kernel_text_address(addr)) {
-			if ((unsigned long) stack == bp + sizeof(long)) {
-				ops->address(data, addr, 1);
-				frame = frame->next_frame;
-				bp = (unsigned long) frame;
-			} else {
-				ops->address(data, addr, 0);
-			}
+		if ((unsigned long) stack == bp + sizeof(long)) {
+			ops->address(data, addr, 1);
+			frame = frame->next_frame;
+			bp = (unsigned long) frame;
+			print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
+		} else if (__kernel_text_address(addr)) {
+			ops->address(data, addr, 0);
 			print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
 		}
 		stack++;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-11 14:29         ` Josh Poimboeuf
@ 2014-07-14  1:35           ` Masami Hiramatsu
  2014-07-14  7:16             ` Namhyung Kim
  0 siblings, 1 reply; 30+ messages in thread
From: Masami Hiramatsu @ 2014-07-14  1:35 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Jiri Kosina, Steven Rostedt, linux-kernel, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Paul E. McKenney, Namhyung Kim,
	H. Peter Anvin, Oleg Nesterov, Seth Jennings, Jiri Slaby

(2014/07/11 23:29), Josh Poimboeuf wrote:
[...]
> 
>>From 951d2aec17885a62905df6b910dc705d99c63993 Mon Sep 17 00:00:00 2001
> From: Josh Poimboeuf <jpoimboe@redhat.com>
> Date: Fri, 11 Jul 2014 08:58:33 -0500
> Subject: [PATCH] x86/dumpstack: fix stack traces for generated code
> 
> If a function in the stack trace is dynamically generated, for example an
> ftrace dynamically generated trampoline, print_context_stack() gets confused
> and ends up showing all the following addresses as unreliable:
> 
>   [  934.546013]  [<ffffffff81700312>] dump_stack+0x45/0x56
>   [  934.546020]  [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
>   [  934.546027]  [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
>   [  934.546058]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>   [  934.546062]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>   [  934.546067]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>   [  934.546071]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>   [  934.546075]  [<ffffffff8121423a>] ? seq_read+0x16a/0x3b0
>   [  934.546081]  [<ffffffff8125768d>] ? proc_reg_read+0x3d/0x80
>   [  934.546088]  [<ffffffff811f0668>] ? vfs_read+0x98/0x170
>   [  934.546093]  [<ffffffff811f1345>] ? SyS_read+0x55/0xd0
>   [  934.546099]  [<ffffffff81707969>] ? system_call_fastpath+0x16/0x1b
> 
> Once it encounters an address which is not in the kernel's text area, it gets
> confused and stops updating the frame pointer.

Right, this uses a module_alloc to get a memory for trampline, but
it just allocates a page in executable vmalloc area. We need a hack
in __kernel_text_address if we really want to use that.

> The __kernel_text_address() check isn't needed when determining whether an
> address is reliable.  It's only needed when deciding whether to print an
> unreliable address.

Yeah, I guess that is for the case that the frame pointer is broken.

> 
> Here's the same stack trace with this patch:
> 
>   [ 1314.612287]  [<ffffffff81700312>] dump_stack+0x45/0x56
>   [ 1314.612290]  [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
>   [ 1314.612293]  [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
>   [ 1314.612306]  [<ffffffffa00160c4>] 0xffffffffa00160c3

Here, this still has a wrong entry. Maybe the trampline doesn't setup
frame pointer (bp) correctly.

Thank you,

>   [ 1314.612309]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>   [ 1314.612311]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>   [ 1314.612312]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>   [ 1314.612314]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>   [ 1314.612315]  [<ffffffff8121423a>] ? seq_read+0x16a/0x3b0
>   [ 1314.612318]  [<ffffffff8125768d>] proc_reg_read+0x3d/0x80
>   [ 1314.612320]  [<ffffffff811f0668>] vfs_read+0x98/0x170
>   [ 1314.612322]  [<ffffffff811f1345>] SyS_read+0x55/0xd0
>   [ 1314.612324]  [<ffffffff81707969>] system_call_fastpath+0x16/0x1b
> ---
>  arch/x86/kernel/dumpstack.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index b74ebc7..db0a33c 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -102,14 +102,13 @@ print_context_stack(struct thread_info *tinfo,
>  		unsigned long addr;
>  
>  		addr = *stack;
> -		if (__kernel_text_address(addr)) {
> -			if ((unsigned long) stack == bp + sizeof(long)) {
> -				ops->address(data, addr, 1);
> -				frame = frame->next_frame;
> -				bp = (unsigned long) frame;
> -			} else {
> -				ops->address(data, addr, 0);
> -			}
> +		if ((unsigned long) stack == bp + sizeof(long)) {
> +			ops->address(data, addr, 1);
> +			frame = frame->next_frame;
> +			bp = (unsigned long) frame;
> +			print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
> +		} else if (__kernel_text_address(addr)) {
> +			ops->address(data, addr, 0);
>  			print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
>  		}
>  		stack++;
> 


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 1/3] ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
  2014-07-03 20:07 ` [RFC][PATCH 1/3] ftrace/x86: Add dynamic allocated trampoline for ftrace_ops Steven Rostedt
  2014-07-04 13:32   ` Masami Hiramatsu
@ 2014-07-14  2:34   ` Masami Hiramatsu
  1 sibling, 0 replies; 30+ messages in thread
From: Masami Hiramatsu @ 2014-07-14  2:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Namhyung Kim, H. Peter Anvin, Oleg Nesterov,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

(2014/07/04 5:07), Steven Rostedt wrote:
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> The current method of handling multiple function callbacks is to register
> a list function callback that calls all the other callbacks based on
> their hash tables and compare it to the function that the callback was
> called on. But this is very inefficient.
> 
> For example, if you are tracing all functions in the kernel and then
> add a kprobe to a function such that the kprobe uses ftrace, the
> mcount trampoline will switch from calling the function trace callback
> to calling the list callback that will iterate over all registered
> ftrace_ops (in this case, the function tracer and the kprobes callback).
> That means for every function being traced it checks the hash of the
> ftrace_ops for function tracing and kprobes, even though the kprobes
> is only set at a single function. The kprobes ftrace_ops is checked
> for every function being traced!
> 
> Instead of calling the list function for functions that are only being
> traced by a single callback, we can call a dynamically allocated
> trampoline that calls the callback directly. The function graph tracer
> already uses a direct call trampoline when it is being traced by itself
> but it is not dynamically allocated. It's trampoline is static in the
> kernel core. The infrastructure that called the function graph trampoline
> can also be used to call a dynamically allocated one.
> 
> For now, only ftrace_ops that are not dynamically allocated can have
> a trampoline. That is, users such as function tracer or stack tracer.
> kprobes and perf allocate their ftrace_ops, and until there's a safe
> way to free the trampoline, it can not be used. The dynamically allocated
> ftrace_ops may, although, use the trampoline if the kernel is not
> compiled with CONFIG_PREEMPT. But that will come later.
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  arch/x86/kernel/ftrace.c    | 157 ++++++++++++++++++++++++++++++++++++++++++--
>  arch/x86/kernel/mcount_64.S |  26 ++++++--
>  include/linux/ftrace.h      |   8 +++
>  kernel/trace/ftrace.c       |  46 ++++++++++++-
>  4 files changed, 224 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 3386dc9aa333..fcc256a33c1d 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -17,9 +17,11 @@
>  #include <linux/ftrace.h>
>  #include <linux/percpu.h>
>  #include <linux/sched.h>
> +#include <linux/slab.h>
>  #include <linux/init.h>
>  #include <linux/list.h>
>  #include <linux/module.h>
> +#include <linux/moduleloader.h>
>  
>  #include <trace/syscall.h>
>  
> @@ -644,12 +646,6 @@ int __init ftrace_dyn_arch_init(void)
>  {
>  	return 0;
>  }
> -#endif
> -
> -#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> -
> -#ifdef CONFIG_DYNAMIC_FTRACE
> -extern void ftrace_graph_call(void);
>  
>  static unsigned char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
>  {
> @@ -665,6 +661,155 @@ static unsigned char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
>  	return calc.code;
>  }
>  
> +/* Currently only x86_64 supports dynamic trampolines */
> +#ifdef CONFIG_X86_64
> +
> +/* Defined as markers to the end of the ftrace default trampolines */
> +extern void ftrace_caller_end(void);
> +extern void ftrace_regs_caller_end(void);
> +extern void ftrace_return(void);
> +extern void ftrace_caller_op_ptr(void);
> +extern void ftrace_regs_caller_op_ptr(void);
> +
> +/* movq function_trace_op(%rip), %rdx */
> +/* 0x48 0x8b 0x15 <offset-to-ftrace_trace_op (4 bytes)> */
> +#define OP_REF_SIZE	7
> +
> +/*
> + * The ftrace_ops is passed to the function, we can pass
> + * in the ops directly as this trampoline will only call
> + * a function for a single ops.
> + */
> +union ftrace_op_code_union {
> +	char code[OP_REF_SIZE];
> +	struct {
> +		char op[3];
> +		int offset;
> +	} __attribute__((packed));
> +};
> +
> +static unsigned long create_trampoline(struct ftrace_ops *ops)
> +{
> +	unsigned const char *jmp;
> +	unsigned long start_offset;
> +	unsigned long end_offset;
> +	unsigned long op_offset;
> +	unsigned long offset;
> +	unsigned long size;
> +	unsigned long ip;
> +	unsigned long *ptr;
> +	void *trampoline;
> +	unsigned const char op_ref[] = { 0x48, 0x8b, 0x15 };
> +	union ftrace_op_code_union op_ptr;
> +	int ret;
> +
> +	if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) {
> +		start_offset = (unsigned long)ftrace_regs_caller;
> +		end_offset = (unsigned long)ftrace_regs_caller_end;
> +		op_offset = (unsigned long)ftrace_regs_caller_op_ptr;
> +	} else {
> +		start_offset = (unsigned long)ftrace_caller;
> +		end_offset = (unsigned long)ftrace_caller_end;
> +		op_offset = (unsigned long)ftrace_caller_op_ptr;
> +	}
> +
> +	size = end_offset - start_offset;
> +
> +	trampoline = module_alloc(size + MCOUNT_INSN_SIZE + sizeof(void *));

Here, since module_alloc always allocates pages like vmalloc, this wastes most
of the memory area in the page. (e.g. ftrace_regs_caller needs less than 0x150
bytes on x86_64 as below)

ffffffff8156ec00 T ftrace_regs_caller
ffffffff8156eccd T ftrace_regs_call
ffffffff8156ed44 t ftrace_restore_flags
ffffffff8156ed50 T ftrace_graph_caller

kprobes has its own insn_slot which allocates a small amount of executable memory
for each kprobe. Perhaps, we can make a generic trampoline mechanism for both, or
just share the insn_slot with ftrace.

Thank you,


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-14  1:35           ` Masami Hiramatsu
@ 2014-07-14  7:16             ` Namhyung Kim
  2014-07-14  8:18               ` Masami Hiramatsu
  0 siblings, 1 reply; 30+ messages in thread
From: Namhyung Kim @ 2014-07-14  7:16 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Josh Poimboeuf, Jiri Kosina, Steven Rostedt, linux-kernel,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Paul E. McKenney,
	H. Peter Anvin, Oleg Nesterov, Seth Jennings, Jiri Slaby

Hi Masami,

On Mon, 14 Jul 2014 10:35:21 +0900, Masami Hiramatsu wrote:
> (2014/07/11 23:29), Josh Poimboeuf wrote:
> [...]
>> 
>>>From 951d2aec17885a62905df6b910dc705d99c63993 Mon Sep 17 00:00:00 2001
>> From: Josh Poimboeuf <jpoimboe@redhat.com>
>> Date: Fri, 11 Jul 2014 08:58:33 -0500
>> Subject: [PATCH] x86/dumpstack: fix stack traces for generated code
>> 
>> If a function in the stack trace is dynamically generated, for example an
>> ftrace dynamically generated trampoline, print_context_stack() gets confused
>> and ends up showing all the following addresses as unreliable:
>> 
>>   [  934.546013]  [<ffffffff81700312>] dump_stack+0x45/0x56
>>   [  934.546020]  [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
>>   [  934.546027]  [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
>>   [  934.546058]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>>   [  934.546062]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>>   [  934.546067]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>>   [  934.546071]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>>   [  934.546075]  [<ffffffff8121423a>] ? seq_read+0x16a/0x3b0
>>   [  934.546081]  [<ffffffff8125768d>] ? proc_reg_read+0x3d/0x80
>>   [  934.546088]  [<ffffffff811f0668>] ? vfs_read+0x98/0x170
>>   [  934.546093]  [<ffffffff811f1345>] ? SyS_read+0x55/0xd0
>>   [  934.546099]  [<ffffffff81707969>] ? system_call_fastpath+0x16/0x1b
>> 
>> Once it encounters an address which is not in the kernel's text area, it gets
>> confused and stops updating the frame pointer.
>
> Right, this uses a module_alloc to get a memory for trampline, but
> it just allocates a page in executable vmalloc area. We need a hack
> in __kernel_text_address if we really want to use that.
>
>> The __kernel_text_address() check isn't needed when determining whether an
>> address is reliable.  It's only needed when deciding whether to print an
>> unreliable address.
>
> Yeah, I guess that is for the case that the frame pointer is broken.
>
>> 
>> Here's the same stack trace with this patch:
>> 
>>   [ 1314.612287]  [<ffffffff81700312>] dump_stack+0x45/0x56
>>   [ 1314.612290]  [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
>>   [ 1314.612293]  [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
>>   [ 1314.612306]  [<ffffffffa00160c4>] 0xffffffffa00160c3
>
> Here, this still has a wrong entry. Maybe the trampline doesn't setup
> frame pointer (bp) correctly.

Hmm.. are you saying about the hex address above?  I guess it's a valid
entry in the (dynamic) trampoline, but has no symbol..


>
>>   [ 1314.612309]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>>   [ 1314.612311]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>>   [ 1314.612312]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>>   [ 1314.612314]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>>   [ 1314.612315]  [<ffffffff8121423a>] ? seq_read+0x16a/0x3b0

But these seem to be wrong - there're duplicate entries and they should
show some of these functions (at least) correctly IMHO.  I guess it's
because the trampoline didn't save rbp to the stack right below the
return address as dumpstack requires.

Thanks,
Namhyung


>>   [ 1314.612318]  [<ffffffff8125768d>] proc_reg_read+0x3d/0x80
>>   [ 1314.612320]  [<ffffffff811f0668>] vfs_read+0x98/0x170
>>   [ 1314.612322]  [<ffffffff811f1345>] SyS_read+0x55/0xd0
>>   [ 1314.612324]  [<ffffffff81707969>] system_call_fastpath+0x16/0x1b
>> ---
>>  arch/x86/kernel/dumpstack.c | 15 +++++++--------
>>  1 file changed, 7 insertions(+), 8 deletions(-)
>> 
>> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
>> index b74ebc7..db0a33c 100644
>> --- a/arch/x86/kernel/dumpstack.c
>> +++ b/arch/x86/kernel/dumpstack.c
>> @@ -102,14 +102,13 @@ print_context_stack(struct thread_info *tinfo,
>>  		unsigned long addr;
>>  
>>  		addr = *stack;
>> -		if (__kernel_text_address(addr)) {
>> -			if ((unsigned long) stack == bp + sizeof(long)) {
>> -				ops->address(data, addr, 1);
>> -				frame = frame->next_frame;
>> -				bp = (unsigned long) frame;
>> -			} else {
>> -				ops->address(data, addr, 0);
>> -			}
>> +		if ((unsigned long) stack == bp + sizeof(long)) {
>> +			ops->address(data, addr, 1);
>> +			frame = frame->next_frame;
>> +			bp = (unsigned long) frame;
>> +			print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
>> +		} else if (__kernel_text_address(addr)) {
>> +			ops->address(data, addr, 0);
>>  			print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
>>  		}
>>  		stack++;
>> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-14  7:16             ` Namhyung Kim
@ 2014-07-14  8:18               ` Masami Hiramatsu
  2014-07-14 14:18                 ` Namhyung Kim
  0 siblings, 1 reply; 30+ messages in thread
From: Masami Hiramatsu @ 2014-07-14  8:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Josh Poimboeuf, Jiri Kosina, Steven Rostedt, linux-kernel,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Paul E. McKenney,
	H. Peter Anvin, Oleg Nesterov, Seth Jennings, Jiri Slaby

(2014/07/14 16:16), Namhyung Kim wrote:
> Hi Masami,
> 
> On Mon, 14 Jul 2014 10:35:21 +0900, Masami Hiramatsu wrote:
>> (2014/07/11 23:29), Josh Poimboeuf wrote:
>> [...]
>>>
>>> >From 951d2aec17885a62905df6b910dc705d99c63993 Mon Sep 17 00:00:00 2001
>>> From: Josh Poimboeuf <jpoimboe@redhat.com>
>>> Date: Fri, 11 Jul 2014 08:58:33 -0500
>>> Subject: [PATCH] x86/dumpstack: fix stack traces for generated code
>>>
>>> If a function in the stack trace is dynamically generated, for example an
>>> ftrace dynamically generated trampoline, print_context_stack() gets confused
>>> and ends up showing all the following addresses as unreliable:
>>>
>>>   [  934.546013]  [<ffffffff81700312>] dump_stack+0x45/0x56
>>>   [  934.546020]  [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
>>>   [  934.546027]  [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
>>>   [  934.546058]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>>>   [  934.546062]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>>>   [  934.546067]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>>>   [  934.546071]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>>>   [  934.546075]  [<ffffffff8121423a>] ? seq_read+0x16a/0x3b0
>>>   [  934.546081]  [<ffffffff8125768d>] ? proc_reg_read+0x3d/0x80
>>>   [  934.546088]  [<ffffffff811f0668>] ? vfs_read+0x98/0x170
>>>   [  934.546093]  [<ffffffff811f1345>] ? SyS_read+0x55/0xd0
>>>   [  934.546099]  [<ffffffff81707969>] ? system_call_fastpath+0x16/0x1b
>>>
>>> Once it encounters an address which is not in the kernel's text area, it gets
>>> confused and stops updating the frame pointer.
>>
>> Right, this uses a module_alloc to get a memory for trampline, but
>> it just allocates a page in executable vmalloc area. We need a hack
>> in __kernel_text_address if we really want to use that.
>>
>>> The __kernel_text_address() check isn't needed when determining whether an
>>> address is reliable.  It's only needed when deciding whether to print an
>>> unreliable address.
>>
>> Yeah, I guess that is for the case that the frame pointer is broken.
>>
>>>
>>> Here's the same stack trace with this patch:
>>>
>>>   [ 1314.612287]  [<ffffffff81700312>] dump_stack+0x45/0x56
>>>   [ 1314.612290]  [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
>>>   [ 1314.612293]  [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
>>>   [ 1314.612306]  [<ffffffffa00160c4>] 0xffffffffa00160c3
>>
>> Here, this still has a wrong entry. Maybe the trampline doesn't setup
>> frame pointer (bp) correctly.
> 
> Hmm.. are you saying about the hex address above?  I guess it's a valid
> entry in the (dynamic) trampoline, but has no symbol..

Ah, indeed. (BTW, why is it one less than the address ...? printk's spec?)

>>>   [ 1314.612309]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>>>   [ 1314.612311]  [<ffffffff812143ae>] ? seq_read+0x2de/0x3b0
>>>   [ 1314.612312]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>>>   [ 1314.612314]  [<ffffffff8125f5b5>] ? meminfo_proc_show+0x5/0x5e0
>>>   [ 1314.612315]  [<ffffffff8121423a>] ? seq_read+0x16a/0x3b0
> 
> But these seem to be wrong - there're duplicate entries and they should
> show some of these functions (at least) correctly IMHO.  I guess it's
> because the trampoline didn't save rbp to the stack right below the
> return address as dumpstack requires.

Right, the last seq_read should be reliable. Thank you for pointing out.

Thanks!


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-14  8:18               ` Masami Hiramatsu
@ 2014-07-14 14:18                 ` Namhyung Kim
  2014-07-15  1:20                   ` Masami Hiramatsu
  0 siblings, 1 reply; 30+ messages in thread
From: Namhyung Kim @ 2014-07-14 14:18 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Josh Poimboeuf, Jiri Kosina, Steven Rostedt, linux-kernel,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Paul E. McKenney,
	H. Peter Anvin, Oleg Nesterov, Seth Jennings, Jiri Slaby

2014-07-14 (월), 17:18 +0900, Masami Hiramatsu:
> (2014/07/14 16:16), Namhyung Kim wrote:
> > Hi Masami,
> > 
> > On Mon, 14 Jul 2014 10:35:21 +0900, Masami Hiramatsu wrote:
> >> (2014/07/11 23:29), Josh Poimboeuf wrote:
> >>> Here's the same stack trace with this patch:
> >>>
> >>>   [ 1314.612287]  [<ffffffff81700312>] dump_stack+0x45/0x56
> >>>   [ 1314.612290]  [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
> >>>   [ 1314.612293]  [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
> >>>   [ 1314.612306]  [<ffffffffa00160c4>] 0xffffffffa00160c3
> >>
> >> Here, this still has a wrong entry. Maybe the trampline doesn't setup
> >> frame pointer (bp) correctly.
> > 
> > Hmm.. are you saying about the hex address above?  I guess it's a valid
> > entry in the (dynamic) trampoline, but has no symbol..
> 
> Ah, indeed. (BTW, why is it one less than the address ...? printk's spec?)

Argh, it seems like a bug in printk's %pB format processing.. :-/

It subtract 1 from the address for stacktrace.  Please see the commit
0f77a8d37825 ("vsprintf: Introduce %pB format specifier") and
71f9e59800e5 ("x86, dumpstack: Use %pB format specifier for stack
trace") for details.  But it should restore the original address when it
failed to find a symbol for that address.

I'll send a fix soon.

Thanks,
Namhyung



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-14 14:18                 ` Namhyung Kim
@ 2014-07-15  1:20                   ` Masami Hiramatsu
  0 siblings, 0 replies; 30+ messages in thread
From: Masami Hiramatsu @ 2014-07-15  1:20 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Josh Poimboeuf, Jiri Kosina, Steven Rostedt, linux-kernel,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Paul E. McKenney,
	H. Peter Anvin, Oleg Nesterov, Seth Jennings, Jiri Slaby

(2014/07/14 23:18), Namhyung Kim wrote:
> 2014-07-14 (월), 17:18 +0900, Masami Hiramatsu:
>> (2014/07/14 16:16), Namhyung Kim wrote:
>>> Hi Masami,
>>>
>>> On Mon, 14 Jul 2014 10:35:21 +0900, Masami Hiramatsu wrote:
>>>> (2014/07/11 23:29), Josh Poimboeuf wrote:
>>>>> Here's the same stack trace with this patch:
>>>>>
>>>>>   [ 1314.612287]  [<ffffffff81700312>] dump_stack+0x45/0x56
>>>>>   [ 1314.612290]  [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
>>>>>   [ 1314.612293]  [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
>>>>>   [ 1314.612306]  [<ffffffffa00160c4>] 0xffffffffa00160c3
>>>>
>>>> Here, this still has a wrong entry. Maybe the trampline doesn't setup
>>>> frame pointer (bp) correctly.
>>>
>>> Hmm.. are you saying about the hex address above?  I guess it's a valid
>>> entry in the (dynamic) trampoline, but has no symbol..
>>
>> Ah, indeed. (BTW, why is it one less than the address ...? printk's spec?)
> 
> Argh, it seems like a bug in printk's %pB format processing.. :-/
> 
> It subtract 1 from the address for stacktrace.  Please see the commit
> 0f77a8d37825 ("vsprintf: Introduce %pB format specifier") and
> 71f9e59800e5 ("x86, dumpstack: Use %pB format specifier for stack
> trace") for details.  But it should restore the original address when it
> failed to find a symbol for that address.

Ah I see.

> 
> I'll send a fix soon.

Thanks! :)

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-03 20:07 [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines Steven Rostedt
                   ` (6 preceding siblings ...)
  2014-07-10 21:36 ` Josh Poimboeuf
@ 2014-07-22 16:47 ` Oleg Nesterov
  2014-07-22 19:02   ` Steven Rostedt
  7 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2014-07-22 16:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Masami Hiramatsu, Namhyung Kim, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

On 07/03, Steven Rostedt wrote:
>
> [ NOT READY FOR INCLUSION! ]
>
> Note, this is based off of my remove ftrace_start/stop() patch set.

So I simply pulled your tree. I can't really comment these changes simply
because I do not understand this code. But I am hunting for RHEL bug in
(probably) this area, so I decided to take a look in a hope that may be
this can help me to understand the current code ;)

> The way the function callback mechanism works in ftrace is that if there's
> only one function callback registered, it will set the mcount/fentry
> trampoline to call that function directly. But as soon as you register
> another callback, the mcount trampoline calls a loop function that iterates
> over all the registered callbacks (ftrace_ops) checking their hash tables
> to see if the called function matches the ops before calling its callback.
> This happens even if the two registered functions are not even tracing
> the same function!
>
> This really sucks if you are tracing all functions, and then add a kprobe
> or perf event that traces a single function. That will cause all the
> other functions being traced to perform the loop test.

But this is even worse or I missed something? I mean, currently even
if you trace nothing and then add a KPROBE_FLAG_FTRACE kprobe, then
kprobe_ftrace_handler() is called by ftrace_ops_list_func() ?

After these changes it seems that kprobe will use a trampoline.

And I can't understand even the basic code. Say, __ftrace_hash_rec_update:

		if (inc) {
			rec->flags++;
			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
				return;

			/*
			 * If there's only a single callback registered to a
			 * function, and the ops has a trampoline registered
			 * for it, then we can call it directly.
			 */
			if (ftrace_rec_count(rec) == 1 && ops->trampoline) {
				rec->flags |= FTRACE_FL_TRAMP;
				ops->trampolines++;
			} else {
				/*
				 * If we are adding another function callback
				 * to this function, and the previous had a
				 * trampoline used, then we need to go back to
				 * the default trampoline.
				 */
				rec->flags &= ~FTRACE_FL_TRAMP;

				/* remove trampolines from any ops for this rec */
				ftrace_clear_tramps(rec);
			}

It seems that "else if (ftrace_rec_count(rec) == 2)" can avoid the unnecessary
ftrace_clear_tramps() ? And not only unnecessary, ftrace_clear_tramps() decrements
->trampolines, can't this break the accounting?

		} else {
			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
				return;
			rec->flags--;

			if (ops->trampoline && !ftrace_rec_count(rec))
				ftrace_remove_tramp(ops, rec);

I am wondering what should we do if ftrace_rec_count() becomes 1 again...

ftrace_save_ops_tramp_hash():

	do_for_each_ftrace_rec(pg, rec) {
		if (ftrace_rec_count(rec) == 1 &&
		    ftrace_ops_test(ops, rec->ip, rec)) {

			/* This record had better have a trampoline */
			if (FTRACE_WARN_ON(!(rec->flags & FTRACE_FL_TRAMP_EN)))
				return -1;

Yes, but I can't understand how this can work.

This ops can have  ->trampolines > 0, but FTRACE_FL_TRAMP_EN can be cleared
by another ftrace_ops?

Suppose there is a single tracer of this function, rec->flags = TRAMP | TRAMP_EN.
Suppose also that it traces more than 1 function, so ->trampolines > 1.

Another tracer comes, __ftrace_hash_rec_update() clears TRAMP. But it should
also do ftrace_check_record() and this should clear TRAMP_EN?

And yes, I can trigger this bug if I simply do "echo function > current_tracer"
and then add/remove a KPROBE_FLAG_FTRACE kprobe.


And you know, when I try to read this code I can't distinguish ->trampoline
from ->trampolines ;)

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-22 16:47 ` Oleg Nesterov
@ 2014-07-22 19:02   ` Steven Rostedt
  2014-07-23 12:08     ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2014-07-22 19:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Masami Hiramatsu, Namhyung Kim, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

On Tue, 22 Jul 2014 18:47:07 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 07/03, Steven Rostedt wrote:
> >
> > [ NOT READY FOR INCLUSION! ]
> >
> > Note, this is based off of my remove ftrace_start/stop() patch set.
> 
> So I simply pulled your tree. I can't really comment these changes simply
> because I do not understand this code. But I am hunting for RHEL bug in
> (probably) this area, so I decided to take a look in a hope that may be
> this can help me to understand the current code ;)

Send me a ping on my RH email and I'll take a look at that BZ.

Note, I've played a little with this recently. I should make sure that
I push the latest. Warning, that trampoline branch will rebase.

> 
> > The way the function callback mechanism works in ftrace is that if there's
> > only one function callback registered, it will set the mcount/fentry
> > trampoline to call that function directly. But as soon as you register
> > another callback, the mcount trampoline calls a loop function that iterates
> > over all the registered callbacks (ftrace_ops) checking their hash tables
> > to see if the called function matches the ops before calling its callback.
> > This happens even if the two registered functions are not even tracing
> > the same function!
> >
> > This really sucks if you are tracing all functions, and then add a kprobe
> > or perf event that traces a single function. That will cause all the
> > other functions being traced to perform the loop test.
> 
> But this is even worse or I missed something? I mean, currently even
> if you trace nothing and then add a KPROBE_FLAG_FTRACE kprobe, then
> kprobe_ftrace_handler() is called by ftrace_ops_list_func() ?

It shouldn't be. It should get called directly from the trampoline. The
allocated trampoline should never call the list op. Well, it might
during the conversion for safety, but after that, trampolines should
only call the registered ftrace_ops->func directly.

> 
> After these changes it seems that kprobe will use a trampoline.
> 
> And I can't understand even the basic code. Say, __ftrace_hash_rec_update:

Well, to make you feel better, __ftrace_hash_rec_update() happens to
be one of the most complex functions in ftrace.

> 
> 		if (inc) {
> 			rec->flags++;
> 			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
> 				return;
> 
> 			/*
> 			 * If there's only a single callback registered to a
> 			 * function, and the ops has a trampoline registered
> 			 * for it, then we can call it directly.
> 			 */
> 			if (ftrace_rec_count(rec) == 1 && ops->trampoline) {
> 				rec->flags |= FTRACE_FL_TRAMP;
> 				ops->trampolines++;
> 			} else {
> 				/*
> 				 * If we are adding another function callback
> 				 * to this function, and the previous had a
> 				 * trampoline used, then we need to go back to
> 				 * the default trampoline.
> 				 */
> 				rec->flags &= ~FTRACE_FL_TRAMP;
> 
> 				/* remove trampolines from any ops for this rec */
> 				ftrace_clear_tramps(rec);
> 			}
> 
> It seems that "else if (ftrace_rec_count(rec) == 2)" can avoid the unnecessary
> ftrace_clear_tramps() ? And not only unnecessary, ftrace_clear_tramps() decrements
> ->trampolines, can't this break the accounting?

Ug, you're right.

But I think it should be "else if (rec->flags & FTRACE_FL_TRAMP)"


> 
> 		} else {
> 			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
> 				return;
> 			rec->flags--;
> 
> 			if (ops->trampoline && !ftrace_rec_count(rec))
> 				ftrace_remove_tramp(ops, rec);
> 
> I am wondering what should we do if ftrace_rec_count() becomes 1 again...
> 
> ftrace_save_ops_tramp_hash():
> 
> 	do_for_each_ftrace_rec(pg, rec) {
> 		if (ftrace_rec_count(rec) == 1 &&
> 		    ftrace_ops_test(ops, rec->ip, rec)) {
> 
> 			/* This record had better have a trampoline */
> 			if (FTRACE_WARN_ON(!(rec->flags & FTRACE_FL_TRAMP_EN)))
> 				return -1;
> 
> Yes, but I can't understand how this can work.

I wanted the back to 1 case to happen after we get the up to one case
working. That is, I don't want to worry about it now ;-)  As you can
see, this code has enough things to try to keep straight without adding
more complexity to the mix.


> 
> This ops can have  ->trampolines > 0, but FTRACE_FL_TRAMP_EN can be cleared
> by another ftrace_ops?
> 
> Suppose there is a single tracer of this function, rec->flags = TRAMP | TRAMP_EN.
> Suppose also that it traces more than 1 function, so ->trampolines > 1.
> 
> Another tracer comes, __ftrace_hash_rec_update() clears TRAMP. But it should
> also do ftrace_check_record() and this should clear TRAMP_EN?
> 
> And yes, I can trigger this bug if I simply do "echo function > current_tracer"
> and then add/remove a KPROBE_FLAG_FTRACE kprobe.
> 
> 
> And you know, when I try to read this code I can't distinguish ->trampoline
> from ->trampolines ;)

Good point. I'll rename trampolines to nr_trampolines. At least that
way it will stick out that it's a counter.


BTW, for someone that says they can't understand the code, you are
pretty good feedback.

-- Steve

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-22 19:02   ` Steven Rostedt
@ 2014-07-23 12:08     ` Oleg Nesterov
  2014-07-23 15:48       ` Steven Rostedt
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2014-07-23 12:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Masami Hiramatsu, Namhyung Kim, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

On 07/22, Steven Rostedt wrote:
>
> On Tue, 22 Jul 2014 18:47:07 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > On 07/03, Steven Rostedt wrote:
> >
> > > The way the function callback mechanism works in ftrace is that if there's
> > > only one function callback registered, it will set the mcount/fentry
> > > trampoline to call that function directly. But as soon as you register
> > > another callback, the mcount trampoline calls a loop function that iterates
> > > over all the registered callbacks (ftrace_ops) checking their hash tables
> > > to see if the called function matches the ops before calling its callback.
> > > This happens even if the two registered functions are not even tracing
> > > the same function!
> > >
> > > This really sucks if you are tracing all functions, and then add a kprobe
> > > or perf event that traces a single function. That will cause all the
> > > other functions being traced to perform the loop test.
> >
> > But this is even worse or I missed something? I mean, currently even
> > if you trace nothing and then add a KPROBE_FLAG_FTRACE kprobe, then
> > kprobe_ftrace_handler() is called by ftrace_ops_list_func() ?
>
> It shouldn't be. It should get called directly from the trampoline. The
> allocated trampoline should never call the list op. Well, it might
> during the conversion for safety, but after that, trampolines should
> only call the registered ftrace_ops->func directly.

I meant the current code (I am reading 3.16-rc2). Even if we have a single
KPROBE_FLAG_FTRACE kprobe, kprobe_ftrace_handler() won't be called directly.

Or I misunderstood your reply? Just in case, let me check...

With this stupid patch

	--- a/kernel/trace/ftrace.c
	+++ b/kernel/trace/ftrace.c
	@@ -4464,6 +4464,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
					printk("op=%p %pS\n", op, op);
					goto out;
				}
	+			pr_crit("LIST_FUNC -> %pf()\n",  op->func);
				op->func(ip, parent_ip, op, regs);
			}
		} while_for_each_ftrace_op(op);

I do
	# cd /sys/kernel/debug/tracing/
	# echo "p:xx SyS_prctl+0x1c" >| kprobe_events
	# cat ../kprobes/list
	ffffffff81056c4c  k  SyS_prctl+0x1c    [DISABLED][FTRACE]
	# echo 1 >| events/kprobes/xx/enable
	#
	# perl -e 'syscall 157,-1'
	# dmesg
	LIST_FUNC -> kprobe_ftrace_handler()

so it is really called by the loop test code.

And I guess that after your patches kprobe_ftrace_handler() should be called
from the trampoline in this case.

> > ftrace_save_ops_tramp_hash():
> >
> > 	do_for_each_ftrace_rec(pg, rec) {
> > 		if (ftrace_rec_count(rec) == 1 &&
> > 		    ftrace_ops_test(ops, rec->ip, rec)) {
> >
> > 			/* This record had better have a trampoline */
> > 			if (FTRACE_WARN_ON(!(rec->flags & FTRACE_FL_TRAMP_EN)))
> > 				return -1;
> >
> > Yes, but I can't understand how this can work.
>
> I wanted the back to 1 case to happen after we get the up to one case
> working. That is, I don't want to worry about it now ;-)  As you can
> see, this code has enough things to try to keep straight without adding
> more complexity to the mix.

Yes, I see... but note that this WARN_ON() looks wrong in any case. At
least currently.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-23 12:08     ` Oleg Nesterov
@ 2014-07-23 15:48       ` Steven Rostedt
  2014-07-23 17:05         ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2014-07-23 15:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Masami Hiramatsu, Namhyung Kim, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

On Wed, 23 Jul 2014 14:08:05 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 07/22, Steven Rostedt wrote:
> >
> > On Tue, 22 Jul 2014 18:47:07 +0200
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > On 07/03, Steven Rostedt wrote:
> > >
> > > > The way the function callback mechanism works in ftrace is that if there's
> > > > only one function callback registered, it will set the mcount/fentry
> > > > trampoline to call that function directly. But as soon as you register
> > > > another callback, the mcount trampoline calls a loop function that iterates
> > > > over all the registered callbacks (ftrace_ops) checking their hash tables
> > > > to see if the called function matches the ops before calling its callback.
> > > > This happens even if the two registered functions are not even tracing
> > > > the same function!
> > > >
> > > > This really sucks if you are tracing all functions, and then add a kprobe
> > > > or perf event that traces a single function. That will cause all the
> > > > other functions being traced to perform the loop test.
> > >
> > > But this is even worse or I missed something? I mean, currently even
> > > if you trace nothing and then add a KPROBE_FLAG_FTRACE kprobe, then
> > > kprobe_ftrace_handler() is called by ftrace_ops_list_func() ?
> >
> > It shouldn't be. It should get called directly from the trampoline. The
> > allocated trampoline should never call the list op. Well, it might
> > during the conversion for safety, but after that, trampolines should
> > only call the registered ftrace_ops->func directly.
> 
> I meant the current code (I am reading 3.16-rc2). Even if we have a single
> KPROBE_FLAG_FTRACE kprobe, kprobe_ftrace_handler() won't be called directly.
> 

Oh, I thought we were still talking about the trampolines, that's not
in 3.16-rc2.

> Or I misunderstood your reply? Just in case, let me check...
> 
> With this stupid patch
> 
> 	--- a/kernel/trace/ftrace.c
> 	+++ b/kernel/trace/ftrace.c
> 	@@ -4464,6 +4464,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
> 					printk("op=%p %pS\n", op, op);
> 					goto out;
> 				}
> 	+			pr_crit("LIST_FUNC -> %pf()\n",  op->func);
> 				op->func(ip, parent_ip, op, regs);
> 			}
> 		} while_for_each_ftrace_op(op);
> 
> I do
> 	# cd /sys/kernel/debug/tracing/
> 	# echo "p:xx SyS_prctl+0x1c" >| kprobe_events
> 	# cat ../kprobes/list
> 	ffffffff81056c4c  k  SyS_prctl+0x1c    [DISABLED][FTRACE]
> 	# echo 1 >| events/kprobes/xx/enable
> 	#
> 	# perl -e 'syscall 157,-1'
> 	# dmesg
> 	LIST_FUNC -> kprobe_ftrace_handler()
> 
> so it is really called by the loop test code.
> 
> And I guess that after your patches kprobe_ftrace_handler() should be called
> from the trampoline in this case.

No it wont be. Not until we have Paul McKenney's task_rcu code that
will return after all tasks have either gone through userspace or a
schedule. Hmm, maybe on a !CONFIG_PREEMPT it will be OK. Oh, I can have
that be OK now on !CONFIG_PREEMPT. Maybe I'll do that too.

kprobe ftrace_ops are allocated which sets the FTRACE_OPS_FL_DYNAMIC
flag. You'll see that flag checked in update_ftrace_function(), and if
it is set, it forces the ftrace_ops_list_func() to be used.

Why?

Because we can not know if a task has just jumped to the function with
the ops that has been freed. Here's what I mean:


	foo()
	  [mcount called --> ftrace_caller trampoline]
           ftrace_caller
             load ftrace_ops into parameter
             <interrupt>
             preempt_schedule()
       [new task]
       kfree(kprobe ftrace_ops);
       [schedule back]
             call kprobes ftrace ops func
               uses ops, but it was freed!
               [BOOM!]


A trampoline can not access anything that can be freed. In this case,
the kprobes ftrace_ops can be, and that forces the list function to be
used. Why is the list function ok? Because it does:

	preempt_disable();
	loop all ops
		ops->func()
	preempt_enable();

And after we disconnect an ops from the list, we call
schedule_on_each_cpu() before freeing it.

Why not use synchronize_sched()?

Because ftrace can be called outside of areas that rcu is aware of,
including the rcu infrastructure itself. That means there's a extremely
rare case that synchronize_sched() can return before that loop of ops
is finished.

Although it's extremely rare, and even possibly impossible to hit, the
fact that theoretically it can be, we must do the even slower but
guaranteed safe schedule_on_each_cpu() to make sure that nothing was in
that preempt disable location before we free it.

> 
> > > ftrace_save_ops_tramp_hash():
> > >
> > > 	do_for_each_ftrace_rec(pg, rec) {
> > > 		if (ftrace_rec_count(rec) == 1 &&
> > > 		    ftrace_ops_test(ops, rec->ip, rec)) {
> > >
> > > 			/* This record had better have a trampoline */
> > > 			if (FTRACE_WARN_ON(!(rec->flags & FTRACE_FL_TRAMP_EN)))
> > > 				return -1;
> > >
> > > Yes, but I can't understand how this can work.
> >
> > I wanted the back to 1 case to happen after we get the up to one case
> > working. That is, I don't want to worry about it now ;-)  As you can
> > see, this code has enough things to try to keep straight without adding
> > more complexity to the mix.
> 
> Yes, I see... but note that this WARN_ON() looks wrong in any case. At
> least currently.

Yeah, in my linux-next repo it quite possibly can be. I'm looking into
fixing that now.

Thanks!

-- Steve


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-23 15:48       ` Steven Rostedt
@ 2014-07-23 17:05         ` Oleg Nesterov
  2014-07-23 17:20           ` Steven Rostedt
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2014-07-23 17:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Masami Hiramatsu, Namhyung Kim, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

On 07/23, Steven Rostedt wrote:
>
> On Wed, 23 Jul 2014 14:08:05 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > With this stupid patch
> >
> > 	--- a/kernel/trace/ftrace.c
> > 	+++ b/kernel/trace/ftrace.c
> > 	@@ -4464,6 +4464,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
> > 					printk("op=%p %pS\n", op, op);
> > 					goto out;
> > 				}
> > 	+			pr_crit("LIST_FUNC -> %pf()\n",  op->func);
> > 				op->func(ip, parent_ip, op, regs);
> > 			}
> > 		} while_for_each_ftrace_op(op);
> >
> > I do
> > 	# cd /sys/kernel/debug/tracing/
> > 	# echo "p:xx SyS_prctl+0x1c" >| kprobe_events
> > 	# cat ../kprobes/list
> > 	ffffffff81056c4c  k  SyS_prctl+0x1c    [DISABLED][FTRACE]
> > 	# echo 1 >| events/kprobes/xx/enable
> > 	#
> > 	# perl -e 'syscall 157,-1'
> > 	# dmesg
> > 	LIST_FUNC -> kprobe_ftrace_handler()
> >
> > so it is really called by the loop test code.
> >
> > And I guess that after your patches kprobe_ftrace_handler() should be called
> > from the trampoline in this case.
>
> No it wont be. Not until we have Paul McKenney's task_rcu code that
> will return after all tasks have either gone through userspace or a
> schedule. Hmm, maybe on a !CONFIG_PREEMPT it will be OK. Oh, I can have
> that be OK now on !CONFIG_PREEMPT. Maybe I'll do that too.
>
> kprobe ftrace_ops are allocated which sets the FTRACE_OPS_FL_DYNAMIC
> flag. You'll see that flag checked in update_ftrace_function(), and if
> it is set, it forces the ftrace_ops_list_func() to be used.

No? __register_ftrace_function() sets if !core_kernel_data(ops), and
kprobe_ftrace_ops is not dynamic?

> Why?
>
> [...snip..]

Yes, thanks, I understand why, at least to some degree.

> 	foo()
> 	  [mcount called --> ftrace_caller trampoline]
>            ftrace_caller
>              load ftrace_ops into parameter
>              <interrupt>
>              preempt_schedule()
>        [new task]
>        kfree(kprobe ftrace_ops);

see above.

And to be sure, I compiled your rfc/trampoline kernel which I pulled
yesterday with the same patch and did the same test. __ftrace_ops_list_func()
prints nothing.

So I also added WARN_ON(1) into kprobe_ftrace_handler() to ensure that
it is actually called, and yes, dmesg reports

	WARNING: ... kprobe_ftrace_handler+0x38/0x140()
	...
	Call Trace:
	 [<ffffffff8136a3eb>] dump_stack+0x5b/0xa8
	 [<ffffffff810423ec>] warn_slowpath_common+0x8c/0xc0
	 [<ffffffff8105772c>] ? SyS_prctl+0x1c/0x730
	 [<ffffffff8104243a>] warn_slowpath_null+0x1a/0x20
	 [<ffffffff810325c8>] kprobe_ftrace_handler+0x38/0x140
	 [<ffffffff8137148a>] ? retint_swapgs+0xe/0x13
	 [<ffffffff81057731>] ? SyS_prctl+0x21/0x730
	 [<ffffffff81057731>] ? SyS_prctl+0x21/0x730
	 [<ffffffff8122424e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
	 [<ffffffff81370912>] ? system_call_fastpath+0x16/0x1b

after "perl -e 'syscall 157,-1'".

and, as expected, if I do "echo SyS_prctl >| set_ftrace_filter" and
"echo function >| current_tracer", then the command above also triggers
2 printk's in __ftrace_ops_list_func() :

	LIST_FUNC -> function_trace_call()
	LIST_FUNC -> kprobe_ftrace_handler()

so it seems that your patches can potentially buy more than you think ;)

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines
  2014-07-23 17:05         ` Oleg Nesterov
@ 2014-07-23 17:20           ` Steven Rostedt
  0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2014-07-23 17:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Paul E. McKenney, Masami Hiramatsu, Namhyung Kim, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Seth Jennings, Jiri Slaby

On Wed, 23 Jul 2014 19:05:52 +0200
Oleg Nesterov <oleg@redhat.com> wrote:


> > kprobe ftrace_ops are allocated which sets the FTRACE_OPS_FL_DYNAMIC
> > flag. You'll see that flag checked in update_ftrace_function(), and if
> > it is set, it forces the ftrace_ops_list_func() to be used.
> 
> No? __register_ftrace_function() sets if !core_kernel_data(ops), and
> kprobe_ftrace_ops is not dynamic?

Oh, you're right. I thought it was allocated.

What it is missing is the FTRACE_OPS_RECURSION_SAFE flag. Although, I'm
working on a patch that makes a non loop func that does the recursion
checks for just a single ftrace_ops->func if only one is registered.

> 
> > Why?
> >
> > [...snip..]
> 
> Yes, thanks, I understand why, at least to some degree.
> 
> > 	foo()
> > 	  [mcount called --> ftrace_caller trampoline]
> >            ftrace_caller
> >              load ftrace_ops into parameter
> >              <interrupt>
> >              preempt_schedule()
> >        [new task]
> >        kfree(kprobe ftrace_ops);
> 
> see above.
> 
> And to be sure, I compiled your rfc/trampoline kernel which I pulled
> yesterday with the same patch and did the same test. __ftrace_ops_list_func()
> prints nothing.

Note, I'm still working on fixes to that branch ;-)

> 
> So I also added WARN_ON(1) into kprobe_ftrace_handler() to ensure that
> it is actually called, and yes, dmesg reports
> 
> 	WARNING: ... kprobe_ftrace_handler+0x38/0x140()
> 	...
> 	Call Trace:
> 	 [<ffffffff8136a3eb>] dump_stack+0x5b/0xa8
> 	 [<ffffffff810423ec>] warn_slowpath_common+0x8c/0xc0
> 	 [<ffffffff8105772c>] ? SyS_prctl+0x1c/0x730
> 	 [<ffffffff8104243a>] warn_slowpath_null+0x1a/0x20
> 	 [<ffffffff810325c8>] kprobe_ftrace_handler+0x38/0x140
> 	 [<ffffffff8137148a>] ? retint_swapgs+0xe/0x13
> 	 [<ffffffff81057731>] ? SyS_prctl+0x21/0x730
> 	 [<ffffffff81057731>] ? SyS_prctl+0x21/0x730
> 	 [<ffffffff8122424e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> 	 [<ffffffff81370912>] ? system_call_fastpath+0x16/0x1b
> 

BTW, you may want to look at

/sys/kernel/debug/tracing/enabled_functions

as that has a lot of debug info for the trampolines in that branch.

> after "perl -e 'syscall 157,-1'".
> 
> and, as expected, if I do "echo SyS_prctl >| set_ftrace_filter" and
> "echo function >| current_tracer", then the command above also triggers
> 2 printk's in __ftrace_ops_list_func() :
> 
> 	LIST_FUNC -> function_trace_call()
> 	LIST_FUNC -> kprobe_ftrace_handler()
> 
> so it seems that your patches can potentially buy more than you think ;)
> 
> Oleg.


I'll play with this some more to understand everything you are stating.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-07-23 17:20 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-03 20:07 [RFC][PATCH 0/3] ftrace: Add dynamically allocated trampolines Steven Rostedt
2014-07-03 20:07 ` [RFC][PATCH 1/3] ftrace/x86: Add dynamic allocated trampoline for ftrace_ops Steven Rostedt
2014-07-04 13:32   ` Masami Hiramatsu
2014-07-04 14:25     ` Steven Rostedt
2014-07-14  2:34   ` Masami Hiramatsu
2014-07-03 20:07 ` [RFC][PATCH 2/3] ftrace/x86: Show trampoline call function in enabled_functions Steven Rostedt
2014-07-03 20:07 ` [RFC][PATCH 3/3] ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines Steven Rostedt
2014-07-03 20:32 ` [RFC][PATCH 0/3] ftrace: Add dynamically " Steven Rostedt
2014-07-04 13:20 ` Masami Hiramatsu
2014-07-04 14:21   ` Steven Rostedt
2014-07-07 13:22     ` Jiri Kosina
2014-07-08 14:24       ` Steven Rostedt
2014-07-07 13:58 ` Jiri Kosina
2014-07-10 21:36 ` Josh Poimboeuf
2014-07-10 21:44   ` Jiri Kosina
2014-07-10 22:01     ` Josh Poimboeuf
2014-07-11  2:26     ` Masami Hiramatsu
2014-07-11 13:24       ` Jiri Kosina
2014-07-11 14:29         ` Josh Poimboeuf
2014-07-14  1:35           ` Masami Hiramatsu
2014-07-14  7:16             ` Namhyung Kim
2014-07-14  8:18               ` Masami Hiramatsu
2014-07-14 14:18                 ` Namhyung Kim
2014-07-15  1:20                   ` Masami Hiramatsu
2014-07-22 16:47 ` Oleg Nesterov
2014-07-22 19:02   ` Steven Rostedt
2014-07-23 12:08     ` Oleg Nesterov
2014-07-23 15:48       ` Steven Rostedt
2014-07-23 17:05         ` Oleg Nesterov
2014-07-23 17:20           ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.