All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
@ 2011-08-10 16:22 Steven Rostedt
  2011-08-10 16:22 ` [PATCH 1/5][RFC] tracing: Clean up tb_fmt to not give faulty compile warning Steven Rostedt
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Steven Rostedt @ 2011-08-10 16:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Masami Hiramatsu, Arnaldo Carvalho de Melo,
	Jason Baron

Hi All,

I started working on adding the -mfentry switch to ftrace, which
allows us to remove the frame pointers requirement from function tracing
as well as makes mcount (fentry) work just better.

But when I did this in another branch, I noticed that I broke kprobes
in its most common usage. The attaching a probe at the beginning of
a function to use get to its parameters.

So I started this branch. This branch is to have kprobes use ftrace
directly when a probe is attached to a ftrace nop. Currently, kprobes
will just error when that happens. With this patch set, it will hook
into the ftrace infrastructure and use ftrace instead. This is more
like an optimized probe as no breakpoints need to be set. A call to
the function is done directly via the mcount trampoline. If ftrace
pt_regs is implemented for an arch, kprobes gets this feature for free.

The first patch is just a clean up that I need to push out to get rid
of the annoying compile warning about initialized variables that
gcc can't tell have been initialized.

The next two patches have ftrace pass both the ftrace_ops structure
and the pt_regs to the callback function that is registered with ftrace.

The last two patches have kprobes interact with ftrace and use the
ftrace infrastructure instead.

I only did this for x86_64, and will do it for x86_32 and PPC64 if everyone
agrees with this approach. Then I could find people to do it for other
archs :)

Thanks!

-- Steve

This patch set can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
rfc/kprobes/ftrace

Head SHA1: e704df971bc958c789e9dcf0b453e4c02e27887b


Steven Rostedt (5):
      tracing: Clean up tb_fmt to not give faulty compile warning
      ftrace: Pass ftrace_ops as third parameter to function trace callback
      ftrace: Return pt_regs to function trace callback (x86_64 only so far)
      kprobes: Inverse taking of module_mutex with kprobe_mutex
      kprobes: Use ftrace hooks when probing ftrace nops

----
 arch/x86/include/asm/ftrace.h     |   42 ++++++----
 arch/x86/kernel/entry_64.S        |   24 +++++-
 include/linux/ftrace.h            |   27 ++++++-
 include/linux/kprobes.h           |    6 ++
 kernel/kprobes.c                  |  167 ++++++++++++++++++++++++++++++++++---
 kernel/trace/ftrace.c             |   74 +++++++++++------
 kernel/trace/trace_events.c       |    3 +-
 kernel/trace/trace_functions.c    |   10 ++-
 kernel/trace/trace_irqsoff.c      |    3 +-
 kernel/trace/trace_printk.c       |   19 ++--
 kernel/trace/trace_sched_wakeup.c |    3 +-
 kernel/trace/trace_selftest.c     |   20 ++++-
 kernel/trace/trace_stack.c        |    3 +-
 13 files changed, 323 insertions(+), 78 deletions(-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/5][RFC] tracing: Clean up tb_fmt to not give faulty compile warning
  2011-08-10 16:22 [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops Steven Rostedt
@ 2011-08-10 16:22 ` Steven Rostedt
  2011-08-10 16:22 ` [PATCH 2/5][RFC] ftrace: Pass ftrace_ops as third parameter to function trace Steven Rostedt
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2011-08-10 16:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Masami Hiramatsu, Arnaldo Carvalho de Melo,
	Jason Baron

[-- Attachment #1: 0001-tracing-Clean-up-tb_fmt-to-not-give-faulty-compile-w.patch --]
[-- Type: text/plain, Size: 1347 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

gcc incorrectly states that the variable "fmt" is uninitialized when
CC_OPITMIZE_FOR_SIZE is set.

Instead of just blindly setting fmt to NULL, the code is cleaned up
a little to be a bit easier for humans to follow, as well as gcc
to know the variables are initialized.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_printk.c |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
index 1f06468..6fd4ffd 100644
--- a/kernel/trace/trace_printk.c
+++ b/kernel/trace/trace_printk.c
@@ -59,18 +59,19 @@ void hold_module_trace_bprintk_format(const char **start, const char **end)
 			continue;
 		}
 
+		fmt = NULL;
 		tb_fmt = kmalloc(sizeof(*tb_fmt), GFP_KERNEL);
-		if (tb_fmt)
+		if (tb_fmt) {
 			fmt = kmalloc(strlen(*iter) + 1, GFP_KERNEL);
-		if (tb_fmt && fmt) {
-			list_add_tail(&tb_fmt->list, &trace_bprintk_fmt_list);
-			strcpy(fmt, *iter);
-			tb_fmt->fmt = fmt;
-			*iter = tb_fmt->fmt;
-		} else {
-			kfree(tb_fmt);
-			*iter = NULL;
+			if (fmt) {
+				list_add_tail(&tb_fmt->list, &trace_bprintk_fmt_list);
+				strcpy(fmt, *iter);
+				tb_fmt->fmt = fmt;
+			} else
+				kfree(tb_fmt);
 		}
+		*iter = fmt;
+
 	}
 	mutex_unlock(&btrace_mutex);
 }
-- 
1.7.5.4



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/5][RFC] ftrace: Pass ftrace_ops as third parameter to function trace
  2011-08-10 16:22 [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops Steven Rostedt
  2011-08-10 16:22 ` [PATCH 1/5][RFC] tracing: Clean up tb_fmt to not give faulty compile warning Steven Rostedt
@ 2011-08-10 16:22 ` Steven Rostedt
  2011-08-10 16:22 ` [PATCH 3/5][RFC] ftrace: Return pt_regs to function trace callback (x86_64 only so Steven Rostedt
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2011-08-10 16:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Masami Hiramatsu, Arnaldo Carvalho de Melo,
	Jason Baron

[-- Attachment #1: 0002-ftrace-Pass-ftrace_ops-as-third-parameter-to-functio.patch callback --]
[-- Type: text/plain, Size: 13115 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Currently the function trace callback receives only the ip and parent_ip
of the function that it traced. It would be more powerful to also return
the ops that registered the function as well. This allows the same function
to act differently depending on what ftrace_ops registered it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/include/asm/ftrace.h     |    4 ++
 arch/x86/kernel/entry_64.S        |    1 +
 include/linux/ftrace.h            |   16 ++++++++-
 kernel/trace/ftrace.c             |   65 ++++++++++++++++++++++---------------
 kernel/trace/trace_events.c       |    3 +-
 kernel/trace/trace_functions.c    |    9 +++--
 kernel/trace/trace_irqsoff.c      |    3 +-
 kernel/trace/trace_sched_wakeup.c |    2 +-
 kernel/trace/trace_selftest.c     |   15 ++++++---
 kernel/trace/trace_stack.c        |    2 +-
 10 files changed, 80 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 268c783..b3fcf16 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -32,6 +32,10 @@
 #define MCOUNT_ADDR		((long)(mcount))
 #define MCOUNT_INSN_SIZE	5 /* sizeof mcount call */
 
+#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_X86_64)
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#endif
+
 #ifndef __ASSEMBLY__
 extern void mcount(void);
 
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e13329d..27adc2b 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -77,6 +77,7 @@ ENTRY(ftrace_caller)
 
 	MCOUNT_SAVE_FRAME
 
+	leaq function_trace_op, %rdx
 	movq 0x38(%rsp), %rdi
 	movq 8(%rbp), %rsi
 	subq $MCOUNT_INSN_SIZE, %rdi
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index f0c0e8a..e1fe5be 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -19,6 +19,15 @@
 
 #include <asm/ftrace.h>
 
+/*
+ * If the arch supports passing the variable contents of
+ * function_trace_op as the third parameter back from the
+ * mcount call, then the arch should define this as 1.
+ */
+#ifndef ARCH_SUPPORTS_FTRACE_OPS
+#define ARCH_SUPPORTS_FTRACE_OPS 0
+#endif
+
 struct ftrace_hash;
 
 #ifdef CONFIG_FUNCTION_TRACER
@@ -29,7 +38,10 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
 		     void __user *buffer, size_t *lenp,
 		     loff_t *ppos);
 
-typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
+struct ftrace_ops;
+
+typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
+			      struct ftrace_ops *op);
 
 enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
@@ -97,7 +109,7 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
-extern void ftrace_stub(unsigned long a0, unsigned long a1);
+extern void ftrace_stub(unsigned long a0, unsigned long a1, struct ftrace_ops *op);
 
 #else /* !CONFIG_FUNCTION_TRACER */
 /*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c3e4575..1d720cb 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -59,12 +59,20 @@
 #define FTRACE_HASH_DEFAULT_BITS 10
 #define FTRACE_HASH_MAX_BITS 12
 
+static struct ftrace_ops ftrace_list_end __read_mostly =
+{
+	.func		= ftrace_stub,
+};
+
 /* ftrace_enabled is a method to turn ftrace on or off */
 int ftrace_enabled __read_mostly;
 static int last_ftrace_enabled;
 
 /* Quick disabling of function tracer. */
-int function_trace_stop;
+int function_trace_stop __read_mostly;
+
+/* Current function tracing op */
+struct ftrace_ops *function_trace_op __read_mostly = &ftrace_list_end;
 
 /* List for set_ftrace_pid's pids. */
 LIST_HEAD(ftrace_pids);
@@ -81,10 +89,6 @@ static int ftrace_disabled __read_mostly;
 
 static DEFINE_MUTEX(ftrace_lock);
 
-static struct ftrace_ops ftrace_list_end __read_mostly = {
-	.func		= ftrace_stub,
-};
-
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
@@ -93,8 +97,8 @@ ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
 
-static void
-ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
+static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
+				 struct ftrace_ops *op);
 
 /*
  * Traverse the ftrace_global_list, invoking all entries.  The reason that we
@@ -105,29 +109,29 @@ ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
  *
  * Silly Alpha and silly pointer-speculation compiler optimizations!
  */
-static void ftrace_global_list_func(unsigned long ip,
-				    unsigned long parent_ip)
+static void
+ftrace_global_list_func(unsigned long ip, unsigned long parent_ip,
+			struct ftrace_ops *op)
 {
-	struct ftrace_ops *op;
-
 	if (unlikely(trace_recursion_test(TRACE_GLOBAL_BIT)))
 		return;
 
 	trace_recursion_set(TRACE_GLOBAL_BIT);
 	op = rcu_dereference_raw(ftrace_global_list); /*see above*/
 	while (op != &ftrace_list_end) {
-		op->func(ip, parent_ip);
+		op->func(ip, parent_ip, op);
 		op = rcu_dereference_raw(op->next); /*see above*/
 	};
 	trace_recursion_clear(TRACE_GLOBAL_BIT);
 }
 
-static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
+static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip,
+			    struct ftrace_ops *op)
 {
 	if (!test_tsk_trace_trace(current))
 		return;
 
-	ftrace_pid_function(ip, parent_ip);
+	ftrace_pid_function(ip, parent_ip, op);
 }
 
 static void set_ftrace_pid_function(ftrace_func_t func)
@@ -157,12 +161,13 @@ void clear_ftrace_function(void)
  * For those archs that do not test ftrace_trace_stop in their
  * mcount call site, we need to do it from C.
  */
-static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
+static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip,
+				  struct ftrace_ops *op)
 {
 	if (function_trace_stop)
 		return;
 
-	__ftrace_trace_function(ip, parent_ip);
+	__ftrace_trace_function(ip, parent_ip, op);
 }
 #endif
 
@@ -198,15 +203,24 @@ static void update_ftrace_function(void)
 
 	/*
 	 * If we are at the end of the list and this ops is
-	 * not dynamic, then have the mcount trampoline call
-	 * the function directly
+	 * not dynamic and the arch supports passing ops, then have the
+	 * mcount trampoline call the function directly.
 	 */
 	if (ftrace_ops_list == &ftrace_list_end ||
 	    (ftrace_ops_list->next == &ftrace_list_end &&
-	     !(ftrace_ops_list->flags & FTRACE_OPS_FL_DYNAMIC)))
+	     !(ftrace_ops_list->flags & FTRACE_OPS_FL_DYNAMIC) &&
+	     ARCH_SUPPORTS_FTRACE_OPS)) {
+		/* Set the ftrace_ops that the arch callback uses */
+		if (ftrace_ops_list == &global_ops)
+			function_trace_op = ftrace_global_list;
+		else
+			function_trace_op = ftrace_ops_list;
 		func = ftrace_ops_list->func;
-	else
+	} else {
+		/* Just use the default ftrace_ops */
+		function_trace_op = &ftrace_list_end;
 		func = ftrace_ops_list_func;
+	}
 
 #ifdef CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST
 	ftrace_trace_function = func;
@@ -2512,8 +2526,8 @@ static int __init ftrace_mod_cmd_init(void)
 }
 device_initcall(ftrace_mod_cmd_init);
 
-static void
-function_trace_probe_call(unsigned long ip, unsigned long parent_ip)
+static void function_trace_probe_call(unsigned long ip, unsigned long parent_ip,
+				      struct ftrace_ops *op)
 {
 	struct ftrace_func_probe *entry;
 	struct hlist_head *hhd;
@@ -3560,10 +3574,9 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
-ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
+ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
+		     struct ftrace_ops *op)
 {
-	struct ftrace_ops *op;
-
 	if (unlikely(trace_recursion_test(TRACE_INTERNAL_BIT)))
 		return;
 
@@ -3576,7 +3589,7 @@ ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 	op = rcu_dereference_raw(ftrace_ops_list);
 	while (op != &ftrace_list_end) {
 		if (ftrace_ops_test(op, ip))
-			op->func(ip, parent_ip);
+			op->func(ip, parent_ip, op);
 		op = rcu_dereference_raw(op->next);
 	};
 	preempt_enable_notrace();
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 581876f..d2a6f94 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1673,7 +1673,8 @@ static __init void event_trace_self_tests(void)
 static DEFINE_PER_CPU(atomic_t, ftrace_test_event_disable);
 
 static void
-function_test_events_call(unsigned long ip, unsigned long parent_ip)
+function_test_events_call(unsigned long ip, unsigned long parent_ip,
+			  struct ftrace_ops *op)
 {
 	struct ring_buffer_event *event;
 	struct ring_buffer *buffer;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index c7b0c6a..fceb7a9 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -48,7 +48,8 @@ static void function_trace_start(struct trace_array *tr)
 }
 
 static void
-function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip)
+function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip,
+				 struct ftrace_ops *op)
 {
 	struct trace_array *tr = func_trace;
 	struct trace_array_cpu *data;
@@ -75,7 +76,8 @@ function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip)
 }
 
 static void
-function_trace_call(unsigned long ip, unsigned long parent_ip)
+function_trace_call(unsigned long ip, unsigned long parent_ip,
+		    struct ftrace_ops *op)
 {
 	struct trace_array *tr = func_trace;
 	struct trace_array_cpu *data;
@@ -106,7 +108,8 @@ function_trace_call(unsigned long ip, unsigned long parent_ip)
 }
 
 static void
-function_stack_trace_call(unsigned long ip, unsigned long parent_ip)
+function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
+			  struct ftrace_ops *op)
 {
 	struct trace_array *tr = func_trace;
 	struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 667aa8c..7649609 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -136,7 +136,8 @@ static int func_prolog_dec(struct trace_array *tr,
  * irqsoff uses its own tracer function to keep the overhead down:
  */
 static void
-irqsoff_tracer_call(unsigned long ip, unsigned long parent_ip)
+irqsoff_tracer_call(unsigned long ip, unsigned long parent_ip,
+		    struct ftrace_ops *op)
 {
 	struct trace_array *tr = irqsoff_trace;
 	struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index e4a70c0..a311bfd 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -108,7 +108,7 @@ out_enable:
  * wakeup uses its own tracer function to keep the overhead down:
  */
 static void
-wakeup_tracer_call(unsigned long ip, unsigned long parent_ip)
+wakeup_tracer_call(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op)
 {
 	struct trace_array *tr = wakeup_trace;
 	struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 288541f..9ae40c8 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -103,35 +103,40 @@ static inline void warn_failed_init_tracer(struct tracer *trace, int init_ret)
 
 static int trace_selftest_test_probe1_cnt;
 static void trace_selftest_test_probe1_func(unsigned long ip,
-					    unsigned long pip)
+					    unsigned long pip,
+					    struct ftrace_ops *op)
 {
 	trace_selftest_test_probe1_cnt++;
 }
 
 static int trace_selftest_test_probe2_cnt;
 static void trace_selftest_test_probe2_func(unsigned long ip,
-					    unsigned long pip)
+					    unsigned long pip,
+					    struct ftrace_ops *op)
 {
 	trace_selftest_test_probe2_cnt++;
 }
 
 static int trace_selftest_test_probe3_cnt;
 static void trace_selftest_test_probe3_func(unsigned long ip,
-					    unsigned long pip)
+					    unsigned long pip,
+					    struct ftrace_ops *op)
 {
 	trace_selftest_test_probe3_cnt++;
 }
 
 static int trace_selftest_test_global_cnt;
 static void trace_selftest_test_global_func(unsigned long ip,
-					    unsigned long pip)
+					    unsigned long pip,
+					    struct ftrace_ops *op)
 {
 	trace_selftest_test_global_cnt++;
 }
 
 static int trace_selftest_test_dyn_cnt;
 static void trace_selftest_test_dyn_func(unsigned long ip,
-					 unsigned long pip)
+					 unsigned long pip,
+					 struct ftrace_ops *op)
 {
 	trace_selftest_test_dyn_cnt++;
 }
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 77575b3..456f262 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -108,7 +108,7 @@ static inline void check_stack(void)
 }
 
 static void
-stack_trace_call(unsigned long ip, unsigned long parent_ip)
+stack_trace_call(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op)
 {
 	int cpu;
 
-- 
1.7.5.4



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/5][RFC] ftrace: Return pt_regs to function trace callback (x86_64 only so
  2011-08-10 16:22 [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops Steven Rostedt
  2011-08-10 16:22 ` [PATCH 1/5][RFC] tracing: Clean up tb_fmt to not give faulty compile warning Steven Rostedt
  2011-08-10 16:22 ` [PATCH 2/5][RFC] ftrace: Pass ftrace_ops as third parameter to function trace Steven Rostedt
@ 2011-08-10 16:22 ` Steven Rostedt
  2011-08-11  5:55   ` Masami Hiramatsu
  2011-08-10 16:22 ` [PATCH 4/5][RFC] kprobes: Inverse taking of module_mutex with kprobe_mutex Steven Rostedt
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2011-08-10 16:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Masami Hiramatsu, Arnaldo Carvalho de Melo,
	Jason Baron

[-- Attachment #1: 0003-ftrace-Return-pt_regs-to-function-trace-callback-x86.patch far) --]
[-- Type: text/plain, Size: 14058 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Return as the 4th paramater to the function tracer callback the pt_regs.

So far this is only supported by x86_64. The ftrace_ops flag
FTRACE_OPS_FL_SAVE_REGS is added to tell the arch to save all regs
to the pt_regs, otherwise a minimum is just passed back (the same
regs that is saved by mcount itself).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/include/asm/ftrace.h     |   38 ++++++++++++++++++++----------------
 arch/x86/kernel/entry_64.S        |   23 +++++++++++++++++++++-
 include/linux/ftrace.h            |   15 ++++++++++++-
 kernel/trace/ftrace.c             |   29 ++++++++++++++++++---------
 kernel/trace/trace_events.c       |    2 +-
 kernel/trace/trace_functions.c    |    7 +++--
 kernel/trace/trace_irqsoff.c      |    2 +-
 kernel/trace/trace_sched_wakeup.c |    3 +-
 kernel/trace/trace_selftest.c     |   15 +++++++++----
 kernel/trace/trace_stack.c        |    3 +-
 10 files changed, 95 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index b3fcf16..0750c2a 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -4,26 +4,29 @@
 #ifdef __ASSEMBLY__
 
 	.macro MCOUNT_SAVE_FRAME
-	/* taken from glibc */
-	subq $0x38, %rsp
-	movq %rax, (%rsp)
-	movq %rcx, 8(%rsp)
-	movq %rdx, 16(%rsp)
-	movq %rsi, 24(%rsp)
-	movq %rdi, 32(%rsp)
-	movq %r8, 40(%rsp)
-	movq %r9, 48(%rsp)
+	 /*
+	  * We add enough stack to save all regs,
+	  * and we what we need in the location of pt_regs.
+	  */
+	subq $ORIG_RAX, %rsp
+	movq %rax, RAX(%rsp)
+	movq %rcx, RCX(%rsp)
+	movq %rdx, RDX(%rsp)
+	movq %rsi, RSI(%rsp)
+	movq %rdi, RDI(%rsp)
+	movq %r8, R8(%rsp)
+	movq %r9, R9(%rsp)
 	.endm
 
 	.macro MCOUNT_RESTORE_FRAME
-	movq 48(%rsp), %r9
-	movq 40(%rsp), %r8
-	movq 32(%rsp), %rdi
-	movq 24(%rsp), %rsi
-	movq 16(%rsp), %rdx
-	movq 8(%rsp), %rcx
-	movq (%rsp), %rax
-	addq $0x38, %rsp
+	movq R9(%rsp), %r9
+	movq R8(%rsp), %r8
+	movq RDI(%rsp), %rdi
+	movq RSI(%rsp), %rsi
+	movq RDX(%rsp), %rdx
+	movq RCX(%rsp), %rcx
+	movq RAX(%rsp), %rax
+	addq $ORIG_RAX, %rsp
 	.endm
 
 #endif
@@ -34,6 +37,7 @@
 
 #if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_X86_64)
 #define ARCH_SUPPORTS_FTRACE_OPS 1
+#define ARCH_SUPPORTS_FTRACE_SAVE_REGS 1
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 27adc2b..b77f297 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -78,7 +78,16 @@ ENTRY(ftrace_caller)
 	MCOUNT_SAVE_FRAME
 
 	leaq function_trace_op, %rdx
-	movq 0x38(%rsp), %rdi
+
+	cmpl $0, ftrace_save_regs
+	jne  save_all_regs
+
+call_func:
+
+	/* regs go into 4th parameter */
+	leaq (%rsp), %rcx
+
+	movq ORIG_RAX(%rsp), %rdi
 	movq 8(%rbp), %rsi
 	subq $MCOUNT_INSN_SIZE, %rdi
 
@@ -96,6 +105,18 @@ GLOBAL(ftrace_stub)
 	retq
 END(ftrace_caller)
 
+save_all_regs:
+	/* Save the rest of pt_regs */
+	movq %r15, R15(%rsp)
+	movq %r14, R14(%rsp)
+	movq %r13, R13(%rsp)
+	movq %r12, R12(%rsp)
+	movq %r10, R10(%rsp)
+	movq %rbp, RBP(%rsp)
+	movq %rbx, RBX(%rsp)
+	jmp call_func
+
+
 #else /* ! CONFIG_DYNAMIC_FTRACE */
 ENTRY(mcount)
 	cmpl $0, function_trace_stop
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index e1fe5be..14dcc98 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -11,6 +11,7 @@
 #include <linux/linkage.h>
 #include <linux/bitops.h>
 #include <linux/module.h>
+#include <linux/ptrace.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
 #include <linux/types.h>
@@ -28,6 +29,14 @@
 #define ARCH_SUPPORTS_FTRACE_OPS 0
 #endif
 
+/*
+ * If the arch supports saving the regs to the ftrace_ops
+ * then it should set this to 1.
+ */
+#ifndef ARCH_SUPPORTS_FTRACE_SAVE_REGS
+#define ARCH_SUPPORTS_FTRACE_SAVE_REGS 0
+#endif
+
 struct ftrace_hash;
 
 #ifdef CONFIG_FUNCTION_TRACER
@@ -41,12 +50,13 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
 struct ftrace_ops;
 
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
-			      struct ftrace_ops *op);
+			      struct ftrace_ops *op, struct pt_regs *pt_regs);
 
 enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_SAVE_REGS		= 1 << 3,
 };
 
 struct ftrace_ops {
@@ -109,7 +119,8 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
-extern void ftrace_stub(unsigned long a0, unsigned long a1, struct ftrace_ops *op);
+extern void ftrace_stub(unsigned long a0, unsigned long a1,
+			struct ftrace_ops *op, struct pt_regs *pt_regs);
 
 #else /* !CONFIG_FUNCTION_TRACER */
 /*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 1d720cb..c92f433 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -74,6 +74,9 @@ int function_trace_stop __read_mostly;
 /* Current function tracing op */
 struct ftrace_ops *function_trace_op __read_mostly = &ftrace_list_end;
 
+/* If the arch supports it, return pt_regs to caller */
+int ftrace_save_regs __read_mostly;
+
 /* List for set_ftrace_pid's pids. */
 LIST_HEAD(ftrace_pids);
 struct ftrace_pid {
@@ -98,7 +101,7 @@ ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
 
 static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
-				 struct ftrace_ops *op);
+				 struct ftrace_ops *op, struct pt_regs *pt_regs);
 
 /*
  * Traverse the ftrace_global_list, invoking all entries.  The reason that we
@@ -111,7 +114,7 @@ static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
  */
 static void
 ftrace_global_list_func(unsigned long ip, unsigned long parent_ip,
-			struct ftrace_ops *op)
+			struct ftrace_ops *op, struct pt_regs *pt_regs)
 {
 	if (unlikely(trace_recursion_test(TRACE_GLOBAL_BIT)))
 		return;
@@ -119,19 +122,19 @@ ftrace_global_list_func(unsigned long ip, unsigned long parent_ip,
 	trace_recursion_set(TRACE_GLOBAL_BIT);
 	op = rcu_dereference_raw(ftrace_global_list); /*see above*/
 	while (op != &ftrace_list_end) {
-		op->func(ip, parent_ip, op);
+		op->func(ip, parent_ip, op, pt_regs);
 		op = rcu_dereference_raw(op->next); /*see above*/
 	};
 	trace_recursion_clear(TRACE_GLOBAL_BIT);
 }
 
 static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip,
-			    struct ftrace_ops *op)
+			    struct ftrace_ops *op, struct pt_regs *pt_regs)
 {
 	if (!test_tsk_trace_trace(current))
 		return;
 
-	ftrace_pid_function(ip, parent_ip, op);
+	ftrace_pid_function(ip, parent_ip, op, pt_regs);
 }
 
 static void set_ftrace_pid_function(ftrace_func_t func)
@@ -162,12 +165,12 @@ void clear_ftrace_function(void)
  * mcount call site, we need to do it from C.
  */
 static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip,
-				  struct ftrace_ops *op)
+				  struct ftrace_ops *op, struct pt_regs *pt_regs)
 {
 	if (function_trace_stop)
 		return;
 
-	__ftrace_trace_function(ip, parent_ip, op);
+	__ftrace_trace_function(ip, parent_ip, op, pt_regs);
 }
 #endif
 
@@ -285,6 +288,9 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
+	if (ops->flags & FTRACE_OPS_FL_SAVE_REGS)
+		ftrace_save_regs++;
+
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
 		int first = ftrace_global_list == &ftrace_list_end;
 		add_ftrace_ops(&ftrace_global_list, ops);
@@ -313,6 +319,9 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 	if (FTRACE_WARN_ON(ops == &global_ops))
 		return -EINVAL;
 
+	if (ops->flags & FTRACE_OPS_FL_SAVE_REGS)
+		ftrace_save_regs--;
+
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
 		ret = remove_ftrace_ops(&ftrace_global_list, ops);
 		if (!ret && ftrace_global_list == &ftrace_list_end)
@@ -2527,7 +2536,7 @@ static int __init ftrace_mod_cmd_init(void)
 device_initcall(ftrace_mod_cmd_init);
 
 static void function_trace_probe_call(unsigned long ip, unsigned long parent_ip,
-				      struct ftrace_ops *op)
+				      struct ftrace_ops *op, struct pt_regs *pt_regs)
 {
 	struct ftrace_func_probe *entry;
 	struct hlist_head *hhd;
@@ -3575,7 +3584,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
-		     struct ftrace_ops *op)
+		     struct ftrace_ops *op, struct pt_regs *pt_regs)
 {
 	if (unlikely(trace_recursion_test(TRACE_INTERNAL_BIT)))
 		return;
@@ -3589,7 +3598,7 @@ ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
 	op = rcu_dereference_raw(ftrace_ops_list);
 	while (op != &ftrace_list_end) {
 		if (ftrace_ops_test(op, ip))
-			op->func(ip, parent_ip, op);
+			op->func(ip, parent_ip, op, pt_regs);
 		op = rcu_dereference_raw(op->next);
 	};
 	preempt_enable_notrace();
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index d2a6f94..fe2c1ac 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1674,7 +1674,7 @@ static DEFINE_PER_CPU(atomic_t, ftrace_test_event_disable);
 
 static void
 function_test_events_call(unsigned long ip, unsigned long parent_ip,
-			  struct ftrace_ops *op)
+			  struct ftrace_ops *op, struct pt_regs *pt_regs)
 {
 	struct ring_buffer_event *event;
 	struct ring_buffer *buffer;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index fceb7a9..5675ebd 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -49,7 +49,7 @@ static void function_trace_start(struct trace_array *tr)
 
 static void
 function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip,
-				 struct ftrace_ops *op)
+				 struct ftrace_ops *op, struct pt_regs *pt_regs)
 {
 	struct trace_array *tr = func_trace;
 	struct trace_array_cpu *data;
@@ -77,7 +77,8 @@ function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip,
 
 static void
 function_trace_call(unsigned long ip, unsigned long parent_ip,
-		    struct ftrace_ops *op)
+		    struct ftrace_ops *op, struct pt_regs *pt_regs)
+
 {
 	struct trace_array *tr = func_trace;
 	struct trace_array_cpu *data;
@@ -109,7 +110,7 @@ function_trace_call(unsigned long ip, unsigned long parent_ip,
 
 static void
 function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
-			  struct ftrace_ops *op)
+			  struct ftrace_ops *op, struct pt_regs *pt_regs)
 {
 	struct trace_array *tr = func_trace;
 	struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 7649609..009b3c4 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -137,7 +137,7 @@ static int func_prolog_dec(struct trace_array *tr,
  */
 static void
 irqsoff_tracer_call(unsigned long ip, unsigned long parent_ip,
-		    struct ftrace_ops *op)
+		    struct ftrace_ops *op, struct pt_regs *pt_regs)
 {
 	struct trace_array *tr = irqsoff_trace;
 	struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index a311bfd..fdafb79 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -108,7 +108,8 @@ out_enable:
  * wakeup uses its own tracer function to keep the overhead down:
  */
 static void
-wakeup_tracer_call(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op)
+wakeup_tracer_call(unsigned long ip, unsigned long parent_ip,
+		   struct ftrace_ops *op, struct pt_regs *pt_regs)
 {
 	struct trace_array *tr = wakeup_trace;
 	struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 9ae40c8..add37e0 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -104,7 +104,8 @@ static inline void warn_failed_init_tracer(struct tracer *trace, int init_ret)
 static int trace_selftest_test_probe1_cnt;
 static void trace_selftest_test_probe1_func(unsigned long ip,
 					    unsigned long pip,
-					    struct ftrace_ops *op)
+					    struct ftrace_ops *op,
+					    struct pt_regs *pt_regs)
 {
 	trace_selftest_test_probe1_cnt++;
 }
@@ -112,7 +113,8 @@ static void trace_selftest_test_probe1_func(unsigned long ip,
 static int trace_selftest_test_probe2_cnt;
 static void trace_selftest_test_probe2_func(unsigned long ip,
 					    unsigned long pip,
-					    struct ftrace_ops *op)
+					    struct ftrace_ops *op,
+					    struct pt_regs *pt_regs)
 {
 	trace_selftest_test_probe2_cnt++;
 }
@@ -120,7 +122,8 @@ static void trace_selftest_test_probe2_func(unsigned long ip,
 static int trace_selftest_test_probe3_cnt;
 static void trace_selftest_test_probe3_func(unsigned long ip,
 					    unsigned long pip,
-					    struct ftrace_ops *op)
+					    struct ftrace_ops *op,
+					    struct pt_regs *pt_regs)
 {
 	trace_selftest_test_probe3_cnt++;
 }
@@ -128,7 +131,8 @@ static void trace_selftest_test_probe3_func(unsigned long ip,
 static int trace_selftest_test_global_cnt;
 static void trace_selftest_test_global_func(unsigned long ip,
 					    unsigned long pip,
-					    struct ftrace_ops *op)
+					    struct ftrace_ops *op,
+					    struct pt_regs *pt_regs)
 {
 	trace_selftest_test_global_cnt++;
 }
@@ -136,7 +140,8 @@ static void trace_selftest_test_global_func(unsigned long ip,
 static int trace_selftest_test_dyn_cnt;
 static void trace_selftest_test_dyn_func(unsigned long ip,
 					 unsigned long pip,
-					 struct ftrace_ops *op)
+					 struct ftrace_ops *op,
+					 struct pt_regs *pt_regs)
 {
 	trace_selftest_test_dyn_cnt++;
 }
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 456f262..d22ba11 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -108,7 +108,8 @@ static inline void check_stack(void)
 }
 
 static void
-stack_trace_call(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op)
+stack_trace_call(unsigned long ip, unsigned long parent_ip,
+		 struct ftrace_ops *op, struct pt_regs *pt_regs)
 {
 	int cpu;
 
-- 
1.7.5.4



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/5][RFC] kprobes: Inverse taking of module_mutex with kprobe_mutex
  2011-08-10 16:22 [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops Steven Rostedt
                   ` (2 preceding siblings ...)
  2011-08-10 16:22 ` [PATCH 3/5][RFC] ftrace: Return pt_regs to function trace callback (x86_64 only so Steven Rostedt
@ 2011-08-10 16:22 ` Steven Rostedt
  2011-08-10 16:22 ` [PATCH 5/5][RFC] kprobes: Use ftrace hooks when probing ftrace nops Steven Rostedt
  2011-08-11  0:21 ` [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on " Masami Hiramatsu
  5 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2011-08-10 16:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Masami Hiramatsu, Arnaldo Carvalho de Melo,
	Jason Baron

[-- Attachment #1: 0004-kprobes-Inverse-taking-of-module_mutex-with-kprobe_m.patch --]
[-- Type: text/plain, Size: 1529 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Currently module_mutex is taken before kprobe_mutex, but this
can cause issues when we have kprobes register ftrace, as the ftrace
mutex is taken before enabling a tracepoint, which currently takes
the module mutex.

If module_mutex is taken before kprobe_mutex, then we can not
have kprobes use the ftrace infrastructure.

There seems to be no reason that the kprobe_mutex can't be taken
before the module_mutex. Running lockdep shows that it is safe
among the kernels I've run.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/kprobes.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index b30fd54..e6c25eb 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -561,9 +561,9 @@ static __kprobes void kprobe_optimizer(struct work_struct *work)
 {
 	LIST_HEAD(free_list);
 
+	mutex_lock(&kprobe_mutex);
 	/* Lock modules while optimizing kprobes */
 	mutex_lock(&module_mutex);
-	mutex_lock(&kprobe_mutex);
 
 	/*
 	 * Step 1: Unoptimize kprobes and collect cleaned (unused and disarmed)
@@ -586,8 +586,8 @@ static __kprobes void kprobe_optimizer(struct work_struct *work)
 	/* Step 4: Free cleaned kprobes after quiesence period */
 	do_free_cleaned_kprobes(&free_list);
 
-	mutex_unlock(&kprobe_mutex);
 	mutex_unlock(&module_mutex);
+	mutex_unlock(&kprobe_mutex);
 
 	/* Step 5: Kick optimizer again if needed */
 	if (!list_empty(&optimizing_list) || !list_empty(&unoptimizing_list))
-- 
1.7.5.4



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5][RFC] kprobes: Use ftrace hooks when probing ftrace nops
  2011-08-10 16:22 [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops Steven Rostedt
                   ` (3 preceding siblings ...)
  2011-08-10 16:22 ` [PATCH 4/5][RFC] kprobes: Inverse taking of module_mutex with kprobe_mutex Steven Rostedt
@ 2011-08-10 16:22 ` Steven Rostedt
  2011-08-11  7:41   ` Masami Hiramatsu
  2011-08-11  0:21 ` [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on " Masami Hiramatsu
  5 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2011-08-10 16:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Masami Hiramatsu, Arnaldo Carvalho de Melo,
	Jason Baron

[-- Attachment #1: 0005-kprobes-Use-ftrace-hooks-when-probing-ftrace-nops.patch --]
[-- Type: text/plain, Size: 9200 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Currently, if a probe is set at an ftrace nop, the probe will fail.

When -mfentry is used by ftrace (in the near future), the nop will be
at the beginning of the function. This will prevent kprobes from hooking
to the most common place that kprobes are attached.

Now that ftrace supports individual users as well as pt_regs passing,
kprobes can use the ftrace infrastructure when a probe is placed on
a ftrace nop.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/kprobes.h |    6 ++
 kernel/kprobes.c        |  163 ++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 160 insertions(+), 9 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index dd7c12e..5742071 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -37,6 +37,7 @@
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
 #include <linux/mutex.h>
+#include <linux/ftrace.h>
 
 #ifdef CONFIG_KPROBES
 #include <asm/kprobes.h>
@@ -112,6 +113,11 @@ struct kprobe {
 	/* copy of the original instruction */
 	struct arch_specific_insn ainsn;
 
+#ifdef CONFIG_FUNCTION_TRACER
+	/* If it is possible to use ftrace to probe */
+	struct ftrace_ops fops;
+#endif
+
 	/*
 	 * Indicates various status flags.
 	 * Protected by kprobe_mutex after this kprobe is registered.
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index e6c25eb..2160768 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -102,6 +102,31 @@ static struct kprobe_blackpoint kprobe_blacklist[] = {
 	{NULL}    /* Terminator */
 };
 
+#ifdef CONFIG_FUNCTION_TRACER
+/* kprobe stub function for use when probe is not connected to ftrace */
+static void
+kprobe_ftrace_stub(unsigned long ip, unsigned long pip,
+		   struct ftrace_ops *op, struct pt_regs *pt_regs)
+{
+}
+
+static bool ftrace_kprobe(struct kprobe *p)
+{
+	return p->fops.func && p->fops.func != kprobe_ftrace_stub;
+}
+
+static void init_non_ftrace_kprobe(struct kprobe *p)
+{
+	p->fops.func = kprobe_ftrace_stub;
+}
+
+#else
+static bool ftrace_kprobe(struct kprobe *p)
+{
+	return false;
+}
+#endif
+
 #ifdef __ARCH_WANT_KPROBES_INSN_SLOT
 /*
  * kprobe->ainsn.insn points to the copy of the instruction to be
@@ -396,6 +421,9 @@ static __kprobes void free_aggr_kprobe(struct kprobe *p)
 {
 	struct optimized_kprobe *op;
 
+	if (ftrace_kprobe(p))
+		return;
+
 	op = container_of(p, struct optimized_kprobe, kp);
 	arch_remove_optimized_kprobe(op);
 	arch_remove_kprobe(p);
@@ -759,6 +787,9 @@ static __kprobes void try_to_optimize_kprobe(struct kprobe *p)
 	struct kprobe *ap;
 	struct optimized_kprobe *op;
 
+	if (ftrace_kprobe(p))
+		return;
+
 	ap = alloc_aggr_kprobe(p);
 	if (!ap)
 		return;
@@ -849,6 +880,10 @@ static void __kprobes __arm_kprobe(struct kprobe *p)
 {
 	struct kprobe *_p;
 
+	/* Only arm non-ftrace probes */
+	if (ftrace_kprobe(p))
+		return;
+
 	/* Check collision with other optimized kprobes */
 	_p = get_optimized_kprobe((unsigned long)p->addr);
 	if (unlikely(_p))
@@ -864,6 +899,10 @@ static void __kprobes __disarm_kprobe(struct kprobe *p, bool reopt)
 {
 	struct kprobe *_p;
 
+	/* Only disarm non-ftrace probes */
+	if (ftrace_kprobe(p))
+		return;
+
 	unoptimize_kprobe(p, false);	/* Try to unoptimize */
 
 	if (!kprobe_queued(p)) {
@@ -878,13 +917,26 @@ static void __kprobes __disarm_kprobe(struct kprobe *p, bool reopt)
 
 #else /* !CONFIG_OPTPROBES */
 
+static void __kprobes __arm_kprobe(struct kprobe *p)
+{
+	/* Only arm non-ftrace probes */
+	if (!ftrace_kprobe(p))
+		arch_arm_kprobe(p);
+}
+
+/* Remove the breakpoint of a probe. Must be called with text_mutex locked */
+static void __kprobes __disarm_kprobe(struct kprobe *p, bool reopt)
+{
+	/* Only disarm non-ftrace probes */
+	if (!ftrace_kprobe(p))
+		arch_disarm_kprobe(p);
+}
+
 #define optimize_kprobe(p)			do {} while (0)
 #define unoptimize_kprobe(p, f)			do {} while (0)
 #define kill_optimized_kprobe(p)		do {} while (0)
 #define prepare_optimized_kprobe(p)		do {} while (0)
 #define try_to_optimize_kprobe(p)		do {} while (0)
-#define __arm_kprobe(p)				arch_arm_kprobe(p)
-#define __disarm_kprobe(p, o)			arch_disarm_kprobe(p)
 #define kprobe_disarmed(p)			kprobe_disabled(p)
 #define wait_for_kprobe_optimizer()		do {} while (0)
 
@@ -897,7 +949,9 @@ static void reuse_unused_kprobe(struct kprobe *ap)
 
 static __kprobes void free_aggr_kprobe(struct kprobe *p)
 {
-	arch_remove_kprobe(p);
+
+	if (!ftrace_kprobe(p))
+		arch_remove_kprobe(p);
 	kfree(p);
 }
 
@@ -910,6 +964,12 @@ static __kprobes struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
 /* Arm a kprobe with text_mutex */
 static void __kprobes arm_kprobe(struct kprobe *kp)
 {
+	/* ftrace probes can skip arch calls */
+	if (ftrace_kprobe(kp)) {
+		register_ftrace_function(&kp->fops);
+		return;
+	}
+
 	/*
 	 * Here, since __arm_kprobe() doesn't use stop_machine(),
 	 * this doesn't cause deadlock on text_mutex. So, we don't
@@ -924,6 +984,12 @@ static void __kprobes arm_kprobe(struct kprobe *kp)
 static void __kprobes disarm_kprobe(struct kprobe *kp)
 {
 	/* Ditto */
+
+	if (ftrace_kprobe(kp)) {
+		unregister_ftrace_function(&kp->fops);
+		return;
+	}
+
 	mutex_lock(&text_mutex);
 	__disarm_kprobe(kp, true);
 	mutex_unlock(&text_mutex);
@@ -1313,6 +1379,56 @@ static inline int check_kprobe_rereg(struct kprobe *p)
 	return ret;
 }
 
+#ifdef CONFIG_FUNCTION_TRACER
+static notrace void
+kprobe_ftrace_callback(unsigned long ip, unsigned long parent_ip,
+		       struct ftrace_ops *op, struct pt_regs *pt_regs)
+{
+	struct kprobe *p = container_of(op, struct kprobe, fops);
+
+	/* A nop has been trapped, just run both handlers back to back */
+	if (p->pre_handler)
+		p->pre_handler(p, pt_regs);
+	if (p->post_handler)
+		p->post_handler(p, pt_regs, 0);
+}
+
+static int use_ftrace_hook(struct kprobe *p)
+{
+	char str[KSYM_SYMBOL_LEN];
+
+	/* see if this is an ftrace function */
+	if (!ftrace_text_reserved(p->addr, p->addr)) {
+		/* make sure fops->func is nop */
+		init_non_ftrace_kprobe(p);
+		return 0;
+	}
+
+	/* If arch does not support pt_regs passing, bail */
+	if (!ARCH_SUPPORTS_FTRACE_SAVE_REGS)
+		return -EINVAL;
+
+	/* Use ftrace hook instead */
+
+	memset(&p->fops, 0, sizeof(p->fops));
+
+	/* Find the function this is connected with this addr */
+	kallsyms_lookup((unsigned long)p->addr, NULL, NULL, NULL, str);
+
+	p->fops.flags = FTRACE_OPS_FL_SAVE_REGS;
+	p->fops.func = kprobe_ftrace_callback;
+
+	ftrace_set_filter(&p->fops, str, strlen(str), 1);
+
+	return 0;
+}
+#else
+static int use_ftrace_hook(struct kprobe *p)
+{
+	return 0;
+}
+#endif
+
 int __kprobes register_kprobe(struct kprobe *p)
 {
 	int ret = 0;
@@ -1329,11 +1445,14 @@ int __kprobes register_kprobe(struct kprobe *p)
 	if (ret)
 		return ret;
 
+	ret = use_ftrace_hook(p);
+	if (ret)
+		return ret;
+
 	jump_label_lock();
 	preempt_disable();
 	if (!kernel_text_address((unsigned long) p->addr) ||
 	    in_kprobes_functions((unsigned long) p->addr) ||
-	    ftrace_text_reserved(p->addr, p->addr) ||
 	    jump_label_text_reserved(p->addr, p->addr))
 		goto fail_with_jump_label;
 
@@ -1384,15 +1503,17 @@ int __kprobes register_kprobe(struct kprobe *p)
 		goto out;
 	}
 
-	ret = arch_prepare_kprobe(p);
-	if (ret)
-		goto out;
+	if (!ftrace_kprobe(p)) {
+		ret = arch_prepare_kprobe(p);
+		if (ret)
+			goto out;
+	}
 
 	INIT_HLIST_NODE(&p->hlist);
 	hlist_add_head_rcu(&p->hlist,
 		       &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
 
-	if (!kprobes_all_disarmed && !kprobe_disabled(p))
+	if (!ftrace_kprobe(p) && !kprobes_all_disarmed && !kprobe_disabled(p))
 		__arm_kprobe(p);
 
 	/* Try to optimize kprobe */
@@ -1400,6 +1521,12 @@ int __kprobes register_kprobe(struct kprobe *p)
 
 out:
 	mutex_unlock(&text_mutex);
+
+	/* ftrace kprobes must be set outside of text_mutex */
+	if (!ret && ftrace_kprobe(p) &&
+	    !kprobes_all_disarmed && !kprobe_disabled(p))
+		arm_kprobe(p);
+
 	put_online_cpus();
 	jump_label_unlock();
 	mutex_unlock(&kprobe_mutex);
@@ -2134,6 +2261,14 @@ static void __kprobes arm_all_kprobes(void)
 	}
 	mutex_unlock(&text_mutex);
 
+	/* ftrace kprobes are enabled outside of text_mutex */
+	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
+		head = &kprobe_table[i];
+		hlist_for_each_entry_rcu(p, node, head, hlist)
+			if (ftrace_kprobe(p) && !kprobe_disabled(p))
+				arm_kprobe(p);
+	}
+
 	kprobes_all_disarmed = false;
 	printk(KERN_INFO "Kprobes globally enabled\n");
 
@@ -2164,11 +2299,21 @@ static void __kprobes disarm_all_kprobes(void)
 	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
 		head = &kprobe_table[i];
 		hlist_for_each_entry_rcu(p, node, head, hlist) {
-			if (!arch_trampoline_kprobe(p) && !kprobe_disabled(p))
+			if (!ftrace_kprobe(p) &&
+			    !arch_trampoline_kprobe(p) && !kprobe_disabled(p))
 				__disarm_kprobe(p, false);
 		}
 	}
 	mutex_unlock(&text_mutex);
+
+	/* ftrace kprobes are disabled outside of text_mutex */
+	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
+		head = &kprobe_table[i];
+		hlist_for_each_entry_rcu(p, node, head, hlist) {
+			if (ftrace_kprobe(p) && !kprobe_disabled(p))
+				disarm_kprobe(p);
+		}
+	}
 	mutex_unlock(&kprobe_mutex);
 
 	/* Wait for disarming all kprobes by optimizer */
-- 
1.7.5.4



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-10 16:22 [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops Steven Rostedt
                   ` (4 preceding siblings ...)
  2011-08-10 16:22 ` [PATCH 5/5][RFC] kprobes: Use ftrace hooks when probing ftrace nops Steven Rostedt
@ 2011-08-11  0:21 ` Masami Hiramatsu
  2011-08-11  0:34   ` Steven Rostedt
  5 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2011-08-11  0:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt

Hi Steven,

Thanks for this nice feature!

(2011/08/11 1:22), Steven Rostedt wrote:
> Hi All,
>
> I started working on adding the -mfentry switch to ftrace, which
> allows us to remove the frame pointers requirement from function tracing
> as well as makes mcount (fentry) work just better.
>
> But when I did this in another branch, I noticed that I broke kprobes
> in its most common usage. The attaching a probe at the beginning of
> a function to use get to its parameters.
>
> So I started this branch. This branch is to have kprobes use ftrace
> directly when a probe is attached to a ftrace nop. Currently, kprobes
> will just error when that happens. With this patch set, it will hook
> into the ftrace infrastructure and use ftrace instead. This is more
> like an optimized probe as no breakpoints need to be set. A call to
> the function is done directly via the mcount trampoline. If ftrace
> pt_regs is implemented for an arch, kprobes gets this feature for free.

I agreed this idea, this looks good to me too :)
With -fentry, this can improve dynamic trace events very much.

BTW (OT), it seems that current kprobe data structure becomes a bit
fat. Maybe what we need is just a "holder of hooking handler" as
what ftrace provides, not a full storage data structure of copied
instrucutions. Perhaps, we'd better diet the kprobe structure for
transparency of hooking infrastructure.

> The first patch is just a clean up that I need to push out to get rid
> of the annoying compile warning about initialized variables that
> gcc can't tell have been initialized.
>
> The next two patches have ftrace pass both the ftrace_ops structure
> and the pt_regs to the callback function that is registered with ftrace.
>
> The last two patches have kprobes interact with ftrace and use the
> ftrace infrastructure instead.
>
> I only did this for x86_64, and will do it for x86_32 and PPC64 if everyone
> agrees with this approach. Then I could find people to do it for other
> archs :)

OK, I'll review this soon!

Thank you!

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-11  0:21 ` [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on " Masami Hiramatsu
@ 2011-08-11  0:34   ` Steven Rostedt
  2011-08-11  6:28     ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2011-08-11  0:34 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt

On Thu, 2011-08-11 at 09:21 +0900, Masami Hiramatsu wrote:
> Hi Steven,
> 
> Thanks for this nice feature!
> 
> (2011/08/11 1:22), Steven Rostedt wrote:
> > Hi All,
> >
> > I started working on adding the -mfentry switch to ftrace, which
> > allows us to remove the frame pointers requirement from function tracing
> > as well as makes mcount (fentry) work just better.
> >
> > But when I did this in another branch, I noticed that I broke kprobes
> > in its most common usage. The attaching a probe at the beginning of
> > a function to use get to its parameters.
> >
> > So I started this branch. This branch is to have kprobes use ftrace
> > directly when a probe is attached to a ftrace nop. Currently, kprobes
> > will just error when that happens. With this patch set, it will hook
> > into the ftrace infrastructure and use ftrace instead. This is more
> > like an optimized probe as no breakpoints need to be set. A call to
> > the function is done directly via the mcount trampoline. If ftrace
> > pt_regs is implemented for an arch, kprobes gets this feature for free.
> 
> I agreed this idea, this looks good to me too :)
> With -fentry, this can improve dynamic trace events very much.
> 
> BTW (OT), it seems that current kprobe data structure becomes a bit
> fat. Maybe what we need is just a "holder of hooking handler" as
> what ftrace provides, not a full storage data structure of copied
> instrucutions. Perhaps, we'd better diet the kprobe structure for
> transparency of hooking infrastructure.

Sure, I can make the ftrace_ops field in kprobes dynamically allocated
instead. That shouldn't be an issue.

> 
> > The first patch is just a clean up that I need to push out to get rid
> > of the annoying compile warning about initialized variables that
> > gcc can't tell have been initialized.
> >
> > The next two patches have ftrace pass both the ftrace_ops structure
> > and the pt_regs to the callback function that is registered with ftrace.
> >
> > The last two patches have kprobes interact with ftrace and use the
> > ftrace infrastructure instead.
> >
> > I only did this for x86_64, and will do it for x86_32 and PPC64 if everyone
> > agrees with this approach. Then I could find people to do it for other
> > archs :)
> 
> OK, I'll review this soon!

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5][RFC] ftrace: Return pt_regs to function trace callback (x86_64 only so
  2011-08-10 16:22 ` [PATCH 3/5][RFC] ftrace: Return pt_regs to function trace callback (x86_64 only so Steven Rostedt
@ 2011-08-11  5:55   ` Masami Hiramatsu
  2011-08-11 12:59     ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2011-08-11  5:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt

(2011/08/11 1:22), Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
>
> Return as the 4th paramater to the function tracer callback the pt_regs.
>
> So far this is only supported by x86_64. The ftrace_ops flag
> FTRACE_OPS_FL_SAVE_REGS is added to tell the arch to save all regs
> to the pt_regs, otherwise a minimum is just passed back (the same
> regs that is saved by mcount itself).

I guess it will be a bit hard to port this on x86-32, because
on x86-32, the top of stack address in pt_regs is the address
of sp member (e.g. &(pt_regs->sp)). I mean that when mcount-entry
calls ftrace_caller, it pushes an address of the next instruction
of mcount-entry on the top of stack.
In that case, &(pt_regs->sp) points the entry which stores the
address, instead of the return address of probed function.

e.g. with kprobes (on x86-32):

[  <bx>  ] <- pt_regs
[  ...   ]
[  <cs>  ]
[<flags> ]
[ret-addr] <- &(pt_regs.sp)
[  arg1  ]
[  arg2  ]

with this method:

[  <bx>  ] <- pt_regs
[  ...   ]
[  <cs>  ]
[<flags> ]
[mcount-ret] <- &(pt_regs.sp)
[ret-addr]
[  arg1  ]
[  arg2  ]

I think this is hard to solve without a tricky hack.

For example, on x86-32, MCOUNT_FRAME_SAVE saves
flags on the entry which will be <cs> and it saves
mcount-ret to local stack and moves flags to next entry.

<save-frame>
pushf			# save flags on CS(%esp)
subl $12, %esp		# skip ip, orig_ax and gs
pushl %fs
pushl %es
...
pushl %ebx
movl 56(%esp), %ebx	# load mcount-ret address
movl 52(%esp), %ebp	# load flags
movl %ebp, 56(%esp)	# store flags

call function (ebx is callee save)

<restore-frame>
movl 56(%esp), %ebp	# load flags
movl %ebp, 52(%esp)	# store flags
movl %ebx, 56(%esp)	# load mcount-ret address
...
popf
ret

Hmm?

> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  arch/x86/include/asm/ftrace.h     |   38 ++++++++++++++++++++----------------
>  arch/x86/kernel/entry_64.S        |   23 +++++++++++++++++++++-
>  include/linux/ftrace.h            |   15 ++++++++++++-
>  kernel/trace/ftrace.c             |   29 ++++++++++++++++++---------
>  kernel/trace/trace_events.c       |    2 +-
>  kernel/trace/trace_functions.c    |    7 +++--
>  kernel/trace/trace_irqsoff.c      |    2 +-
>  kernel/trace/trace_sched_wakeup.c |    3 +-
>  kernel/trace/trace_selftest.c     |   15 +++++++++----
>  kernel/trace/trace_stack.c        |    3 +-
>  10 files changed, 95 insertions(+), 42 deletions(-)
>
> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index b3fcf16..0750c2a 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
> @@ -4,26 +4,29 @@
>  #ifdef __ASSEMBLY__
>
>  	.macro MCOUNT_SAVE_FRAME
> -	/* taken from glibc */
> -	subq $0x38, %rsp
> -	movq %rax, (%rsp)
> -	movq %rcx, 8(%rsp)
> -	movq %rdx, 16(%rsp)
> -	movq %rsi, 24(%rsp)
> -	movq %rdi, 32(%rsp)
> -	movq %r8, 40(%rsp)
> -	movq %r9, 48(%rsp)
> +	 /*
> +	  * We add enough stack to save all regs,
> +	  * and we what we need in the location of pt_regs.
> +	  */
> +	subq $ORIG_RAX, %rsp
> +	movq %rax, RAX(%rsp)
> +	movq %rcx, RCX(%rsp)
> +	movq %rdx, RDX(%rsp)
> +	movq %rsi, RSI(%rsp)
> +	movq %rdi, RDI(%rsp)
> +	movq %r8, R8(%rsp)
> +	movq %r9, R9(%rsp)
>  	.endm
>
>  	.macro MCOUNT_RESTORE_FRAME
> -	movq 48(%rsp), %r9
> -	movq 40(%rsp), %r8
> -	movq 32(%rsp), %rdi
> -	movq 24(%rsp), %rsi
> -	movq 16(%rsp), %rdx
> -	movq 8(%rsp), %rcx
> -	movq (%rsp), %rax
> -	addq $0x38, %rsp
> +	movq R9(%rsp), %r9
> +	movq R8(%rsp), %r8
> +	movq RDI(%rsp), %rdi
> +	movq RSI(%rsp), %rsi
> +	movq RDX(%rsp), %rdx
> +	movq RCX(%rsp), %rcx
> +	movq RAX(%rsp), %rax
> +	addq $ORIG_RAX, %rsp
>  	.endm
>
>  #endif
> @@ -34,6 +37,7 @@
>
>  #if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_X86_64)
>  #define ARCH_SUPPORTS_FTRACE_OPS 1
> +#define ARCH_SUPPORTS_FTRACE_SAVE_REGS 1
>  #endif
>
>  #ifndef __ASSEMBLY__
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 27adc2b..b77f297 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -78,7 +78,16 @@ ENTRY(ftrace_caller)

I can see below code before save frame.

	cmpl $0, function_trace_stop
	jne  ftrace_stub

Please pushf before comparing it. :)
Sometimes, the value of eflags is worth to watch.
I know that SF/ZF will be never used between
function call, so it is OK if the eflags is saved
in MCOUNT_SAVE_FRAME.

>  	MCOUNT_SAVE_FRAME
>
>  	leaq function_trace_op, %rdx
> -	movq 0x38(%rsp), %rdi
> +
> +	cmpl $0, ftrace_save_regs
> +	jne  save_all_regs
> +
> +call_func:
> +
> +	/* regs go into 4th parameter */
> +	leaq (%rsp), %rcx
> +
> +	movq ORIG_RAX(%rsp), %rdi
>  	movq 8(%rbp), %rsi
>  	subq $MCOUNT_INSN_SIZE, %rdi
>
> @@ -96,6 +105,18 @@ GLOBAL(ftrace_stub)
>  	retq
>  END(ftrace_caller)
>
> +save_all_regs:
> +	/* Save the rest of pt_regs */
> +	movq %r15, R15(%rsp)
> +	movq %r14, R14(%rsp)
> +	movq %r13, R13(%rsp)
> +	movq %r12, R12(%rsp)
> +	movq %r10, R10(%rsp)
> +	movq %rbp, RBP(%rsp)
> +	movq %rbx, RBX(%rsp)
> +	jmp call_func

At least, pt_regs.sp must be saved for accessing
vars on stack.

> +
> +
>  #else /* ! CONFIG_DYNAMIC_FTRACE */
>  ENTRY(mcount)
>  	cmpl $0, function_trace_stop

You also need to restore the rest of pt_regs if
ftrace_save_regs is true.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-11  0:34   ` Steven Rostedt
@ 2011-08-11  6:28     ` Masami Hiramatsu
  2011-08-11 13:01       ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2011-08-11  6:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt

(2011/08/11 9:34), Steven Rostedt wrote:
> On Thu, 2011-08-11 at 09:21 +0900, Masami Hiramatsu wrote:
>> Hi Steven,
>>
>> Thanks for this nice feature!
>>
>> (2011/08/11 1:22), Steven Rostedt wrote:
>>> Hi All,
>>>
>>> I started working on adding the -mfentry switch to ftrace, which
>>> allows us to remove the frame pointers requirement from function tracing
>>> as well as makes mcount (fentry) work just better.
>>>
>>> But when I did this in another branch, I noticed that I broke kprobes
>>> in its most common usage. The attaching a probe at the beginning of
>>> a function to use get to its parameters.
>>>
>>> So I started this branch. This branch is to have kprobes use ftrace
>>> directly when a probe is attached to a ftrace nop. Currently, kprobes
>>> will just error when that happens. With this patch set, it will hook
>>> into the ftrace infrastructure and use ftrace instead. This is more
>>> like an optimized probe as no breakpoints need to be set. A call to
>>> the function is done directly via the mcount trampoline. If ftrace
>>> pt_regs is implemented for an arch, kprobes gets this feature for free.
>>
>> I agreed this idea, this looks good to me too :)
>> With -fentry, this can improve dynamic trace events very much.
>>
>> BTW (OT), it seems that current kprobe data structure becomes a bit
>> fat. Maybe what we need is just a "holder of hooking handler" as
>> what ftrace provides, not a full storage data structure of copied
>> instrucutions. Perhaps, we'd better diet the kprobe structure for
>> transparency of hooking infrastructure.
>
> Sure, I can make the ftrace_ops field in kprobes dynamically allocated
> instead. That shouldn't be an issue.

By the way (again), perhaps, much simpler solution is using ftrace
not in kprobe, but in the trace_kprobe. Of course, there are several
pros and cons...

The pros:
- Arch independent solution (anyway, ftrace still needs passing pt_regs
 to their handler)
- Don't need to introduce more complexity into kprobes itself.
- Maybe systemtap also can catch up with this as using same method.

The cons:
- Native kprobes users will be disappointed... anyway, they just need to
  move their probes to the next instruction (usually addr+=5 is OK).

... are there any other cons? :)

Thank you,


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5][RFC] kprobes: Use ftrace hooks when probing ftrace nops
  2011-08-10 16:22 ` [PATCH 5/5][RFC] kprobes: Use ftrace hooks when probing ftrace nops Steven Rostedt
@ 2011-08-11  7:41   ` Masami Hiramatsu
  2011-08-11 13:22     ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2011-08-11  7:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt

Hi Steven,

As I suggested in another reply, I'm now attracted by the idea
of using ftrace in kprobe-tracer, because of simplicity.
Anyway, here is my review.

(2011/08/11 1:22), Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
>
> Currently, if a probe is set at an ftrace nop, the probe will fail.
>
> When -mfentry is used by ftrace (in the near future), the nop will be
> at the beginning of the function. This will prevent kprobes from hooking
> to the most common place that kprobes are attached.
>
> Now that ftrace supports individual users as well as pt_regs passing,
> kprobes can use the ftrace infrastructure when a probe is placed on
> a ftrace nop.

My one general concern is the timing of enabling ftrace-based probe.
Breakpoint-based kprobe (including optimized kprobe) is enabled
right after registering. Users might expect that.
And AFAIK, dynamic ftraces handler will be enabled (activated)
after a while, because it has to wait for an NMI, doesn't it?

And theoretically, this ftrace-based probe can't support jprobe,
because it changes IP address. Nowadays, this may becomes minor
problem (because someone who wants to trace function args can
use kprobe-tracer), but still exist.


> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  include/linux/kprobes.h |    6 ++
>  kernel/kprobes.c        |  163 ++++++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 160 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index dd7c12e..5742071 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -37,6 +37,7 @@
>  #include <linux/spinlock.h>
>  #include <linux/rcupdate.h>
>  #include <linux/mutex.h>
> +#include <linux/ftrace.h>
>
>  #ifdef CONFIG_KPROBES
>  #include <asm/kprobes.h>
> @@ -112,6 +113,11 @@ struct kprobe {
>  	/* copy of the original instruction */
>  	struct arch_specific_insn ainsn;
>
> +#ifdef CONFIG_FUNCTION_TRACER
> +	/* If it is possible to use ftrace to probe */
> +	struct ftrace_ops fops;
> +#endif
> +
>  	/*
>  	 * Indicates various status flags.
>  	 * Protected by kprobe_mutex after this kprobe is registered.
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index e6c25eb..2160768 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -102,6 +102,31 @@ static struct kprobe_blackpoint kprobe_blacklist[] = {
>  	{NULL}    /* Terminator */
>  };
>
> +#ifdef CONFIG_FUNCTION_TRACER
> +/* kprobe stub function for use when probe is not connected to ftrace */
> +static void
> +kprobe_ftrace_stub(unsigned long ip, unsigned long pip,
> +		   struct ftrace_ops *op, struct pt_regs *pt_regs)
> +{
> +}
> +
> +static bool ftrace_kprobe(struct kprobe *p)
> +{
> +	return p->fops.func && p->fops.func != kprobe_ftrace_stub;
> +}
> +
> +static void init_non_ftrace_kprobe(struct kprobe *p)
> +{
> +	p->fops.func = kprobe_ftrace_stub;
> +}
> +
> +#else
> +static bool ftrace_kprobe(struct kprobe *p)
> +{
> +	return false;
> +}
> +#endif
> +
>  #ifdef __ARCH_WANT_KPROBES_INSN_SLOT
>  /*
>   * kprobe->ainsn.insn points to the copy of the instruction to be
> @@ -396,6 +421,9 @@ static __kprobes void free_aggr_kprobe(struct kprobe *p)
>  {
>  	struct optimized_kprobe *op;
>
> +	if (ftrace_kprobe(p))
> +		return;
> +
>  	op = container_of(p, struct optimized_kprobe, kp);
>  	arch_remove_optimized_kprobe(op);
>  	arch_remove_kprobe(p);
> @@ -759,6 +787,9 @@ static __kprobes void try_to_optimize_kprobe(struct kprobe *p)
>  	struct kprobe *ap;
>  	struct optimized_kprobe *op;
>
> +	if (ftrace_kprobe(p))
> +		return;
> +
>  	ap = alloc_aggr_kprobe(p);
>  	if (!ap)
>  		return;
> @@ -849,6 +880,10 @@ static void __kprobes __arm_kprobe(struct kprobe *p)
>  {
>  	struct kprobe *_p;
>
> +	/* Only arm non-ftrace probes */
> +	if (ftrace_kprobe(p))
> +		return;
> +
>  	/* Check collision with other optimized kprobes */
>  	_p = get_optimized_kprobe((unsigned long)p->addr);
>  	if (unlikely(_p))
> @@ -864,6 +899,10 @@ static void __kprobes __disarm_kprobe(struct kprobe *p,
bool reopt)
>  {
>  	struct kprobe *_p;
>
> +	/* Only disarm non-ftrace probes */
> +	if (ftrace_kprobe(p))
> +		return;
> +
>  	unoptimize_kprobe(p, false);	/* Try to unoptimize */
>
>  	if (!kprobe_queued(p)) {
> @@ -878,13 +917,26 @@ static void __kprobes __disarm_kprobe(struct kprobe *p,
bool reopt)
>
>  #else /* !CONFIG_OPTPROBES */
>
> +static void __kprobes __arm_kprobe(struct kprobe *p)
> +{
> +	/* Only arm non-ftrace probes */
> +	if (!ftrace_kprobe(p))
> +		arch_arm_kprobe(p);
> +}
> +
> +/* Remove the breakpoint of a probe. Must be called with text_mutex locked */
> +static void __kprobes __disarm_kprobe(struct kprobe *p, bool reopt)
> +{
> +	/* Only disarm non-ftrace probes */
> +	if (!ftrace_kprobe(p))
> +		arch_disarm_kprobe(p);
> +}

If it ignores disabling/enabling, kprobe_ftrace_callback must
check kprobe_disabled(p) and skip it.

> +
>  #define optimize_kprobe(p)			do {} while (0)
>  #define unoptimize_kprobe(p, f)			do {} while (0)
>  #define kill_optimized_kprobe(p)		do {} while (0)
>  #define prepare_optimized_kprobe(p)		do {} while (0)
>  #define try_to_optimize_kprobe(p)		do {} while (0)
> -#define __arm_kprobe(p)				arch_arm_kprobe(p)
> -#define __disarm_kprobe(p, o)			arch_disarm_kprobe(p)
>  #define kprobe_disarmed(p)			kprobe_disabled(p)
>  #define wait_for_kprobe_optimizer()		do {} while (0)
>
> @@ -897,7 +949,9 @@ static void reuse_unused_kprobe(struct kprobe *ap)
>
>  static __kprobes void free_aggr_kprobe(struct kprobe *p)
>  {
> -	arch_remove_kprobe(p);
> +
> +	if (!ftrace_kprobe(p))
> +		arch_remove_kprobe(p);
>  	kfree(p);
>  }
>
> @@ -910,6 +964,12 @@ static __kprobes struct kprobe *alloc_aggr_kprobe(struct
kprobe *p)
>  /* Arm a kprobe with text_mutex */
>  static void __kprobes arm_kprobe(struct kprobe *kp)
>  {
> +	/* ftrace probes can skip arch calls */
> +	if (ftrace_kprobe(kp)) {
> +		register_ftrace_function(&kp->fops);
> +		return;
> +	}
> +
>  	/*
>  	 * Here, since __arm_kprobe() doesn't use stop_machine(),
>  	 * this doesn't cause deadlock on text_mutex. So, we don't
> @@ -924,6 +984,12 @@ static void __kprobes arm_kprobe(struct kprobe *kp)
>  static void __kprobes disarm_kprobe(struct kprobe *kp)
>  {
>  	/* Ditto */
> +
> +	if (ftrace_kprobe(kp)) {
> +		unregister_ftrace_function(&kp->fops);
> +		return;
> +	}
> +
>  	mutex_lock(&text_mutex);
>  	__disarm_kprobe(kp, true);
>  	mutex_unlock(&text_mutex);
> @@ -1313,6 +1379,56 @@ static inline int check_kprobe_rereg(struct kprobe *p)
>  	return ret;
>  }
>
> +#ifdef CONFIG_FUNCTION_TRACER
> +static notrace void
> +kprobe_ftrace_callback(unsigned long ip, unsigned long parent_ip,
> +		       struct ftrace_ops *op, struct pt_regs *pt_regs)
> +{
> +	struct kprobe *p = container_of(op, struct kprobe, fops);
> +

Here, we need to set up kprobe_ctlblk and some of pt_regs members,
ip, cs and orig_ax as optimized_callback()@arch/x86/kernel/kprobes.c
does.

> +	/* A nop has been trapped, just run both handlers back to back */
> +	if (p->pre_handler)
> +		p->pre_handler(p, pt_regs);

And increment regs->ip here for NOP.

> +	if (p->post_handler)
> +		p->post_handler(p, pt_regs, 0);
> +}

Anyway, above operations strongly depends on arch, so
kprobe_ftrace_callback should be moved to arch/*/kprobes.c.

And I think most of the code can be shared with optimized code.

> +
> +static int use_ftrace_hook(struct kprobe *p)
> +{
> +	char str[KSYM_SYMBOL_LEN];
> +
> +	/* see if this is an ftrace function */
> +	if (!ftrace_text_reserved(p->addr, p->addr)) {
> +		/* make sure fops->func is nop */
> +		init_non_ftrace_kprobe(p);
> +		return 0;
> +	}
> +
> +	/* If arch does not support pt_regs passing, bail */
> +	if (!ARCH_SUPPORTS_FTRACE_SAVE_REGS)
> +		return -EINVAL;

Hmm, I think this should be checked at build time...

> +
> +	/* Use ftrace hook instead */
> +
> +	memset(&p->fops, 0, sizeof(p->fops));
> +
> +	/* Find the function this is connected with this addr */
> +	kallsyms_lookup((unsigned long)p->addr, NULL, NULL, NULL, str);
> +
> +	p->fops.flags = FTRACE_OPS_FL_SAVE_REGS;
> +	p->fops.func = kprobe_ftrace_callback;
> +
> +	ftrace_set_filter(&p->fops, str, strlen(str), 1);

Hmm, IMHO, ftrace_set_filter should not be called here, because
there can be other kprobes are already registered on the same
address. In that case, it is natural that we use an aggr_kprobe
for handling several kprobes on same address. Or, kprobe hash table
will have several different probes on same address.

> +
> +	return 0;
> +}
> +#else
> +static int use_ftrace_hook(struct kprobe *p)
> +{
> +	return 0;
> +}
> +#endif
> +
>  int __kprobes register_kprobe(struct kprobe *p)
>  {
>  	int ret = 0;
> @@ -1329,11 +1445,14 @@ int __kprobes register_kprobe(struct kprobe *p)
>  	if (ret)
>  		return ret;
>
> +	ret = use_ftrace_hook(p);
> +	if (ret)
> +		return ret;
> +
>  	jump_label_lock();
>  	preempt_disable();
>  	if (!kernel_text_address((unsigned long) p->addr) ||
>  	    in_kprobes_functions((unsigned long) p->addr) ||
> -	    ftrace_text_reserved(p->addr, p->addr) ||
>  	    jump_label_text_reserved(p->addr, p->addr))
>  		goto fail_with_jump_label;
>
> @@ -1384,15 +1503,17 @@ int __kprobes register_kprobe(struct kprobe *p)
>  		goto out;
>  	}
>
> -	ret = arch_prepare_kprobe(p);
> -	if (ret)
> -		goto out;
> +	if (!ftrace_kprobe(p)) {
> +		ret = arch_prepare_kprobe(p);
> +		if (ret)
> +			goto out;
> +	}
>
>  	INIT_HLIST_NODE(&p->hlist);
>  	hlist_add_head_rcu(&p->hlist,
>  		       &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
>
> -	if (!kprobes_all_disarmed && !kprobe_disabled(p))
> +	if (!ftrace_kprobe(p) && !kprobes_all_disarmed && !kprobe_disabled(p))
>  		__arm_kprobe(p);
>
>  	/* Try to optimize kprobe */
> @@ -1400,6 +1521,12 @@ int __kprobes register_kprobe(struct kprobe *p)
>
>  out:
>  	mutex_unlock(&text_mutex);
> +
> +	/* ftrace kprobes must be set outside of text_mutex */
> +	if (!ret && ftrace_kprobe(p) &&
> +	    !kprobes_all_disarmed && !kprobe_disabled(p))
> +		arm_kprobe(p);
> +
>  	put_online_cpus();
>  	jump_label_unlock();
>  	mutex_unlock(&kprobe_mutex);

After this, we must handle some fails which can happen when probing
on a module.


> @@ -2134,6 +2261,14 @@ static void __kprobes arm_all_kprobes(void)
>  	}
>  	mutex_unlock(&text_mutex);
>
> +	/* ftrace kprobes are enabled outside of text_mutex */
> +	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
> +		head = &kprobe_table[i];
> +		hlist_for_each_entry_rcu(p, node, head, hlist)
> +			if (ftrace_kprobe(p) && !kprobe_disabled(p))
> +				arm_kprobe(p);
> +	}
> +
>  	kprobes_all_disarmed = false;
>  	printk(KERN_INFO "Kprobes globally enabled\n");
>
> @@ -2164,11 +2299,21 @@ static void __kprobes disarm_all_kprobes(void)
>  	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
>  		head = &kprobe_table[i];
>  		hlist_for_each_entry_rcu(p, node, head, hlist) {
> -			if (!arch_trampoline_kprobe(p) && !kprobe_disabled(p))
> +			if (!ftrace_kprobe(p) &&
> +			    !arch_trampoline_kprobe(p) && !kprobe_disabled(p))
>  				__disarm_kprobe(p, false);
>  		}
>  	}
>  	mutex_unlock(&text_mutex);
> +
> +	/* ftrace kprobes are disabled outside of text_mutex */
> +	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
> +		head = &kprobe_table[i];
> +		hlist_for_each_entry_rcu(p, node, head, hlist) {
> +			if (ftrace_kprobe(p) && !kprobe_disabled(p))
> +				disarm_kprobe(p);
> +		}
> +	}
>  	mutex_unlock(&kprobe_mutex);
>
>  	/* Wait for disarming all kprobes by optimizer */

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5][RFC] ftrace: Return pt_regs to function trace callback (x86_64 only so
  2011-08-11  5:55   ` Masami Hiramatsu
@ 2011-08-11 12:59     ` Steven Rostedt
  2011-08-12  0:55       ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2011-08-11 12:59 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt

On Thu, 2011-08-11 at 14:55 +0900, Masami Hiramatsu wrote:
> (2011/08/11 1:22), Steven Rostedt wrote:
> > From: Steven Rostedt <srostedt@redhat.com>
> >
> > Return as the 4th paramater to the function tracer callback the pt_regs.
> >
> > So far this is only supported by x86_64. The ftrace_ops flag
> > FTRACE_OPS_FL_SAVE_REGS is added to tell the arch to save all regs
> > to the pt_regs, otherwise a minimum is just passed back (the same
> > regs that is saved by mcount itself).
> 
> I guess it will be a bit hard to port this on x86-32, because
> on x86-32, the top of stack address in pt_regs is the address
> of sp member (e.g. &(pt_regs->sp)). I mean that when mcount-entry
> calls ftrace_caller, it pushes an address of the next instruction
> of mcount-entry on the top of stack.
> In that case, &(pt_regs->sp) points the entry which stores the
> address, instead of the return address of probed function.

I'm a little confused here. What is needed by kprobes handlers?

It should be no problem saving the sp member onto pt_regs of the time
mcount was called, and simulate the probe as if it was put on the nop.

> 
> e.g. with kprobes (on x86-32):
> 
> [  <bx>  ] <- pt_regs
> [  ...   ]
> [  <cs>  ]
> [<flags> ]
> [ret-addr] <- &(pt_regs.sp)
> [  arg1  ]
> [  arg2  ]
> 
> with this method:
> 
> [  <bx>  ] <- pt_regs
> [  ...   ]
> [  <cs>  ]
> [<flags> ]
> [mcount-ret] <- &(pt_regs.sp)
> [ret-addr]
> [  arg1  ]
> [  arg2  ]
> 
> I think this is hard to solve without a tricky hack.

Is this for handling kretprobes? With mcount, we shouldn't be doing that
because mcount is not the place that kretprobes should be doing this.
When we add __fentry__, the arch code will be different to handle that
and it shouldn't be too hard to make that work with kretprobes.

> 
> For example, on x86-32, MCOUNT_FRAME_SAVE saves
> flags on the entry which will be <cs> and it saves
> mcount-ret to local stack and moves flags to next entry.
> 
> <save-frame>
> pushf			# save flags on CS(%esp)
> subl $12, %esp		# skip ip, orig_ax and gs
> pushl %fs
> pushl %es
> ...
> pushl %ebx
> movl 56(%esp), %ebx	# load mcount-ret address
> movl 52(%esp), %ebp	# load flags
> movl %ebp, 56(%esp)	# store flags

Nope we don't do anything like this. I've already got x86_32 working
(but maybe it needs more for pt_regs->sp) but why is pushf needed?

-- Steve

> 
> call function (ebx is callee save)
> 
> <restore-frame>
> movl 56(%esp), %ebp	# load flags
> movl %ebp, 52(%esp)	# store flags
> movl %ebx, 56(%esp)	# load mcount-ret address
> ...
> popf
> ret
> 
> Hmm?
> 
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >  arch/x86/include/asm/ftrace.h     |   38 ++++++++++++++++++++----------------
> >  arch/x86/kernel/entry_64.S        |   23 +++++++++++++++++++++-
> >  include/linux/ftrace.h            |   15 ++++++++++++-
> >  kernel/trace/ftrace.c             |   29 ++++++++++++++++++---------
> >  kernel/trace/trace_events.c       |    2 +-
> >  kernel/trace/trace_functions.c    |    7 +++--
> >  kernel/trace/trace_irqsoff.c      |    2 +-
> >  kernel/trace/trace_sched_wakeup.c |    3 +-
> >  kernel/trace/trace_selftest.c     |   15 +++++++++----
> >  kernel/trace/trace_stack.c        |    3 +-
> >  10 files changed, 95 insertions(+), 42 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> > index b3fcf16..0750c2a 100644
> > --- a/arch/x86/include/asm/ftrace.h
> > +++ b/arch/x86/include/asm/ftrace.h
> > @@ -4,26 +4,29 @@
> >  #ifdef __ASSEMBLY__
> >
> >  	.macro MCOUNT_SAVE_FRAME
> > -	/* taken from glibc */
> > -	subq $0x38, %rsp
> > -	movq %rax, (%rsp)
> > -	movq %rcx, 8(%rsp)
> > -	movq %rdx, 16(%rsp)
> > -	movq %rsi, 24(%rsp)
> > -	movq %rdi, 32(%rsp)
> > -	movq %r8, 40(%rsp)
> > -	movq %r9, 48(%rsp)
> > +	 /*
> > +	  * We add enough stack to save all regs,
> > +	  * and we what we need in the location of pt_regs.
> > +	  */
> > +	subq $ORIG_RAX, %rsp
> > +	movq %rax, RAX(%rsp)
> > +	movq %rcx, RCX(%rsp)
> > +	movq %rdx, RDX(%rsp)
> > +	movq %rsi, RSI(%rsp)
> > +	movq %rdi, RDI(%rsp)
> > +	movq %r8, R8(%rsp)
> > +	movq %r9, R9(%rsp)
> >  	.endm
> >
> >  	.macro MCOUNT_RESTORE_FRAME
> > -	movq 48(%rsp), %r9
> > -	movq 40(%rsp), %r8
> > -	movq 32(%rsp), %rdi
> > -	movq 24(%rsp), %rsi
> > -	movq 16(%rsp), %rdx
> > -	movq 8(%rsp), %rcx
> > -	movq (%rsp), %rax
> > -	addq $0x38, %rsp
> > +	movq R9(%rsp), %r9
> > +	movq R8(%rsp), %r8
> > +	movq RDI(%rsp), %rdi
> > +	movq RSI(%rsp), %rsi
> > +	movq RDX(%rsp), %rdx
> > +	movq RCX(%rsp), %rcx
> > +	movq RAX(%rsp), %rax
> > +	addq $ORIG_RAX, %rsp
> >  	.endm
> >
> >  #endif
> > @@ -34,6 +37,7 @@
> >
> >  #if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_X86_64)
> >  #define ARCH_SUPPORTS_FTRACE_OPS 1
> > +#define ARCH_SUPPORTS_FTRACE_SAVE_REGS 1
> >  #endif
> >
> >  #ifndef __ASSEMBLY__
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index 27adc2b..b77f297 100644
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -78,7 +78,16 @@ ENTRY(ftrace_caller)
> 
> I can see below code before save frame.
> 
> 	cmpl $0, function_trace_stop
> 	jne  ftrace_stub
> 
> Please pushf before comparing it. :)
> Sometimes, the value of eflags is worth to watch.
> I know that SF/ZF will be never used between
> function call, so it is OK if the eflags is saved
> in MCOUNT_SAVE_FRAME.
> 
> >  	MCOUNT_SAVE_FRAME
> >
> >  	leaq function_trace_op, %rdx
> > -	movq 0x38(%rsp), %rdi
> > +
> > +	cmpl $0, ftrace_save_regs
> > +	jne  save_all_regs
> > +
> > +call_func:
> > +
> > +	/* regs go into 4th parameter */
> > +	leaq (%rsp), %rcx
> > +
> > +	movq ORIG_RAX(%rsp), %rdi
> >  	movq 8(%rbp), %rsi
> >  	subq $MCOUNT_INSN_SIZE, %rdi
> >
> > @@ -96,6 +105,18 @@ GLOBAL(ftrace_stub)
> >  	retq
> >  END(ftrace_caller)
> >
> > +save_all_regs:
> > +	/* Save the rest of pt_regs */
> > +	movq %r15, R15(%rsp)
> > +	movq %r14, R14(%rsp)
> > +	movq %r13, R13(%rsp)
> > +	movq %r12, R12(%rsp)
> > +	movq %r10, R10(%rsp)
> > +	movq %rbp, RBP(%rsp)
> > +	movq %rbx, RBX(%rsp)
> > +	jmp call_func
> 
> At least, pt_regs.sp must be saved for accessing
> vars on stack.
> 
> > +
> > +
> >  #else /* ! CONFIG_DYNAMIC_FTRACE */
> >  ENTRY(mcount)
> >  	cmpl $0, function_trace_stop
> 
> You also need to restore the rest of pt_regs if
> ftrace_save_regs is true.
> 
> Thank you,
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-11  6:28     ` Masami Hiramatsu
@ 2011-08-11 13:01       ` Steven Rostedt
  2011-08-12  2:57         ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2011-08-11 13:01 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt

On Thu, 2011-08-11 at 15:28 +0900, Masami Hiramatsu wrote:
> (2011/08/11 9:34), Steven Rostedt wrote:
> > On Thu, 2011-08-11 at 09:21 +0900, Masami Hiramatsu wrote:
> >> Hi Steven,
> >>
> >> Thanks for this nice feature!
> >>
> >> (2011/08/11 1:22), Steven Rostedt wrote:
> >>> Hi All,
> >>>
> >>> I started working on adding the -mfentry switch to ftrace, which
> >>> allows us to remove the frame pointers requirement from function tracing
> >>> as well as makes mcount (fentry) work just better.
> >>>
> >>> But when I did this in another branch, I noticed that I broke kprobes
> >>> in its most common usage. The attaching a probe at the beginning of
> >>> a function to use get to its parameters.
> >>>
> >>> So I started this branch. This branch is to have kprobes use ftrace
> >>> directly when a probe is attached to a ftrace nop. Currently, kprobes
> >>> will just error when that happens. With this patch set, it will hook
> >>> into the ftrace infrastructure and use ftrace instead. This is more
> >>> like an optimized probe as no breakpoints need to be set. A call to
> >>> the function is done directly via the mcount trampoline. If ftrace
> >>> pt_regs is implemented for an arch, kprobes gets this feature for free.
> >>
> >> I agreed this idea, this looks good to me too :)
> >> With -fentry, this can improve dynamic trace events very much.
> >>
> >> BTW (OT), it seems that current kprobe data structure becomes a bit
> >> fat. Maybe what we need is just a "holder of hooking handler" as
> >> what ftrace provides, not a full storage data structure of copied
> >> instrucutions. Perhaps, we'd better diet the kprobe structure for
> >> transparency of hooking infrastructure.
> >
> > Sure, I can make the ftrace_ops field in kprobes dynamically allocated
> > instead. That shouldn't be an issue.
> 
> By the way (again), perhaps, much simpler solution is using ftrace
> not in kprobe, but in the trace_kprobe. Of course, there are several
> pros and cons...
> 
> The pros:
> - Arch independent solution (anyway, ftrace still needs passing pt_regs
>  to their handler)
> - Don't need to introduce more complexity into kprobes itself.
> - Maybe systemtap also can catch up with this as using same method.

Note that systemtap and others will be hooking into kprobes version, not
the trace_kprobe one. If we do it in trace_kprobe, then everyone else
needs to reimplement it too.  I have bigger ideas for the future of
this, and I really want to get this working. If it doesn't work for
kprobes, then it won't work for anything else.

> 
> The cons:
> - Native kprobes users will be disappointed... anyway, they just need to
>   move their probes to the next instruction (usually addr+=5 is OK).

I've been told that doing the addr+=5 (which is also arch specific) can
break things for other tools.

-- Steve

> 
> ... are there any other cons? :)
> 
> Thank you,
> 
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5][RFC] kprobes: Use ftrace hooks when probing ftrace nops
  2011-08-11  7:41   ` Masami Hiramatsu
@ 2011-08-11 13:22     ` Steven Rostedt
  2011-08-12  2:41       ` Masami Hiramatsu
  2011-08-12  5:46       ` Ananth N Mavinakayanahalli
  0 siblings, 2 replies; 28+ messages in thread
From: Steven Rostedt @ 2011-08-11 13:22 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt

On Thu, 2011-08-11 at 16:41 +0900, Masami Hiramatsu wrote:
> Hi Steven,
> 
> As I suggested in another reply, I'm now attracted by the idea
> of using ftrace in kprobe-tracer, because of simplicity.
> Anyway, here is my review.

It may be a little simpler, but it defeats the purpose of what I'm
trying to accomplish. I do want the ability to expand this to do other
things than just trace_kprobe.

> 
> (2011/08/11 1:22), Steven Rostedt wrote:
> > From: Steven Rostedt <srostedt@redhat.com>
> >
> > Currently, if a probe is set at an ftrace nop, the probe will fail.
> >
> > When -mfentry is used by ftrace (in the near future), the nop will be
> > at the beginning of the function. This will prevent kprobes from hooking
> > to the most common place that kprobes are attached.
> >
> > Now that ftrace supports individual users as well as pt_regs passing,
> > kprobes can use the ftrace infrastructure when a probe is placed on
> > a ftrace nop.
> 
> My one general concern is the timing of enabling ftrace-based probe.
> Breakpoint-based kprobe (including optimized kprobe) is enabled
> right after registering. Users might expect that.
> And AFAIK, dynamic ftraces handler will be enabled (activated)
> after a while, because it has to wait for an NMI, doesn't it?

There's no NMI needed. Not sure where you got that idea. When
register_ftrace_function() is called, it will start tracing immediately.
Note, stop_machine() is called, but we could fix that in the future too.

> 
> And theoretically, this ftrace-based probe can't support jprobe,
> because it changes IP address. Nowadays, this may becomes minor
> problem (because someone who wants to trace function args can
> use kprobe-tracer), but still exist.

I want to fix that too. :)

We don't need to worry about that until -mfentry exists. But once that
does, then, yes, I want the ftrace version working for jprobe too.

> 
> 
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >  include/linux/kprobes.h |    6 ++
> >  kernel/kprobes.c        |  163 ++++++++++++++++++++++++++++++++++++++++++++---
> >  2 files changed, 160 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> > index dd7c12e..5742071 100644
> > --- a/include/linux/kprobes.h
> > +++ b/include/linux/kprobes.h
> > @@ -37,6 +37,7 @@
> >  #include <linux/spinlock.h>
> >  #include <linux/rcupdate.h>
> >  #include <linux/mutex.h>
> > +#include <linux/ftrace.h>
> >
> >  #ifdef CONFIG_KPROBES
> >  #include <asm/kprobes.h>
> > @@ -112,6 +113,11 @@ struct kprobe {
> >  	/* copy of the original instruction */
> >  	struct arch_specific_insn ainsn;
> >
> > +#ifdef CONFIG_FUNCTION_TRACER
> > +	/* If it is possible to use ftrace to probe */
> > +	struct ftrace_ops fops;
> > +#endif
> > +
> >  	/*
> >  	 * Indicates various status flags.
> >  	 * Protected by kprobe_mutex after this kprobe is registered.
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index e6c25eb..2160768 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -102,6 +102,31 @@ static struct kprobe_blackpoint kprobe_blacklist[] = {
> >  	{NULL}    /* Terminator */
> >  };
> >
> > +#ifdef CONFIG_FUNCTION_TRACER
> > +/* kprobe stub function for use when probe is not connected to ftrace */
> > +static void
> > +kprobe_ftrace_stub(unsigned long ip, unsigned long pip,
> > +		   struct ftrace_ops *op, struct pt_regs *pt_regs)
> > +{
> > +}
> > +
> > +static bool ftrace_kprobe(struct kprobe *p)
> > +{
> > +	return p->fops.func && p->fops.func != kprobe_ftrace_stub;
> > +}
> > +
> > +static void init_non_ftrace_kprobe(struct kprobe *p)
> > +{
> > +	p->fops.func = kprobe_ftrace_stub;
> > +}
> > +
> > +#else
> > +static bool ftrace_kprobe(struct kprobe *p)
> > +{
> > +	return false;
> > +}
> > +#endif
> > +
> >  #ifdef __ARCH_WANT_KPROBES_INSN_SLOT
> >  /*
> >   * kprobe->ainsn.insn points to the copy of the instruction to be
> > @@ -396,6 +421,9 @@ static __kprobes void free_aggr_kprobe(struct kprobe *p)
> >  {
> >  	struct optimized_kprobe *op;
> >
> > +	if (ftrace_kprobe(p))
> > +		return;
> > +
> >  	op = container_of(p, struct optimized_kprobe, kp);
> >  	arch_remove_optimized_kprobe(op);
> >  	arch_remove_kprobe(p);
> > @@ -759,6 +787,9 @@ static __kprobes void try_to_optimize_kprobe(struct kprobe *p)
> >  	struct kprobe *ap;
> >  	struct optimized_kprobe *op;
> >
> > +	if (ftrace_kprobe(p))
> > +		return;
> > +
> >  	ap = alloc_aggr_kprobe(p);
> >  	if (!ap)
> >  		return;
> > @@ -849,6 +880,10 @@ static void __kprobes __arm_kprobe(struct kprobe *p)
> >  {
> >  	struct kprobe *_p;
> >
> > +	/* Only arm non-ftrace probes */
> > +	if (ftrace_kprobe(p))
> > +		return;
> > +
> >  	/* Check collision with other optimized kprobes */
> >  	_p = get_optimized_kprobe((unsigned long)p->addr);
> >  	if (unlikely(_p))
> > @@ -864,6 +899,10 @@ static void __kprobes __disarm_kprobe(struct kprobe *p,
> bool reopt)
> >  {
> >  	struct kprobe *_p;
> >
> > +	/* Only disarm non-ftrace probes */
> > +	if (ftrace_kprobe(p))
> > +		return;
> > +
> >  	unoptimize_kprobe(p, false);	/* Try to unoptimize */
> >
> >  	if (!kprobe_queued(p)) {
> > @@ -878,13 +917,26 @@ static void __kprobes __disarm_kprobe(struct kprobe *p,
> bool reopt)
> >
> >  #else /* !CONFIG_OPTPROBES */
> >
> > +static void __kprobes __arm_kprobe(struct kprobe *p)
> > +{
> > +	/* Only arm non-ftrace probes */
> > +	if (!ftrace_kprobe(p))
> > +		arch_arm_kprobe(p);
> > +}
> > +
> > +/* Remove the breakpoint of a probe. Must be called with text_mutex locked */
> > +static void __kprobes __disarm_kprobe(struct kprobe *p, bool reopt)
> > +{
> > +	/* Only disarm non-ftrace probes */
> > +	if (!ftrace_kprobe(p))
> > +		arch_disarm_kprobe(p);
> > +}
> 
> If it ignores disabling/enabling, kprobe_ftrace_callback must
> check kprobe_disabled(p) and skip it.

No, the callers of __(dis)arm_kprobe() also call the (dis)arm_kprobe()
later, which calls (un)register_ftrace_function(). The reason we can't
call the ftrace register functions here is because they call
stop_machine() and then we get into a deadlock with the text_mutex. The
places that call __(dis)arm_kprobe() later call (dis)arm_kprobe() after
releasing the text_mutex.


> 
> > +
> >  #define optimize_kprobe(p)			do {} while (0)
> >  #define unoptimize_kprobe(p, f)			do {} while (0)
> >  #define kill_optimized_kprobe(p)		do {} while (0)
> >  #define prepare_optimized_kprobe(p)		do {} while (0)
> >  #define try_to_optimize_kprobe(p)		do {} while (0)
> > -#define __arm_kprobe(p)				arch_arm_kprobe(p)
> > -#define __disarm_kprobe(p, o)			arch_disarm_kprobe(p)
> >  #define kprobe_disarmed(p)			kprobe_disabled(p)
> >  #define wait_for_kprobe_optimizer()		do {} while (0)
> >
> > @@ -897,7 +949,9 @@ static void reuse_unused_kprobe(struct kprobe *ap)
> >
> >  static __kprobes void free_aggr_kprobe(struct kprobe *p)
> >  {
> > -	arch_remove_kprobe(p);
> > +
> > +	if (!ftrace_kprobe(p))
> > +		arch_remove_kprobe(p);
> >  	kfree(p);
> >  }
> >
> > @@ -910,6 +964,12 @@ static __kprobes struct kprobe *alloc_aggr_kprobe(struct
> kprobe *p)
> >  /* Arm a kprobe with text_mutex */
> >  static void __kprobes arm_kprobe(struct kprobe *kp)
> >  {
> > +	/* ftrace probes can skip arch calls */
> > +	if (ftrace_kprobe(kp)) {
> > +		register_ftrace_function(&kp->fops);
> > +		return;
> > +	}
> > +
> >  	/*
> >  	 * Here, since __arm_kprobe() doesn't use stop_machine(),
> >  	 * this doesn't cause deadlock on text_mutex. So, we don't
> > @@ -924,6 +984,12 @@ static void __kprobes arm_kprobe(struct kprobe *kp)
> >  static void __kprobes disarm_kprobe(struct kprobe *kp)
> >  {
> >  	/* Ditto */
> > +
> > +	if (ftrace_kprobe(kp)) {
> > +		unregister_ftrace_function(&kp->fops);
> > +		return;
> > +	}
> > +
> >  	mutex_lock(&text_mutex);
> >  	__disarm_kprobe(kp, true);
> >  	mutex_unlock(&text_mutex);
> > @@ -1313,6 +1379,56 @@ static inline int check_kprobe_rereg(struct kprobe *p)
> >  	return ret;
> >  }
> >
> > +#ifdef CONFIG_FUNCTION_TRACER
> > +static notrace void
> > +kprobe_ftrace_callback(unsigned long ip, unsigned long parent_ip,
> > +		       struct ftrace_ops *op, struct pt_regs *pt_regs)
> > +{
> > +	struct kprobe *p = container_of(op, struct kprobe, fops);
> > +
> 
> Here, we need to set up kprobe_ctlblk and some of pt_regs members,
> ip, cs and orig_ax as optimized_callback()@arch/x86/kernel/kprobes.c
> does.

I'm curious to what this is used for? It doesn't seem to be needed for
the generic kprobes. Because we know the probe was on a nop, there's no
need to simulate the operation. IOW, there's no need for singlestep() or
other gdb like operations.

> 
> > +	/* A nop has been trapped, just run both handlers back to back */
> > +	if (p->pre_handler)
> > +		p->pre_handler(p, pt_regs);
> 
> And increment regs->ip here for NOP.

Does the post_handler() expect the ip to be after the call? Thus a
post_handle() is the same as pre_handle() on rip+next_ins?

> 
> > +	if (p->post_handler)
> > +		p->post_handler(p, pt_regs, 0);
> > +}
> 
> Anyway, above operations strongly depends on arch, so
> kprobe_ftrace_callback should be moved to arch/*/kprobes.c.
> 
> And I think most of the code can be shared with optimized code.

Not sure why. Except if regs->ip needs to be incremented. And that can
be a per arch header file. Remember, the ftrace version has no need for
single stepping like the optimized version does. ftrace replaces a nop,
where as other kprobes replace actual instructions. This is what make
the ftrace version so much less complex.

> 
> > +
> > +static int use_ftrace_hook(struct kprobe *p)
> > +{
> > +	char str[KSYM_SYMBOL_LEN];
> > +
> > +	/* see if this is an ftrace function */
> > +	if (!ftrace_text_reserved(p->addr, p->addr)) {
> > +		/* make sure fops->func is nop */
> > +		init_non_ftrace_kprobe(p);
> > +		return 0;
> > +	}
> > +
> > +	/* If arch does not support pt_regs passing, bail */
> > +	if (!ARCH_SUPPORTS_FTRACE_SAVE_REGS)
> > +		return -EINVAL;
> 
> Hmm, I think this should be checked at build time...

This is actually keeping the same logic that exists now. See below, we
removed the check of ftrace_text_reserved() from the conditions that
make register_kprobe() return -EINVAL. Now if the arch supports
FTRACE_SAVE_REGS, it can handle the ftrace_text_reserved(), if not, then
it goes back to its original behavior.

The only way to remove the later code is with #ifdef ugliness, and I
didn't want to add that.

> 
> > +
> > +	/* Use ftrace hook instead */
> > +
> > +	memset(&p->fops, 0, sizeof(p->fops));
> > +
> > +	/* Find the function this is connected with this addr */
> > +	kallsyms_lookup((unsigned long)p->addr, NULL, NULL, NULL, str);
> > +
> > +	p->fops.flags = FTRACE_OPS_FL_SAVE_REGS;
> > +	p->fops.func = kprobe_ftrace_callback;
> > +
> > +	ftrace_set_filter(&p->fops, str, strlen(str), 1);
> 
> Hmm, IMHO, ftrace_set_filter should not be called here, because
> there can be other kprobes are already registered on the same
> address. In that case, it is natural that we use an aggr_kprobe
> for handling several kprobes on same address. Or, kprobe hash table
> will have several different probes on same address.

The ftrace_set_filter() only updates the p->fops to what it will trace.
It's the register_ftrace_function() that enables the ftrace tracing, and
that is done with the arm_kprobe() call. If that's where the aggr_kprobe
enables its call, then that will be the only probe that is called.

Now I need to re-examine how the aggr_kprobes work. Does it have its own
handler to call multiple probe->handlers?


> 
> > +
> > +	return 0;
> > +}
> > +#else
> > +static int use_ftrace_hook(struct kprobe *p)
> > +{
> > +	return 0;
> > +}
> > +#endif
> > +
> >  int __kprobes register_kprobe(struct kprobe *p)
> >  {
> >  	int ret = 0;
> > @@ -1329,11 +1445,14 @@ int __kprobes register_kprobe(struct kprobe *p)
> >  	if (ret)
> >  		return ret;
> >
> > +	ret = use_ftrace_hook(p);
> > +	if (ret)
> > +		return ret;
> > +
> >  	jump_label_lock();
> >  	preempt_disable();
> >  	if (!kernel_text_address((unsigned long) p->addr) ||
> >  	    in_kprobes_functions((unsigned long) p->addr) ||
> > -	    ftrace_text_reserved(p->addr, p->addr) ||
> >  	    jump_label_text_reserved(p->addr, p->addr))
> >  		goto fail_with_jump_label;
> >
> > @@ -1384,15 +1503,17 @@ int __kprobes register_kprobe(struct kprobe *p)
> >  		goto out;
> >  	}
> >
> > -	ret = arch_prepare_kprobe(p);
> > -	if (ret)
> > -		goto out;
> > +	if (!ftrace_kprobe(p)) {
> > +		ret = arch_prepare_kprobe(p);
> > +		if (ret)
> > +			goto out;
> > +	}
> >
> >  	INIT_HLIST_NODE(&p->hlist);
> >  	hlist_add_head_rcu(&p->hlist,
> >  		       &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
> >
> > -	if (!kprobes_all_disarmed && !kprobe_disabled(p))
> > +	if (!ftrace_kprobe(p) && !kprobes_all_disarmed && !kprobe_disabled(p))
> >  		__arm_kprobe(p);
> >
> >  	/* Try to optimize kprobe */
> > @@ -1400,6 +1521,12 @@ int __kprobes register_kprobe(struct kprobe *p)
> >
> >  out:
> >  	mutex_unlock(&text_mutex);
> > +
> > +	/* ftrace kprobes must be set outside of text_mutex */
> > +	if (!ret && ftrace_kprobe(p) &&
> > +	    !kprobes_all_disarmed && !kprobe_disabled(p))
> > +		arm_kprobe(p);
> > +
> >  	put_online_cpus();
> >  	jump_label_unlock();
> >  	mutex_unlock(&kprobe_mutex);
> 
> After this, we must handle some fails which can happen when probing
> on a module.

What problems that were added by ftrace that isn't a problem with normal
probes?

> 
> 
> > @@ -2134,6 +2261,14 @@ static void __kprobes arm_all_kprobes(void)
> >  	}
> >  	mutex_unlock(&text_mutex);
> >
> > +	/* ftrace kprobes are enabled outside of text_mutex */
> > +	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
> > +		head = &kprobe_table[i];
> > +		hlist_for_each_entry_rcu(p, node, head, hlist)
> > +			if (ftrace_kprobe(p) && !kprobe_disabled(p))
> > +				arm_kprobe(p);
> > +	}
> > +
> >  	kprobes_all_disarmed = false;
> >  	printk(KERN_INFO "Kprobes globally enabled\n");
> >
> > @@ -2164,11 +2299,21 @@ static void __kprobes disarm_all_kprobes(void)
> >  	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
> >  		head = &kprobe_table[i];
> >  		hlist_for_each_entry_rcu(p, node, head, hlist) {
> > -			if (!arch_trampoline_kprobe(p) && !kprobe_disabled(p))
> > +			if (!ftrace_kprobe(p) &&
> > +			    !arch_trampoline_kprobe(p) && !kprobe_disabled(p))
> >  				__disarm_kprobe(p, false);
> >  		}
> >  	}
> >  	mutex_unlock(&text_mutex);
> > +
> > +	/* ftrace kprobes are disabled outside of text_mutex */
> > +	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
> > +		head = &kprobe_table[i];
> > +		hlist_for_each_entry_rcu(p, node, head, hlist) {
> > +			if (ftrace_kprobe(p) && !kprobe_disabled(p))
> > +				disarm_kprobe(p);
> > +		}
> > +	}
> >  	mutex_unlock(&kprobe_mutex);
> >
> >  	/* Wait for disarming all kprobes by optimizer */
> 
> Thank you,
> 

Thanks for taking the time for your review!

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5][RFC] ftrace: Return pt_regs to function trace callback (x86_64 only so
  2011-08-11 12:59     ` Steven Rostedt
@ 2011-08-12  0:55       ` Masami Hiramatsu
  2011-08-12 13:05         ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2011-08-12  0:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt

(2011/08/11 21:59), Steven Rostedt wrote:
> On Thu, 2011-08-11 at 14:55 +0900, Masami Hiramatsu wrote:
>> (2011/08/11 1:22), Steven Rostedt wrote:
>>> From: Steven Rostedt <srostedt@redhat.com>
>>>
>>> Return as the 4th paramater to the function tracer callback the pt_regs.
>>>
>>> So far this is only supported by x86_64. The ftrace_ops flag
>>> FTRACE_OPS_FL_SAVE_REGS is added to tell the arch to save all regs
>>> to the pt_regs, otherwise a minimum is just passed back (the same
>>> regs that is saved by mcount itself).
>>
>> I guess it will be a bit hard to port this on x86-32, because
>> on x86-32, the top of stack address in pt_regs is the address
>> of sp member (e.g. &(pt_regs->sp)). I mean that when mcount-entry
>> calls ftrace_caller, it pushes an address of the next instruction
>> of mcount-entry on the top of stack.
>> In that case, &(pt_regs->sp) points the entry which stores the
>> address, instead of the return address of probed function.
>
> I'm a little confused here. What is needed by kprobes handlers?

pt_regs, which is same as what the breakpoint interrupt generates.
This means that all members of pt_regs should be filled (except ss)
as interrupt handler does.

> It should be no problem saving the sp member onto pt_regs of the time
> mcount was called, and simulate the probe as if it was put on the nop.

OK.

>
>>
>> e.g. with kprobes (on x86-32):
>>
>> [  <bx>  ] <- pt_regs
>> [  ...   ]
>> [  <cs>  ]
>> [<flags> ]
>> [ret-addr] <- &(pt_regs.sp)
>> [  arg1  ]
>> [  arg2  ]
>>
>> with this method:
>>
>> [  <bx>  ] <- pt_regs
>> [  ...   ]
>> [  <cs>  ]
>> [<flags> ]
>> [mcount-ret] <- &(pt_regs.sp)
>> [ret-addr]
>> [  arg1  ]
>> [  arg2  ]
>>
>> I think this is hard to solve without a tricky hack.
>
> Is this for handling kretprobes? With mcount, we shouldn't be doing that
> because mcount is not the place that kretprobes should be doing this.

Yeah, I know, but if kprobe works on mcount with --fentry *transparently*,
it should support kretprobes.
Moreover, if we'd like to access args on the stack, sp should be able to be
handled as breakpoint-based kprobes. and that has to be done for dynamic-
events on x86-32.

> When we add __fentry__, the arch code will be different to handle that
> and it shouldn't be too hard to make that work with kretprobes.
>
>>
>> For example, on x86-32, MCOUNT_FRAME_SAVE saves
>> flags on the entry which will be <cs> and it saves
>> mcount-ret to local stack and moves flags to next entry.
>>
>> <save-frame>
>> pushf			# save flags on CS(%esp)
>> subl $12, %esp		# skip ip, orig_ax and gs
>> pushl %fs
>> pushl %es
>> ...
>> pushl %ebx
>> movl 56(%esp), %ebx	# load mcount-ret address
>> movl 52(%esp), %ebp	# load flags
>> movl %ebp, 56(%esp)	# store flags
>
> Nope we don't do anything like this. I've already got x86_32 working
> (but maybe it needs more for pt_regs->sp) but why is pushf needed?

as I said below (inlined comment), e.g. someone who wanna chase IF,
he needs eflags.

Please note, kprobes is the function which basically provides a
breakpoint handler in the kernel. Users expect that. So all registers
should be provided as breakpoint interrupt does. That's what I said
"transparently".

>
> -- Steve
>
>>
>> call function (ebx is callee save)
>>
>> <restore-frame>
>> movl 56(%esp), %ebp	# load flags
>> movl %ebp, 52(%esp)	# store flags
>> movl %ebx, 56(%esp)	# load mcount-ret address
>> ...
>> popf
>> ret
>>
>> Hmm?
>>
>>> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>>> ---
>>>  arch/x86/include/asm/ftrace.h     |   38 ++++++++++++++++++++----------------
>>>  arch/x86/kernel/entry_64.S        |   23 +++++++++++++++++++++-
>>>  include/linux/ftrace.h            |   15 ++++++++++++-
>>>  kernel/trace/ftrace.c             |   29 ++++++++++++++++++---------
>>>  kernel/trace/trace_events.c       |    2 +-
>>>  kernel/trace/trace_functions.c    |    7 +++--
>>>  kernel/trace/trace_irqsoff.c      |    2 +-
>>>  kernel/trace/trace_sched_wakeup.c |    3 +-
>>>  kernel/trace/trace_selftest.c     |   15 +++++++++----
>>>  kernel/trace/trace_stack.c        |    3 +-
>>>  10 files changed, 95 insertions(+), 42 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
>>> index b3fcf16..0750c2a 100644
>>> --- a/arch/x86/include/asm/ftrace.h
>>> +++ b/arch/x86/include/asm/ftrace.h
>>> @@ -4,26 +4,29 @@
>>>  #ifdef __ASSEMBLY__
>>>
>>>  	.macro MCOUNT_SAVE_FRAME
>>> -	/* taken from glibc */
>>> -	subq $0x38, %rsp
>>> -	movq %rax, (%rsp)
>>> -	movq %rcx, 8(%rsp)
>>> -	movq %rdx, 16(%rsp)
>>> -	movq %rsi, 24(%rsp)
>>> -	movq %rdi, 32(%rsp)
>>> -	movq %r8, 40(%rsp)
>>> -	movq %r9, 48(%rsp)
>>> +	 /*
>>> +	  * We add enough stack to save all regs,
>>> +	  * and we what we need in the location of pt_regs.
>>> +	  */
>>> +	subq $ORIG_RAX, %rsp
>>> +	movq %rax, RAX(%rsp)
>>> +	movq %rcx, RCX(%rsp)
>>> +	movq %rdx, RDX(%rsp)
>>> +	movq %rsi, RSI(%rsp)
>>> +	movq %rdi, RDI(%rsp)
>>> +	movq %r8, R8(%rsp)
>>> +	movq %r9, R9(%rsp)
>>>  	.endm
>>>
>>>  	.macro MCOUNT_RESTORE_FRAME
>>> -	movq 48(%rsp), %r9
>>> -	movq 40(%rsp), %r8
>>> -	movq 32(%rsp), %rdi
>>> -	movq 24(%rsp), %rsi
>>> -	movq 16(%rsp), %rdx
>>> -	movq 8(%rsp), %rcx
>>> -	movq (%rsp), %rax
>>> -	addq $0x38, %rsp
>>> +	movq R9(%rsp), %r9
>>> +	movq R8(%rsp), %r8
>>> +	movq RDI(%rsp), %rdi
>>> +	movq RSI(%rsp), %rsi
>>> +	movq RDX(%rsp), %rdx
>>> +	movq RCX(%rsp), %rcx
>>> +	movq RAX(%rsp), %rax
>>> +	addq $ORIG_RAX, %rsp
>>>  	.endm
>>>
>>>  #endif
>>> @@ -34,6 +37,7 @@
>>>
>>>  #if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_X86_64)
>>>  #define ARCH_SUPPORTS_FTRACE_OPS 1
>>> +#define ARCH_SUPPORTS_FTRACE_SAVE_REGS 1
>>>  #endif
>>>
>>>  #ifndef __ASSEMBLY__
>>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>>> index 27adc2b..b77f297 100644
>>> --- a/arch/x86/kernel/entry_64.S
>>> +++ b/arch/x86/kernel/entry_64.S
>>> @@ -78,7 +78,16 @@ ENTRY(ftrace_caller)
>>
>> I can see below code before save frame.
>>
>> 	cmpl $0, function_trace_stop
>> 	jne  ftrace_stub
>>
>> Please pushf before comparing it. :)
>> Sometimes, the value of eflags is worth to watch.
>> I know that SF/ZF will be never used between
>> function call, so it is OK if the eflags is saved
>> in MCOUNT_SAVE_FRAME.
>>
>>>  	MCOUNT_SAVE_FRAME
>>>
>>>  	leaq function_trace_op, %rdx
>>> -	movq 0x38(%rsp), %rdi
>>> +
>>> +	cmpl $0, ftrace_save_regs
>>> +	jne  save_all_regs
>>> +
>>> +call_func:
>>> +
>>> +	/* regs go into 4th parameter */
>>> +	leaq (%rsp), %rcx
>>> +
>>> +	movq ORIG_RAX(%rsp), %rdi
>>>  	movq 8(%rbp), %rsi
>>>  	subq $MCOUNT_INSN_SIZE, %rdi
>>>
>>> @@ -96,6 +105,18 @@ GLOBAL(ftrace_stub)
>>>  	retq
>>>  END(ftrace_caller)
>>>
>>> +save_all_regs:
>>> +	/* Save the rest of pt_regs */
>>> +	movq %r15, R15(%rsp)
>>> +	movq %r14, R14(%rsp)
>>> +	movq %r13, R13(%rsp)
>>> +	movq %r12, R12(%rsp)
>>> +	movq %r10, R10(%rsp)
>>> +	movq %rbp, RBP(%rsp)
>>> +	movq %rbx, RBX(%rsp)
>>> +	jmp call_func
>>
>> At least, pt_regs.sp must be saved for accessing
>> vars on stack.
>>
>>> +
>>> +
>>>  #else /* ! CONFIG_DYNAMIC_FTRACE */
>>>  ENTRY(mcount)
>>>  	cmpl $0, function_trace_stop
>>
>> You also need to restore the rest of pt_regs if
>> ftrace_save_regs is true.
>>
>> Thank you,
>>
>


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5][RFC] kprobes: Use ftrace hooks when probing ftrace nops
  2011-08-11 13:22     ` Steven Rostedt
@ 2011-08-12  2:41       ` Masami Hiramatsu
  2011-08-12  5:46       ` Ananth N Mavinakayanahalli
  1 sibling, 0 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2011-08-12  2:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt, Ananth N Mavinakayanahalli

(2011/08/11 22:22), Steven Rostedt wrote:
> On Thu, 2011-08-11 at 16:41 +0900, Masami Hiramatsu wrote:
>> Hi Steven,
>>
>> As I suggested in another reply, I'm now attracted by the idea
>> of using ftrace in kprobe-tracer, because of simplicity.
>> Anyway, here is my review.
>
> It may be a little simpler, but it defeats the purpose of what I'm
> trying to accomplish. I do want the ability to expand this to do other
> things than just trace_kprobe.

OK, could you tell me what would you like to accomplish with this?
I thought this was a challange to trace function arguments with
low-overhead. And if so, I think expanding trace_kprobe (a.k.a.
dynamic-events) to handle ftrace with pt_regs is a better way.

What is the good point to expand kprobes itself instead of using
ftrace directly?

If you consider -mfentry reduces the kprobes usefulness because
it prevents probe function entry, I think we can put kprobes
just after mcount code  because mcount code does not change anything,
and this doesn't affect anything.

E.g. if user puts kprobe on a function+0, it automatically moved
to function+5, but kp->addr is still same as the function address.
With this change, we don't need to change anything. User will see
the IP address in pre_handler() is function+6 (5 for mcount and 1
for int3), and it will be acceptable because real probed IP address
must be gotten from kp->addr.


>> (2011/08/11 1:22), Steven Rostedt wrote:
>>> From: Steven Rostedt <srostedt@redhat.com>
>>>
>>> Currently, if a probe is set at an ftrace nop, the probe will fail.
>>>
>>> When -mfentry is used by ftrace (in the near future), the nop will be
>>> at the beginning of the function. This will prevent kprobes from hooking
>>> to the most common place that kprobes are attached.
>>>
>>> Now that ftrace supports individual users as well as pt_regs passing,
>>> kprobes can use the ftrace infrastructure when a probe is placed on
>>> a ftrace nop.
>>
>> My one general concern is the timing of enabling ftrace-based probe.
>> Breakpoint-based kprobe (including optimized kprobe) is enabled
>> right after registering. Users might expect that.
>> And AFAIK, dynamic ftraces handler will be enabled (activated)
>> after a while, because it has to wait for an NMI, doesn't it?
>
> There's no NMI needed. Not sure where you got that idea. When
> register_ftrace_function() is called, it will start tracing immediately.
> Note, stop_machine() is called, but we could fix that in the future too.

Ah, I see. So timing issue will be minor (even though calling stop_machine()
each time might be solved).

>> And theoretically, this ftrace-based probe can't support jprobe,
>> because it changes IP address. Nowadays, this may becomes minor
>> problem (because someone who wants to trace function args can
>> use kprobe-tracer), but still exist.
>
> I want to fix that too. :)
>
> We don't need to worry about that until -mfentry exists. But once that
> does, then, yes, I want the ftrace version working for jprobe too.

OK, changing IP is not so hard.


>>> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>>> ---
>>>  include/linux/kprobes.h |    6 ++
>>>  kernel/kprobes.c        |  163 ++++++++++++++++++++++++++++++++++++++++++++---
>>>  2 files changed, 160 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
>>> index dd7c12e..5742071 100644
>>> --- a/include/linux/kprobes.h
>>> +++ b/include/linux/kprobes.h
>>> @@ -37,6 +37,7 @@
>>>  #include <linux/spinlock.h>
>>>  #include <linux/rcupdate.h>
>>>  #include <linux/mutex.h>
>>> +#include <linux/ftrace.h>
>>>
>>>  #ifdef CONFIG_KPROBES
>>>  #include <asm/kprobes.h>
>>> @@ -112,6 +113,11 @@ struct kprobe {
>>>  	/* copy of the original instruction */
>>>  	struct arch_specific_insn ainsn;
>>>
>>> +#ifdef CONFIG_FUNCTION_TRACER
>>> +	/* If it is possible to use ftrace to probe */
>>> +	struct ftrace_ops fops;
>>> +#endif
>>> +
>>>  	/*
>>>  	 * Indicates various status flags.
>>>  	 * Protected by kprobe_mutex after this kprobe is registered.
>>> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
>>> index e6c25eb..2160768 100644
>>> --- a/kernel/kprobes.c
>>> +++ b/kernel/kprobes.c
>>> @@ -102,6 +102,31 @@ static struct kprobe_blackpoint kprobe_blacklist[] = {
>>>  	{NULL}    /* Terminator */
>>>  };
>>>
>>> +#ifdef CONFIG_FUNCTION_TRACER
>>> +/* kprobe stub function for use when probe is not connected to ftrace */
>>> +static void
>>> +kprobe_ftrace_stub(unsigned long ip, unsigned long pip,
>>> +		   struct ftrace_ops *op, struct pt_regs *pt_regs)
>>> +{
>>> +}
>>> +
>>> +static bool ftrace_kprobe(struct kprobe *p)
>>> +{
>>> +	return p->fops.func && p->fops.func != kprobe_ftrace_stub;
>>> +}
>>> +
>>> +static void init_non_ftrace_kprobe(struct kprobe *p)
>>> +{
>>> +	p->fops.func = kprobe_ftrace_stub;
>>> +}
>>> +
>>> +#else
>>> +static bool ftrace_kprobe(struct kprobe *p)
>>> +{
>>> +	return false;
>>> +}
>>> +#endif
>>> +
>>>  #ifdef __ARCH_WANT_KPROBES_INSN_SLOT
>>>  /*
>>>   * kprobe->ainsn.insn points to the copy of the instruction to be
>>> @@ -396,6 +421,9 @@ static __kprobes void free_aggr_kprobe(struct kprobe *p)
>>>  {
>>>  	struct optimized_kprobe *op;
>>>
>>> +	if (ftrace_kprobe(p))
>>> +		return;
>>> +
>>>  	op = container_of(p, struct optimized_kprobe, kp);
>>>  	arch_remove_optimized_kprobe(op);
>>>  	arch_remove_kprobe(p);
>>> @@ -759,6 +787,9 @@ static __kprobes void try_to_optimize_kprobe(struct
kprobe *p)
>>>  	struct kprobe *ap;
>>>  	struct optimized_kprobe *op;
>>>
>>> +	if (ftrace_kprobe(p))
>>> +		return;
>>> +
>>>  	ap = alloc_aggr_kprobe(p);
>>>  	if (!ap)
>>>  		return;
>>> @@ -849,6 +880,10 @@ static void __kprobes __arm_kprobe(struct kprobe *p)
>>>  {
>>>  	struct kprobe *_p;
>>>
>>> +	/* Only arm non-ftrace probes */
>>> +	if (ftrace_kprobe(p))
>>> +		return;
>>> +
>>>  	/* Check collision with other optimized kprobes */
>>>  	_p = get_optimized_kprobe((unsigned long)p->addr);
>>>  	if (unlikely(_p))
>>> @@ -864,6 +899,10 @@ static void __kprobes __disarm_kprobe(struct kprobe *p,
>> bool reopt)
>>>  {
>>>  	struct kprobe *_p;
>>>
>>> +	/* Only disarm non-ftrace probes */
>>> +	if (ftrace_kprobe(p))
>>> +		return;
>>> +
>>>  	unoptimize_kprobe(p, false);	/* Try to unoptimize */
>>>
>>>  	if (!kprobe_queued(p)) {
>>> @@ -878,13 +917,26 @@ static void __kprobes __disarm_kprobe(struct kprobe *p,
>> bool reopt)
>>>
>>>  #else /* !CONFIG_OPTPROBES */
>>>
>>> +static void __kprobes __arm_kprobe(struct kprobe *p)
>>> +{
>>> +	/* Only arm non-ftrace probes */
>>> +	if (!ftrace_kprobe(p))
>>> +		arch_arm_kprobe(p);
>>> +}
>>> +
>>> +/* Remove the breakpoint of a probe. Must be called with text_mutex locked */
>>> +static void __kprobes __disarm_kprobe(struct kprobe *p, bool reopt)
>>> +{
>>> +	/* Only disarm non-ftrace probes */
>>> +	if (!ftrace_kprobe(p))
>>> +		arch_disarm_kprobe(p);
>>> +}
>>
>> If it ignores disabling/enabling, kprobe_ftrace_callback must
>> check kprobe_disabled(p) and skip it.
>
> No, the callers of __(dis)arm_kprobe() also call the (dis)arm_kprobe()
> later, which calls (un)register_ftrace_function(). The reason we can't
> call the ftrace register functions here is because they call
> stop_machine() and then we get into a deadlock with the text_mutex. The
> places that call __(dis)arm_kprobe() later call (dis)arm_kprobe() after
> releasing the text_mutex.

OK.

>>> +
>>>  #define optimize_kprobe(p)			do {} while (0)
>>>  #define unoptimize_kprobe(p, f)			do {} while (0)
>>>  #define kill_optimized_kprobe(p)		do {} while (0)
>>>  #define prepare_optimized_kprobe(p)		do {} while (0)
>>>  #define try_to_optimize_kprobe(p)		do {} while (0)
>>> -#define __arm_kprobe(p)				arch_arm_kprobe(p)
>>> -#define __disarm_kprobe(p, o)			arch_disarm_kprobe(p)
>>>  #define kprobe_disarmed(p)			kprobe_disabled(p)
>>>  #define wait_for_kprobe_optimizer()		do {} while (0)
>>>
>>> @@ -897,7 +949,9 @@ static void reuse_unused_kprobe(struct kprobe *ap)
>>>
>>>  static __kprobes void free_aggr_kprobe(struct kprobe *p)
>>>  {
>>> -	arch_remove_kprobe(p);
>>> +
>>> +	if (!ftrace_kprobe(p))
>>> +		arch_remove_kprobe(p);
>>>  	kfree(p);
>>>  }
>>>
>>> @@ -910,6 +964,12 @@ static __kprobes struct kprobe *alloc_aggr_kprobe(struct
>> kprobe *p)
>>>  /* Arm a kprobe with text_mutex */
>>>  static void __kprobes arm_kprobe(struct kprobe *kp)
>>>  {
>>> +	/* ftrace probes can skip arch calls */
>>> +	if (ftrace_kprobe(kp)) {
>>> +		register_ftrace_function(&kp->fops);
>>> +		return;
>>> +	}
>>> +
>>>  	/*
>>>  	 * Here, since __arm_kprobe() doesn't use stop_machine(),
>>>  	 * this doesn't cause deadlock on text_mutex. So, we don't
>>> @@ -924,6 +984,12 @@ static void __kprobes arm_kprobe(struct kprobe *kp)
>>>  static void __kprobes disarm_kprobe(struct kprobe *kp)
>>>  {
>>>  	/* Ditto */
>>> +
>>> +	if (ftrace_kprobe(kp)) {
>>> +		unregister_ftrace_function(&kp->fops);
>>> +		return;
>>> +	}
>>> +
>>>  	mutex_lock(&text_mutex);
>>>  	__disarm_kprobe(kp, true);
>>>  	mutex_unlock(&text_mutex);
>>> @@ -1313,6 +1379,56 @@ static inline int check_kprobe_rereg(struct kprobe *p)
>>>  	return ret;
>>>  }
>>>
>>> +#ifdef CONFIG_FUNCTION_TRACER
>>> +static notrace void
>>> +kprobe_ftrace_callback(unsigned long ip, unsigned long parent_ip,
>>> +		       struct ftrace_ops *op, struct pt_regs *pt_regs)
>>> +{
>>> +	struct kprobe *p = container_of(op, struct kprobe, fops);
>>> +
>>
>> Here, we need to set up kprobe_ctlblk and some of pt_regs members,
>> ip, cs and orig_ax as optimized_callback()@arch/x86/kernel/kprobes.c
>> does.
>
> I'm curious to what this is used for? It doesn't seem to be needed for
> the generic kprobes. Because we know the probe was on a nop, there's no
> need to simulate the operation. IOW, there's no need for singlestep() or
> other gdb like operations.

That is not for the singlestep but the handler may expect that, because
kprobes handler doesn't know it is hit on a breakpoint, jump or mcount
call.
So, we have to simulate it as much as possible. Or, it's not the kprobes.


>>
>>> +	/* A nop has been trapped, just run both handlers back to back */
>>> +	if (p->pre_handler)
>>> +		p->pre_handler(p, pt_regs);
>>
>> And increment regs->ip here for NOP.
>
> Does the post_handler() expect the ip to be after the call? Thus a
> post_handle() is the same as pre_handle() on rip+next_ins?

Usually, post_handler() is for watching the behavior of probed
instruction. If the kprobe puts a breakpoint on "addl $10, %eax",
post_handler() will see the value of %eax is incremented by 10.

>>> +	if (p->post_handler)
>>> +		p->post_handler(p, pt_regs, 0);
>>> +}
>>
>> Anyway, above operations strongly depends on arch, so
>> kprobe_ftrace_callback should be moved to arch/*/kprobes.c.
>>
>> And I think most of the code can be shared with optimized code.
>
> Not sure why. Except if regs->ip needs to be incremented. And that can
> be a per arch header file. Remember, the ftrace version has no need for
> single stepping like the optimized version does. ftrace replaces a nop,
> where as other kprobes replace actual instructions. This is what make
> the ftrace version so much less complex.

Yeah, I just meant about preparing code. Hmm, putting that kind of
register fixup in the arch header may be a good idea.

>>> +
>>> +static int use_ftrace_hook(struct kprobe *p)
>>> +{
>>> +	char str[KSYM_SYMBOL_LEN];
>>> +
>>> +	/* see if this is an ftrace function */
>>> +	if (!ftrace_text_reserved(p->addr, p->addr)) {
>>> +		/* make sure fops->func is nop */
>>> +		init_non_ftrace_kprobe(p);
>>> +		return 0;
>>> +	}
>>> +
>>> +	/* If arch does not support pt_regs passing, bail */
>>> +	if (!ARCH_SUPPORTS_FTRACE_SAVE_REGS)
>>> +		return -EINVAL;
>>
>> Hmm, I think this should be checked at build time...
>
> This is actually keeping the same logic that exists now. See below, we
> removed the check of ftrace_text_reserved() from the conditions that
> make register_kprobe() return -EINVAL. Now if the arch supports
> FTRACE_SAVE_REGS, it can handle the ftrace_text_reserved(), if not, then
> it goes back to its original behavior.
>
> The only way to remove the later code is with #ifdef ugliness, and I
> didn't want to add that.

OK, it's for avoiding ugliness.

>
>>
>>> +
>>> +	/* Use ftrace hook instead */
>>> +
>>> +	memset(&p->fops, 0, sizeof(p->fops));
>>> +
>>> +	/* Find the function this is connected with this addr */
>>> +	kallsyms_lookup((unsigned long)p->addr, NULL, NULL, NULL, str);
>>> +
>>> +	p->fops.flags = FTRACE_OPS_FL_SAVE_REGS;
>>> +	p->fops.func = kprobe_ftrace_callback;
>>> +
>>> +	ftrace_set_filter(&p->fops, str, strlen(str), 1);
>>
>> Hmm, IMHO, ftrace_set_filter should not be called here, because
>> there can be other kprobes are already registered on the same
>> address. In that case, it is natural that we use an aggr_kprobe
>> for handling several kprobes on same address. Or, kprobe hash table
>> will have several different probes on same address.
>
> The ftrace_set_filter() only updates the p->fops to what it will trace.
> It's the register_ftrace_function() that enables the ftrace tracing, and
> that is done with the arm_kprobe() call. If that's where the aggr_kprobe
> enables its call, then that will be the only probe that is called.

Ah, OK, I see. so ftrace_set_filter() just prepares p->fops, not
registers something in ftrace.


> Now I need to re-examine how the aggr_kprobes work. Does it have its own
> handler to call multiple probe->handlers?

Yes, aggr_pre/fault/post/break_handler() are those.

>>> +
>>> +	return 0;
>>> +}
>>> +#else
>>> +static int use_ftrace_hook(struct kprobe *p)
>>> +{
>>> +	return 0;
>>> +}
>>> +#endif
>>> +
>>>  int __kprobes register_kprobe(struct kprobe *p)
>>>  {
>>>  	int ret = 0;
>>> @@ -1329,11 +1445,14 @@ int __kprobes register_kprobe(struct kprobe *p)
>>>  	if (ret)
>>>  		return ret;
>>>
>>> +	ret = use_ftrace_hook(p);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>>  	jump_label_lock();
>>>  	preempt_disable();
>>>  	if (!kernel_text_address((unsigned long) p->addr) ||
>>>  	    in_kprobes_functions((unsigned long) p->addr) ||
>>> -	    ftrace_text_reserved(p->addr, p->addr) ||
>>>  	    jump_label_text_reserved(p->addr, p->addr))
>>>  		goto fail_with_jump_label;
>>>
>>> @@ -1384,15 +1503,17 @@ int __kprobes register_kprobe(struct kprobe *p)
>>>  		goto out;
>>>  	}
>>>
>>> -	ret = arch_prepare_kprobe(p);
>>> -	if (ret)
>>> -		goto out;
>>> +	if (!ftrace_kprobe(p)) {
>>> +		ret = arch_prepare_kprobe(p);
>>> +		if (ret)
>>> +			goto out;
>>> +	}
>>>
>>>  	INIT_HLIST_NODE(&p->hlist);
>>>  	hlist_add_head_rcu(&p->hlist,
>>>  		       &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
>>>
>>> -	if (!kprobes_all_disarmed && !kprobe_disabled(p))
>>> +	if (!ftrace_kprobe(p) && !kprobes_all_disarmed && !kprobe_disabled(p))
>>>  		__arm_kprobe(p);
>>>
>>>  	/* Try to optimize kprobe */
>>> @@ -1400,6 +1521,12 @@ int __kprobes register_kprobe(struct kprobe *p)
>>>
>>>  out:
>>>  	mutex_unlock(&text_mutex);
>>> +
>>> +	/* ftrace kprobes must be set outside of text_mutex */
>>> +	if (!ret && ftrace_kprobe(p) &&
>>> +	    !kprobes_all_disarmed && !kprobe_disabled(p))
>>> +		arm_kprobe(p);
>>> +
>>>  	put_online_cpus();
>>>  	jump_label_unlock();
>>>  	mutex_unlock(&kprobe_mutex);
>>
>> After this, we must handle some fails which can happen when probing
>> on a module.
>
> What problems that were added by ftrace that isn't a problem with normal
> probes?

It was my misunderstand of ftrace_set_filter(). No need to do anything.

>
>>
>>
>>> @@ -2134,6 +2261,14 @@ static void __kprobes arm_all_kprobes(void)
>>>  	}
>>>  	mutex_unlock(&text_mutex);
>>>
>>> +	/* ftrace kprobes are enabled outside of text_mutex */
>>> +	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
>>> +		head = &kprobe_table[i];
>>> +		hlist_for_each_entry_rcu(p, node, head, hlist)
>>> +			if (ftrace_kprobe(p) && !kprobe_disabled(p))
>>> +				arm_kprobe(p);
>>> +	}
>>> +
>>>  	kprobes_all_disarmed = false;
>>>  	printk(KERN_INFO "Kprobes globally enabled\n");
>>>
>>> @@ -2164,11 +2299,21 @@ static void __kprobes disarm_all_kprobes(void)
>>>  	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
>>>  		head = &kprobe_table[i];
>>>  		hlist_for_each_entry_rcu(p, node, head, hlist) {
>>> -			if (!arch_trampoline_kprobe(p) && !kprobe_disabled(p))
>>> +			if (!ftrace_kprobe(p) &&
>>> +			    !arch_trampoline_kprobe(p) && !kprobe_disabled(p))
>>>  				__disarm_kprobe(p, false);
>>>  		}
>>>  	}
>>>  	mutex_unlock(&text_mutex);
>>> +
>>> +	/* ftrace kprobes are disabled outside of text_mutex */
>>> +	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
>>> +		head = &kprobe_table[i];
>>> +		hlist_for_each_entry_rcu(p, node, head, hlist) {
>>> +			if (ftrace_kprobe(p) && !kprobe_disabled(p))
>>> +				disarm_kprobe(p);
>>> +		}
>>> +	}
>>>  	mutex_unlock(&kprobe_mutex);
>>>
>>>  	/* Wait for disarming all kprobes by optimizer */
>>
>> Thank you,
>>
>
> Thanks for taking the time for your review!
>
> -- Steve
>
>


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-11 13:01       ` Steven Rostedt
@ 2011-08-12  2:57         ` Masami Hiramatsu
  2011-08-12 13:08           ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2011-08-12  2:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt, Ananth N Mavinakayanahalli

(2011/08/11 22:01), Steven Rostedt wrote:
> On Thu, 2011-08-11 at 15:28 +0900, Masami Hiramatsu wrote:
>> (2011/08/11 9:34), Steven Rostedt wrote:
>>> On Thu, 2011-08-11 at 09:21 +0900, Masami Hiramatsu wrote:
>>>> Hi Steven,
>>>>
>>>> Thanks for this nice feature!
>>>>
>>>> (2011/08/11 1:22), Steven Rostedt wrote:
>>>>> Hi All,
>>>>>
>>>>> I started working on adding the -mfentry switch to ftrace, which
>>>>> allows us to remove the frame pointers requirement from function tracing
>>>>> as well as makes mcount (fentry) work just better.
>>>>>
>>>>> But when I did this in another branch, I noticed that I broke kprobes
>>>>> in its most common usage. The attaching a probe at the beginning of
>>>>> a function to use get to its parameters.
>>>>>
>>>>> So I started this branch. This branch is to have kprobes use ftrace
>>>>> directly when a probe is attached to a ftrace nop. Currently, kprobes
>>>>> will just error when that happens. With this patch set, it will hook
>>>>> into the ftrace infrastructure and use ftrace instead. This is more
>>>>> like an optimized probe as no breakpoints need to be set. A call to
>>>>> the function is done directly via the mcount trampoline. If ftrace
>>>>> pt_regs is implemented for an arch, kprobes gets this feature for free.
>>>>
>>>> I agreed this idea, this looks good to me too :)
>>>> With -fentry, this can improve dynamic trace events very much.
>>>>
>>>> BTW (OT), it seems that current kprobe data structure becomes a bit
>>>> fat. Maybe what we need is just a "holder of hooking handler" as
>>>> what ftrace provides, not a full storage data structure of copied
>>>> instrucutions. Perhaps, we'd better diet the kprobe structure for
>>>> transparency of hooking infrastructure.
>>>
>>> Sure, I can make the ftrace_ops field in kprobes dynamically allocated
>>> instead. That shouldn't be an issue.
>>
>> By the way (again), perhaps, much simpler solution is using ftrace
>> not in kprobe, but in the trace_kprobe. Of course, there are several
>> pros and cons...
>>
>> The pros:
>> - Arch independent solution (anyway, ftrace still needs passing pt_regs
>>  to their handler)
>> - Don't need to introduce more complexity into kprobes itself.
>> - Maybe systemtap also can catch up with this as using same method.
>
> Note that systemtap and others will be hooking into kprobes version, not
> the trace_kprobe one. If we do it in trace_kprobe, then everyone else
> needs to reimplement it too.  I have bigger ideas for the future of
> this, and I really want to get this working. If it doesn't work for
> kprobes, then it won't work for anything else.

I don't think it won't work. It can work but on a long way.
Could you tell me your "bigger ideas"? Perhaps, we are on the different
way but aim to same goal.

>> The cons:
>> - Native kprobes users will be disappointed... anyway, they just need to
>>   move their probes to the next instruction (usually addr+=5 is OK).
>
> I've been told that doing the addr+=5 (which is also arch specific) can
> break things for other tools.

As I told in previous mail, I think kprobes can do that transparently.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5][RFC] kprobes: Use ftrace hooks when probing ftrace nops
  2011-08-11 13:22     ` Steven Rostedt
  2011-08-12  2:41       ` Masami Hiramatsu
@ 2011-08-12  5:46       ` Ananth N Mavinakayanahalli
  2011-08-12 13:14         ` Steven Rostedt
  1 sibling, 1 reply; 28+ messages in thread
From: Ananth N Mavinakayanahalli @ 2011-08-12  5:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
	Arnaldo Carvalho de Melo, Jason Baron, yrl.pp-manager.tt

On Thu, Aug 11, 2011 at 09:22:36AM -0400, Steven Rostedt wrote:
> On Thu, 2011-08-11 at 16:41 +0900, Masami Hiramatsu wrote:

...

> > > +#ifdef CONFIG_FUNCTION_TRACER
> > > +static notrace void
> > > +kprobe_ftrace_callback(unsigned long ip, unsigned long parent_ip,
> > > +		       struct ftrace_ops *op, struct pt_regs *pt_regs)
> > > +{
> > > +	struct kprobe *p = container_of(op, struct kprobe, fops);
> > > +
> > 
> > Here, we need to set up kprobe_ctlblk and some of pt_regs members,
> > ip, cs and orig_ax as optimized_callback()@arch/x86/kernel/kprobes.c
> > does.
> 
> I'm curious to what this is used for? It doesn't seem to be needed for
> the generic kprobes. Because we know the probe was on a nop, there's no
> need to simulate the operation. IOW, there's no need for singlestep() or
> other gdb like operations.

It is needed to handle recursion, for instance. If in the rare case a
kprobe handler calls another routine which also has a kprobe installed
(and so on), we just bypass calling the 2nd kprobe's handlers. To do
this and to set back the original kprobe context, we track the kprobe
state in the per_cpu kprobe_ctlblk.

Ananth

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5][RFC] ftrace: Return pt_regs to function trace callback (x86_64 only so
  2011-08-12  0:55       ` Masami Hiramatsu
@ 2011-08-12 13:05         ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2011-08-12 13:05 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt

On Fri, 2011-08-12 at 09:55 +0900, Masami Hiramatsu wrote:

> Please note, kprobes is the function which basically provides a
> breakpoint handler in the kernel. Users expect that. So all registers
> should be provided as breakpoint interrupt does. That's what I said
> "transparently".

Well, I think I'm on the right track then. Because I'm working to get
the function graph tracer to use this functionality as well.

I believe that I can make this work "transparently" too.

I'm now working on having ftrace use the breakpoint/update method as
well as the stop_machine() method. If an arch has an alternative to
stop_machine, then it can use that method instead.

-- Steve




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-12  2:57         ` Masami Hiramatsu
@ 2011-08-12 13:08           ` Steven Rostedt
  2011-08-13 10:09             ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2011-08-12 13:08 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt, Ananth N Mavinakayanahalli

On Fri, 2011-08-12 at 11:57 +0900, Masami Hiramatsu wrote:

> I don't think it won't work. It can work but on a long way.
> Could you tell me your "bigger ideas"? Perhaps, we are on the different
> way but aim to same goal.

Part of the bigger ideas is to have things like function graph tracing
use this, as it will simplify the entry.S code. There's other things
that may come out of this too.

> 
> >> The cons:
> >> - Native kprobes users will be disappointed... anyway, they just need to
> >>   move their probes to the next instruction (usually addr+=5 is OK).
> >
> > I've been told that doing the addr+=5 (which is also arch specific) can
> > break things for other tools.
> 
> As I told in previous mail, I think kprobes can do that transparently.

Yeah, it could, but I'm afraid this may need to be done differently for
every arch. If I'm working on getting this code to work for ftrace,
which will require modifying ever arch as well, kprobes can get the
changes for free.

-- Steve




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5][RFC] kprobes: Use ftrace hooks when probing ftrace nops
  2011-08-12  5:46       ` Ananth N Mavinakayanahalli
@ 2011-08-12 13:14         ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2011-08-12 13:14 UTC (permalink / raw)
  To: ananth
  Cc: Masami Hiramatsu, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
	Arnaldo Carvalho de Melo, Jason Baron, yrl.pp-manager.tt

On Fri, 2011-08-12 at 11:16 +0530, Ananth N Mavinakayanahalli wrote:
> On Thu, Aug 11, 2011 at 09:22:36AM -0400, Steven Rostedt wrote:
> > On Thu, 2011-08-11 at 16:41 +0900, Masami Hiramatsu wrote:
> 
> ...
> 
> > > > +#ifdef CONFIG_FUNCTION_TRACER
> > > > +static notrace void
> > > > +kprobe_ftrace_callback(unsigned long ip, unsigned long parent_ip,
> > > > +		       struct ftrace_ops *op, struct pt_regs *pt_regs)
> > > > +{
> > > > +	struct kprobe *p = container_of(op, struct kprobe, fops);
> > > > +
> > > 
> > > Here, we need to set up kprobe_ctlblk and some of pt_regs members,
> > > ip, cs and orig_ax as optimized_callback()@arch/x86/kernel/kprobes.c
> > > does.
> > 
> > I'm curious to what this is used for? It doesn't seem to be needed for
> > the generic kprobes. Because we know the probe was on a nop, there's no
> > need to simulate the operation. IOW, there's no need for singlestep() or
> > other gdb like operations.
> 
> It is needed to handle recursion, for instance. If in the rare case a
> kprobe handler calls another routine which also has a kprobe installed
> (and so on), we just bypass calling the 2nd kprobe's handlers. To do
> this and to set back the original kprobe context, we track the kprobe
> state in the per_cpu kprobe_ctlblk.

OK, so this looks a bit more involved. I'll look into this as well. This
is still all under RFC now anyway. But I think it is doable.

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-12 13:08           ` Steven Rostedt
@ 2011-08-13 10:09             ` Masami Hiramatsu
  2011-08-14  2:58               ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2011-08-13 10:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt, Ananth N Mavinakayanahalli

(2011/08/12 22:08), Steven Rostedt wrote:
> On Fri, 2011-08-12 at 11:57 +0900, Masami Hiramatsu wrote:
>
>> I don't think it won't work. It can work but on a long way.
>> Could you tell me your "bigger ideas"? Perhaps, we are on the different
>> way but aim to same goal.
>
> Part of the bigger ideas is to have things like function graph tracing
> use this, as it will simplify the entry.S code. There's other things
> that may come out of this too.

Hmm, I think that the current function graph tracing implementation
is more scalable than kretprobes, because kretprobe requires
spinlock on every hit. Moreover, you can't probe NMI handler with
kprobe, and kprobes on irq-handler are also possible to fail
because of recursive-call.
So I don't recommend using kretprobe for function-graph tracer :-(

However, even though, some parts of code can be shared among them
and it will simplify their implementations.

>>>> The cons:
>>>> - Native kprobes users will be disappointed... anyway, they just need to
>>>>   move their probes to the next instruction (usually addr+=5 is OK).
>>>
>>> I've been told that doing the addr+=5 (which is also arch specific) can
>>> break things for other tools.
>>
>> As I told in previous mail, I think kprobes can do that transparently.
>
> Yeah, it could, but I'm afraid this may need to be done differently for
> every arch. If I'm working on getting this code to work for ftrace,
> which will require modifying ever arch as well, kprobes can get the
> changes for free.

I guess that anyway it should be done (at least, carefully checked)
for every arch.

Thank you,


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-13 10:09             ` Masami Hiramatsu
@ 2011-08-14  2:58               ` Steven Rostedt
  2011-08-14 10:28                 ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2011-08-14  2:58 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt, Ananth N Mavinakayanahalli

On Sat, 2011-08-13 at 19:09 +0900, Masami Hiramatsu wrote:
> (2011/08/12 22:08), Steven Rostedt wrote:
> > On Fri, 2011-08-12 at 11:57 +0900, Masami Hiramatsu wrote:
> >
> >> I don't think it won't work. It can work but on a long way.
> >> Could you tell me your "bigger ideas"? Perhaps, we are on the different
> >> way but aim to same goal.
> >
> > Part of the bigger ideas is to have things like function graph tracing
> > use this, as it will simplify the entry.S code. There's other things
> > that may come out of this too.
> 
> Hmm, I think that the current function graph tracing implementation
> is more scalable than kretprobes, because kretprobe requires
> spinlock on every hit. Moreover, you can't probe NMI handler with
> kprobe, and kprobes on irq-handler are also possible to fail
> because of recursive-call.
> So I don't recommend using kretprobe for function-graph tracer :-(

Sorry for the confusion. My idea is not to use kretprobe with function
graph tracer, but to use the ftrace hooks with the pt_regs and friends
for function graph tracer instead of what it does today, which is to add
function graph code directly into entry.S.

The point I was making is, if I need to get ftrace function tracing
being good enough for function graph tracer, then it should work with
kprobes without any issues. If I need to do the work anyway (for
function graph tracing) then why not use it directly with kprobes
instead of doing more hooks just in the kprobe_trace?

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-14  2:58               ` Steven Rostedt
@ 2011-08-14 10:28                 ` Masami Hiramatsu
  2011-08-15 13:06                   ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2011-08-14 10:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt, Ananth N Mavinakayanahalli

(2011/08/14 11:58), Steven Rostedt wrote:
> On Sat, 2011-08-13 at 19:09 +0900, Masami Hiramatsu wrote:
>> (2011/08/12 22:08), Steven Rostedt wrote:
>>> On Fri, 2011-08-12 at 11:57 +0900, Masami Hiramatsu wrote:
>>>
>>>> I don't think it won't work. It can work but on a long way.
>>>> Could you tell me your "bigger ideas"? Perhaps, we are on the different
>>>> way but aim to same goal.
>>>
>>> Part of the bigger ideas is to have things like function graph tracing
>>> use this, as it will simplify the entry.S code. There's other things
>>> that may come out of this too.
>>
>> Hmm, I think that the current function graph tracing implementation
>> is more scalable than kretprobes, because kretprobe requires
>> spinlock on every hit. Moreover, you can't probe NMI handler with
>> kprobe, and kprobes on irq-handler are also possible to fail
>> because of recursive-call.
>> So I don't recommend using kretprobe for function-graph tracer :-(
>
> Sorry for the confusion. My idea is not to use kretprobe with function
> graph tracer, but to use the ftrace hooks with the pt_regs and friends
> for function graph tracer instead of what it does today, which is to add
> function graph code directly into entry.S.

Ah, OK.

> The point I was making is, if I need to get ftrace function tracing
> being good enough for function graph tracer, then it should work with
> kprobes without any issues.

No, I don't think so, because kprobes user may trace a flags register or
segment registers. However, function graph tracer only needs stack
register etc. Thus, if ftrace function tracing is good enough for
kprobes handlers, it is enough for function graph tracer too.

> If I need to do the work anyway (for
> function graph tracing) then why not use it directly with kprobes
> instead of doing more hooks just in the kprobe_trace?

>From the kprobe-tracer point of view, I don't mind. I just care
about complexity, and compatibility of kprobe handlers.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-14 10:28                 ` Masami Hiramatsu
@ 2011-08-15 13:06                   ` Steven Rostedt
  2011-08-17 12:12                     ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2011-08-15 13:06 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt, Ananth N Mavinakayanahalli

On Sun, 2011-08-14 at 19:28 +0900, Masami Hiramatsu wrote:

> > The point I was making is, if I need to get ftrace function tracing
> > being good enough for function graph tracer, then it should work with
> > kprobes without any issues.
> 
> No, I don't think so, because kprobes user may trace a flags register or
> segment registers. However, function graph tracer only needs stack
> register etc. Thus, if ftrace function tracing is good enough for
> kprobes handlers, it is enough for function graph tracer too.

The added code needed for ftrace is not much more. We just need to save
all of pt_regs and that should work (I already save most of it). The
flags can be saved, but things like compare flags will be useless, as
those are used. But compare flags are undefined when calling a function
anyway.


> 
> > If I need to do the work anyway (for
> > function graph tracing) then why not use it directly with kprobes
> > instead of doing more hooks just in the kprobe_trace?
> 
> >From the kprobe-tracer point of view, I don't mind. I just care
> about complexity, and compatibility of kprobe handlers.

Right, so far there's not much changes to the kprobe side. I'll see if I
can make ftrace do all the hard work to get the proper registers. I want
to make as much as possible stay in the generic kprobe code, and not
touch the arch code. But I may still need to do something like
'increment rip' or whatever.

Note, we will need to touch kprobes one way or another. Either we have
kprobes hook to ftrace, or we need to do the nop skip trick, which will
probably be as intrusive in the kprobe code.

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-15 13:06                   ` Steven Rostedt
@ 2011-08-17 12:12                     ` Masami Hiramatsu
  2011-08-18 20:06                       ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2011-08-17 12:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt, Ananth N Mavinakayanahalli

(2011/08/15 22:06), Steven Rostedt wrote:
> On Sun, 2011-08-14 at 19:28 +0900, Masami Hiramatsu wrote:
>
>>> The point I was making is, if I need to get ftrace function tracing
>>> being good enough for function graph tracer, then it should work with
>>> kprobes without any issues.
>>
>> No, I don't think so, because kprobes user may trace a flags register or
>> segment registers. However, function graph tracer only needs stack
>> register etc. Thus, if ftrace function tracing is good enough for
>> kprobes handlers, it is enough for function graph tracer too.
>
> The added code needed for ftrace is not much more. We just need to save
> all of pt_regs and that should work (I already save most of it). The
> flags can be saved, but things like compare flags will be useless, as
> those are used. But compare flags are undefined when calling a function
> anyway.

Agreed. I think users may just want to see other status flags.

>>> If I need to do the work anyway (for
>>> function graph tracing) then why not use it directly with kprobes
>>> instead of doing more hooks just in the kprobe_trace?
>>
>> >From the kprobe-tracer point of view, I don't mind. I just care
>> about complexity, and compatibility of kprobe handlers.
>
> Right, so far there's not much changes to the kprobe side. I'll see if I
> can make ftrace do all the hard work to get the proper registers. I want
> to make as much as possible stay in the generic kprobe code, and not
> touch the arch code. But I may still need to do something like
> 'increment rip' or whatever.

kprobes arch side code is not so complex, it has less states.
Generic kprobes code has much more states to support module
probes, module init, aggregated probes, and so on. So it is OK for me
to touch the arch code :).

> Note, we will need to touch kprobes one way or another. Either we have
> kprobes hook to ftrace, or we need to do the nop skip trick, which will
> probably be as intrusive in the kprobe code.

I see. OK, I'm now looking forward to your work ;)

Thank you!


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-17 12:12                     ` Masami Hiramatsu
@ 2011-08-18 20:06                       ` Steven Rostedt
  2011-08-19  2:41                         ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2011-08-18 20:06 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt, Ananth N Mavinakayanahalli

On Wed, 2011-08-17 at 21:12 +0900, Masami Hiramatsu wrote:

> I see. OK, I'm now looking forward to your work ;)

Hi Masami,

BTW, do you have any test suits that I could use to test the changes I
make. Something that I may be able to catch regressions with?

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops
  2011-08-18 20:06                       ` Steven Rostedt
@ 2011-08-19  2:41                         ` Masami Hiramatsu
  0 siblings, 0 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2011-08-19  2:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Jason Baron, yrl.pp-manager.tt, Ananth N Mavinakayanahalli

(2011/08/19 5:06), Steven Rostedt wrote:
> On Wed, 2011-08-17 at 21:12 +0900, Masami Hiramatsu wrote:
> 
>> I see. OK, I'm now looking forward to your work ;)
> 
> Hi Masami,
> 
> BTW, do you have any test suits that I could use to test the changes I
> make. Something that I may be able to catch regressions with?

Yes, here are the test to ensure putting probes on module/module init code.

http://sourceforge.net/mailarchive/forum.php?thread_name=65795E11DBF1E645A09CEC7EAEE94B9C39F68E90%40USINDEVS02.corp.hds.com&forum_name=dle-develop

http://sourceforge.net/mailarchive/forum.php?thread_name=65795E11DBF1E645A09CEC7EAEE94B9C39F68E8F%40USINDEVS02.corp.hds.com&forum_name=dle-develop

Also, basic smoke test is already in the kernel.

BTW, I think we may need more testcases for checking registers etc.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-08-19  2:42 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-10 16:22 [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on ftrace nops Steven Rostedt
2011-08-10 16:22 ` [PATCH 1/5][RFC] tracing: Clean up tb_fmt to not give faulty compile warning Steven Rostedt
2011-08-10 16:22 ` [PATCH 2/5][RFC] ftrace: Pass ftrace_ops as third parameter to function trace Steven Rostedt
2011-08-10 16:22 ` [PATCH 3/5][RFC] ftrace: Return pt_regs to function trace callback (x86_64 only so Steven Rostedt
2011-08-11  5:55   ` Masami Hiramatsu
2011-08-11 12:59     ` Steven Rostedt
2011-08-12  0:55       ` Masami Hiramatsu
2011-08-12 13:05         ` Steven Rostedt
2011-08-10 16:22 ` [PATCH 4/5][RFC] kprobes: Inverse taking of module_mutex with kprobe_mutex Steven Rostedt
2011-08-10 16:22 ` [PATCH 5/5][RFC] kprobes: Use ftrace hooks when probing ftrace nops Steven Rostedt
2011-08-11  7:41   ` Masami Hiramatsu
2011-08-11 13:22     ` Steven Rostedt
2011-08-12  2:41       ` Masami Hiramatsu
2011-08-12  5:46       ` Ananth N Mavinakayanahalli
2011-08-12 13:14         ` Steven Rostedt
2011-08-11  0:21 ` [PATCH 0/5][RFC] kprobes/ftrace: Have kprobes use ftrace on " Masami Hiramatsu
2011-08-11  0:34   ` Steven Rostedt
2011-08-11  6:28     ` Masami Hiramatsu
2011-08-11 13:01       ` Steven Rostedt
2011-08-12  2:57         ` Masami Hiramatsu
2011-08-12 13:08           ` Steven Rostedt
2011-08-13 10:09             ` Masami Hiramatsu
2011-08-14  2:58               ` Steven Rostedt
2011-08-14 10:28                 ` Masami Hiramatsu
2011-08-15 13:06                   ` Steven Rostedt
2011-08-17 12:12                     ` Masami Hiramatsu
2011-08-18 20:06                       ` Steven Rostedt
2011-08-19  2:41                         ` Masami Hiramatsu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.