linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] ftrace, perf: Adding support to use function trace
@ 2011-11-27 18:04 Jiri Olsa
  2011-11-27 18:04 ` [PATCH 1/9] trace: Fix uninitialized variable compiler warning Jiri Olsa
                   ` (9 more replies)
  0 siblings, 10 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-11-27 18:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme; +Cc: linux-kernel

hi,
here's another version of perf support for function trace
with filter. The changeset is working and hopefully is not
introducing more bugs.. ;) still testing.

Patch 1 through 3 could be probably taken separately as they
fix independent issues. The rest is specific to the perf
function tracing.

I needed a way to temporarily enable/disable ftrace_ops based
on the event in/out scheduling, patch 4/9 is introducing that.
Not sure this is the best way though..

Also adding open/close, add/del registration actions for
tracepoint (patches 5,6) to have suitable place to control
the function trace.

Patch 7 enables the function event perf registration,
and patches 8 and 9 add filter support.

attached patches:
- 1/9 trace: Fix uninitialized variable compiler warning
- 2/9 ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update
- 3/9 ftrace: Fix shutdown to disable calls properly
- 4/9 ftrace: Add enable/disable ftrace_ops control interface
- 5/9 ftrace, perf: Add open/close tracepoint perf registration actions
- 6/9 ftrace, perf: Add add/del tracepoint perf registration actions
- 7/9 ftrace, perf: Add support to use function tracepoint in perf
- 8/9 ftrace, perf: Add FILTER_TRACE_FN event field type
- 9/9 ftrace, perf: Add filter support for function trace event

thanks for comments,
jirka
---
 include/linux/ftrace.h             |   12 ++
 include/linux/ftrace_event.h       |    9 ++-
 include/linux/perf_event.h         |    3 +
 kernel/trace/ftrace.c              |   45 ++++++---
 kernel/trace/trace.c               |    3 +-
 kernel/trace/trace.h               |    9 ++-
 kernel/trace/trace_event_perf.c    |  208 ++++++++++++++++++++++++++++-------
 kernel/trace/trace_events.c        |   12 ++-
 kernel/trace/trace_events_filter.c |  114 ++++++++++++++++++--
 kernel/trace/trace_export.c        |   53 ++++++++-
 kernel/trace/trace_kprobe.c        |    8 ++-
 kernel/trace/trace_syscalls.c      |   18 +++-
 12 files changed, 413 insertions(+), 81 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCH 1/9] trace: Fix uninitialized variable compiler warning
  2011-11-27 18:04 [RFC] ftrace, perf: Adding support to use function trace Jiri Olsa
@ 2011-11-27 18:04 ` Jiri Olsa
  2011-11-28 16:19   ` Steven Rostedt
  2011-11-27 18:04 ` [PATCH 2/9] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-11-27 18:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme; +Cc: linux-kernel, Jiri Olsa

Initialize page2 variable to make compiler happy.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/trace.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 9e158cc..4a06862 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3655,8 +3655,7 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
 	struct page *pages[2];
 	int nr_pages = 1;
 	ssize_t written;
-	void *page1;
-	void *page2;
+	void *page1, *page2 = NULL;
 	int offset;
 	int size;
 	int len;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 2/9] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update
  2011-11-27 18:04 [RFC] ftrace, perf: Adding support to use function trace Jiri Olsa
  2011-11-27 18:04 ` [PATCH 1/9] trace: Fix uninitialized variable compiler warning Jiri Olsa
@ 2011-11-27 18:04 ` Jiri Olsa
  2011-11-28 16:24   ` Steven Rostedt
  2011-11-27 18:04 ` [PATCH 3/9] ftrace: Fix shutdown to disable calls properly Jiri Olsa
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-11-27 18:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme; +Cc: linux-kernel, Jiri Olsa

We need to check the existence of the other_hash before
we touch its count variable.

This issue is hit only when non global ftrace_ops is used.
The global ftrace_ops is initialized with empty hashes.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/ftrace.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b1e8943..c6d0293 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1372,7 +1372,8 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
 			if (filter_hash && in_hash && !in_other_hash)
 				match = 1;
 			else if (!filter_hash && in_hash &&
-				 (in_other_hash || !other_hash->count))
+				 (in_other_hash ||
+				  !other_hash || !other_hash->count))
 				match = 1;
 		}
 		if (!match)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 3/9] ftrace: Fix shutdown to disable calls properly
  2011-11-27 18:04 [RFC] ftrace, perf: Adding support to use function trace Jiri Olsa
  2011-11-27 18:04 ` [PATCH 1/9] trace: Fix uninitialized variable compiler warning Jiri Olsa
  2011-11-27 18:04 ` [PATCH 2/9] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
@ 2011-11-27 18:04 ` Jiri Olsa
  2011-11-28 19:18   ` Steven Rostedt
  2011-11-27 18:04 ` [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-11-27 18:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme; +Cc: linux-kernel, Jiri Olsa

Each ftrace_startup call increases the call record's flag,
so we need allways to decrease it when shutting down the
ftrace_ops.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/ftrace.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c6d0293..0ca0c0d 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1744,8 +1744,7 @@ static void ftrace_shutdown(struct ftrace_ops *ops, int command)
 	if (ops != &global_ops || !global_start_up)
 		ops->flags &= ~FTRACE_OPS_FL_ENABLED;
 
-	if (!ftrace_start_up)
-		command |= FTRACE_DISABLE_CALLS;
+	command |= FTRACE_DISABLE_CALLS;
 
 	if (saved_ftrace_func != ftrace_trace_function) {
 		saved_ftrace_func = ftrace_trace_function;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface
  2011-11-27 18:04 [RFC] ftrace, perf: Adding support to use function trace Jiri Olsa
                   ` (2 preceding siblings ...)
  2011-11-27 18:04 ` [PATCH 3/9] ftrace: Fix shutdown to disable calls properly Jiri Olsa
@ 2011-11-27 18:04 ` Jiri Olsa
  2011-11-28 19:26   ` Steven Rostedt
  2011-11-28 20:21   ` Steven Rostedt
  2011-11-27 18:04 ` [PATCH 5/9] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-11-27 18:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme; +Cc: linux-kernel, Jiri Olsa

Adding a way to temporarily enable/disable ftrace_ops.

When there is a ftrace_ops with FTRACE_OPS_FL_CONTROL flag
registered, the ftrace_ops_list_func processing function
is used as ftrace function in order to have all registered
ftrace_ops under control.

Also using jump label not to introduce overhead to current
ftrace_ops_list_func processing.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |   12 ++++++++++++
 kernel/trace/ftrace.c  |   39 +++++++++++++++++++++++++++++----------
 2 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 26eafce..28b59f1 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -35,12 +35,14 @@ enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	atomic_t			disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +99,16 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+static inline void enable_ftrace_function(struct ftrace_ops *ops)
+{
+	atomic_dec(&ops->disabled);
+}
+
+static inline void disable_ftrace_function(struct ftrace_ops *ops)
+{
+	atomic_inc(&ops->disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 0ca0c0d..e5a9498 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -30,6 +30,7 @@
 #include <linux/list.h>
 #include <linux/hash.h>
 #include <linux/rcupdate.h>
+#include <linux/jump_label.h>
 
 #include <trace/events/sched.h>
 
@@ -94,6 +95,8 @@ ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
 
+static struct jump_label_key ftrace_ops_control;
+
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
 
@@ -196,17 +199,21 @@ static void update_ftrace_function(void)
 
 	update_global_ops();
 
-	/*
-	 * If we are at the end of the list and this ops is
-	 * not dynamic, then have the mcount trampoline call
-	 * the function directly
-	 */
-	if (ftrace_ops_list == &ftrace_list_end ||
-	    (ftrace_ops_list->next == &ftrace_list_end &&
-	     !(ftrace_ops_list->flags & FTRACE_OPS_FL_DYNAMIC)))
-		func = ftrace_ops_list->func;
-	else
+	if (jump_label_enabled(&ftrace_ops_control))
 		func = ftrace_ops_list_func;
+	else {
+		/*
+		 * If we are at the end of the list and this ops is
+		 * not dynamic, then have the mcount trampoline call
+		 * the function directly
+		 */
+		if (ftrace_ops_list == &ftrace_list_end ||
+		    (ftrace_ops_list->next == &ftrace_list_end &&
+		     !(ftrace_ops_list->flags & FTRACE_OPS_FL_DYNAMIC)))
+			func = ftrace_ops_list->func;
+		else
+			func = ftrace_ops_list_func;
+	}
 
 #ifdef CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST
 	ftrace_trace_function = func;
@@ -280,6 +287,9 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
+	if (ops->flags & FTRACE_OPS_FL_CONTROL)
+		jump_label_inc(&ftrace_ops_control);
+
 	if (ftrace_enabled)
 		update_ftrace_function();
 
@@ -311,6 +321,9 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 	if (ret < 0)
 		return ret;
 
+	if (ops->flags & FTRACE_OPS_FL_CONTROL)
+		jump_label_dec(&ftrace_ops_control);
+
 	if (ftrace_enabled)
 		update_ftrace_function();
 
@@ -3577,8 +3590,14 @@ ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 	preempt_disable_notrace();
 	op = rcu_dereference_raw(ftrace_ops_list);
 	while (op != &ftrace_list_end) {
+		if (static_branch(&ftrace_ops_control))
+			if ((op->flags & FTRACE_OPS_FL_CONTROL) &&
+			    atomic_read(&op->disabled))
+				goto next;
+
 		if (ftrace_ops_test(op, ip))
 			op->func(ip, parent_ip);
+ next:
 		op = rcu_dereference_raw(op->next);
 	};
 	preempt_enable_notrace();
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 5/9] ftrace, perf: Add open/close tracepoint perf registration actions
  2011-11-27 18:04 [RFC] ftrace, perf: Adding support to use function trace Jiri Olsa
                   ` (3 preceding siblings ...)
  2011-11-27 18:04 ` [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2011-11-27 18:04 ` Jiri Olsa
  2011-11-27 18:04 ` [PATCH 6/9] ftrace, perf: Add add/del " Jiri Olsa
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-11-27 18:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme; +Cc: linux-kernel, Jiri Olsa

Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
register/unregister from open/close actions.

The register/unregister actions are invoked for the first/last
tracepoint user when opening/closing the evet.

The open/close actions are invoked for each tracepoint user when
opening/closing the event.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    6 +-
 kernel/trace/trace.h            |    5 ++
 kernel/trace/trace_event_perf.c |  116 +++++++++++++++++++++++++--------------
 kernel/trace/trace_events.c     |   10 ++-
 kernel/trace/trace_kprobe.c     |    6 ++-
 kernel/trace/trace_syscalls.c   |   14 +++-
 6 files changed, 106 insertions(+), 51 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index c3da42d..195e360 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -146,6 +146,8 @@ enum trace_reg {
 	TRACE_REG_UNREGISTER,
 	TRACE_REG_PERF_REGISTER,
 	TRACE_REG_PERF_UNREGISTER,
+	TRACE_REG_PERF_OPEN,
+	TRACE_REG_PERF_CLOSE,
 };
 
 struct ftrace_event_call;
@@ -157,7 +159,7 @@ struct ftrace_event_class {
 	void			*perf_probe;
 #endif
 	int			(*reg)(struct ftrace_event_call *event,
-				       enum trace_reg type);
+				       enum trace_reg type, void *data);
 	int			(*define_fields)(struct ftrace_event_call *);
 	struct list_head	*(*get_fields)(struct ftrace_event_call *);
 	struct list_head	fields;
@@ -165,7 +167,7 @@ struct ftrace_event_class {
 };
 
 extern int ftrace_event_reg(struct ftrace_event_call *event,
-			    enum trace_reg type);
+			    enum trace_reg type, void *data);
 
 enum {
 	TRACE_EVENT_FL_ENABLED_BIT,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f8ec229..c4330dc 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -825,4 +825,9 @@ extern const char *__stop___trace_bprintk_fmt[];
 	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 #include "trace_entries.h"
 
+#ifdef CONFIG_PERF_EVENTS
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data);
+#endif
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 19a359d..0cfcc37 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -44,23 +44,17 @@ static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 	return 0;
 }
 
-static int perf_trace_event_init(struct ftrace_event_call *tp_event,
-				 struct perf_event *p_event)
+static int perf_trace_event_reg(struct ftrace_event_call *tp_event,
+				struct perf_event *p_event)
 {
 	struct hlist_head __percpu *list;
-	int ret;
+	int ret = -ENOMEM;
 	int cpu;
 
-	ret = perf_trace_event_perm(tp_event, p_event);
-	if (ret)
-		return ret;
-
 	p_event->tp_event = tp_event;
 	if (tp_event->perf_refcount++ > 0)
 		return 0;
 
-	ret = -ENOMEM;
-
 	list = alloc_percpu(struct hlist_head);
 	if (!list)
 		goto fail;
@@ -83,7 +77,7 @@ static int perf_trace_event_init(struct ftrace_event_call *tp_event,
 		}
 	}
 
-	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER);
+	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER, NULL);
 	if (ret)
 		goto fail;
 
@@ -108,6 +102,69 @@ fail:
 	return ret;
 }
 
+static void perf_trace_event_unreg(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	int i;
+
+	if (--tp_event->perf_refcount > 0)
+		goto out;
+
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER, NULL);
+
+	/*
+	 * Ensure our callback won't be called anymore. The buffers
+	 * will be freed after that.
+	 */
+	tracepoint_synchronize_unregister();
+
+	free_percpu(tp_event->perf_events);
+	tp_event->perf_events = NULL;
+
+	if (!--total_ref_count) {
+		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
+			free_percpu(perf_trace_buf[i]);
+			perf_trace_buf[i] = NULL;
+		}
+	}
+out:
+	module_put(tp_event->mod);
+}
+
+static int perf_trace_event_open(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_OPEN, p_event);
+}
+
+static void perf_trace_event_close(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
+}
+
+static int perf_trace_event_init(struct ftrace_event_call *tp_event,
+				 struct perf_event *p_event)
+{
+	int ret;
+
+	ret = perf_trace_event_perm(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_reg(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_open(p_event);
+	if (ret) {
+		perf_trace_event_unreg(p_event);
+		return ret;
+	}
+
+	return 0;
+}
+
 int perf_trace_init(struct perf_event *p_event)
 {
 	struct ftrace_event_call *tp_event;
@@ -130,6 +187,14 @@ int perf_trace_init(struct perf_event *p_event)
 	return ret;
 }
 
+void perf_trace_destroy(struct perf_event *p_event)
+{
+	mutex_lock(&event_mutex);
+	perf_trace_event_close(p_event);
+	perf_trace_event_unreg(p_event);
+	mutex_unlock(&event_mutex);
+}
+
 int perf_trace_add(struct perf_event *p_event, int flags)
 {
 	struct ftrace_event_call *tp_event = p_event->tp_event;
@@ -154,37 +219,6 @@ void perf_trace_del(struct perf_event *p_event, int flags)
 	hlist_del_rcu(&p_event->hlist_entry);
 }
 
-void perf_trace_destroy(struct perf_event *p_event)
-{
-	struct ftrace_event_call *tp_event = p_event->tp_event;
-	int i;
-
-	mutex_lock(&event_mutex);
-	if (--tp_event->perf_refcount > 0)
-		goto out;
-
-	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER);
-
-	/*
-	 * Ensure our callback won't be called anymore. The buffers
-	 * will be freed after that.
-	 */
-	tracepoint_synchronize_unregister();
-
-	free_percpu(tp_event->perf_events);
-	tp_event->perf_events = NULL;
-
-	if (!--total_ref_count) {
-		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
-			free_percpu(perf_trace_buf[i]);
-			perf_trace_buf[i] = NULL;
-		}
-	}
-out:
-	module_put(tp_event->mod);
-	mutex_unlock(&event_mutex);
-}
-
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 				       struct pt_regs *regs, int *rctxp)
 {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c212a7f..5138fea 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -147,7 +147,8 @@ int trace_event_raw_init(struct ftrace_event_call *call)
 }
 EXPORT_SYMBOL_GPL(trace_event_raw_init);
 
-int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
+int ftrace_event_reg(struct ftrace_event_call *call,
+		     enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -170,6 +171,9 @@ int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
 					    call->class->perf_probe,
 					    call);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
@@ -209,7 +213,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_stop_cmdline_record();
 				call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			call->class->reg(call, TRACE_REG_UNREGISTER);
+			call->class->reg(call, TRACE_REG_UNREGISTER, NULL);
 		}
 		break;
 	case 1:
@@ -218,7 +222,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_start_cmdline_record();
 				call->flags |= TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			ret = call->class->reg(call, TRACE_REG_REGISTER);
+			ret = call->class->reg(call, TRACE_REG_REGISTER, NULL);
 			if (ret) {
 				tracing_stop_cmdline_record();
 				pr_info("event trace: Could not enable event "
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 00d527c..5667f89 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1892,7 +1892,8 @@ static __kprobes void kretprobe_perf_func(struct kretprobe_instance *ri,
 #endif	/* CONFIG_PERF_EVENTS */
 
 static __kprobes
-int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
+int kprobe_register(struct ftrace_event_call *event,
+		    enum trace_reg type, void *data)
 {
 	struct trace_probe *tp = (struct trace_probe *)event->data;
 
@@ -1909,6 +1910,9 @@ int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
 	case TRACE_REG_PERF_UNREGISTER:
 		disable_trace_probe(tp, TP_FLAG_PROFILE);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index cb65454..6916b0d 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -17,9 +17,9 @@ static DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
 static DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 
 static int syscall_enter_define_fields(struct ftrace_event_call *call);
 static int syscall_exit_define_fields(struct ftrace_event_call *call);
@@ -649,7 +649,7 @@ void perf_sysexit_disable(struct ftrace_event_call *call)
 #endif /* CONFIG_PERF_EVENTS */
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -664,13 +664,16 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysenter_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
 }
 
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -685,6 +688,9 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysexit_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 6/9] ftrace, perf: Add add/del tracepoint perf registration actions
  2011-11-27 18:04 [RFC] ftrace, perf: Adding support to use function trace Jiri Olsa
                   ` (4 preceding siblings ...)
  2011-11-27 18:04 ` [PATCH 5/9] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
@ 2011-11-27 18:04 ` Jiri Olsa
  2011-11-27 18:04 ` [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-11-27 18:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme; +Cc: linux-kernel, Jiri Olsa

Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
perf event schedule in/out actions.

The add action is invoked for when the perf event is scheduled in,
while the del action is invoked when the event is scheduled out.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    2 ++
 kernel/trace/trace_event_perf.c |    4 +++-
 kernel/trace/trace_events.c     |    2 ++
 kernel/trace/trace_kprobe.c     |    2 ++
 kernel/trace/trace_syscalls.c   |    4 ++++
 5 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 195e360..2bf677c 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -148,6 +148,8 @@ enum trace_reg {
 	TRACE_REG_PERF_UNREGISTER,
 	TRACE_REG_PERF_OPEN,
 	TRACE_REG_PERF_CLOSE,
+	TRACE_REG_PERF_ADD,
+	TRACE_REG_PERF_DEL,
 };
 
 struct ftrace_event_call;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 0cfcc37..d72af0b 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -211,12 +211,14 @@ int perf_trace_add(struct perf_event *p_event, int flags)
 	list = this_cpu_ptr(pcpu_list);
 	hlist_add_head_rcu(&p_event->hlist_entry, list);
 
-	return 0;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_ADD, p_event);
 }
 
 void perf_trace_del(struct perf_event *p_event, int flags)
 {
+	struct ftrace_event_call *tp_event = p_event->tp_event;
 	hlist_del_rcu(&p_event->hlist_entry);
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_DEL, p_event);
 }
 
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 5138fea..079a93a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -173,6 +173,8 @@ int ftrace_event_reg(struct ftrace_event_call *call,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5667f89..580a05e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1912,6 +1912,8 @@ int kprobe_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 6916b0d..dbdd804 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -666,6 +666,8 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
@@ -690,6 +692,8 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf
  2011-11-27 18:04 [RFC] ftrace, perf: Adding support to use function trace Jiri Olsa
                   ` (5 preceding siblings ...)
  2011-11-27 18:04 ` [PATCH 6/9] ftrace, perf: Add add/del " Jiri Olsa
@ 2011-11-27 18:04 ` Jiri Olsa
  2011-11-28 19:58   ` Steven Rostedt
  2011-11-27 18:04 ` [PATCH 8/9] ftrace, perf: Add FILTER_TRACE_FN event field type Jiri Olsa
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-11-27 18:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme; +Cc: linux-kernel, Jiri Olsa

Adding perf registration support for the ftrace function event,
so it is now possible to register it via perf interface.

The perf_event struct statically contains ftrace_ops as a handle
for function tracer. The function tracer is registered/unregistered
in open/close actions, and enabled/disabled in add/del actions.

It is now possible to use function trace within perf commands
like:

  perf record -e ftrace:function ls
  perf stat -e ftrace:function ls

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h      |    3 +
 kernel/trace/trace_event_perf.c |   88 +++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_export.c     |   23 ++++++++++
 3 files changed, 114 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1e9ebe5..6071995 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -847,6 +847,9 @@ struct perf_event {
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
 	struct event_filter		*filter;
+#ifdef CONFIG_FUNCTION_TRACER
+	struct ftrace_ops               ftrace_ops;
+#endif
 #endif
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index d72af0b..4be0f73 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -250,3 +250,91 @@ __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 	return raw_data;
 }
 EXPORT_SYMBOL_GPL(perf_trace_buf_prepare);
+
+
+static void
+perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_entry *entry;
+	struct hlist_head *head;
+	struct pt_regs regs;
+	int rctx;
+
+#define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
+		    sizeof(u64)) - sizeof(u32))
+
+	BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE);
+
+	perf_fetch_caller_regs(&regs);
+
+	entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);
+	if (!entry)
+		return;
+
+	entry->ip = ip;
+	entry->parent_ip = parent_ip;
+
+	head = this_cpu_ptr(event_function.perf_events);
+	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
+			      1, &regs, head);
+
+#undef ENTRY_SIZE
+}
+
+static int perf_ftrace_function_register(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+
+	ops->flags |= FTRACE_OPS_FL_CONTROL;
+	atomic_set(&ops->disabled, 1);
+	ops->func = perf_ftrace_function_call;
+	return register_ftrace_function(ops);
+}
+
+static int perf_ftrace_function_unregister(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	return unregister_ftrace_function(ops);
+}
+
+static void perf_ftrace_function_enable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	enable_ftrace_function(ops);
+}
+
+static void perf_ftrace_function_disable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	disable_ftrace_function(ops);
+}
+
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data)
+{
+	int etype = call->event.type;
+
+	if (etype != TRACE_FN)
+		return -EINVAL;
+
+	switch (type) {
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+		return perf_ftrace_function_register(data);
+	case TRACE_REG_PERF_CLOSE:
+		return perf_ftrace_function_unregister(data);
+	case TRACE_REG_PERF_ADD:
+		perf_ftrace_function_enable(data);
+		return 0;
+	case TRACE_REG_PERF_DEL:
+		perf_ftrace_function_disable(data);
+		return 0;
+	}
+
+	return -EINVAL;
+}
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index bbeec31..62e86a5 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 
 #include "trace_entries.h"
 
+static int ftrace_event_class_register(struct ftrace_event_call *call,
+				       enum trace_reg type, void *data)
+{
+	switch (type) {
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
+#ifdef CONFIG_PERF_EVENTS
+		return perf_ftrace_event_register(call, type, data);
+#endif
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	}
+
+	return -EINVAL;
+}
+
 #undef __entry
 #define __entry REC
 
@@ -159,6 +181,7 @@ struct ftrace_event_class event_class_ftrace_##call = {			\
 	.system			= __stringify(TRACE_SYSTEM),		\
 	.define_fields		= ftrace_define_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
+	.reg			= ftrace_event_class_register,		\
 };									\
 									\
 struct ftrace_event_call __used event_##call = {			\
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 8/9] ftrace, perf: Add FILTER_TRACE_FN event field type
  2011-11-27 18:04 [RFC] ftrace, perf: Adding support to use function trace Jiri Olsa
                   ` (6 preceding siblings ...)
  2011-11-27 18:04 ` [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
@ 2011-11-27 18:04 ` Jiri Olsa
  2011-11-28 20:01   ` Steven Rostedt
  2011-11-27 18:04 ` [PATCH 9/9] ftrace, perf: Add filter support for function trace event Jiri Olsa
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
  9 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-11-27 18:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme; +Cc: linux-kernel, Jiri Olsa

Adding FILTER_TRACE_FN event field type for function tracepoint
event, so it can be properly recognized within filtering code.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h       |    1 +
 kernel/trace/trace_events_filter.c |    7 ++++++-
 kernel/trace/trace_export.c        |   25 ++++++++++++++++++++-----
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 2bf677c..dd478fc 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -245,6 +245,7 @@ enum {
 	FILTER_STATIC_STRING,
 	FILTER_DYN_STRING,
 	FILTER_PTR_STRING,
+	FILTER_TRACE_FN,
 };
 
 #define EVENT_STORAGE_SIZE 128
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index fdc6d22..7b0b04c 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -900,6 +900,11 @@ int filter_assign_type(const char *type)
 	return FILTER_OTHER;
 }
 
+static bool is_function_field(struct ftrace_event_field *field)
+{
+	return field->filter_type == FILTER_TRACE_FN;
+}
+
 static bool is_string_field(struct ftrace_event_field *field)
 {
 	return field->filter_type == FILTER_DYN_STRING ||
@@ -987,7 +992,7 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else {
+	} else if (!is_function_field(field)) {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 62e86a5..7b035ab 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -67,7 +67,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -77,7 +77,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -91,7 +91,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 		ret = trace_define_field(event_call, event_storage, #item, \
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 		mutex_unlock(&event_storage_mutex);			\
 		if (ret)						\
 			return ret;					\
@@ -104,7 +104,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -112,10 +112,24 @@ static void __always_unused ____ftrace_check_##name(void)	\
 #define __dynamic_array(type, item)					\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
-				 0, is_signed_type(type), FILTER_OTHER);\
+				 0, is_signed_type(type), filter_type);\
 	if (ret)							\
 		return ret;
 
+#define FILTER_TYPE_TRACE_FN           FILTER_TRACE_FN
+#define FILTER_TYPE_TRACE_GRAPH_ENT    FILTER_OTHER
+#define FILTER_TYPE_TRACE_GRAPH_RET    FILTER_OTHER
+#define FILTER_TYPE_TRACE_CTX          FILTER_OTHER
+#define FILTER_TYPE_TRACE_WAKE         FILTER_OTHER
+#define FILTER_TYPE_TRACE_STACK                FILTER_OTHER
+#define FILTER_TYPE_TRACE_USER_STACK   FILTER_OTHER
+#define FILTER_TYPE_TRACE_BPRINT       FILTER_OTHER
+#define FILTER_TYPE_TRACE_PRINT                FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_RW      FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_MAP     FILTER_OTHER
+#define FILTER_TYPE_TRACE_BRANCH       FILTER_OTHER
+#define FILTER_TYPE(arg)               FILTER_TYPE_##arg
+
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 int									\
@@ -123,6 +137,7 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 {									\
 	struct struct_name field;					\
 	int ret;							\
+	int filter_type = FILTER_TYPE(id);				\
 									\
 	tstruct;							\
 									\
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 9/9] ftrace, perf: Add filter support for function trace event
  2011-11-27 18:04 [RFC] ftrace, perf: Adding support to use function trace Jiri Olsa
                   ` (7 preceding siblings ...)
  2011-11-27 18:04 ` [PATCH 8/9] ftrace, perf: Add FILTER_TRACE_FN event field type Jiri Olsa
@ 2011-11-27 18:04 ` Jiri Olsa
  2011-11-28 20:07   ` Steven Rostedt
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
  9 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-11-27 18:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme; +Cc: linux-kernel, Jiri Olsa

Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '&&', ending
up with the filter strings like:

  "ip == f1 f2 ..." && "ip != f3 f4 ..." ...

The '==' operator adds trace filter with same efect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same efect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space. Same syntax
is supported/required as for the set_ftrace_filter and
set_ftrace_notrace files.

The '&&' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
opearators within one filter string.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/trace.h               |    4 +-
 kernel/trace/trace_events_filter.c |  111 +++++++++++++++++++++++++++++++++---
 kernel/trace/trace_export.c        |    5 ++
 3 files changed, 110 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c4330dc..fde4d2a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -589,6 +589,8 @@ static inline int ftrace_trace_task(struct task_struct *task)
 static inline int ftrace_is_dead(void) { return 0; }
 #endif
 
+int ftrace_event_is_function(struct ftrace_event_call *call);
+
 /*
  * struct trace_parser - servers for reading the user input separated by spaces
  * @cont: set if the input is not complete - no final space char was found
@@ -765,9 +767,7 @@ struct filter_pred {
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	struct ftrace_event_field *field;
-#endif
 	int 			offset;
 	int 			not;
 	int 			op;
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 7b0b04c..7434f50 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -54,6 +54,13 @@ struct filter_op {
 	int precedence;
 };
 
+static struct filter_op filter_ftrace_ops[] = {
+	{ OP_AND,	"&&",		1 },
+	{ OP_NE,	"!=",		2 },
+	{ OP_EQ,	"==",		2 },
+	{ OP_NONE,	"OP_NONE",	0 },
+};
+
 static struct filter_op filter_ops[] = {
 	{ OP_OR,	"||",		1 },
 	{ OP_AND,	"&&",		2 },
@@ -81,6 +88,7 @@ enum {
 	FILT_ERR_TOO_MANY_PREDS,
 	FILT_ERR_MISSING_FIELD,
 	FILT_ERR_INVALID_FILTER,
+	FILT_ERR_IP_FIELD_ONLY,
 };
 
 static char *err_text[] = {
@@ -96,6 +104,7 @@ static char *err_text[] = {
 	"Too many terms in predicate expression",
 	"Missing field name and/or value",
 	"Meaningless filter expression",
+	"Only 'ip' field is supported for function trace",
 };
 
 struct opstack_op {
@@ -992,7 +1001,12 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else if (!is_function_field(field)) {
+	} else if (is_function_field(field)) {
+		if (strcmp(field->name, "ip")) {
+			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
+			return -EINVAL;
+		}
+	} else {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
@@ -1339,10 +1353,8 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
 
 	strcpy(pred.regex.pattern, operand2);
 	pred.regex.len = strlen(pred.regex.pattern);
-
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	pred.field = field;
-#endif
+
 	return init_pred(ps, field, &pred) ? NULL : &pred;
 }
 
@@ -1894,6 +1906,81 @@ void ftrace_profile_free_filter(struct perf_event *event)
 	__free_filter(filter);
 }
 
+struct function_filter_data {
+	struct ftrace_ops *ops;
+	int first_filter;
+	int first_notrace;
+};
+
+static int __ftrace_function_set_filter(int filter, char *buf, int len,
+					struct function_filter_data *data)
+{
+	int *reset;
+
+	reset = filter ? &data->first_filter : &data->first_notrace;
+
+	if (filter)
+		ftrace_set_filter(data->ops, buf, len, *reset);
+	else
+		ftrace_set_notrace(data->ops, buf, len, *reset);
+
+	if (*reset)
+		*reset = 0;
+
+	return WALK_PRED_DEFAULT;
+}
+
+static int ftrace_function_check_pred(struct filter_pred *pred)
+{
+	struct ftrace_event_field *field = pred->field;
+
+	/*
+	  Check the predicate for function trace, verify:
+	   - only '==' and '!=' is used
+	   - the 'ip' field is used
+	*/
+	if (WARN((pred->op != OP_EQ) && (pred->op != OP_NE),
+		 "wrong operator for function filter: %d\n", pred->op))
+		return -EINVAL;
+
+	if (strcmp(field->name, "ip"))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int ftrace_function_set_filter_cb(enum move_type move,
+					 struct filter_pred *pred,
+					 int *err, void *data)
+{
+	if ((move != MOVE_DOWN) ||
+	    (pred->left != FILTER_PRED_INVALID))
+		return WALK_PRED_DEFAULT;
+
+	/* Double checking the predicate is valid for function trace. */
+	*err = ftrace_function_check_pred(pred);
+	if (*err)
+		return WALK_PRED_ABORT;
+
+	return __ftrace_function_set_filter(pred->op == OP_EQ,
+					    pred->regex.pattern,
+					    pred->regex.len,
+					    data);
+}
+
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	struct function_filter_data data = {
+		.first_filter  = 1,
+		.first_notrace = 1,
+		.ops           = &event->ftrace_ops,
+	};
+
+	return walk_pred_tree(filter->preds, filter->root,
+			      ftrace_function_set_filter_cb, &data);
+}
+
 int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 			      char *filter_str)
 {
@@ -1901,6 +1988,7 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	struct event_filter *filter;
 	struct filter_parse_state *ps;
 	struct ftrace_event_call *call;
+	struct filter_op *fops = filter_ops;
 
 	mutex_lock(&event_mutex);
 
@@ -1925,14 +2013,21 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	if (!ps)
 		goto free_filter;
 
-	parse_init(ps, filter_ops, filter_str);
+	if (ftrace_event_is_function(call))
+		fops = filter_ftrace_ops;
+
+	parse_init(ps, fops, filter_str);
 	err = filter_parse(ps);
 	if (err)
 		goto free_ps;
 
 	err = replace_preds(call, filter, ps, filter_str, false);
-	if (!err)
-		event->filter = filter;
+	if (!err) {
+		if (ftrace_event_is_function(call))
+			err = ftrace_function_set_filter(event, filter);
+		else
+			event->filter = filter;
+	}
 
 free_ps:
 	filter_opstack_clear(ps);
@@ -1940,7 +2035,7 @@ free_ps:
 	kfree(ps);
 
 free_filter:
-	if (err)
+	if (err || ftrace_event_is_function(call))
 		__free_filter(filter);
 
 out_unlock:
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 7b035ab..46c35e2 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -208,4 +208,9 @@ struct ftrace_event_call __used event_##call = {			\
 struct ftrace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
+int ftrace_event_is_function(struct ftrace_event_call *call)
+{
+	return call == &event_function;
+}
+
 #include "trace_entries.h"
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCH 1/9] trace: Fix uninitialized variable compiler warning
  2011-11-27 18:04 ` [PATCH 1/9] trace: Fix uninitialized variable compiler warning Jiri Olsa
@ 2011-11-28 16:19   ` Steven Rostedt
  2011-11-28 16:25     ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 16:19 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: fweisbec, mingo, paulus, acme, linux-kernel

On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> Initialize page2 variable to make compiler happy.

What compiler is this? Because this is a compiler bug. In fact, there's
no check for page2 being NULL, so if it is used uninitialized it will
crash the kernel. I don't like these "make the compiler shut up" fixes,
because honestly, changes like this hide bugs.

Nacked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  kernel/trace/trace.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 9e158cc..4a06862 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -3655,8 +3655,7 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
>  	struct page *pages[2];
>  	int nr_pages = 1;
>  	ssize_t written;
> -	void *page1;
> -	void *page2;
> +	void *page1, *page2 = NULL;
>  	int offset;
>  	int size;
>  	int len;



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/9] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update
  2011-11-27 18:04 ` [PATCH 2/9] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
@ 2011-11-28 16:24   ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 16:24 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: fweisbec, mingo, paulus, acme, linux-kernel

On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> We need to check the existence of the other_hash before
> we touch its count variable.
> 
> This issue is hit only when non global ftrace_ops is used.
> The global ftrace_ops is initialized with empty hashes.
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  kernel/trace/ftrace.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index b1e8943..c6d0293 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1372,7 +1372,8 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
>  			if (filter_hash && in_hash && !in_other_hash)
>  				match = 1;
>  			else if (!filter_hash && in_hash &&
> -				 (in_other_hash || !other_hash->count))
> +				 (in_other_hash ||
> +				  !other_hash || !other_hash->count))

Thanks! I hit this bug in too many places. I need to make a helper
routine, which I think I will, that is:

	hash_has_contents(hash)

that does the check for us.

-- Steve

>  				match = 1;
>  		}
>  		if (!match)



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 1/9] trace: Fix uninitialized variable compiler warning
  2011-11-28 16:19   ` Steven Rostedt
@ 2011-11-28 16:25     ` Jiri Olsa
  2011-11-28 19:34       ` Steven Rostedt
  0 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-11-28 16:25 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: fweisbec, mingo, paulus, acme, linux-kernel



On Mon, Nov 28, 2011 at 11:19:18AM -0500, Steven Rostedt wrote:
> On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> > Initialize page2 variable to make compiler happy.
> 
> What compiler is this? Because this is a compiler bug. In fact, there's
> no check for page2 being NULL, so if it is used uninitialized it will
> crash the kernel. I don't like these "make the compiler shut up" fixes,
> because honestly, changes like this hide bugs.

[jolsa@krava1 ~]$ gcc --version
gcc (GCC) 4.4.5 20110214 (Red Hat 4.4.5-6)

> 
> Nacked-by: Steven Rostedt <rostedt@goodmis.org>

understood, np

jirka
> 
> -- Steve
> 
> > 
> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> > ---
> >  kernel/trace/trace.c |    3 +--
> >  1 files changed, 1 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 9e158cc..4a06862 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -3655,8 +3655,7 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
> >  	struct page *pages[2];
> >  	int nr_pages = 1;
> >  	ssize_t written;
> > -	void *page1;
> > -	void *page2;
> > +	void *page1, *page2 = NULL;
> >  	int offset;
> >  	int size;
> >  	int len;
> 
> 

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 3/9] ftrace: Fix shutdown to disable calls properly
  2011-11-27 18:04 ` [PATCH 3/9] ftrace: Fix shutdown to disable calls properly Jiri Olsa
@ 2011-11-28 19:18   ` Steven Rostedt
  2011-11-29 11:21     ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 19:18 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: fweisbec, mingo, paulus, acme, linux-kernel

On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> Each ftrace_startup call increases the call record's flag,
> so we need allways to decrease it when shutting down the
> ftrace_ops.

No, that's not how this works. I probably should comment this code
better, because it caused me to reread it too ;)

> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  kernel/trace/ftrace.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index c6d0293..0ca0c0d 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1744,8 +1744,7 @@ static void ftrace_shutdown(struct ftrace_ops *ops, int command)
>  	if (ops != &global_ops || !global_start_up)
>  		ops->flags &= ~FTRACE_OPS_FL_ENABLED;
>  
> -	if (!ftrace_start_up)
> -		command |= FTRACE_DISABLE_CALLS;
> +	command |= FTRACE_DISABLE_CALLS;

FTRACE_DISABLE_CALLS will disable *all* functions for all tracers. If
you are tracing with ftrace and perf, and one calls this with
FTRACE_DISABLE_CALLS then both will no long be tracing anything.

When you call unregister_ftrace_function() it will disable the functions
that you have enabled by the ops.

Nacked-by: Steven Rostedt <rostedt@goodmis.org.

-- Steve


>  
>  	if (saved_ftrace_func != ftrace_trace_function) {
>  		saved_ftrace_func = ftrace_trace_function;



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface
  2011-11-27 18:04 ` [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2011-11-28 19:26   ` Steven Rostedt
  2011-11-28 20:02     ` Peter Zijlstra
  2011-11-28 20:21   ` Steven Rostedt
  1 sibling, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 19:26 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: fweisbec, mingo, paulus, acme, linux-kernel, Peter Zijlstra

[ Added Peter Z ]

On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> Adding a way to temporarily enable/disable ftrace_ops.
> 
> When there is a ftrace_ops with FTRACE_OPS_FL_CONTROL flag
> registered, the ftrace_ops_list_func processing function
> is used as ftrace function in order to have all registered
> ftrace_ops under control.
> 
> Also using jump label not to introduce overhead to current
> ftrace_ops_list_func processing.
> 

Are jump labels safe in NMI context yet? If not, this will need to wait
till we make it so.

-- Steve

> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  include/linux/ftrace.h |   12 ++++++++++++
>  kernel/trace/ftrace.c  |   39 +++++++++++++++++++++++++++++----------
>  2 files changed, 41 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 26eafce..28b59f1 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -35,12 +35,14 @@ enum {
>  	FTRACE_OPS_FL_ENABLED		= 1 << 0,
>  	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
>  	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
> +	FTRACE_OPS_FL_CONTROL		= 1 << 3,
>  };
>  
>  struct ftrace_ops {
>  	ftrace_func_t			func;
>  	struct ftrace_ops		*next;
>  	unsigned long			flags;
> +	atomic_t			disabled;
>  #ifdef CONFIG_DYNAMIC_FTRACE
>  	struct ftrace_hash		*notrace_hash;
>  	struct ftrace_hash		*filter_hash;
> @@ -97,6 +99,16 @@ int register_ftrace_function(struct ftrace_ops *ops);
>  int unregister_ftrace_function(struct ftrace_ops *ops);
>  void clear_ftrace_function(void);
>  
> +static inline void enable_ftrace_function(struct ftrace_ops *ops)
> +{
> +	atomic_dec(&ops->disabled);
> +}
> +
> +static inline void disable_ftrace_function(struct ftrace_ops *ops)
> +{
> +	atomic_inc(&ops->disabled);
> +}
> +
>  extern void ftrace_stub(unsigned long a0, unsigned long a1);
>  
>  #else /* !CONFIG_FUNCTION_TRACER */
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 0ca0c0d..e5a9498 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -30,6 +30,7 @@
>  #include <linux/list.h>
>  #include <linux/hash.h>
>  #include <linux/rcupdate.h>
> +#include <linux/jump_label.h>
>  
>  #include <trace/events/sched.h>
>  
> @@ -94,6 +95,8 @@ ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
>  ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
>  static struct ftrace_ops global_ops;
>  
> +static struct jump_label_key ftrace_ops_control;
> +
>  static void
>  ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
>  
> @@ -196,17 +199,21 @@ static void update_ftrace_function(void)
>  
>  	update_global_ops();
>  
> -	/*
> -	 * If we are at the end of the list and this ops is
> -	 * not dynamic, then have the mcount trampoline call
> -	 * the function directly
> -	 */
> -	if (ftrace_ops_list == &ftrace_list_end ||
> -	    (ftrace_ops_list->next == &ftrace_list_end &&
> -	     !(ftrace_ops_list->flags & FTRACE_OPS_FL_DYNAMIC)))
> -		func = ftrace_ops_list->func;
> -	else
> +	if (jump_label_enabled(&ftrace_ops_control))
>  		func = ftrace_ops_list_func;
> +	else {
> +		/*
> +		 * If we are at the end of the list and this ops is
> +		 * not dynamic, then have the mcount trampoline call
> +		 * the function directly
> +		 */
> +		if (ftrace_ops_list == &ftrace_list_end ||
> +		    (ftrace_ops_list->next == &ftrace_list_end &&
> +		     !(ftrace_ops_list->flags & FTRACE_OPS_FL_DYNAMIC)))
> +			func = ftrace_ops_list->func;
> +		else
> +			func = ftrace_ops_list_func;
> +	}
>  
>  #ifdef CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST
>  	ftrace_trace_function = func;
> @@ -280,6 +287,9 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
>  	} else
>  		add_ftrace_ops(&ftrace_ops_list, ops);
>  
> +	if (ops->flags & FTRACE_OPS_FL_CONTROL)
> +		jump_label_inc(&ftrace_ops_control);
> +
>  	if (ftrace_enabled)
>  		update_ftrace_function();
>  
> @@ -311,6 +321,9 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
>  	if (ret < 0)
>  		return ret;
>  
> +	if (ops->flags & FTRACE_OPS_FL_CONTROL)
> +		jump_label_dec(&ftrace_ops_control);
> +
>  	if (ftrace_enabled)
>  		update_ftrace_function();
>  
> @@ -3577,8 +3590,14 @@ ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
>  	preempt_disable_notrace();
>  	op = rcu_dereference_raw(ftrace_ops_list);
>  	while (op != &ftrace_list_end) {
> +		if (static_branch(&ftrace_ops_control))
> +			if ((op->flags & FTRACE_OPS_FL_CONTROL) &&
> +			    atomic_read(&op->disabled))
> +				goto next;
> +
>  		if (ftrace_ops_test(op, ip))
>  			op->func(ip, parent_ip);
> + next:
>  		op = rcu_dereference_raw(op->next);
>  	};
>  	preempt_enable_notrace();



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 1/9] trace: Fix uninitialized variable compiler warning
  2011-11-28 16:25     ` Jiri Olsa
@ 2011-11-28 19:34       ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 19:34 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 17:25 +0100, Jiri Olsa wrote:
> 
> On Mon, Nov 28, 2011 at 11:19:18AM -0500, Steven Rostedt wrote:
> > On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> > > Initialize page2 variable to make compiler happy.
> > 
> > What compiler is this? Because this is a compiler bug. In fact, there's
> > no check for page2 being NULL, so if it is used uninitialized it will
> > crash the kernel. I don't like these "make the compiler shut up" fixes,
> > because honestly, changes like this hide bugs.
> 
> [jolsa@krava1 ~]$ gcc --version
> gcc (GCC) 4.4.5 20110214 (Red Hat 4.4.5-6)

Yep, that's an old compiler. The newer ones don't show this as an error.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf
  2011-11-27 18:04 ` [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
@ 2011-11-28 19:58   ` Steven Rostedt
  2011-11-28 20:03     ` Peter Zijlstra
  2011-11-28 20:08     ` Peter Zijlstra
  0 siblings, 2 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 19:58 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: fweisbec, mingo, paulus, acme, linux-kernel, Peter Zijlstra

On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> Adding perf registration support for the ftrace function event,
> so it is now possible to register it via perf interface.
> 
> The perf_event struct statically contains ftrace_ops as a handle
> for function tracer. The function tracer is registered/unregistered
> in open/close actions, and enabled/disabled in add/del actions.
> 
> It is now possible to use function trace within perf commands
> like:
> 
>   perf record -e ftrace:function ls
>   perf stat -e ftrace:function ls

Question. This is a root only command, correct? Otherwise, we are
allowing any user to create a large performance impact to the system.

> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  include/linux/perf_event.h      |    3 +
>  kernel/trace/trace_event_perf.c |   88 +++++++++++++++++++++++++++++++++++++++
>  kernel/trace/trace_export.c     |   23 ++++++++++
>  3 files changed, 114 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 1e9ebe5..6071995 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -847,6 +847,9 @@ struct perf_event {
>  #ifdef CONFIG_EVENT_TRACING
>  	struct ftrace_event_call	*tp_event;
>  	struct event_filter		*filter;
> +#ifdef CONFIG_FUNCTION_TRACER
> +	struct ftrace_ops               ftrace_ops;
> +#endif
>  #endif
>  
>  #ifdef CONFIG_CGROUP_PERF
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index d72af0b..4be0f73 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -250,3 +250,91 @@ __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
>  	return raw_data;
>  }
>  EXPORT_SYMBOL_GPL(perf_trace_buf_prepare);
> +
> +
> +static void
> +perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip)
> +{
> +	struct ftrace_entry *entry;
> +	struct hlist_head *head;
> +	struct pt_regs regs;
> +	int rctx;
> +
> +#define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
> +		    sizeof(u64)) - sizeof(u32))
> +
> +	BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE);
> +
> +	perf_fetch_caller_regs(&regs);
> +
> +	entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);
> +	if (!entry)
> +		return;
> +
> +	entry->ip = ip;
> +	entry->parent_ip = parent_ip;
> +
> +	head = this_cpu_ptr(event_function.perf_events);
> +	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
> +			      1, &regs, head);
> +
> +#undef ENTRY_SIZE
> +}
> +
> +static int perf_ftrace_function_register(struct perf_event *event)
> +{
> +	struct ftrace_ops *ops = &event->ftrace_ops;
> +
> +	ops->flags |= FTRACE_OPS_FL_CONTROL;
> +	atomic_set(&ops->disabled, 1);
> +	ops->func = perf_ftrace_function_call;
> +	return register_ftrace_function(ops);

When is ADD called? Because as soon as you register this function, even
though you have it "disabled" the system takes about a 13% impact on
performance just by calling this.

> +}
> +
> +static int perf_ftrace_function_unregister(struct perf_event *event)
> +{
> +	struct ftrace_ops *ops = &event->ftrace_ops;
> +	return unregister_ftrace_function(ops);
> +}
> +
> +static void perf_ftrace_function_enable(struct perf_event *event)
> +{
> +	struct ftrace_ops *ops = &event->ftrace_ops;
> +	enable_ftrace_function(ops);

Is it really an issue that we shouldn't call the full blown register
instead? I'm not really understanding why this is a problem. Note, one
of the improvements to ftrace in the near future is to enable ftrace
without stop_machine.

-- Steve

> +}
> +
> +static void perf_ftrace_function_disable(struct perf_event *event)
> +{
> +	struct ftrace_ops *ops = &event->ftrace_ops;
> +	disable_ftrace_function(ops);
> +}
> +
> +int perf_ftrace_event_register(struct ftrace_event_call *call,
> +			       enum trace_reg type, void *data)
> +{
> +	int etype = call->event.type;
> +
> +	if (etype != TRACE_FN)
> +		return -EINVAL;
> +
> +	switch (type) {
> +	case TRACE_REG_REGISTER:
> +	case TRACE_REG_UNREGISTER:
> +		break;
> +	case TRACE_REG_PERF_REGISTER:
> +	case TRACE_REG_PERF_UNREGISTER:
> +		return 0;
> +	case TRACE_REG_PERF_OPEN:
> +		return perf_ftrace_function_register(data);
> +	case TRACE_REG_PERF_CLOSE:
> +		return perf_ftrace_function_unregister(data);
> +	case TRACE_REG_PERF_ADD:
> +		perf_ftrace_function_enable(data);
> +		return 0;
> +	case TRACE_REG_PERF_DEL:
> +		perf_ftrace_function_disable(data);
> +		return 0;
> +	}
> +
> +	return -EINVAL;
> +}
> diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> index bbeec31..62e86a5 100644
> --- a/kernel/trace/trace_export.c
> +++ b/kernel/trace/trace_export.c
> @@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
>  
>  #include "trace_entries.h"
>  
> +static int ftrace_event_class_register(struct ftrace_event_call *call,
> +				       enum trace_reg type, void *data)
> +{
> +	switch (type) {
> +	case TRACE_REG_PERF_REGISTER:
> +	case TRACE_REG_PERF_UNREGISTER:
> +		return 0;
> +	case TRACE_REG_PERF_OPEN:
> +	case TRACE_REG_PERF_CLOSE:
> +	case TRACE_REG_PERF_ADD:
> +	case TRACE_REG_PERF_DEL:
> +#ifdef CONFIG_PERF_EVENTS
> +		return perf_ftrace_event_register(call, type, data);
> +#endif
> +	case TRACE_REG_REGISTER:
> +	case TRACE_REG_UNREGISTER:
> +		break;
> +	}
> +
> +	return -EINVAL;
> +}
> +
>  #undef __entry
>  #define __entry REC
>  
> @@ -159,6 +181,7 @@ struct ftrace_event_class event_class_ftrace_##call = {			\
>  	.system			= __stringify(TRACE_SYSTEM),		\
>  	.define_fields		= ftrace_define_fields_##call,		\
>  	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
> +	.reg			= ftrace_event_class_register,		\
>  };									\
>  									\
>  struct ftrace_event_call __used event_##call = {			\



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 8/9] ftrace, perf: Add FILTER_TRACE_FN event field type
  2011-11-27 18:04 ` [PATCH 8/9] ftrace, perf: Add FILTER_TRACE_FN event field type Jiri Olsa
@ 2011-11-28 20:01   ` Steven Rostedt
  2011-11-29 10:14     ` Jiri Olsa
  2011-11-29 11:22     ` Jiri Olsa
  0 siblings, 2 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 20:01 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: fweisbec, mingo, paulus, acme, linux-kernel, Peter Zijlstra

BTW, Please Cc Peter Zijlstra too, as he maintains perf inside the
kernel.

On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> Adding FILTER_TRACE_FN event field type for function tracepoint
> event, so it can be properly recognized within filtering code.

-ECHANGELOGTOOSHORT

I'm not sure what this is for.

-- Steve

> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  include/linux/ftrace_event.h       |    1 +
>  kernel/trace/trace_events_filter.c |    7 ++++++-
>  kernel/trace/trace_export.c        |   25 ++++++++++++++++++++-----
>  3 files changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index 2bf677c..dd478fc 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -245,6 +245,7 @@ enum {
>  	FILTER_STATIC_STRING,
>  	FILTER_DYN_STRING,
>  	FILTER_PTR_STRING,
> +	FILTER_TRACE_FN,
>  };
>  
>  #define EVENT_STORAGE_SIZE 128
> diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> index fdc6d22..7b0b04c 100644
> --- a/kernel/trace/trace_events_filter.c
> +++ b/kernel/trace/trace_events_filter.c
> @@ -900,6 +900,11 @@ int filter_assign_type(const char *type)
>  	return FILTER_OTHER;
>  }
>  
> +static bool is_function_field(struct ftrace_event_field *field)
> +{
> +	return field->filter_type == FILTER_TRACE_FN;
> +}
> +
>  static bool is_string_field(struct ftrace_event_field *field)
>  {
>  	return field->filter_type == FILTER_DYN_STRING ||
> @@ -987,7 +992,7 @@ static int init_pred(struct filter_parse_state *ps,
>  			fn = filter_pred_strloc;
>  		else
>  			fn = filter_pred_pchar;
> -	} else {
> +	} else if (!is_function_field(field)) {
>  		if (field->is_signed)
>  			ret = strict_strtoll(pred->regex.pattern, 0, &val);
>  		else
> diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> index 62e86a5..7b035ab 100644
> --- a/kernel/trace/trace_export.c
> +++ b/kernel/trace/trace_export.c
> @@ -67,7 +67,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
>  	ret = trace_define_field(event_call, #type, #item,		\
>  				 offsetof(typeof(field), item),		\
>  				 sizeof(field.item),			\
> -				 is_signed_type(type), FILTER_OTHER);	\
> +				 is_signed_type(type), filter_type);	\
>  	if (ret)							\
>  		return ret;
>  
> @@ -77,7 +77,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
>  				 offsetof(typeof(field),		\
>  					  container.item),		\
>  				 sizeof(field.container.item),		\
> -				 is_signed_type(type), FILTER_OTHER);	\
> +				 is_signed_type(type), filter_type);	\
>  	if (ret)							\
>  		return ret;
>  
> @@ -91,7 +91,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
>  		ret = trace_define_field(event_call, event_storage, #item, \
>  				 offsetof(typeof(field), item),		\
>  				 sizeof(field.item),			\
> -				 is_signed_type(type), FILTER_OTHER);	\
> +				 is_signed_type(type), filter_type);	\
>  		mutex_unlock(&event_storage_mutex);			\
>  		if (ret)						\
>  			return ret;					\
> @@ -104,7 +104,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
>  				 offsetof(typeof(field),		\
>  					  container.item),		\
>  				 sizeof(field.container.item),		\
> -				 is_signed_type(type), FILTER_OTHER);	\
> +				 is_signed_type(type), filter_type);	\
>  	if (ret)							\
>  		return ret;
>  
> @@ -112,10 +112,24 @@ static void __always_unused ____ftrace_check_##name(void)	\
>  #define __dynamic_array(type, item)					\
>  	ret = trace_define_field(event_call, #type, #item,		\
>  				 offsetof(typeof(field), item),		\
> -				 0, is_signed_type(type), FILTER_OTHER);\
> +				 0, is_signed_type(type), filter_type);\
>  	if (ret)							\
>  		return ret;
>  
> +#define FILTER_TYPE_TRACE_FN           FILTER_TRACE_FN
> +#define FILTER_TYPE_TRACE_GRAPH_ENT    FILTER_OTHER
> +#define FILTER_TYPE_TRACE_GRAPH_RET    FILTER_OTHER
> +#define FILTER_TYPE_TRACE_CTX          FILTER_OTHER
> +#define FILTER_TYPE_TRACE_WAKE         FILTER_OTHER
> +#define FILTER_TYPE_TRACE_STACK                FILTER_OTHER
> +#define FILTER_TYPE_TRACE_USER_STACK   FILTER_OTHER
> +#define FILTER_TYPE_TRACE_BPRINT       FILTER_OTHER
> +#define FILTER_TYPE_TRACE_PRINT                FILTER_OTHER
> +#define FILTER_TYPE_TRACE_MMIO_RW      FILTER_OTHER
> +#define FILTER_TYPE_TRACE_MMIO_MAP     FILTER_OTHER
> +#define FILTER_TYPE_TRACE_BRANCH       FILTER_OTHER
> +#define FILTER_TYPE(arg)               FILTER_TYPE_##arg
> +
>  #undef FTRACE_ENTRY
>  #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
>  int									\
> @@ -123,6 +137,7 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
>  {									\
>  	struct struct_name field;					\
>  	int ret;							\
> +	int filter_type = FILTER_TYPE(id);				\
>  									\
>  	tstruct;							\
>  									\



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface
  2011-11-28 19:26   ` Steven Rostedt
@ 2011-11-28 20:02     ` Peter Zijlstra
  2011-11-28 20:05       ` Peter Zijlstra
  2011-11-28 20:12       ` Steven Rostedt
  0 siblings, 2 replies; 186+ messages in thread
From: Peter Zijlstra @ 2011-11-28 20:02 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 14:26 -0500, Steven Rostedt wrote:
> 
> Are jump labels safe in NMI context yet? If not, this will need to wait
> till we make it so. 

I don't think they are, we currently very much rely on that
stop_machine() crap. NMIs can go straight through that.

I think you can make it work with the stop_machine()-less approach,
because then the NMI will trap on the INT3 which will wait for
completion, sync and resume the NMI.

That of course relies on the NMI vs IRET crap getting sorted.

But even then, that's highly arch specific and I'm not sure we can make
all archs that support both jump_label and NMIs work.



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf
  2011-11-28 19:58   ` Steven Rostedt
@ 2011-11-28 20:03     ` Peter Zijlstra
  2011-11-28 20:13       ` Steven Rostedt
  2011-11-28 20:08     ` Peter Zijlstra
  1 sibling, 1 reply; 186+ messages in thread
From: Peter Zijlstra @ 2011-11-28 20:03 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 14:58 -0500, Steven Rostedt wrote:
> >   perf stat -e ftrace:function ls
> 
> Question. This is a root only command, correct? Otherwise, we are
> allowing any user to create a large performance impact to the system. 

Typically not, although I haven't looked at Jiri's implementation of the
function tracepoint.



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface
  2011-11-28 20:02     ` Peter Zijlstra
@ 2011-11-28 20:05       ` Peter Zijlstra
  2011-11-28 20:14         ` Steven Rostedt
  2011-11-28 20:12       ` Steven Rostedt
  1 sibling, 1 reply; 186+ messages in thread
From: Peter Zijlstra @ 2011-11-28 20:05 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 21:02 +0100, Peter Zijlstra wrote:
> On Mon, 2011-11-28 at 14:26 -0500, Steven Rostedt wrote:
> > 
> > Are jump labels safe in NMI context yet? If not, this will need to wait
> > till we make it so. 
> 
> I don't think they are, we currently very much rely on that
> stop_machine() crap. NMIs can go straight through that.
> 
> I think you can make it work with the stop_machine()-less approach,
> because then the NMI will trap on the INT3 which will wait for
> completion, sync and resume the NMI.

Note that this of course precludes ever using jump_labels from do_int3
handlers ;-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 9/9] ftrace, perf: Add filter support for function trace event
  2011-11-27 18:04 ` [PATCH 9/9] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2011-11-28 20:07   ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 20:07 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: fweisbec, mingo, paulus, acme, linux-kernel

On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> Adding support to filter function trace event via perf
> interface. It is now possible to use filter interface
> in the perf tool like:
> 
>   perf record -e ftrace:function --filter="(ip == mm_*)" ls
> 
> The filter syntax is restricted to the the 'ip' field only,
> and following operators are accepted '==' '!=' '&&', ending
> up with the filter strings like:
> 
>   "ip == f1 f2 ..." && "ip != f3 f4 ..." ...
> 
> The '==' operator adds trace filter with same efect as would

effect

> be added via set_ftrace_filter file.
> 
> The '!=' operator adds trace filter with same efect as would

effect

> be added via set_ftrace_notrace file.
> 
> The right side of the '!=', '==' operators is list of functions
> or regexp. to be added to filter separated by space. Same syntax
> is supported/required as for the set_ftrace_filter and
> set_ftrace_notrace files.
> 
> The '&&' operator is used for connecting multiple filter definitions
> together. It is possible to have more than one '==' and '!='
> opearators within one filter string.

operators


Interesting way to handle it. I'll have to test it out. I wonder if we
could also make this work the same way with ftrace too.

-- Steve

> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  kernel/trace/trace.h               |    4 +-
>  kernel/trace/trace_events_filter.c |  111 +++++++++++++++++++++++++++++++++---
>  kernel/trace/trace_export.c        |    5 ++
>  3 files changed, 110 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index c4330dc..fde4d2a 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -589,6 +589,8 @@ static inline int ftrace_trace_task(struct task_struct *task)
>  static inline int ftrace_is_dead(void) { return 0; }
>  #endif
>  
> +int ftrace_event_is_function(struct ftrace_event_call *call);
> +
>  /*
>   * struct trace_parser - servers for reading the user input separated by spaces
>   * @cont: set if the input is not complete - no final space char was found
> @@ -765,9 +767,7 @@ struct filter_pred {
>  	u64 			val;
>  	struct regex		regex;
>  	unsigned short		*ops;
> -#ifdef CONFIG_FTRACE_STARTUP_TEST
>  	struct ftrace_event_field *field;
> -#endif
>  	int 			offset;
>  	int 			not;
>  	int 			op;
> diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> index 7b0b04c..7434f50 100644
> --- a/kernel/trace/trace_events_filter.c
> +++ b/kernel/trace/trace_events_filter.c
> @@ -54,6 +54,13 @@ struct filter_op {
>  	int precedence;
>  };
>  
> +static struct filter_op filter_ftrace_ops[] = {
> +	{ OP_AND,	"&&",		1 },
> +	{ OP_NE,	"!=",		2 },
> +	{ OP_EQ,	"==",		2 },
> +	{ OP_NONE,	"OP_NONE",	0 },
> +};
> +
>  static struct filter_op filter_ops[] = {
>  	{ OP_OR,	"||",		1 },
>  	{ OP_AND,	"&&",		2 },
> @@ -81,6 +88,7 @@ enum {
>  	FILT_ERR_TOO_MANY_PREDS,
>  	FILT_ERR_MISSING_FIELD,
>  	FILT_ERR_INVALID_FILTER,
> +	FILT_ERR_IP_FIELD_ONLY,
>  };
>  
>  static char *err_text[] = {
> @@ -96,6 +104,7 @@ static char *err_text[] = {
>  	"Too many terms in predicate expression",
>  	"Missing field name and/or value",
>  	"Meaningless filter expression",
> +	"Only 'ip' field is supported for function trace",
>  };
>  
>  struct opstack_op {
> @@ -992,7 +1001,12 @@ static int init_pred(struct filter_parse_state *ps,
>  			fn = filter_pred_strloc;
>  		else
>  			fn = filter_pred_pchar;
> -	} else if (!is_function_field(field)) {
> +	} else if (is_function_field(field)) {
> +		if (strcmp(field->name, "ip")) {
> +			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
> +			return -EINVAL;
> +		}
> +	} else {
>  		if (field->is_signed)
>  			ret = strict_strtoll(pred->regex.pattern, 0, &val);
>  		else
> @@ -1339,10 +1353,8 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
>  
>  	strcpy(pred.regex.pattern, operand2);
>  	pred.regex.len = strlen(pred.regex.pattern);
> -
> -#ifdef CONFIG_FTRACE_STARTUP_TEST
>  	pred.field = field;
> -#endif
> +
>  	return init_pred(ps, field, &pred) ? NULL : &pred;
>  }
>  
> @@ -1894,6 +1906,81 @@ void ftrace_profile_free_filter(struct perf_event *event)
>  	__free_filter(filter);
>  }
>  
> +struct function_filter_data {
> +	struct ftrace_ops *ops;
> +	int first_filter;
> +	int first_notrace;
> +};
> +
> +static int __ftrace_function_set_filter(int filter, char *buf, int len,
> +					struct function_filter_data *data)
> +{
> +	int *reset;
> +
> +	reset = filter ? &data->first_filter : &data->first_notrace;
> +
> +	if (filter)
> +		ftrace_set_filter(data->ops, buf, len, *reset);
> +	else
> +		ftrace_set_notrace(data->ops, buf, len, *reset);
> +
> +	if (*reset)
> +		*reset = 0;
> +
> +	return WALK_PRED_DEFAULT;
> +}
> +
> +static int ftrace_function_check_pred(struct filter_pred *pred)
> +{
> +	struct ftrace_event_field *field = pred->field;
> +
> +	/*
> +	  Check the predicate for function trace, verify:
> +	   - only '==' and '!=' is used
> +	   - the 'ip' field is used
> +	*/
> +	if (WARN((pred->op != OP_EQ) && (pred->op != OP_NE),
> +		 "wrong operator for function filter: %d\n", pred->op))
> +		return -EINVAL;
> +
> +	if (strcmp(field->name, "ip"))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static int ftrace_function_set_filter_cb(enum move_type move,
> +					 struct filter_pred *pred,
> +					 int *err, void *data)
> +{
> +	if ((move != MOVE_DOWN) ||
> +	    (pred->left != FILTER_PRED_INVALID))
> +		return WALK_PRED_DEFAULT;
> +
> +	/* Double checking the predicate is valid for function trace. */
> +	*err = ftrace_function_check_pred(pred);
> +	if (*err)
> +		return WALK_PRED_ABORT;
> +
> +	return __ftrace_function_set_filter(pred->op == OP_EQ,
> +					    pred->regex.pattern,
> +					    pred->regex.len,
> +					    data);
> +}
> +
> +static int ftrace_function_set_filter(struct perf_event *event,
> +				      struct event_filter *filter)
> +{
> +	struct function_filter_data data = {
> +		.first_filter  = 1,
> +		.first_notrace = 1,
> +		.ops           = &event->ftrace_ops,
> +	};
> +
> +	return walk_pred_tree(filter->preds, filter->root,
> +			      ftrace_function_set_filter_cb, &data);
> +}
> +
>  int ftrace_profile_set_filter(struct perf_event *event, int event_id,
>  			      char *filter_str)
>  {
> @@ -1901,6 +1988,7 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
>  	struct event_filter *filter;
>  	struct filter_parse_state *ps;
>  	struct ftrace_event_call *call;
> +	struct filter_op *fops = filter_ops;
>  
>  	mutex_lock(&event_mutex);
>  
> @@ -1925,14 +2013,21 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
>  	if (!ps)
>  		goto free_filter;
>  
> -	parse_init(ps, filter_ops, filter_str);
> +	if (ftrace_event_is_function(call))
> +		fops = filter_ftrace_ops;
> +
> +	parse_init(ps, fops, filter_str);
>  	err = filter_parse(ps);
>  	if (err)
>  		goto free_ps;
>  
>  	err = replace_preds(call, filter, ps, filter_str, false);
> -	if (!err)
> -		event->filter = filter;
> +	if (!err) {
> +		if (ftrace_event_is_function(call))
> +			err = ftrace_function_set_filter(event, filter);
> +		else
> +			event->filter = filter;
> +	}
>  
>  free_ps:
>  	filter_opstack_clear(ps);
> @@ -1940,7 +2035,7 @@ free_ps:
>  	kfree(ps);
>  
>  free_filter:
> -	if (err)
> +	if (err || ftrace_event_is_function(call))
>  		__free_filter(filter);
>  
>  out_unlock:
> diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> index 7b035ab..46c35e2 100644
> --- a/kernel/trace/trace_export.c
> +++ b/kernel/trace/trace_export.c
> @@ -208,4 +208,9 @@ struct ftrace_event_call __used event_##call = {			\
>  struct ftrace_event_call __used						\
>  __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
>  
> +int ftrace_event_is_function(struct ftrace_event_call *call)
> +{
> +	return call == &event_function;
> +}
> +
>  #include "trace_entries.h"



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf
  2011-11-28 19:58   ` Steven Rostedt
  2011-11-28 20:03     ` Peter Zijlstra
@ 2011-11-28 20:08     ` Peter Zijlstra
  2011-11-28 20:10       ` Peter Zijlstra
  1 sibling, 1 reply; 186+ messages in thread
From: Peter Zijlstra @ 2011-11-28 20:08 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 14:58 -0500, Steven Rostedt wrote:
> > +static int perf_ftrace_function_register(struct perf_event *event)
> > +{
> > +     struct ftrace_ops *ops = &event->ftrace_ops;
> > +
> > +     ops->flags |= FTRACE_OPS_FL_CONTROL;
> > +     atomic_set(&ops->disabled, 1);
> > +     ops->func = perf_ftrace_function_call;
> > +     return register_ftrace_function(ops);
> 
> When is ADD called? Because as soon as you register this function, even
> though you have it "disabled" the system takes about a 13% impact on
> performance just by calling this.

Typically at context switch time.

> > +static void perf_ftrace_function_enable(struct perf_event *event)
> > +{
> > +     struct ftrace_ops *ops = &event->ftrace_ops;
> > +     enable_ftrace_function(ops);
> 
> Is it really an issue that we shouldn't call the full blown register
> instead? I'm not really understanding why this is a problem. Note, one
> of the improvements to ftrace in the near future is to enable ftrace
> without stop_machine. 

Yeah, but still not something you want to do during context switches,
right?

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf
  2011-11-28 20:08     ` Peter Zijlstra
@ 2011-11-28 20:10       ` Peter Zijlstra
  2011-11-28 20:16         ` Steven Rostedt
  0 siblings, 1 reply; 186+ messages in thread
From: Peter Zijlstra @ 2011-11-28 20:10 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 21:08 +0100, Peter Zijlstra wrote:
> > Is it really an issue that we shouldn't call the full blown register
> > instead? I'm not really understanding why this is a problem. Note, one
> > of the improvements to ftrace in the near future is to enable ftrace
> > without stop_machine. 
> 
> Yeah, but still not something you want to do during context switches,
> right? 

Also, that's x86, there's still archs that need stop_machine(), right?

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface
  2011-11-28 20:02     ` Peter Zijlstra
  2011-11-28 20:05       ` Peter Zijlstra
@ 2011-11-28 20:12       ` Steven Rostedt
  2011-11-28 20:15         ` Peter Zijlstra
  1 sibling, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 20:12 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 21:02 +0100, Peter Zijlstra wrote:
> On Mon, 2011-11-28 at 14:26 -0500, Steven Rostedt wrote:
> > 
> > Are jump labels safe in NMI context yet? If not, this will need to wait
> > till we make it so. 
> 
> I don't think they are, we currently very much rely on that
> stop_machine() crap. NMIs can go straight through that.

Ftrace has a way around it, it would be trivial (well maybe ;) to add
the same to jump labels.

> 
> I think you can make it work with the stop_machine()-less approach,
> because then the NMI will trap on the INT3 which will wait for
> completion, sync and resume the NMI.
> 
> That of course relies on the NMI vs IRET crap getting sorted.
> 
> But even then, that's highly arch specific and I'm not sure we can make
> all archs that support both jump_label and NMIs work.

Actually, from what I've been told, x86 seems to be the only arch that
does crazy things with NMIs. Most the other archs do NMI when the system
is dead. That is, there's no return to normal system processing once an
NMI is hit.

Another thing is, most systems are OK in modifying code on SMP, and
stop_machine isn't even needed for that. I have code to test this out on
PowerPc, and it seems to work well.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf
  2011-11-28 20:03     ` Peter Zijlstra
@ 2011-11-28 20:13       ` Steven Rostedt
  2011-11-29 10:10         ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 20:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 21:03 +0100, Peter Zijlstra wrote:
> On Mon, 2011-11-28 at 14:58 -0500, Steven Rostedt wrote:
> > >   perf stat -e ftrace:function ls
> > 
> > Question. This is a root only command, correct? Otherwise, we are
> > allowing any user to create a large performance impact to the system. 
> 
> Typically not, although I haven't looked at Jiri's implementation of the
> function tracepoint.
> 

I would not allow function tracing for non root users. It's just way too
invasive to let such a thing happen.

-- Steve




^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface
  2011-11-28 20:05       ` Peter Zijlstra
@ 2011-11-28 20:14         ` Steven Rostedt
  2011-11-28 20:20           ` Peter Zijlstra
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 20:14 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 21:05 +0100, Peter Zijlstra wrote:

> Note that this of course precludes ever using jump_labels from do_int3
> handlers ;-)

Hmm, how many do_int3 handlers are there?

Oh, I bet we could trace kprobe handlers, and this would break that.

I'm a bit nervous about adding jump_labels into function tracing
handlers.

-- Steve


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface
  2011-11-28 20:12       ` Steven Rostedt
@ 2011-11-28 20:15         ` Peter Zijlstra
  2011-11-28 20:24           ` Steven Rostedt
  0 siblings, 1 reply; 186+ messages in thread
From: Peter Zijlstra @ 2011-11-28 20:15 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 15:12 -0500, Steven Rostedt wrote:
> Actually, from what I've been told, x86 seems to be the only arch that
> does crazy things with NMIs. Most the other archs do NMI when the system
> is dead. That is, there's no return to normal system processing once an
> NMI is hit. 

Sparc64 implements effective NMIs by playing games with their interrupt
priority levels. local_irq_disable() disable the lower 15 (0-14) levels
only, and their PMU interrupts at level 15.

That generates an effective NMI (interrupt not blocked by
local_irq_disable()).

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf
  2011-11-28 20:10       ` Peter Zijlstra
@ 2011-11-28 20:16         ` Steven Rostedt
  2011-11-28 20:18           ` Peter Zijlstra
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 20:16 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 21:10 +0100, Peter Zijlstra wrote:

> > Yeah, but still not something you want to do during context switches,
> > right? 
> 
> Also, that's x86, there's still archs that need stop_machine(), right?

s390 may need it, but till we add a work around there too.

But yeah, if this happens at context switch, then we want the fast
enable/disable that Jiri made.

-- Steve




^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf
  2011-11-28 20:16         ` Steven Rostedt
@ 2011-11-28 20:18           ` Peter Zijlstra
  0 siblings, 0 replies; 186+ messages in thread
From: Peter Zijlstra @ 2011-11-28 20:18 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 15:16 -0500, Steven Rostedt wrote:
> On Mon, 2011-11-28 at 21:10 +0100, Peter Zijlstra wrote:
> 
> > > Yeah, but still not something you want to do during context switches,
> > > right? 
> > 
> > Also, that's x86, there's still archs that need stop_machine(), right?
> 
> s390 may need it, but till we add a work around there too.
> 
> But yeah, if this happens at context switch, then we want the fast
> enable/disable that Jiri made.

Right, so the way tracepoint->perf works currently is we enable the
tracepoint when we create the event. The tracepoint handler runs a
per-cpu hlist. perf then registers with the per-cpu hlist on context
switch.



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface
  2011-11-28 20:14         ` Steven Rostedt
@ 2011-11-28 20:20           ` Peter Zijlstra
  0 siblings, 0 replies; 186+ messages in thread
From: Peter Zijlstra @ 2011-11-28 20:20 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, 2011-11-28 at 15:14 -0500, Steven Rostedt wrote:
> Hmm, how many do_int3 handlers are there?
> 
Dunno, but there's the regular SIGTRAP one I guess, there's kprobes and
there's going to be uprobes. kgdb looks to be having one as well.

> Oh, I bet we could trace kprobe handlers, and this would break that.
> 
That would be lovely pain indeed ;-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface
  2011-11-27 18:04 ` [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
  2011-11-28 19:26   ` Steven Rostedt
@ 2011-11-28 20:21   ` Steven Rostedt
  2011-11-29 10:07     ` Jiri Olsa
  1 sibling, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 20:21 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: fweisbec, mingo, paulus, acme, linux-kernel, Peter Zijlstra


>  
> @@ -311,6 +321,9 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
>  	if (ret < 0)
>  		return ret;
>  
> +	if (ops->flags & FTRACE_OPS_FL_CONTROL)
> +		jump_label_dec(&ftrace_ops_control);
> +
>  	if (ftrace_enabled)
>  		update_ftrace_function();
>  
> @@ -3577,8 +3590,14 @@ ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
>  	preempt_disable_notrace();
>  	op = rcu_dereference_raw(ftrace_ops_list);
>  	while (op != &ftrace_list_end) {
> +		if (static_branch(&ftrace_ops_control))

Instead of doing a static_branch() here, which makes me really nervous,
because this is called in function trace context, which has some strict
rules of its own, and is probably prone to recursion, we could add
another "ops" similar to the global_ops.

We could make a control_ops, and add all ops with the
FTRACE_OPS_FL_CONTROL flag set to it. And then this function will have
its own loop that it will check the disabled flag, for the ops
registered to it.

This code doesn't need to be touched, we just add a layer of redirection
for control ops and it will solve the jump_label issue.

-- Steve

> +			if ((op->flags & FTRACE_OPS_FL_CONTROL) &&
> +			    atomic_read(&op->disabled))
> +				goto next;
> +
>  		if (ftrace_ops_test(op, ip))
>  			op->func(ip, parent_ip);
> + next:
>  		op = rcu_dereference_raw(op->next);
>  	};
>  	preempt_enable_notrace();



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface
  2011-11-28 20:15         ` Peter Zijlstra
@ 2011-11-28 20:24           ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-11-28 20:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, fweisbec, mingo, paulus, acme, linux-kernel, David Miller

On Mon, 2011-11-28 at 21:15 +0100, Peter Zijlstra wrote:
> On Mon, 2011-11-28 at 15:12 -0500, Steven Rostedt wrote:
> > Actually, from what I've been told, x86 seems to be the only arch that
> > does crazy things with NMIs. Most the other archs do NMI when the system
> > is dead. That is, there's no return to normal system processing once an
> > NMI is hit. 
> 
> Sparc64 implements effective NMIs by playing games with their interrupt
> priority levels. local_irq_disable() disable the lower 15 (0-14) levels
> only, and their PMU interrupts at level 15.
> 
> That generates an effective NMI (interrupt not blocked by
> local_irq_disable()).

[ Added David ]

I think it's ok on sparc to modify code on one processor while another
processor is executing it. If not then we need the tricks that x86 does,
but if its ok, then we don't even need stop_machine();

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface
  2011-11-28 20:21   ` Steven Rostedt
@ 2011-11-29 10:07     ` Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-11-29 10:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, linux-kernel, Peter Zijlstra

On Mon, Nov 28, 2011 at 03:21:33PM -0500, Steven Rostedt wrote:
> 
> >  
> > @@ -311,6 +321,9 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
> >  	if (ret < 0)
> >  		return ret;
> >  
> > +	if (ops->flags & FTRACE_OPS_FL_CONTROL)
> > +		jump_label_dec(&ftrace_ops_control);
> > +
> >  	if (ftrace_enabled)
> >  		update_ftrace_function();
> >  
> > @@ -3577,8 +3590,14 @@ ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
> >  	preempt_disable_notrace();
> >  	op = rcu_dereference_raw(ftrace_ops_list);
> >  	while (op != &ftrace_list_end) {
> > +		if (static_branch(&ftrace_ops_control))
> 
> Instead of doing a static_branch() here, which makes me really nervous,
> because this is called in function trace context, which has some strict
> rules of its own, and is probably prone to recursion, we could add
> another "ops" similar to the global_ops.
> 
> We could make a control_ops, and add all ops with the
> FTRACE_OPS_FL_CONTROL flag set to it. And then this function will have
> its own loop that it will check the disabled flag, for the ops
> registered to it.
> 
> This code doesn't need to be touched, we just add a layer of redirection
> for control ops and it will solve the jump_label issue.

sounds good, I'll make the change

thanks,
jirka

> 
> -- Steve
> 
> > +			if ((op->flags & FTRACE_OPS_FL_CONTROL) &&
> > +			    atomic_read(&op->disabled))
> > +				goto next;
> > +
> >  		if (ftrace_ops_test(op, ip))
> >  			op->func(ip, parent_ip);
> > + next:
> >  		op = rcu_dereference_raw(op->next);
> >  	};
> >  	preempt_enable_notrace();
> 
> 

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf
  2011-11-28 20:13       ` Steven Rostedt
@ 2011-11-29 10:10         ` Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-11-29 10:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, fweisbec, mingo, paulus, acme, linux-kernel

On Mon, Nov 28, 2011 at 03:13:09PM -0500, Steven Rostedt wrote:
> On Mon, 2011-11-28 at 21:03 +0100, Peter Zijlstra wrote:
> > On Mon, 2011-11-28 at 14:58 -0500, Steven Rostedt wrote:
> > > >   perf stat -e ftrace:function ls
> > > 
> > > Question. This is a root only command, correct? Otherwise, we are
> > > allowing any user to create a large performance impact to the system. 
> > 
> > Typically not, although I haven't looked at Jiri's implementation of the
> > function tracepoint.
> > 
> 
> I would not allow function tracing for non root users. It's just way too
> invasive to let such a thing happen.

I believe it's for root only, I'll double check

jirka

> 
> -- Steve
> 
> 
> 

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 8/9] ftrace, perf: Add FILTER_TRACE_FN event field type
  2011-11-28 20:01   ` Steven Rostedt
@ 2011-11-29 10:14     ` Jiri Olsa
  2011-11-29 11:22     ` Jiri Olsa
  1 sibling, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-11-29 10:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, linux-kernel, Peter Zijlstra

On Mon, Nov 28, 2011 at 03:01:23PM -0500, Steven Rostedt wrote:
> BTW, Please Cc Peter Zijlstra too, as he maintains perf inside the
> kernel.
> 
> On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> > Adding FILTER_TRACE_FN event field type for function tracepoint
> > event, so it can be properly recognized within filtering code.
> 
> -ECHANGELOGTOOSHORT
> 
> I'm not sure what this is for.

need a way to distinguish fucntion trace field for the next patch,
I'll make this more fit or do it some other way.. ;)

thanks,
jirka

> 
> -- Steve
> 
> > 
> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> > ---
> >  include/linux/ftrace_event.h       |    1 +
> >  kernel/trace/trace_events_filter.c |    7 ++++++-
> >  kernel/trace/trace_export.c        |   25 ++++++++++++++++++++-----
> >  3 files changed, 27 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > index 2bf677c..dd478fc 100644
> > --- a/include/linux/ftrace_event.h
> > +++ b/include/linux/ftrace_event.h
> > @@ -245,6 +245,7 @@ enum {
> >  	FILTER_STATIC_STRING,
> >  	FILTER_DYN_STRING,
> >  	FILTER_PTR_STRING,
> > +	FILTER_TRACE_FN,
> >  };
> >  
> >  #define EVENT_STORAGE_SIZE 128
> > diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> > index fdc6d22..7b0b04c 100644
> > --- a/kernel/trace/trace_events_filter.c
> > +++ b/kernel/trace/trace_events_filter.c
> > @@ -900,6 +900,11 @@ int filter_assign_type(const char *type)
> >  	return FILTER_OTHER;
> >  }
> >  
> > +static bool is_function_field(struct ftrace_event_field *field)
> > +{
> > +	return field->filter_type == FILTER_TRACE_FN;
> > +}
> > +
> >  static bool is_string_field(struct ftrace_event_field *field)
> >  {
> >  	return field->filter_type == FILTER_DYN_STRING ||
> > @@ -987,7 +992,7 @@ static int init_pred(struct filter_parse_state *ps,
> >  			fn = filter_pred_strloc;
> >  		else
> >  			fn = filter_pred_pchar;
> > -	} else {
> > +	} else if (!is_function_field(field)) {
> >  		if (field->is_signed)
> >  			ret = strict_strtoll(pred->regex.pattern, 0, &val);
> >  		else
> > diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> > index 62e86a5..7b035ab 100644
> > --- a/kernel/trace/trace_export.c
> > +++ b/kernel/trace/trace_export.c
> > @@ -67,7 +67,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
> >  	ret = trace_define_field(event_call, #type, #item,		\
> >  				 offsetof(typeof(field), item),		\
> >  				 sizeof(field.item),			\
> > -				 is_signed_type(type), FILTER_OTHER);	\
> > +				 is_signed_type(type), filter_type);	\
> >  	if (ret)							\
> >  		return ret;
> >  
> > @@ -77,7 +77,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
> >  				 offsetof(typeof(field),		\
> >  					  container.item),		\
> >  				 sizeof(field.container.item),		\
> > -				 is_signed_type(type), FILTER_OTHER);	\
> > +				 is_signed_type(type), filter_type);	\
> >  	if (ret)							\
> >  		return ret;
> >  
> > @@ -91,7 +91,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
> >  		ret = trace_define_field(event_call, event_storage, #item, \
> >  				 offsetof(typeof(field), item),		\
> >  				 sizeof(field.item),			\
> > -				 is_signed_type(type), FILTER_OTHER);	\
> > +				 is_signed_type(type), filter_type);	\
> >  		mutex_unlock(&event_storage_mutex);			\
> >  		if (ret)						\
> >  			return ret;					\
> > @@ -104,7 +104,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
> >  				 offsetof(typeof(field),		\
> >  					  container.item),		\
> >  				 sizeof(field.container.item),		\
> > -				 is_signed_type(type), FILTER_OTHER);	\
> > +				 is_signed_type(type), filter_type);	\
> >  	if (ret)							\
> >  		return ret;
> >  
> > @@ -112,10 +112,24 @@ static void __always_unused ____ftrace_check_##name(void)	\
> >  #define __dynamic_array(type, item)					\
> >  	ret = trace_define_field(event_call, #type, #item,		\
> >  				 offsetof(typeof(field), item),		\
> > -				 0, is_signed_type(type), FILTER_OTHER);\
> > +				 0, is_signed_type(type), filter_type);\
> >  	if (ret)							\
> >  		return ret;
> >  
> > +#define FILTER_TYPE_TRACE_FN           FILTER_TRACE_FN
> > +#define FILTER_TYPE_TRACE_GRAPH_ENT    FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_GRAPH_RET    FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_CTX          FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_WAKE         FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_STACK                FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_USER_STACK   FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_BPRINT       FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_PRINT                FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_MMIO_RW      FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_MMIO_MAP     FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_BRANCH       FILTER_OTHER
> > +#define FILTER_TYPE(arg)               FILTER_TYPE_##arg
> > +
> >  #undef FTRACE_ENTRY
> >  #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
> >  int									\
> > @@ -123,6 +137,7 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
> >  {									\
> >  	struct struct_name field;					\
> >  	int ret;							\
> > +	int filter_type = FILTER_TYPE(id);				\
> >  									\
> >  	tstruct;							\
> >  									\
> 
> 

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 3/9] ftrace: Fix shutdown to disable calls properly
  2011-11-28 19:18   ` Steven Rostedt
@ 2011-11-29 11:21     ` Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-11-29 11:21 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: fweisbec, mingo, paulus, acme, linux-kernel

On Mon, Nov 28, 2011 at 02:18:48PM -0500, Steven Rostedt wrote:
> On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> > Each ftrace_startup call increases the call record's flag,
> > so we need allways to decrease it when shutting down the
> > ftrace_ops.
> 
> No, that's not how this works. I probably should comment this code
> better, because it caused me to reread it too ;)
> 
> > 
> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> > ---
> >  kernel/trace/ftrace.c |    3 +--
> >  1 files changed, 1 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index c6d0293..0ca0c0d 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -1744,8 +1744,7 @@ static void ftrace_shutdown(struct ftrace_ops *ops, int command)
> >  	if (ops != &global_ops || !global_start_up)
> >  		ops->flags &= ~FTRACE_OPS_FL_ENABLED;
> >  
> > -	if (!ftrace_start_up)
> > -		command |= FTRACE_DISABLE_CALLS;
> > +	command |= FTRACE_DISABLE_CALLS;
> 
> FTRACE_DISABLE_CALLS will disable *all* functions for all tracers. If
> you are tracing with ftrace and perf, and one calls this with
> FTRACE_DISABLE_CALLS then both will no long be tracing anything.
> 
> When you call unregister_ftrace_function() it will disable the functions
> that you have enabled by the ops.
> 
> Nacked-by: Steven Rostedt <rostedt@goodmis.org.

ops, should read/test more carefully.. however the problem stays, here's
one possible scenario of the issue:

	- enable function trace like:
		echo "mm_*" > ./set_ftrace_filter 
		echo function > ./current_tracer 
	- run:
		./perf record  -e ftrace:function cal
		- this runs: register_ftrace_function/unregister_ftrace_function

	- bad result: after perf is finished, function trace will have all the functions enabled

The reason is, that after the perf record calls unregister_ftrace_function,
only the functions' record flags (struct dyn_ftrace::flags) are changed, but
functions' mcount calls are not replaced accordingly.

In our case, when function trace becomes again the only tracer, it will be
plugged directly into the entry_BIT.S code, and thus getting all the functions
from previous perf ftrace_ops without any filtering.

I think we need to project the function record flags state into the
function call replacement during the unregister_ftrace_function.

jirka

> 
> -- Steve
> 
> 
> >  
> >  	if (saved_ftrace_func != ftrace_trace_function) {
> >  		saved_ftrace_func = ftrace_trace_function;
> 
> 

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 8/9] ftrace, perf: Add FILTER_TRACE_FN event field type
  2011-11-28 20:01   ` Steven Rostedt
  2011-11-29 10:14     ` Jiri Olsa
@ 2011-11-29 11:22     ` Jiri Olsa
  2011-11-29 11:51       ` Peter Zijlstra
  1 sibling, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-11-29 11:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, linux-kernel, Peter Zijlstra

On Mon, Nov 28, 2011 at 03:01:23PM -0500, Steven Rostedt wrote:
> BTW, Please Cc Peter Zijlstra too, as he maintains perf inside the
> kernel.

thanks! ommited by mistake..

> 
> On Sun, 2011-11-27 at 19:04 +0100, Jiri Olsa wrote:
> > Adding FILTER_TRACE_FN event field type for function tracepoint
> > event, so it can be properly recognized within filtering code.
> 
> -ECHANGELOGTOOSHORT
> 
> I'm not sure what this is for.
> 
> -- Steve
> 
> > 
> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> > ---
> >  include/linux/ftrace_event.h       |    1 +
> >  kernel/trace/trace_events_filter.c |    7 ++++++-
> >  kernel/trace/trace_export.c        |   25 ++++++++++++++++++++-----
> >  3 files changed, 27 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > index 2bf677c..dd478fc 100644
> > --- a/include/linux/ftrace_event.h
> > +++ b/include/linux/ftrace_event.h
> > @@ -245,6 +245,7 @@ enum {
> >  	FILTER_STATIC_STRING,
> >  	FILTER_DYN_STRING,
> >  	FILTER_PTR_STRING,
> > +	FILTER_TRACE_FN,
> >  };
> >  
> >  #define EVENT_STORAGE_SIZE 128
> > diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> > index fdc6d22..7b0b04c 100644
> > --- a/kernel/trace/trace_events_filter.c
> > +++ b/kernel/trace/trace_events_filter.c
> > @@ -900,6 +900,11 @@ int filter_assign_type(const char *type)
> >  	return FILTER_OTHER;
> >  }
> >  
> > +static bool is_function_field(struct ftrace_event_field *field)
> > +{
> > +	return field->filter_type == FILTER_TRACE_FN;
> > +}
> > +
> >  static bool is_string_field(struct ftrace_event_field *field)
> >  {
> >  	return field->filter_type == FILTER_DYN_STRING ||
> > @@ -987,7 +992,7 @@ static int init_pred(struct filter_parse_state *ps,
> >  			fn = filter_pred_strloc;
> >  		else
> >  			fn = filter_pred_pchar;
> > -	} else {
> > +	} else if (!is_function_field(field)) {
> >  		if (field->is_signed)
> >  			ret = strict_strtoll(pred->regex.pattern, 0, &val);
> >  		else
> > diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> > index 62e86a5..7b035ab 100644
> > --- a/kernel/trace/trace_export.c
> > +++ b/kernel/trace/trace_export.c
> > @@ -67,7 +67,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
> >  	ret = trace_define_field(event_call, #type, #item,		\
> >  				 offsetof(typeof(field), item),		\
> >  				 sizeof(field.item),			\
> > -				 is_signed_type(type), FILTER_OTHER);	\
> > +				 is_signed_type(type), filter_type);	\
> >  	if (ret)							\
> >  		return ret;
> >  
> > @@ -77,7 +77,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
> >  				 offsetof(typeof(field),		\
> >  					  container.item),		\
> >  				 sizeof(field.container.item),		\
> > -				 is_signed_type(type), FILTER_OTHER);	\
> > +				 is_signed_type(type), filter_type);	\
> >  	if (ret)							\
> >  		return ret;
> >  
> > @@ -91,7 +91,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
> >  		ret = trace_define_field(event_call, event_storage, #item, \
> >  				 offsetof(typeof(field), item),		\
> >  				 sizeof(field.item),			\
> > -				 is_signed_type(type), FILTER_OTHER);	\
> > +				 is_signed_type(type), filter_type);	\
> >  		mutex_unlock(&event_storage_mutex);			\
> >  		if (ret)						\
> >  			return ret;					\
> > @@ -104,7 +104,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
> >  				 offsetof(typeof(field),		\
> >  					  container.item),		\
> >  				 sizeof(field.container.item),		\
> > -				 is_signed_type(type), FILTER_OTHER);	\
> > +				 is_signed_type(type), filter_type);	\
> >  	if (ret)							\
> >  		return ret;
> >  
> > @@ -112,10 +112,24 @@ static void __always_unused ____ftrace_check_##name(void)	\
> >  #define __dynamic_array(type, item)					\
> >  	ret = trace_define_field(event_call, #type, #item,		\
> >  				 offsetof(typeof(field), item),		\
> > -				 0, is_signed_type(type), FILTER_OTHER);\
> > +				 0, is_signed_type(type), filter_type);\
> >  	if (ret)							\
> >  		return ret;
> >  
> > +#define FILTER_TYPE_TRACE_FN           FILTER_TRACE_FN
> > +#define FILTER_TYPE_TRACE_GRAPH_ENT    FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_GRAPH_RET    FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_CTX          FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_WAKE         FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_STACK                FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_USER_STACK   FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_BPRINT       FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_PRINT                FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_MMIO_RW      FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_MMIO_MAP     FILTER_OTHER
> > +#define FILTER_TYPE_TRACE_BRANCH       FILTER_OTHER
> > +#define FILTER_TYPE(arg)               FILTER_TYPE_##arg
> > +
> >  #undef FTRACE_ENTRY
> >  #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
> >  int									\
> > @@ -123,6 +137,7 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
> >  {									\
> >  	struct struct_name field;					\
> >  	int ret;							\
> > +	int filter_type = FILTER_TYPE(id);				\
> >  									\
> >  	tstruct;							\
> >  									\
> 
> 

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 8/9] ftrace, perf: Add FILTER_TRACE_FN event field type
  2011-11-29 11:22     ` Jiri Olsa
@ 2011-11-29 11:51       ` Peter Zijlstra
  2011-11-29 12:21         ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Peter Zijlstra @ 2011-11-29 11:51 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Steven Rostedt, fweisbec, mingo, paulus, acme, linux-kernel

On Tue, 2011-11-29 at 12:22 +0100, Jiri Olsa wrote:
> On Mon, Nov 28, 2011 at 03:01:23PM -0500, Steven Rostedt wrote:
> > BTW, Please Cc Peter Zijlstra too, as he maintains perf inside the
> > kernel.
> 
> thanks! ommited by mistake..

I hope not trimming the email was a mistake too? :-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 8/9] ftrace, perf: Add FILTER_TRACE_FN event field type
  2011-11-29 11:51       ` Peter Zijlstra
@ 2011-11-29 12:21         ` Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-11-29 12:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, fweisbec, mingo, paulus, acme, linux-kernel

On Tue, Nov 29, 2011 at 12:51:40PM +0100, Peter Zijlstra wrote:
> On Tue, 2011-11-29 at 12:22 +0100, Jiri Olsa wrote:
> > On Mon, Nov 28, 2011 at 03:01:23PM -0500, Steven Rostedt wrote:
> > > BTW, Please Cc Peter Zijlstra too, as he maintains perf inside the
> > > kernel.
> > 
> > thanks! ommited by mistake..
> 
> I hope not trimming the email was a mistake too? :-)

Acked-by: Jiri Olsa <jolsa@redhat.com>

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [RFCv2] ftrace, perf: Adding support to use function trace
  2011-11-27 18:04 [RFC] ftrace, perf: Adding support to use function trace Jiri Olsa
                   ` (8 preceding siblings ...)
  2011-11-27 18:04 ` [PATCH 9/9] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2011-12-05 17:22 ` Jiri Olsa
  2011-12-05 17:22   ` [PATCHv2 01/10] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
                     ` (11 more replies)
  9 siblings, 12 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-05 17:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

hi,
here's another version of perf support for function trace
with filter. The changeset is working and hopefully is not
introducing more bugs.. ;) still testing though..

It's still marked RFC since I'm not sure there's better way to go
with patches 2 and 10. Also not sure if the condition fixed by
patch 7 was not intentional. The rest is fixed/updated version
of the v1 changes.

attached patches:
 01/10 ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update
 02/10 ftrace: Change mcount call replacement logic
 03/10 ftrace: Add enable/disable ftrace_ops control interface
 04/10 ftrace, perf: Add open/close tracepoint perf registration actions
 05/10 ftrace, perf: Add add/del tracepoint perf registration actions
 06/10 ftrace, perf: Add support to use function tracepoint in perf
 07/10 ftrace: Change filter/notrace set functions to return exit code
 08/10 ftrace, perf: Distinguish ftrace function event field type
 09/10 ftrace, perf: Add filter support for function trace event
 10/10 ftrace, graph: Add global_ops filter callback for graph tracing

v2 changes:
 01/10 - keeping the old fix instead of adding hash_has_contents func
         I'll send separating patchset for this
 02/10 - using different way to avoid the issue (3/9 in v1)
 03/10 - using the way proposed by Steven for controling ftrace_ops
         (4/9 in v1)
 06/10 - added check ensuring the ftrace:function event could be used by
         root only (7/9 in v1)
 08/10 - added more description (8/9 in v1)
 09/10 - changed '&&' operator to '||' which seems more suitable
         in this case (9/9 in v1)

thanks for comments,
jirka
---
 include/linux/ftrace.h             |   18 +++-
 include/linux/ftrace_event.h       |    9 +-
 include/linux/perf_event.h         |    3 +
 kernel/trace/ftrace.c              |  202 +++++++++++++++++++++++++++++------
 kernel/trace/trace.h               |   11 ++-
 kernel/trace/trace_event_perf.c    |  212 +++++++++++++++++++++++++++++-------
 kernel/trace/trace_events.c        |   12 ++-
 kernel/trace/trace_events_filter.c |  116 ++++++++++++++++++-
 kernel/trace/trace_export.c        |   53 ++++++++-
 kernel/trace/trace_kprobe.c        |    8 +-
 kernel/trace/trace_syscalls.c      |   18 +++-
 11 files changed, 562 insertions(+), 100 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCHv2 01/10] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
@ 2011-12-05 17:22   ` Jiri Olsa
  2011-12-05 17:22   ` [PATCHv2 02/10] ftrace: Change mcount call replacement logic Jiri Olsa
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-05 17:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

We need to check the existence of the other_hash before
we touch its count variable.

This issue is hit only when non global ftrace_ops is used.
The global ftrace_ops is initialized with empty hashes.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/ftrace.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b1e8943..c6d0293 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1372,7 +1372,8 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
 			if (filter_hash && in_hash && !in_other_hash)
 				match = 1;
 			else if (!filter_hash && in_hash &&
-				 (in_other_hash || !other_hash->count))
+				 (in_other_hash ||
+				  !other_hash || !other_hash->count))
 				match = 1;
 		}
 		if (!match)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCHv2 02/10] ftrace: Change mcount call replacement logic
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
  2011-12-05 17:22   ` [PATCHv2 01/10] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
@ 2011-12-05 17:22   ` Jiri Olsa
  2011-12-19 19:03     ` Steven Rostedt
                       ` (2 more replies)
  2011-12-05 17:22   ` [PATCHv2 03/10] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
                     ` (9 subsequent siblings)
  11 siblings, 3 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-05 17:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

The current logic of updating calls is to do the mcount replacement
only when ftrace_ops is being registered. When ftrace_ops is being
unregistered then only in case it was the last registered ftrace_ops,
all calls are disabled.

This is an issue when ftrace_ops without FTRACE_OPS_FL_GLOBAL flag
is being unregistered, because all the functions stays enabled
and thus inherrited by global_ops, like in following scenario:

  - set restricting global filter
  - enable function trace
  - register/unregister ftrace_ops with flags != FTRACE_OPS_FL_GLOBAL
    and with no filter

Now the global_ops will get by all the functions regardless the
global_ops filter. So we need all functions that where enabled via
this ftrace_ops and are not part of global filter to be disabled.

Note, currently if there are only global ftrace_ops registered,
there's no filter hash check and the filter is represented only
by enabled records.

Changing the ftrace_shutdown logic to ensure the replacement
is called for each ftrace_ops being unregistered.

Also changing FTRACE_ENABLE_CALLS into FTRACE_UPDATE_CALLS
calls which seems more suitable now.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/ftrace.c |   27 +++++++++++++--------------
 1 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c6d0293..b79ab33 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -948,7 +948,7 @@ struct ftrace_func_probe {
 };
 
 enum {
-	FTRACE_ENABLE_CALLS		= (1 << 0),
+	FTRACE_UPDATE_CALLS		= (1 << 0),
 	FTRACE_DISABLE_CALLS		= (1 << 1),
 	FTRACE_UPDATE_TRACE_FUNC	= (1 << 2),
 	FTRACE_START_FUNC_RET		= (1 << 3),
@@ -1520,7 +1520,7 @@ int ftrace_text_reserved(void *start, void *end)
 
 
 static int
-__ftrace_replace_code(struct dyn_ftrace *rec, int enable)
+__ftrace_replace_code(struct dyn_ftrace *rec, int update)
 {
 	unsigned long ftrace_addr;
 	unsigned long flag = 0UL;
@@ -1528,17 +1528,17 @@ __ftrace_replace_code(struct dyn_ftrace *rec, int enable)
 	ftrace_addr = (unsigned long)FTRACE_ADDR;
 
 	/*
-	 * If we are enabling tracing:
+	 * If we are updating calls:
 	 *
 	 *   If the record has a ref count, then we need to enable it
 	 *   because someone is using it.
 	 *
 	 *   Otherwise we make sure its disabled.
 	 *
-	 * If we are disabling tracing, then disable all records that
+	 * If we are disabling calls, then disable all records that
 	 * are enabled.
 	 */
-	if (enable && (rec->flags & ~FTRACE_FL_MASK))
+	if (update && (rec->flags & ~FTRACE_FL_MASK))
 		flag = FTRACE_FL_ENABLED;
 
 	/* If the state of this record hasn't changed, then do nothing */
@@ -1554,7 +1554,7 @@ __ftrace_replace_code(struct dyn_ftrace *rec, int enable)
 	return ftrace_make_nop(NULL, rec, ftrace_addr);
 }
 
-static void ftrace_replace_code(int enable)
+static void ftrace_replace_code(int update)
 {
 	struct dyn_ftrace *rec;
 	struct ftrace_page *pg;
@@ -1568,7 +1568,7 @@ static void ftrace_replace_code(int enable)
 		if (rec->flags & FTRACE_FL_FREE)
 			continue;
 
-		failed = __ftrace_replace_code(rec, enable);
+		failed = __ftrace_replace_code(rec, update);
 		if (failed) {
 			ftrace_bug(failed, rec->ip);
 			/* Stop processing */
@@ -1624,7 +1624,7 @@ static int __ftrace_modify_code(void *data)
 	 */
 	function_trace_stop++;
 
-	if (*command & FTRACE_ENABLE_CALLS)
+	if (*command & FTRACE_UPDATE_CALLS)
 		ftrace_replace_code(1);
 	else if (*command & FTRACE_DISABLE_CALLS)
 		ftrace_replace_code(0);
@@ -1692,7 +1692,7 @@ static int ftrace_startup(struct ftrace_ops *ops, int command)
 		return -ENODEV;
 
 	ftrace_start_up++;
-	command |= FTRACE_ENABLE_CALLS;
+	command |= FTRACE_UPDATE_CALLS;
 
 	/* ops marked global share the filter hashes */
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
@@ -1744,8 +1744,7 @@ static void ftrace_shutdown(struct ftrace_ops *ops, int command)
 	if (ops != &global_ops || !global_start_up)
 		ops->flags &= ~FTRACE_OPS_FL_ENABLED;
 
-	if (!ftrace_start_up)
-		command |= FTRACE_DISABLE_CALLS;
+	command |= FTRACE_UPDATE_CALLS;
 
 	if (saved_ftrace_func != ftrace_trace_function) {
 		saved_ftrace_func = ftrace_trace_function;
@@ -1767,7 +1766,7 @@ static void ftrace_startup_sysctl(void)
 	saved_ftrace_func = NULL;
 	/* ftrace_start_up is true if we want ftrace running */
 	if (ftrace_start_up)
-		ftrace_run_update_code(FTRACE_ENABLE_CALLS);
+		ftrace_run_update_code(FTRACE_UPDATE_CALLS);
 }
 
 static void ftrace_shutdown_sysctl(void)
@@ -2920,7 +2919,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
 	if (!ret && ops->flags & FTRACE_OPS_FL_ENABLED
 	    && ftrace_enabled)
-		ftrace_run_update_code(FTRACE_ENABLE_CALLS);
+		ftrace_run_update_code(FTRACE_UPDATE_CALLS);
 
 	mutex_unlock(&ftrace_lock);
 
@@ -3108,7 +3107,7 @@ ftrace_regex_release(struct inode *inode, struct file *file)
 				       orig_hash, iter->hash);
 		if (!ret && (iter->ops->flags & FTRACE_OPS_FL_ENABLED)
 		    && ftrace_enabled)
-			ftrace_run_update_code(FTRACE_ENABLE_CALLS);
+			ftrace_run_update_code(FTRACE_UPDATE_CALLS);
 
 		mutex_unlock(&ftrace_lock);
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCHv2 03/10] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
  2011-12-05 17:22   ` [PATCHv2 01/10] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
  2011-12-05 17:22   ` [PATCHv2 02/10] ftrace: Change mcount call replacement logic Jiri Olsa
@ 2011-12-05 17:22   ` Jiri Olsa
  2011-12-19 19:19     ` Steven Rostedt
  2011-12-19 19:35     ` Steven Rostedt
  2011-12-05 17:22   ` [PATCHv2 04/10] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
                     ` (8 subsequent siblings)
  11 siblings, 2 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-05 17:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.

Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.

When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.

The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.

Adding 2 inline functions enable_ftrace_function/disable_ftrace_function,
which enable/disable the ftrace_ops for current cpu.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |   14 ++++++
 kernel/trace/ftrace.c  |  115 +++++++++++++++++++++++++++++++++++++++++++-----
 kernel/trace/trace.h   |    2 +
 3 files changed, 120 insertions(+), 11 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 26eafce..b223944 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -35,12 +35,14 @@ enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	void __percpu			*disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +99,18 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+static inline void enable_ftrace_function(struct ftrace_ops *ops)
+{
+	atomic_t *disabled = this_cpu_ptr(ops->disabled);
+	atomic_dec(disabled);
+}
+
+static inline void disable_ftrace_function(struct ftrace_ops *ops)
+{
+	atomic_t *disabled = this_cpu_ptr(ops->disabled);
+	atomic_inc(disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b79ab33..c2fa233 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -87,12 +87,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
 };
 
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
 ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
+static struct ftrace_ops control_ops;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
@@ -166,6 +168,38 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
 }
 #endif
 
+static void control_ops_disable_all(struct ftrace_ops *ops)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		atomic_set(per_cpu_ptr(ops->disabled, cpu), 1);
+}
+
+static int control_ops_alloc(struct ftrace_ops *ops)
+{
+	atomic_t *disabled;
+
+	disabled = alloc_percpu(atomic_t);
+	if (!disabled)
+		return -ENOMEM;
+
+	ops->disabled = disabled;
+	control_ops_disable_all(ops);
+	return 0;
+}
+
+static void control_ops_free(struct ftrace_ops *ops)
+{
+	free_percpu(ops->disabled);
+}
+
+static int control_ops_is_disabled(struct ftrace_ops *ops)
+{
+	atomic_t *disabled = this_cpu_ptr(ops->disabled);
+	return atomic_read(disabled);
+}
+
 static void update_global_ops(void)
 {
 	ftrace_func_t func;
@@ -221,7 +255,7 @@ static void update_ftrace_function(void)
 #endif
 }
 
-static void add_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
+static void __add_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 {
 	ops->next = *list;
 	/*
@@ -233,7 +267,7 @@ static void add_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	rcu_assign_pointer(*list, ops);
 }
 
-static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
+static int __remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 {
 	struct ftrace_ops **p;
 
@@ -257,6 +291,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	return 0;
 }
 
+static void add_ftrace_ops(struct ftrace_ops **list,
+			   struct ftrace_ops *main_ops,
+			   struct ftrace_ops *ops)
+{
+	int first = *list == &ftrace_list_end;
+	__add_ftrace_ops(list, ops);
+	if (first)
+		__add_ftrace_ops(&ftrace_ops_list, main_ops);
+}
+
+static int remove_ftrace_ops(struct ftrace_ops **list,
+			      struct ftrace_ops *main_ops,
+			      struct ftrace_ops *ops)
+{
+	int ret = __remove_ftrace_ops(list, ops);
+	if (!ret && *list == &ftrace_list_end)
+		ret = __remove_ftrace_ops(&ftrace_ops_list, main_ops);
+	return ret;
+}
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ftrace_disabled)
@@ -268,17 +322,23 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
 		return -EBUSY;
 
+#define FL_GLOBAL_CONTROL (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
+	if ((ops->flags & FL_GLOBAL_CONTROL) == FL_GLOBAL_CONTROL)
+		return -EINVAL;
+#undef FL_GLOBAL_CONTROL
+
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		int first = ftrace_global_list == &ftrace_list_end;
-		add_ftrace_ops(&ftrace_global_list, ops);
+		add_ftrace_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
-		if (first)
-			add_ftrace_ops(&ftrace_ops_list, &global_ops);
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		if (control_ops_alloc(ops))
+			return -ENOMEM;
+		add_ftrace_ops(&ftrace_control_list, &control_ops, ops);
 	} else
-		add_ftrace_ops(&ftrace_ops_list, ops);
+		__add_ftrace_ops(&ftrace_ops_list, ops);
 
 	if (ftrace_enabled)
 		update_ftrace_function();
@@ -300,13 +360,16 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 		return -EINVAL;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		ret = remove_ftrace_ops(&ftrace_global_list, ops);
-		if (!ret && ftrace_global_list == &ftrace_list_end)
-			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+		ret = remove_ftrace_ops(&ftrace_global_list, &global_ops, ops);
 		if (!ret)
 			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		ret = remove_ftrace_ops(&ftrace_control_list,
+					&control_ops, ops);
+		if (!ret)
+			control_ops_free(ops);
 	} else
-		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
+		ret = __remove_ftrace_ops(&ftrace_ops_list, ops);
 
 	if (ret < 0)
 		return ret;
@@ -3562,6 +3625,36 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
+ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_ops *op;
+
+	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
+		return;
+
+	/*
+	 * Some of the ops may be dynamically allocated,
+	 * they must be freed after a synchronize_sched().
+	 */
+	preempt_disable_notrace();
+	trace_recursion_set(TRACE_CONTROL_BIT);
+	op = rcu_dereference_raw(ftrace_control_list);
+	while (op != &ftrace_list_end) {
+		if (!control_ops_is_disabled(op) &&
+		    ftrace_ops_test(op, ip))
+			op->func(ip, parent_ip);
+
+		op = rcu_dereference_raw(op->next);
+	};
+	preempt_enable_notrace();
+	trace_recursion_clear(TRACE_CONTROL_BIT);
+}
+
+static struct ftrace_ops control_ops = {
+	.func = ftrace_ops_control_func,
+};
+
+static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 {
 	struct ftrace_ops *op;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f8ec229..da05926 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ struct tracer {
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
 #define TRACE_GLOBAL_BIT		(1<<12)
+#define TRACE_CONTROL_BIT		(1<<13)
+
 /*
  * Abuse of the trace_recursion.
  * As we need a way to maintain state if we are tracing the function
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCHv2 04/10] ftrace, perf: Add open/close tracepoint perf registration actions
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
                     ` (2 preceding siblings ...)
  2011-12-05 17:22   ` [PATCHv2 03/10] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2011-12-05 17:22   ` Jiri Olsa
  2011-12-05 17:22   ` [PATCHv2 05/10] ftrace, perf: Add add/del " Jiri Olsa
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-05 17:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
register/unregister from open/close actions.

The register/unregister actions are invoked for the first/last
tracepoint user when opening/closing the evet.

The open/close actions are invoked for each tracepoint user when
opening/closing the event.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    6 +-
 kernel/trace/trace.h            |    5 ++
 kernel/trace/trace_event_perf.c |  116 +++++++++++++++++++++++++--------------
 kernel/trace/trace_events.c     |   10 ++-
 kernel/trace/trace_kprobe.c     |    6 ++-
 kernel/trace/trace_syscalls.c   |   14 +++-
 6 files changed, 106 insertions(+), 51 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index c3da42d..195e360 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -146,6 +146,8 @@ enum trace_reg {
 	TRACE_REG_UNREGISTER,
 	TRACE_REG_PERF_REGISTER,
 	TRACE_REG_PERF_UNREGISTER,
+	TRACE_REG_PERF_OPEN,
+	TRACE_REG_PERF_CLOSE,
 };
 
 struct ftrace_event_call;
@@ -157,7 +159,7 @@ struct ftrace_event_class {
 	void			*perf_probe;
 #endif
 	int			(*reg)(struct ftrace_event_call *event,
-				       enum trace_reg type);
+				       enum trace_reg type, void *data);
 	int			(*define_fields)(struct ftrace_event_call *);
 	struct list_head	*(*get_fields)(struct ftrace_event_call *);
 	struct list_head	fields;
@@ -165,7 +167,7 @@ struct ftrace_event_class {
 };
 
 extern int ftrace_event_reg(struct ftrace_event_call *event,
-			    enum trace_reg type);
+			    enum trace_reg type, void *data);
 
 enum {
 	TRACE_EVENT_FL_ENABLED_BIT,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index da05926..476527a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -827,4 +827,9 @@ extern const char *__stop___trace_bprintk_fmt[];
 	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 #include "trace_entries.h"
 
+#ifdef CONFIG_PERF_EVENTS
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data);
+#endif
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 19a359d..0cfcc37 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -44,23 +44,17 @@ static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 	return 0;
 }
 
-static int perf_trace_event_init(struct ftrace_event_call *tp_event,
-				 struct perf_event *p_event)
+static int perf_trace_event_reg(struct ftrace_event_call *tp_event,
+				struct perf_event *p_event)
 {
 	struct hlist_head __percpu *list;
-	int ret;
+	int ret = -ENOMEM;
 	int cpu;
 
-	ret = perf_trace_event_perm(tp_event, p_event);
-	if (ret)
-		return ret;
-
 	p_event->tp_event = tp_event;
 	if (tp_event->perf_refcount++ > 0)
 		return 0;
 
-	ret = -ENOMEM;
-
 	list = alloc_percpu(struct hlist_head);
 	if (!list)
 		goto fail;
@@ -83,7 +77,7 @@ static int perf_trace_event_init(struct ftrace_event_call *tp_event,
 		}
 	}
 
-	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER);
+	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER, NULL);
 	if (ret)
 		goto fail;
 
@@ -108,6 +102,69 @@ fail:
 	return ret;
 }
 
+static void perf_trace_event_unreg(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	int i;
+
+	if (--tp_event->perf_refcount > 0)
+		goto out;
+
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER, NULL);
+
+	/*
+	 * Ensure our callback won't be called anymore. The buffers
+	 * will be freed after that.
+	 */
+	tracepoint_synchronize_unregister();
+
+	free_percpu(tp_event->perf_events);
+	tp_event->perf_events = NULL;
+
+	if (!--total_ref_count) {
+		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
+			free_percpu(perf_trace_buf[i]);
+			perf_trace_buf[i] = NULL;
+		}
+	}
+out:
+	module_put(tp_event->mod);
+}
+
+static int perf_trace_event_open(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_OPEN, p_event);
+}
+
+static void perf_trace_event_close(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
+}
+
+static int perf_trace_event_init(struct ftrace_event_call *tp_event,
+				 struct perf_event *p_event)
+{
+	int ret;
+
+	ret = perf_trace_event_perm(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_reg(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_open(p_event);
+	if (ret) {
+		perf_trace_event_unreg(p_event);
+		return ret;
+	}
+
+	return 0;
+}
+
 int perf_trace_init(struct perf_event *p_event)
 {
 	struct ftrace_event_call *tp_event;
@@ -130,6 +187,14 @@ int perf_trace_init(struct perf_event *p_event)
 	return ret;
 }
 
+void perf_trace_destroy(struct perf_event *p_event)
+{
+	mutex_lock(&event_mutex);
+	perf_trace_event_close(p_event);
+	perf_trace_event_unreg(p_event);
+	mutex_unlock(&event_mutex);
+}
+
 int perf_trace_add(struct perf_event *p_event, int flags)
 {
 	struct ftrace_event_call *tp_event = p_event->tp_event;
@@ -154,37 +219,6 @@ void perf_trace_del(struct perf_event *p_event, int flags)
 	hlist_del_rcu(&p_event->hlist_entry);
 }
 
-void perf_trace_destroy(struct perf_event *p_event)
-{
-	struct ftrace_event_call *tp_event = p_event->tp_event;
-	int i;
-
-	mutex_lock(&event_mutex);
-	if (--tp_event->perf_refcount > 0)
-		goto out;
-
-	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER);
-
-	/*
-	 * Ensure our callback won't be called anymore. The buffers
-	 * will be freed after that.
-	 */
-	tracepoint_synchronize_unregister();
-
-	free_percpu(tp_event->perf_events);
-	tp_event->perf_events = NULL;
-
-	if (!--total_ref_count) {
-		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
-			free_percpu(perf_trace_buf[i]);
-			perf_trace_buf[i] = NULL;
-		}
-	}
-out:
-	module_put(tp_event->mod);
-	mutex_unlock(&event_mutex);
-}
-
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 				       struct pt_regs *regs, int *rctxp)
 {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c212a7f..5138fea 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -147,7 +147,8 @@ int trace_event_raw_init(struct ftrace_event_call *call)
 }
 EXPORT_SYMBOL_GPL(trace_event_raw_init);
 
-int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
+int ftrace_event_reg(struct ftrace_event_call *call,
+		     enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -170,6 +171,9 @@ int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
 					    call->class->perf_probe,
 					    call);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
@@ -209,7 +213,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_stop_cmdline_record();
 				call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			call->class->reg(call, TRACE_REG_UNREGISTER);
+			call->class->reg(call, TRACE_REG_UNREGISTER, NULL);
 		}
 		break;
 	case 1:
@@ -218,7 +222,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_start_cmdline_record();
 				call->flags |= TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			ret = call->class->reg(call, TRACE_REG_REGISTER);
+			ret = call->class->reg(call, TRACE_REG_REGISTER, NULL);
 			if (ret) {
 				tracing_stop_cmdline_record();
 				pr_info("event trace: Could not enable event "
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 00d527c..5667f89 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1892,7 +1892,8 @@ static __kprobes void kretprobe_perf_func(struct kretprobe_instance *ri,
 #endif	/* CONFIG_PERF_EVENTS */
 
 static __kprobes
-int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
+int kprobe_register(struct ftrace_event_call *event,
+		    enum trace_reg type, void *data)
 {
 	struct trace_probe *tp = (struct trace_probe *)event->data;
 
@@ -1909,6 +1910,9 @@ int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
 	case TRACE_REG_PERF_UNREGISTER:
 		disable_trace_probe(tp, TP_FLAG_PROFILE);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index cb65454..6916b0d 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -17,9 +17,9 @@ static DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
 static DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 
 static int syscall_enter_define_fields(struct ftrace_event_call *call);
 static int syscall_exit_define_fields(struct ftrace_event_call *call);
@@ -649,7 +649,7 @@ void perf_sysexit_disable(struct ftrace_event_call *call)
 #endif /* CONFIG_PERF_EVENTS */
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -664,13 +664,16 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysenter_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
 }
 
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -685,6 +688,9 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysexit_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCHv2 05/10] ftrace, perf: Add add/del tracepoint perf registration actions
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
                     ` (3 preceding siblings ...)
  2011-12-05 17:22   ` [PATCHv2 04/10] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
@ 2011-12-05 17:22   ` Jiri Olsa
  2011-12-05 17:22   ` [PATCHv2 06/10] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-05 17:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
perf event schedule in/out actions.

The add action is invoked for when the perf event is scheduled in,
while the del action is invoked when the event is scheduled out.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    2 ++
 kernel/trace/trace_event_perf.c |    4 +++-
 kernel/trace/trace_events.c     |    2 ++
 kernel/trace/trace_kprobe.c     |    2 ++
 kernel/trace/trace_syscalls.c   |    4 ++++
 5 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 195e360..2bf677c 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -148,6 +148,8 @@ enum trace_reg {
 	TRACE_REG_PERF_UNREGISTER,
 	TRACE_REG_PERF_OPEN,
 	TRACE_REG_PERF_CLOSE,
+	TRACE_REG_PERF_ADD,
+	TRACE_REG_PERF_DEL,
 };
 
 struct ftrace_event_call;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 0cfcc37..d72af0b 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -211,12 +211,14 @@ int perf_trace_add(struct perf_event *p_event, int flags)
 	list = this_cpu_ptr(pcpu_list);
 	hlist_add_head_rcu(&p_event->hlist_entry, list);
 
-	return 0;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_ADD, p_event);
 }
 
 void perf_trace_del(struct perf_event *p_event, int flags)
 {
+	struct ftrace_event_call *tp_event = p_event->tp_event;
 	hlist_del_rcu(&p_event->hlist_entry);
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_DEL, p_event);
 }
 
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 5138fea..079a93a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -173,6 +173,8 @@ int ftrace_event_reg(struct ftrace_event_call *call,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5667f89..580a05e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1912,6 +1912,8 @@ int kprobe_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 6916b0d..dbdd804 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -666,6 +666,8 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
@@ -690,6 +692,8 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCHv2 06/10] ftrace, perf: Add support to use function tracepoint in perf
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
                     ` (4 preceding siblings ...)
  2011-12-05 17:22   ` [PATCHv2 05/10] ftrace, perf: Add add/del " Jiri Olsa
@ 2011-12-05 17:22   ` Jiri Olsa
  2011-12-05 17:22   ` [PATCHv2 07/10] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-05 17:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding perf registration support for the ftrace function event,
so it is now possible to register it via perf interface.

The perf_event struct statically contains ftrace_ops as a handle
for function tracer. The function tracer is registered/unregistered
in open/close actions, and enabled/disabled in add/del actions.

It is now possible to use function trace within perf commands
like:

  perf record -e ftrace:function ls
  perf stat -e ftrace:function ls

Allowed only for root.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h      |    3 +
 kernel/trace/trace.h            |    2 +
 kernel/trace/trace_event_perf.c |   92 +++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_export.c     |   28 ++++++++++++
 4 files changed, 125 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1e9ebe5..6071995 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -847,6 +847,9 @@ struct perf_event {
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
 	struct event_filter		*filter;
+#ifdef CONFIG_FUNCTION_TRACER
+	struct ftrace_ops               ftrace_ops;
+#endif
 #endif
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 476527a..4199916 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -591,6 +591,8 @@ static inline int ftrace_trace_task(struct task_struct *task)
 static inline int ftrace_is_dead(void) { return 0; }
 #endif
 
+int ftrace_event_is_function(struct ftrace_event_call *call);
+
 /*
  * struct trace_parser - servers for reading the user input separated by spaces
  * @cont: set if the input is not complete - no final space char was found
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index d72af0b..c3bc56b 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -24,6 +24,11 @@ static int	total_ref_count;
 static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 				 struct perf_event *p_event)
 {
+	/* The ftrace function trace is allowed only for root. */
+	if (ftrace_event_is_function(tp_event) &&
+	    perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	/* No tracing, just counting, so no obvious leak */
 	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
 		return 0;
@@ -250,3 +255,90 @@ __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 	return raw_data;
 }
 EXPORT_SYMBOL_GPL(perf_trace_buf_prepare);
+
+
+static void
+perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_entry *entry;
+	struct hlist_head *head;
+	struct pt_regs regs;
+	int rctx;
+
+#define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
+		    sizeof(u64)) - sizeof(u32))
+
+	BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE);
+
+	perf_fetch_caller_regs(&regs);
+
+	entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);
+	if (!entry)
+		return;
+
+	entry->ip = ip;
+	entry->parent_ip = parent_ip;
+
+	head = this_cpu_ptr(event_function.perf_events);
+	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
+			      1, &regs, head);
+
+#undef ENTRY_SIZE
+}
+
+static int perf_ftrace_function_register(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+
+	ops->flags |= FTRACE_OPS_FL_CONTROL;
+	ops->func = perf_ftrace_function_call;
+	return register_ftrace_function(ops);
+}
+
+static int perf_ftrace_function_unregister(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	return unregister_ftrace_function(ops);
+}
+
+static void perf_ftrace_function_enable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	enable_ftrace_function(ops);
+}
+
+static void perf_ftrace_function_disable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	disable_ftrace_function(ops);
+}
+
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data)
+{
+	int etype = call->event.type;
+
+	if (etype != TRACE_FN)
+		return -EINVAL;
+
+	switch (type) {
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+		return perf_ftrace_function_register(data);
+	case TRACE_REG_PERF_CLOSE:
+		return perf_ftrace_function_unregister(data);
+	case TRACE_REG_PERF_ADD:
+		perf_ftrace_function_enable(data);
+		return 0;
+	case TRACE_REG_PERF_DEL:
+		perf_ftrace_function_disable(data);
+		return 0;
+	}
+
+	return -EINVAL;
+}
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index bbeec31..867653c 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 
 #include "trace_entries.h"
 
+static int ftrace_event_class_register(struct ftrace_event_call *call,
+				       enum trace_reg type, void *data)
+{
+	switch (type) {
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
+#ifdef CONFIG_PERF_EVENTS
+		return perf_ftrace_event_register(call, type, data);
+#endif
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	}
+
+	return -EINVAL;
+}
+
 #undef __entry
 #define __entry REC
 
@@ -159,6 +181,7 @@ struct ftrace_event_class event_class_ftrace_##call = {			\
 	.system			= __stringify(TRACE_SYSTEM),		\
 	.define_fields		= ftrace_define_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
+	.reg			= ftrace_event_class_register,		\
 };									\
 									\
 struct ftrace_event_call __used event_##call = {			\
@@ -170,4 +193,9 @@ struct ftrace_event_call __used event_##call = {			\
 struct ftrace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
+int ftrace_event_is_function(struct ftrace_event_call *call)
+{
+	return call == &event_function;
+}
+
 #include "trace_entries.h"
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCHv2 07/10] ftrace: Change filter/notrace set functions to return exit code
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
                     ` (5 preceding siblings ...)
  2011-12-05 17:22   ` [PATCHv2 06/10] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
@ 2011-12-05 17:22   ` Jiri Olsa
  2011-12-19 19:22     ` Steven Rostedt
  2011-12-05 17:22   ` [PATCHv2 08/10] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
                     ` (4 subsequent siblings)
  11 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-05 17:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.

The set_ftrace_filter interface returns error in case the filter
did not match:

  # echo krava > set_ftrace_filter
  bash: echo: write error: Invalid argument

Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |    4 ++--
 kernel/trace/ftrace.c  |   16 ++++++++++------
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index b223944..ddd55cc 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -194,9 +194,9 @@ struct dyn_ftrace {
 };
 
 int ftrace_force_update(void);
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset);
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c2fa233..2dae0c7 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2975,8 +2975,11 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	mutex_lock(&ftrace_regex_lock);
 	if (reset)
 		ftrace_filter_reset(hash);
-	if (buf)
-		ftrace_match_records(hash, buf, len);
+	if (buf &&
+	    !ftrace_match_records(hash, buf, len)) {
+		ret = -EINVAL;
+		goto out_regex_unlock;
+	}
 
 	mutex_lock(&ftrace_lock);
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
@@ -2986,6 +2989,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 
 	mutex_unlock(&ftrace_lock);
 
+ out_regex_unlock:
 	mutex_unlock(&ftrace_regex_lock);
 
 	free_ftrace_hash(hash);
@@ -3002,10 +3006,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
  * Filters denote which functions should be enabled when tracing is enabled.
  * If @buf is NULL and reset is set, all functions will be enabled for tracing.
  */
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 1);
+	return ftrace_set_regex(ops, buf, len, reset, 1);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_filter);
 
@@ -3020,10 +3024,10 @@ EXPORT_SYMBOL_GPL(ftrace_set_filter);
  * is enabled. If @buf is NULL and reset is set, all functions will be enabled
  * for tracing.
  */
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 0);
+	return ftrace_set_regex(ops, buf, len, reset, 0);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_notrace);
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCHv2 08/10] ftrace, perf: Distinguish ftrace function event field type
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
                     ` (6 preceding siblings ...)
  2011-12-05 17:22   ` [PATCHv2 07/10] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2011-12-05 17:22   ` Jiri Olsa
  2011-12-05 17:22   ` [PATCHv2 09/10] ftrace, perf: Add filter support for function trace event Jiri Olsa
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-05 17:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding FILTER_TRACE_FN event field type for function tracepoint
event, so it can be properly recognized within filtering code.

Currently all fields of ftrace subsystem events share the common
field type FILTER_OTHER. Since the function trace fields need special
care within the filtering code we need to recognize it properly,
hence adding the FILTER_TRACE_FN event type.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h       |    1 +
 kernel/trace/trace_events_filter.c |    7 ++++++-
 kernel/trace/trace_export.c        |   25 ++++++++++++++++++++-----
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 2bf677c..dd478fc 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -245,6 +245,7 @@ enum {
 	FILTER_STATIC_STRING,
 	FILTER_DYN_STRING,
 	FILTER_PTR_STRING,
+	FILTER_TRACE_FN,
 };
 
 #define EVENT_STORAGE_SIZE 128
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index fdc6d22..7b0b04c 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -900,6 +900,11 @@ int filter_assign_type(const char *type)
 	return FILTER_OTHER;
 }
 
+static bool is_function_field(struct ftrace_event_field *field)
+{
+	return field->filter_type == FILTER_TRACE_FN;
+}
+
 static bool is_string_field(struct ftrace_event_field *field)
 {
 	return field->filter_type == FILTER_DYN_STRING ||
@@ -987,7 +992,7 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else {
+	} else if (!is_function_field(field)) {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 867653c..46c35e2 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -67,7 +67,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -77,7 +77,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -91,7 +91,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 		ret = trace_define_field(event_call, event_storage, #item, \
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 		mutex_unlock(&event_storage_mutex);			\
 		if (ret)						\
 			return ret;					\
@@ -104,7 +104,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -112,10 +112,24 @@ static void __always_unused ____ftrace_check_##name(void)	\
 #define __dynamic_array(type, item)					\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
-				 0, is_signed_type(type), FILTER_OTHER);\
+				 0, is_signed_type(type), filter_type);\
 	if (ret)							\
 		return ret;
 
+#define FILTER_TYPE_TRACE_FN           FILTER_TRACE_FN
+#define FILTER_TYPE_TRACE_GRAPH_ENT    FILTER_OTHER
+#define FILTER_TYPE_TRACE_GRAPH_RET    FILTER_OTHER
+#define FILTER_TYPE_TRACE_CTX          FILTER_OTHER
+#define FILTER_TYPE_TRACE_WAKE         FILTER_OTHER
+#define FILTER_TYPE_TRACE_STACK                FILTER_OTHER
+#define FILTER_TYPE_TRACE_USER_STACK   FILTER_OTHER
+#define FILTER_TYPE_TRACE_BPRINT       FILTER_OTHER
+#define FILTER_TYPE_TRACE_PRINT                FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_RW      FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_MAP     FILTER_OTHER
+#define FILTER_TYPE_TRACE_BRANCH       FILTER_OTHER
+#define FILTER_TYPE(arg)               FILTER_TYPE_##arg
+
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 int									\
@@ -123,6 +137,7 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 {									\
 	struct struct_name field;					\
 	int ret;							\
+	int filter_type = FILTER_TYPE(id);				\
 									\
 	tstruct;							\
 									\
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCHv2 09/10] ftrace, perf: Add filter support for function trace event
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
                     ` (7 preceding siblings ...)
  2011-12-05 17:22   ` [PATCHv2 08/10] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
@ 2011-12-05 17:22   ` Jiri Olsa
  2011-12-05 17:22   ` [PATCHv2 10/10] ftrace, graph: Add global_ops filter callback for graph tracing Jiri Olsa
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-05 17:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

  "ip == f1 f2 ..." || "ip != f3 f4 ..." ...

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space. Same syntax
is supported/required as for the set_ftrace_filter and
set_ftrace_notrace files.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/trace.h               |    2 -
 kernel/trace/trace_events_filter.c |  113 +++++++++++++++++++++++++++++++++---
 2 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4199916..9b7a004 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -769,9 +769,7 @@ struct filter_pred {
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	struct ftrace_event_field *field;
-#endif
 	int 			offset;
 	int 			not;
 	int 			op;
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 7b0b04c..479b40b 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -54,6 +54,13 @@ struct filter_op {
 	int precedence;
 };
 
+static struct filter_op filter_ftrace_ops[] = {
+	{ OP_OR,	"||",		1 },
+	{ OP_NE,	"!=",		2 },
+	{ OP_EQ,	"==",		2 },
+	{ OP_NONE,	"OP_NONE",	0 },
+};
+
 static struct filter_op filter_ops[] = {
 	{ OP_OR,	"||",		1 },
 	{ OP_AND,	"&&",		2 },
@@ -81,6 +88,7 @@ enum {
 	FILT_ERR_TOO_MANY_PREDS,
 	FILT_ERR_MISSING_FIELD,
 	FILT_ERR_INVALID_FILTER,
+	FILT_ERR_IP_FIELD_ONLY,
 };
 
 static char *err_text[] = {
@@ -96,6 +104,7 @@ static char *err_text[] = {
 	"Too many terms in predicate expression",
 	"Missing field name and/or value",
 	"Meaningless filter expression",
+	"Only 'ip' field is supported for function trace",
 };
 
 struct opstack_op {
@@ -992,7 +1001,12 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else if (!is_function_field(field)) {
+	} else if (is_function_field(field)) {
+		if (strcmp(field->name, "ip")) {
+			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
+			return -EINVAL;
+		}
+	} else {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
@@ -1339,10 +1353,8 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
 
 	strcpy(pred.regex.pattern, operand2);
 	pred.regex.len = strlen(pred.regex.pattern);
-
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	pred.field = field;
-#endif
+
 	return init_pred(ps, field, &pred) ? NULL : &pred;
 }
 
@@ -1894,6 +1906,83 @@ void ftrace_profile_free_filter(struct perf_event *event)
 	__free_filter(filter);
 }
 
+struct function_filter_data {
+	struct ftrace_ops *ops;
+	int first_filter;
+	int first_notrace;
+};
+
+static int __ftrace_function_set_filter(int filter, char *buf, int len,
+					struct function_filter_data *data)
+{
+	int *reset, ret;
+
+	reset = filter ? &data->first_filter : &data->first_notrace;
+
+	if (filter)
+		ret = ftrace_set_filter(data->ops, buf, len, *reset);
+	else
+		ret = ftrace_set_notrace(data->ops, buf, len, *reset);
+
+	if (*reset)
+		*reset = 0;
+
+	return ret;
+}
+
+static int ftrace_function_check_pred(struct filter_pred *pred)
+{
+	struct ftrace_event_field *field = pred->field;
+
+	/*
+	  Check the predicate for function trace, verify:
+	   - only '==' and '!=' is used
+	   - the 'ip' field is used
+	*/
+	if (WARN((pred->op != OP_EQ) && (pred->op != OP_NE),
+		 "wrong operator for function filter: %d\n", pred->op))
+		return -EINVAL;
+
+	if (strcmp(field->name, "ip"))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int ftrace_function_set_filter_cb(enum move_type move,
+					 struct filter_pred *pred,
+					 int *err, void *data)
+{
+	if ((move != MOVE_DOWN) ||
+	    (pred->left != FILTER_PRED_INVALID))
+		return WALK_PRED_DEFAULT;
+
+	/* Double checking the predicate is valid for function trace. */
+	*err = ftrace_function_check_pred(pred);
+	if (*err)
+		return WALK_PRED_ABORT;
+
+	*err = __ftrace_function_set_filter(pred->op == OP_EQ,
+					    pred->regex.pattern,
+					    pred->regex.len,
+					    data);
+
+	return (*err) ? WALK_PRED_ABORT : WALK_PRED_DEFAULT;
+}
+
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	struct function_filter_data data = {
+		.first_filter  = 1,
+		.first_notrace = 1,
+		.ops           = &event->ftrace_ops,
+	};
+
+	return walk_pred_tree(filter->preds, filter->root,
+			      ftrace_function_set_filter_cb, &data);
+}
+
 int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 			      char *filter_str)
 {
@@ -1901,6 +1990,7 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	struct event_filter *filter;
 	struct filter_parse_state *ps;
 	struct ftrace_event_call *call;
+	struct filter_op *fops = filter_ops;
 
 	mutex_lock(&event_mutex);
 
@@ -1925,14 +2015,21 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	if (!ps)
 		goto free_filter;
 
-	parse_init(ps, filter_ops, filter_str);
+	if (ftrace_event_is_function(call))
+		fops = filter_ftrace_ops;
+
+	parse_init(ps, fops, filter_str);
 	err = filter_parse(ps);
 	if (err)
 		goto free_ps;
 
 	err = replace_preds(call, filter, ps, filter_str, false);
-	if (!err)
-		event->filter = filter;
+	if (!err) {
+		if (ftrace_event_is_function(call))
+			err = ftrace_function_set_filter(event, filter);
+		else
+			event->filter = filter;
+	}
 
 free_ps:
 	filter_opstack_clear(ps);
@@ -1940,7 +2037,7 @@ free_ps:
 	kfree(ps);
 
 free_filter:
-	if (err)
+	if (err || ftrace_event_is_function(call))
 		__free_filter(filter);
 
 out_unlock:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCHv2 10/10] ftrace, graph: Add global_ops filter callback for graph tracing
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
                     ` (8 preceding siblings ...)
  2011-12-05 17:22   ` [PATCHv2 09/10] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2011-12-05 17:22   ` Jiri Olsa
  2011-12-19 19:27     ` Steven Rostedt
  2011-12-19 13:40   ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
  2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
  11 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-05 17:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

The function graph tracer should depend on the global_ops filter,
and process only functions that pass the global_ops filter.

Currently the function graph tracer gets all the functions
enabled for tracing no matter what ftrace_ops enabled them.

Adding a hook for the graph entry callback, which ensures the
function is compared against the global_ops filter and bail
out of if it does not match.

This hook is enabled only if there's at least one non global
ftrace_ops registered.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/ftrace.c |   41 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 2dae0c7..dc49ba6 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -95,10 +95,13 @@ ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
 static struct ftrace_ops control_ops;
+static int non_global_ops_registered;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
 
+static void ftrace_graph_update_filter(void);
+
 /*
  * Traverse the ftrace_global_list, invoking all entries.  The reason that we
  * can use rcu_dereference_raw() is that elements removed from this list
@@ -330,6 +333,9 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
+	if (!(ops->flags & FTRACE_OPS_FL_GLOBAL))
+		non_global_ops_registered++;
+
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
 		add_ftrace_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
@@ -359,6 +365,9 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 	if (FTRACE_WARN_ON(ops == &global_ops))
 		return -EINVAL;
 
+	if (!(ops->flags & FTRACE_OPS_FL_GLOBAL))
+		non_global_ops_registered--;
+
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
 		ret = remove_ftrace_ops(&ftrace_global_list, &global_ops, ops);
 		if (!ret)
@@ -1695,6 +1704,8 @@ static int __ftrace_modify_code(void *data)
 	if (*command & FTRACE_UPDATE_TRACE_FUNC)
 		ftrace_update_ftrace_func(ftrace_trace_function);
 
+	ftrace_graph_update_filter();
+
 	if (*command & FTRACE_START_FUNC_RET)
 		ftrace_enable_ftrace_graph_caller();
 	else if (*command & FTRACE_STOP_FUNC_RET)
@@ -4339,4 +4350,34 @@ void ftrace_graph_stop(void)
 {
 	ftrace_stop();
 }
+
+static trace_func_graph_ent_t ftrace_graph_entry_saved;
+
+int ftrace_graph_entry_filter(struct ftrace_graph_ent *ent)
+{
+	if (ftrace_ops_test(&global_ops, ent->func))
+		return ftrace_graph_entry_saved(ent);
+
+	return 0;
+}
+
+static void ftrace_graph_update_filter(void)
+{
+	bool installed = (ftrace_graph_entry == ftrace_graph_entry_filter);
+
+	if (!ftrace_graph_active)
+		return;
+
+	if (!installed && non_global_ops_registered) {
+		ftrace_graph_entry_saved = ftrace_graph_entry;
+		ftrace_graph_entry = ftrace_graph_entry_filter;
+		return;
+	}
+
+	if (installed && !non_global_ops_registered)
+		ftrace_graph_entry = ftrace_graph_entry_saved;
+}
+
+#else
+static void ftrace_graph_update_filter(void) { }
 #endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [RFCv2] ftrace, perf: Adding support to use function trace
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
                     ` (9 preceding siblings ...)
  2011-12-05 17:22   ` [PATCHv2 10/10] ftrace, graph: Add global_ops filter callback for graph tracing Jiri Olsa
@ 2011-12-19 13:40   ` Jiri Olsa
  2011-12-19 16:45     ` Steven Rostedt
  2011-12-19 16:58     ` Frederic Weisbecker
  2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
  11 siblings, 2 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-19 13:40 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

hi,
any feedback?

thanks,
jirka

On Mon, Dec 05, 2011 at 06:22:46PM +0100, Jiri Olsa wrote:
> hi,
> here's another version of perf support for function trace
> with filter. The changeset is working and hopefully is not
> introducing more bugs.. ;) still testing though..
> 
> It's still marked RFC since I'm not sure there's better way to go
> with patches 2 and 10. Also not sure if the condition fixed by
> patch 7 was not intentional. The rest is fixed/updated version
> of the v1 changes.
> 
> attached patches:
>  01/10 ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update
>  02/10 ftrace: Change mcount call replacement logic
>  03/10 ftrace: Add enable/disable ftrace_ops control interface
>  04/10 ftrace, perf: Add open/close tracepoint perf registration actions
>  05/10 ftrace, perf: Add add/del tracepoint perf registration actions
>  06/10 ftrace, perf: Add support to use function tracepoint in perf
>  07/10 ftrace: Change filter/notrace set functions to return exit code
>  08/10 ftrace, perf: Distinguish ftrace function event field type
>  09/10 ftrace, perf: Add filter support for function trace event
>  10/10 ftrace, graph: Add global_ops filter callback for graph tracing
> 
> v2 changes:
>  01/10 - keeping the old fix instead of adding hash_has_contents func
>          I'll send separating patchset for this
>  02/10 - using different way to avoid the issue (3/9 in v1)
>  03/10 - using the way proposed by Steven for controling ftrace_ops
>          (4/9 in v1)
>  06/10 - added check ensuring the ftrace:function event could be used by
>          root only (7/9 in v1)
>  08/10 - added more description (8/9 in v1)
>  09/10 - changed '&&' operator to '||' which seems more suitable
>          in this case (9/9 in v1)
> 
> thanks for comments,
> jirka
> ---
>  include/linux/ftrace.h             |   18 +++-
>  include/linux/ftrace_event.h       |    9 +-
>  include/linux/perf_event.h         |    3 +
>  kernel/trace/ftrace.c              |  202 +++++++++++++++++++++++++++++------
>  kernel/trace/trace.h               |   11 ++-
>  kernel/trace/trace_event_perf.c    |  212 +++++++++++++++++++++++++++++-------
>  kernel/trace/trace_events.c        |   12 ++-
>  kernel/trace/trace_events_filter.c |  116 ++++++++++++++++++-
>  kernel/trace/trace_export.c        |   53 ++++++++-
>  kernel/trace/trace_kprobe.c        |    8 +-
>  kernel/trace/trace_syscalls.c      |   18 +++-
>  11 files changed, 562 insertions(+), 100 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [RFCv2] ftrace, perf: Adding support to use function trace
  2011-12-19 13:40   ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
@ 2011-12-19 16:45     ` Steven Rostedt
  2011-12-19 16:58     ` Frederic Weisbecker
  1 sibling, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-12-19 16:45 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, 2011-12-19 at 14:40 +0100, Jiri Olsa wrote:
> hi,
> any feedback?
> 

Ah sorry, I think I missed this version. Too many other things going on
at once.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [RFCv2] ftrace, perf: Adding support to use function trace
  2011-12-19 13:40   ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
  2011-12-19 16:45     ` Steven Rostedt
@ 2011-12-19 16:58     ` Frederic Weisbecker
  1 sibling, 0 replies; 186+ messages in thread
From: Frederic Weisbecker @ 2011-12-19 16:58 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, Dec 19, 2011 at 02:40:24PM +0100, Jiri Olsa wrote:
> hi,
> any feedback?

Sorry for having lost track on this, will have a look soon.

Thanks.

> 
> thanks,
> jirka
> 
> On Mon, Dec 05, 2011 at 06:22:46PM +0100, Jiri Olsa wrote:
> > hi,
> > here's another version of perf support for function trace
> > with filter. The changeset is working and hopefully is not
> > introducing more bugs.. ;) still testing though..
> > 
> > It's still marked RFC since I'm not sure there's better way to go
> > with patches 2 and 10. Also not sure if the condition fixed by
> > patch 7 was not intentional. The rest is fixed/updated version
> > of the v1 changes.
> > 
> > attached patches:
> >  01/10 ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update
> >  02/10 ftrace: Change mcount call replacement logic
> >  03/10 ftrace: Add enable/disable ftrace_ops control interface
> >  04/10 ftrace, perf: Add open/close tracepoint perf registration actions
> >  05/10 ftrace, perf: Add add/del tracepoint perf registration actions
> >  06/10 ftrace, perf: Add support to use function tracepoint in perf
> >  07/10 ftrace: Change filter/notrace set functions to return exit code
> >  08/10 ftrace, perf: Distinguish ftrace function event field type
> >  09/10 ftrace, perf: Add filter support for function trace event
> >  10/10 ftrace, graph: Add global_ops filter callback for graph tracing
> > 
> > v2 changes:
> >  01/10 - keeping the old fix instead of adding hash_has_contents func
> >          I'll send separating patchset for this
> >  02/10 - using different way to avoid the issue (3/9 in v1)
> >  03/10 - using the way proposed by Steven for controling ftrace_ops
> >          (4/9 in v1)
> >  06/10 - added check ensuring the ftrace:function event could be used by
> >          root only (7/9 in v1)
> >  08/10 - added more description (8/9 in v1)
> >  09/10 - changed '&&' operator to '||' which seems more suitable
> >          in this case (9/9 in v1)
> > 
> > thanks for comments,
> > jirka
> > ---
> >  include/linux/ftrace.h             |   18 +++-
> >  include/linux/ftrace_event.h       |    9 +-
> >  include/linux/perf_event.h         |    3 +
> >  kernel/trace/ftrace.c              |  202 +++++++++++++++++++++++++++++------
> >  kernel/trace/trace.h               |   11 ++-
> >  kernel/trace/trace_event_perf.c    |  212 +++++++++++++++++++++++++++++-------
> >  kernel/trace/trace_events.c        |   12 ++-
> >  kernel/trace/trace_events_filter.c |  116 ++++++++++++++++++-
> >  kernel/trace/trace_export.c        |   53 ++++++++-
> >  kernel/trace/trace_kprobe.c        |    8 +-
> >  kernel/trace/trace_syscalls.c      |   18 +++-
> >  11 files changed, 562 insertions(+), 100 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 02/10] ftrace: Change mcount call replacement logic
  2011-12-05 17:22   ` [PATCHv2 02/10] ftrace: Change mcount call replacement logic Jiri Olsa
@ 2011-12-19 19:03     ` Steven Rostedt
  2011-12-20 13:10       ` Jiri Olsa
  2011-12-20 19:39     ` Steven Rostedt
  2012-01-08  9:13     ` [tip:perf/core] ftrace: Fix unregister ftrace_ops accounting tip-bot for Jiri Olsa
  2 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-12-19 19:03 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:
> The current logic of updating calls is to do the mcount replacement
> only when ftrace_ops is being registered. When ftrace_ops is being
> unregistered then only in case it was the last registered ftrace_ops,
> all calls are disabled.
> 
> This is an issue when ftrace_ops without FTRACE_OPS_FL_GLOBAL flag
> is being unregistered, because all the functions stays enabled
> and thus inherrited by global_ops, like in following scenario:
> 
>   - set restricting global filter
>   - enable function trace
>   - register/unregister ftrace_ops with flags != FTRACE_OPS_FL_GLOBAL
>     and with no filter

I don't see this problem. I just changed stack_tracer to have its own
filter (I've been wanting to do that for a long time, so when I saw this
email, I decided it's a good time to implement it).

Here's what I did:

# echo schedule > set_ftrace_filter
# cat set_ftrace_filter
schedule
# cat enabled_functions
schedule (1)
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
# cat enabled_functions
do_one_initcall (1)
match_dev_by_uuid (1)
name_to_dev_t (1)
idle_notifier_unregister (1)
idle_notifier_register (1)
start_thread_common.constprop.6 (1)
enter_idle (1)
exit_idle (1)
cpu_idle (1)
__show_regs (1)
release_thread (1)
[...]
_cond_resched (1)
preempt_schedule_irq (1)
schedule (2)
io_schedule (1)
yield_to (1)
yield (1)

// note that schedule is (2)

# echo 0 > /proc/sys/kernel/stack_tracer_enabled
# cat enabled_functions
schedule (1)


> 
> Now the global_ops will get by all the functions regardless the
> global_ops filter. So we need all functions that where enabled via
> this ftrace_ops and are not part of global filter to be disabled.

The global functions are not at issue here. What do you see?

Maybe I fixed something as I'm using the latest tip/perf/core. Note, I
can send you the stack_tracer patch if you want to take a look at this
example. I need to clean it up too.

> 
> Note, currently if there are only global ftrace_ops registered,
> there's no filter hash check and the filter is represented only
> by enabled records.
> 
> Changing the ftrace_shutdown logic to ensure the replacement
> is called for each ftrace_ops being unregistered.
> 
> Also changing FTRACE_ENABLE_CALLS into FTRACE_UPDATE_CALLS
> calls which seems more suitable now.

I still think this patch is wrong. What's the problem you are seeing?

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 03/10] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-05 17:22   ` [PATCHv2 03/10] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2011-12-19 19:19     ` Steven Rostedt
  2011-12-19 19:35     ` Steven Rostedt
  1 sibling, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-12-19 19:19 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:
> Adding a way to temporarily enable/disable ftrace_ops. The change
> follows the same way as 'global' ftrace_ops are done.
> 
> Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
> which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
> flag. In addition new per cpu flag called 'disabled' is also added to
> ftrace_ops to provide the control information for each cpu.
> 
> When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
> set as disabled for all cpus.
> 
> The ftrace_control_list contains all the registered 'control' ftrace_ops.
> The control_ops provides function which iterates ftrace_control_list
> and does the check for 'disabled' flag on current cpu.
> 
> Adding 2 inline functions enable_ftrace_function/disable_ftrace_function,
> which enable/disable the ftrace_ops for current cpu.
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  include/linux/ftrace.h |   14 ++++++
>  kernel/trace/ftrace.c  |  115 +++++++++++++++++++++++++++++++++++++++++++-----
>  kernel/trace/trace.h   |    2 +
>  3 files changed, 120 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 26eafce..b223944 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -35,12 +35,14 @@ enum {
>  	FTRACE_OPS_FL_ENABLED		= 1 << 0,
>  	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
>  	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
> +	FTRACE_OPS_FL_CONTROL		= 1 << 3,
>  };
>  
>  struct ftrace_ops {
>  	ftrace_func_t			func;
>  	struct ftrace_ops		*next;
>  	unsigned long			flags;
> +	void __percpu			*disabled;
>  #ifdef CONFIG_DYNAMIC_FTRACE
>  	struct ftrace_hash		*notrace_hash;
>  	struct ftrace_hash		*filter_hash;
> @@ -97,6 +99,18 @@ int register_ftrace_function(struct ftrace_ops *ops);
>  int unregister_ftrace_function(struct ftrace_ops *ops);
>  void clear_ftrace_function(void);
>  
> +static inline void enable_ftrace_function(struct ftrace_ops *ops)
> +{
> +	atomic_t *disabled = this_cpu_ptr(ops->disabled);
> +	atomic_dec(disabled);
> +}
> +
> +static inline void disable_ftrace_function(struct ftrace_ops *ops)
> +{
> +	atomic_t *disabled = this_cpu_ptr(ops->disabled);
> +	atomic_inc(disabled);
> +}
> +
>  extern void ftrace_stub(unsigned long a0, unsigned long a1);
>  
>  #else /* !CONFIG_FUNCTION_TRACER */
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index b79ab33..c2fa233 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -87,12 +87,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
>  };
>  
>  static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
> +static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
>  static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
>  ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
>  static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
>  ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
>  ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
>  static struct ftrace_ops global_ops;
> +static struct ftrace_ops control_ops;
>  
>  static void
>  ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
> @@ -166,6 +168,38 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
>  }
>  #endif
>  
> +static void control_ops_disable_all(struct ftrace_ops *ops)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu)
> +		atomic_set(per_cpu_ptr(ops->disabled, cpu), 1);
> +}
> +
> +static int control_ops_alloc(struct ftrace_ops *ops)
> +{
> +	atomic_t *disabled;
> +
> +	disabled = alloc_percpu(atomic_t);
> +	if (!disabled)
> +		return -ENOMEM;
> +
> +	ops->disabled = disabled;
> +	control_ops_disable_all(ops);
> +	return 0;
> +}
> +
> +static void control_ops_free(struct ftrace_ops *ops)
> +{
> +	free_percpu(ops->disabled);
> +}
> +
> +static int control_ops_is_disabled(struct ftrace_ops *ops)
> +{
> +	atomic_t *disabled = this_cpu_ptr(ops->disabled);
> +	return atomic_read(disabled);
> +}
> +
>  static void update_global_ops(void)
>  {
>  	ftrace_func_t func;
> @@ -221,7 +255,7 @@ static void update_ftrace_function(void)
>  #endif
>  }
>  
> -static void add_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
> +static void __add_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)

Don't do the above (see below).

>  {
>  	ops->next = *list;
>  	/*
> @@ -233,7 +267,7 @@ static void add_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
>  	rcu_assign_pointer(*list, ops);
>  }
>  
> -static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
> +static int __remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
>  {
>  	struct ftrace_ops **p;
>  
> @@ -257,6 +291,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
>  	return 0;
>  }
>  
> +static void add_ftrace_ops(struct ftrace_ops **list,
> +			   struct ftrace_ops *main_ops,
> +			   struct ftrace_ops *ops)

Call this add_ftrace_list_ops or something (see below).

> +{
> +	int first = *list == &ftrace_list_end;
> +	__add_ftrace_ops(list, ops);
> +	if (first)
> +		__add_ftrace_ops(&ftrace_ops_list, main_ops);
> +}
> +
> +static int remove_ftrace_ops(struct ftrace_ops **list,
> +			      struct ftrace_ops *main_ops,
> +			      struct ftrace_ops *ops)
> +{
> +	int ret = __remove_ftrace_ops(list, ops);
> +	if (!ret && *list == &ftrace_list_end)
> +		ret = __remove_ftrace_ops(&ftrace_ops_list, main_ops);
> +	return ret;
> +}
> +
>  static int __register_ftrace_function(struct ftrace_ops *ops)
>  {
>  	if (ftrace_disabled)
> @@ -268,17 +322,23 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
>  	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
>  		return -EBUSY;
>  
> +#define FL_GLOBAL_CONTROL (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
> +	if ((ops->flags & FL_GLOBAL_CONTROL) == FL_GLOBAL_CONTROL)
> +		return -EINVAL;
> +#undef FL_GLOBAL_CONTROL

The above is ugly. Just define a FL_GLOBAL_CONTROL near the top or
something.

> +
>  	if (!core_kernel_data((unsigned long)ops))
>  		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
>  
>  	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
> -		int first = ftrace_global_list == &ftrace_list_end;
> -		add_ftrace_ops(&ftrace_global_list, ops);
> +		add_ftrace_ops(&ftrace_global_list, &global_ops, ops);
>  		ops->flags |= FTRACE_OPS_FL_ENABLED;
> -		if (first)
> -			add_ftrace_ops(&ftrace_ops_list, &global_ops);
> +	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
> +		if (control_ops_alloc(ops))
> +			return -ENOMEM;
> +		add_ftrace_ops(&ftrace_control_list, &control_ops, ops);
>  	} else
> -		add_ftrace_ops(&ftrace_ops_list, ops);
> +		__add_ftrace_ops(&ftrace_ops_list, ops);


Don't use the name __add_ftrace_ops() and add_ftrace_ops() it's
confusing and error prone. Keep the add_ftrace_ops() as the original,
and make your new function "add_ftrace_list_ops()" or something.


>  
>  	if (ftrace_enabled)
>  		update_ftrace_function();
> @@ -300,13 +360,16 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
>  		return -EINVAL;
>  
>  	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
> -		ret = remove_ftrace_ops(&ftrace_global_list, ops);
> -		if (!ret && ftrace_global_list == &ftrace_list_end)
> -			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
> +		ret = remove_ftrace_ops(&ftrace_global_list, &global_ops, ops);
>  		if (!ret)
>  			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
> +	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
> +		ret = remove_ftrace_ops(&ftrace_control_list,
> +					&control_ops, ops);
> +		if (!ret)

You need a synchronize_sched() here, otherwise you can have functions
accessing the ops->per_cpu variables. A process on another CPU could be
about to use the ops, and although you removed it from the list, it is
still being used on another CPU. If you free now without a
synchronize_sched(), the other CPU can be using the freed variable.

> +			control_ops_free(ops);
>  	} else
> -		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
> +		ret = __remove_ftrace_ops(&ftrace_ops_list, ops);

Same thing goes with the remove_ftrace_ops and __remove_ftrace_ops().


>  
>  	if (ret < 0)
>  		return ret;
> @@ -3562,6 +3625,36 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
>  #endif /* CONFIG_DYNAMIC_FTRACE */
>  
>  static void
> +ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
> +{
> +	struct ftrace_ops *op;
> +
> +	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
> +		return;
> +
> +	/*
> +	 * Some of the ops may be dynamically allocated,
> +	 * they must be freed after a synchronize_sched().
> +	 */
> +	preempt_disable_notrace();
> +	trace_recursion_set(TRACE_CONTROL_BIT);
> +	op = rcu_dereference_raw(ftrace_control_list);
> +	while (op != &ftrace_list_end) {
> +		if (!control_ops_is_disabled(op) &&
> +		    ftrace_ops_test(op, ip))
> +			op->func(ip, parent_ip);
> +
> +		op = rcu_dereference_raw(op->next);
> +	};
> +	preempt_enable_notrace();
> +	trace_recursion_clear(TRACE_CONTROL_BIT);

Shouldn't the above be reversed to match the preempt_disable() and
trace_recursion_set(). That is:

	preempt_disable();
	trace_recursion_set();
	[...]
	trace_recursion_clear();
	preempt_enable();

That would make in symmetric.


-- Steve

> +}
> +
> +static struct ftrace_ops control_ops = {
> +	.func = ftrace_ops_control_func,
> +};
> +
> +static void
>  ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
>  {
>  	struct ftrace_ops *op;
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index f8ec229..da05926 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -288,6 +288,8 @@ struct tracer {
>  /* for function tracing recursion */
>  #define TRACE_INTERNAL_BIT		(1<<11)
>  #define TRACE_GLOBAL_BIT		(1<<12)
> +#define TRACE_CONTROL_BIT		(1<<13)
> +
>  /*
>   * Abuse of the trace_recursion.
>   * As we need a way to maintain state if we are tracing the function



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 07/10] ftrace: Change filter/notrace set functions to return exit code
  2011-12-05 17:22   ` [PATCHv2 07/10] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2011-12-19 19:22     ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-12-19 19:22 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:
> Currently the ftrace_set_filter and ftrace_set_notrace functions
> do not return any return code. So there's no way for ftrace_ops
> user to tell wether the filter was correctly applied.
> 
> The set_ftrace_filter interface returns error in case the filter
> did not match:
> 
>   # echo krava > set_ftrace_filter
>   bash: echo: write error: Invalid argument
> 
> Changing both ftrace_set_filter and ftrace_set_notrace functions
> to return zero if the filter was applied correctly or -E* values
> in case of error.

This looks like a proper fix. Move this to the front of the patch set,
as this and patch 1 can go into ftrace now as fixes.

-- Steve

> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 10/10] ftrace, graph: Add global_ops filter callback for graph tracing
  2011-12-05 17:22   ` [PATCHv2 10/10] ftrace, graph: Add global_ops filter callback for graph tracing Jiri Olsa
@ 2011-12-19 19:27     ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-12-19 19:27 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:
> The function graph tracer should depend on the global_ops filter,
> and process only functions that pass the global_ops filter.
> 
> Currently the function graph tracer gets all the functions
> enabled for tracing no matter what ftrace_ops enabled them.
> 
> Adding a hook for the graph entry callback, which ensures the
> function is compared against the global_ops filter and bail
> out of if it does not match.
> 
> This hook is enabled only if there's at least one non global
> ftrace_ops registered.


Sorry, I don't like this fix. Right now just let function graph tracer
act weird. If we put in this workaround, it would let us be lazy and not
work on function graph for a proper fix.

The function graph code needs a overhaul anyway. A proper fix may
require fixes in the arch code where the assembly is, as well as a bit
of rewriting of the original code.

This is on my todo list, although it is a bit low priority now. With
this patch set coming in, I can up the priority on the real fix. So
please remove this patch, but keep pinging me to fix it for real ;)

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 03/10] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-05 17:22   ` [PATCHv2 03/10] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
  2011-12-19 19:19     ` Steven Rostedt
@ 2011-12-19 19:35     ` Steven Rostedt
  2011-12-20 14:57       ` Jiri Olsa
  1 sibling, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-12-19 19:35 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel,
	aarapov, Christoph Lameter, Thomas Gleixner

On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:
>  
> +static inline void enable_ftrace_function(struct ftrace_ops *ops)
> +{
> +	atomic_t *disabled = this_cpu_ptr(ops->disabled);
> +	atomic_dec(disabled);
> +}
> +
> +static inline void disable_ftrace_function(struct ftrace_ops *ops)
> +{
> +	atomic_t *disabled = this_cpu_ptr(ops->disabled);
> +	atomic_inc(disabled);
> +}
> +

The above should be renamed to ftrace_function_enable/disable(), and
they should pass in the cpu. There may be a case we want to disable
ftrace functions on another CPU.

Not to mention, this is the perfect example of "this_cpu_ptr" being used
incorrectly. It's not made for this purpose, and again the naming of
"this_cpu" totally confuses other kernel developers. We need to change
this name as it was agreed to at Kernel Summit.

If the above is called with preemption enabled, it will not do what is
expected. We could disable function tracing on one CPU and then
re-enable it for another CPU even though it is already enabled.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 02/10] ftrace: Change mcount call replacement logic
  2011-12-19 19:03     ` Steven Rostedt
@ 2011-12-20 13:10       ` Jiri Olsa
  2011-12-20 16:33         ` Steven Rostedt
  0 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-20 13:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, Dec 19, 2011 at 02:03:00PM -0500, Steven Rostedt wrote:
> On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:
> > The current logic of updating calls is to do the mcount replacement
> > only when ftrace_ops is being registered. When ftrace_ops is being
> > unregistered then only in case it was the last registered ftrace_ops,
> > all calls are disabled.
> > 
> > This is an issue when ftrace_ops without FTRACE_OPS_FL_GLOBAL flag
> > is being unregistered, because all the functions stays enabled
> > and thus inherrited by global_ops, like in following scenario:
> > 
> >   - set restricting global filter
> >   - enable function trace
> >   - register/unregister ftrace_ops with flags != FTRACE_OPS_FL_GLOBAL
> >     and with no filter
> 
> I don't see this problem. I just changed stack_tracer to have its own
> filter (I've been wanting to do that for a long time, so when I saw this
> email, I decided it's a good time to implement it).
> 
> Here's what I did:
> 
> # echo schedule > set_ftrace_filter
> # cat set_ftrace_filter
> schedule
> # cat enabled_functions
> schedule (1)
> # echo 1 > /proc/sys/kernel/stack_tracer_enabled
> # cat enabled_functions
> do_one_initcall (1)
> match_dev_by_uuid (1)
> name_to_dev_t (1)
> idle_notifier_unregister (1)
> idle_notifier_register (1)
> start_thread_common.constprop.6 (1)
> enter_idle (1)
> exit_idle (1)
> cpu_idle (1)
> __show_regs (1)
> release_thread (1)
> [...]
> _cond_resched (1)
> preempt_schedule_irq (1)
> schedule (2)
> io_schedule (1)
> yield_to (1)
> yield (1)
> 
> // note that schedule is (2)
> 
> # echo 0 > /proc/sys/kernel/stack_tracer_enabled
> # cat enabled_functions
> schedule (1)
> 
> 
> > 
> > Now the global_ops will get by all the functions regardless the
> > global_ops filter. So we need all functions that where enabled via
> > this ftrace_ops and are not part of global filter to be disabled.
> 
> The global functions are not at issue here. What do you see?
> 
> Maybe I fixed something as I'm using the latest tip/perf/core. Note, I
> can send you the stack_tracer patch if you want to take a look at this
> example. I need to clean it up too.

that would be great, thanks

> 
> > 
> > Note, currently if there are only global ftrace_ops registered,
> > there's no filter hash check and the filter is represented only
> > by enabled records.
> > 
> > Changing the ftrace_shutdown logic to ensure the replacement
> > is called for each ftrace_ops being unregistered.
> > 
> > Also changing FTRACE_ENABLE_CALLS into FTRACE_UPDATE_CALLS
> > calls which seems more suitable now.
> 
> I still think this patch is wrong. What's the problem you are seeing?

let me try with an example..

say we have only 2 traceable functions - A and B ;)

1) set global filter for function A with 'echo A > ./set_ftrace_filter'
	a - A is put to the global_ops filter

2) enable function trace with 'echo function > current_tracer'
	a - register_ftrace_function is called with function trace ftrace_ops (GLOBAL flag)
	b - update_ftrace_function is called, setting ftrace_ops callback function
	    to be called directly from the assembly entry_* code
	c - ftrace_hash_rec_enable is called, and dyn_ftrace record
	    for function A is updated:
		A::flags|FL_MASK = 1
	d - ftrace_replace_code(1) is called, and function A is
	    brought in to life

3) enable function trace via perf ftrace_ops
	a - register_ftrace_function is called with perf event ftrace_ops (!GLOBAL flag)
	b - update_ftrace_function is called, setting ftrace_ops_list_func
	    function to be called from the assembly entry_* code and
	    handle the ftrace_ops' dispatch
	c - ftrace_hash_rec_enable is called, and A and B dyn_ftrace records
	    are updated:
		A::flags|FL_MASK = 2
		B::flags|FL_MASK = 1
	d - ftrace_replace_code(1) is called, and function B is
	    brought in to life

4) disable function trace via perf ftrace_ops
	a - unregister_ftrace_function is called with perf event ftrace_ops (same as in step 3)
	b - update_ftrace_function is called, setting global ftrace_ops (from step 2)
	    callback function to be called directly from the assembly entry_* code
	c - ftrace_hash_rec_disable is called, and A and B dyn_ftrace
	    records are updated:
		A::flags|FL_MASK = 1
		B::flags|FL_MASK = 0
	d - ??? see below..

Now, only the global function trace ftrace_ops is enabled (from step 2),
but both A and B are alive and feeding the tracer despite its filter (function A),
because its ftrace_ops is directly linked to the assembly entry_* code (step 4b).

The reason is that even though we updated the B's dyn_ftrace record (step 4c)
to be 'B::flags|FL_MASK == 0', we did not update the B call itself and disable it.

If we'd call ftrace_replace_code(1) at 4d), the B function would be
disabled, and we would get expected behaviour.

So thats the reason I think we should update the calls (mcount call code)
each time we unregister the ftrace_ops, because some records could be
enabled only for the specific ftrace_ops and we need to put them down
when we disable this ftrace_ops.


hopefully it make any sense.. :)

thanks,
jirka

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 03/10] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-19 19:35     ` Steven Rostedt
@ 2011-12-20 14:57       ` Jiri Olsa
  2011-12-20 15:25         ` Steven Rostedt
  0 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-20 14:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel,
	aarapov, Christoph Lameter, Thomas Gleixner

On Mon, Dec 19, 2011 at 02:35:35PM -0500, Steven Rostedt wrote:
> On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:
> >  
> > +static inline void enable_ftrace_function(struct ftrace_ops *ops)
> > +{
> > +	atomic_t *disabled = this_cpu_ptr(ops->disabled);
> > +	atomic_dec(disabled);
> > +}
> > +
> > +static inline void disable_ftrace_function(struct ftrace_ops *ops)
> > +{
> > +	atomic_t *disabled = this_cpu_ptr(ops->disabled);
> > +	atomic_inc(disabled);
> > +}
> > +
> 
> The above should be renamed to ftrace_function_enable/disable(), and
> they should pass in the cpu. There may be a case we want to disable
> ftrace functions on another CPU.
> 
> Not to mention, this is the perfect example of "this_cpu_ptr" being used
> incorrectly. It's not made for this purpose, and again the naming of
> "this_cpu" totally confuses other kernel developers. We need to change
> this name as it was agreed to at Kernel Summit.

ok, will make the changes

> 
> If the above is called with preemption enabled, it will not do what is
> expected. We could disable function tracing on one CPU and then
> re-enable it for another CPU even though it is already enabled.

It is only called inside perf reg callback within the
schedule function where the preemption is disabled.

The ftrace_function_enable is called when task is scheduled in
on respective cpu. Likewise the ftrace_function_disable is called 
when task is scheduled out on respective cpu.

thanks,
jirka

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 03/10] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-20 14:57       ` Jiri Olsa
@ 2011-12-20 15:25         ` Steven Rostedt
  2011-12-20 15:35           ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-12-20 15:25 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel,
	aarapov, Christoph Lameter, Thomas Gleixner

On Tue, 2011-12-20 at 15:57 +0100, Jiri Olsa wrote:

> > 
> > If the above is called with preemption enabled, it will not do what is
> > expected. We could disable function tracing on one CPU and then
> > re-enable it for another CPU even though it is already enabled.
> 
> It is only called inside perf reg callback within the
> schedule function where the preemption is disabled.
> 
> The ftrace_function_enable is called when task is scheduled in
> on respective cpu. Likewise the ftrace_function_disable is called 
> when task is scheduled out on respective cpu.

Yes I know how you use it, but this is an open API. It may be currently
only used by perf today, but that doesn't mean that it wont be used by
others. There's no documentation on how to use it. I don't look at this
and say, "oh this is used by perf, we only need to worry about how perf
uses it". That doesn't scale. It needs to be documented on how to use
it, and if it requires preemption disabled when calling it, it should
definitely be stated that fact and it may need a
WARN_ON(preempt_count()) or something.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 03/10] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-20 15:25         ` Steven Rostedt
@ 2011-12-20 15:35           ` Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-20 15:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel,
	aarapov, Christoph Lameter, Thomas Gleixner

On Tue, Dec 20, 2011 at 10:25:53AM -0500, Steven Rostedt wrote:
> On Tue, 2011-12-20 at 15:57 +0100, Jiri Olsa wrote:
> 
> > > 
> > > If the above is called with preemption enabled, it will not do what is
> > > expected. We could disable function tracing on one CPU and then
> > > re-enable it for another CPU even though it is already enabled.
> > 
> > It is only called inside perf reg callback within the
> > schedule function where the preemption is disabled.
> > 
> > The ftrace_function_enable is called when task is scheduled in
> > on respective cpu. Likewise the ftrace_function_disable is called 
> > when task is scheduled out on respective cpu.
> 
> Yes I know how you use it, but this is an open API. It may be currently
> only used by perf today, but that doesn't mean that it wont be used by
> others. There's no documentation on how to use it. I don't look at this
> and say, "oh this is used by perf, we only need to worry about how perf
> uses it". That doesn't scale. It needs to be documented on how to use
> it, and if it requires preemption disabled when calling it, it should
> definitely be stated that fact and it may need a
> WARN_ON(preempt_count()) or something.

I've already added

        if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)))
                return;

will add similar for preemption and make comments for both functions

thanks,
jirka

> 
> -- Steve
> 
> 

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 02/10] ftrace: Change mcount call replacement logic
  2011-12-20 13:10       ` Jiri Olsa
@ 2011-12-20 16:33         ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-12-20 16:33 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Tue, 2011-12-20 at 14:10 +0100, Jiri Olsa wrote:

> let me try with an example..
> 
> say we have only 2 traceable functions - A and B ;)
> 
> 1) set global filter for function A with 'echo A > ./set_ftrace_filter'
> 	a - A is put to the global_ops filter
> 
> 2) enable function trace with 'echo function > current_tracer'
> 	a - register_ftrace_function is called with function trace ftrace_ops (GLOBAL flag)
> 	b - update_ftrace_function is called, setting ftrace_ops callback function
> 	    to be called directly from the assembly entry_* code
> 	c - ftrace_hash_rec_enable is called, and dyn_ftrace record
> 	    for function A is updated:
> 		A::flags|FL_MASK = 1
> 	d - ftrace_replace_code(1) is called, and function A is
> 	    brought in to life
> 
> 3) enable function trace via perf ftrace_ops
> 	a - register_ftrace_function is called with perf event ftrace_ops (!GLOBAL flag)
> 	b - update_ftrace_function is called, setting ftrace_ops_list_func
> 	    function to be called from the assembly entry_* code and
> 	    handle the ftrace_ops' dispatch
> 	c - ftrace_hash_rec_enable is called, and A and B dyn_ftrace records
> 	    are updated:
> 		A::flags|FL_MASK = 2
> 		B::flags|FL_MASK = 1
> 	d - ftrace_replace_code(1) is called, and function B is
> 	    brought in to life
> 
> 4) disable function trace via perf ftrace_ops
> 	a - unregister_ftrace_function is called with perf event ftrace_ops (same as in step 3)
> 	b - update_ftrace_function is called, setting global ftrace_ops (from step 2)
> 	    callback function to be called directly from the assembly entry_* code
> 	c - ftrace_hash_rec_disable is called, and A and B dyn_ftrace
> 	    records are updated:
> 		A::flags|FL_MASK = 1
> 		B::flags|FL_MASK = 0
> 	d - ??? see below..
> 
> Now, only the global function trace ftrace_ops is enabled (from step 2),
> but both A and B are alive and feeding the tracer despite its filter (function A),
> because its ftrace_ops is directly linked to the assembly entry_* code (step 4b).
> 
> The reason is that even though we updated the B's dyn_ftrace record (step 4c)
> to be 'B::flags|FL_MASK == 0', we did not update the B call itself and disable it.
> 
> If we'd call ftrace_replace_code(1) at 4d), the B function would be
> disabled, and we would get expected behaviour.
> 
> So thats the reason I think we should update the calls (mcount call code)
> each time we unregister the ftrace_ops, because some records could be
> enabled only for the specific ftrace_ops and we need to put them down
> when we disable this ftrace_ops.
> 
> 
> hopefully it make any sense.. :)

Ah, OK, I am able to trigger this. I see what you mean. Yeah, now that
we have a counter, it should enable/disable based on the counter. And
the DISABLE is still needed for just the "disable regardless"
 (/proc/sys/kernel/ftrace_enabled = 0).

I'll apply your patch and see how it works. Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 02/10] ftrace: Change mcount call replacement logic
  2011-12-05 17:22   ` [PATCHv2 02/10] ftrace: Change mcount call replacement logic Jiri Olsa
  2011-12-19 19:03     ` Steven Rostedt
@ 2011-12-20 19:39     ` Steven Rostedt
  2011-12-21  9:57       ` Jiri Olsa
  2012-01-08  9:13     ` [tip:perf/core] ftrace: Fix unregister ftrace_ops accounting tip-bot for Jiri Olsa
  2 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-12-20 19:39 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:
> The current logic of updating calls is to do the mcount replacement
> only when ftrace_ops is being registered. When ftrace_ops is being
> unregistered then only in case it was the last registered ftrace_ops,
> all calls are disabled.
> 
> This is an issue when ftrace_ops without FTRACE_OPS_FL_GLOBAL flag

Actually it has nothing to do with the global flag. The bug is just two
different ftrace_ops. As I can make the problem go either way (by
disabling the global_tracer I screwed up the stack_tracer). I was also
able to screw up the stack tracer with the function_profile_ops that
does not have the flag set either.

I was able to trigger this bug with just using the new stack_trace ops
and the function profiling ops. Neither of them have the global flag
set.


> is being unregistered, because all the functions stays enabled
> and thus inherrited by global_ops, like in following scenario:
> 
>   - set restricting global filter
>   - enable function trace
>   - register/unregister ftrace_ops with flags != FTRACE_OPS_FL_GLOBAL
>     and with no filter

The real problem is:

	- set restricting filter on one ftrace_ops
	- enable function tracing for that ftrace_ops
	- register/unregister another ftrace_ops with no filter
	- Now the first ftrace_ops has all the functions of the last

> 
> Now the global_ops will get by all the functions regardless the
> global_ops filter. So we need all functions that where enabled via
> this ftrace_ops and are not part of global filter to be disabled.
> 
> Note, currently if there are only global ftrace_ops registered,
> there's no filter hash check and the filter is represented only
> by enabled records.

This isn't totally true either, as function_profile_call has the same
issue. This bug does exist in mainline today. If you unconfigure
function graph tracing, then the function profiling uses function tracer
instead of function graph tracer to profile.

Here, I just reproduced it:

 # cd /debug/tracing/
 # echo schedule > set_ftrace_filter
 # echo 1 > function_profile_enabled
 # echo 0 > function_profile_enabled
 # cat set_ftrace_filter
schedule
 # cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 87869/87869   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
     kworker/1:1-49    [001] d..2    13.258661: __rcu_read_unlock <-__schedule
     kworker/1:1-49    [001] d..2    13.258661: _raw_spin_lock <-__schedule
     kworker/1:1-49    [001] d..2    13.258661: add_preempt_count <-_raw_spin_lock
     kworker/1:1-49    [001] d..3    13.258662: put_prev_task_fair <-__schedule
     kworker/1:1-49    [001] d..3    13.258662: pick_next_task_fair <-pick_next_task
     kworker/1:1-49    [001] d..3    13.258662: pick_next_task_stop <-pick_next_task
     kworker/1:1-49    [001] d..3    13.258663: pick_next_task_rt <-pick_next_task
     kworker/1:1-49    [001] d..3    13.258663: pick_next_task_fair <-pick_next_task
     kworker/1:1-49    [001] d..3    13.258663: pick_next_task_idle <-pick_next_task
     kworker/1:1-49    [001] d..3    13.258664: sched_switch: prev_comm=kworker/1:1 prev_pid=49 prev_prio=120 prev_state=S ==> next_comm=kworker/0:0 next_pid=0 next_prio=120
     kworker/1:1-49    [001] d..3    13.258664: __switch_to <-__schedule
          <idle>-0     [001] ...1    13.258665: __schedule <-schedule
          <idle>-0     [001] ...1    13.258665: add_preempt_count <-__schedule
          <idle>-0     [001] ...2    13.258665: rcu_note_context_switch <-__schedule
          <idle>-0     [001] ...2    13.258665: rcu_utilization: Start context switch

I'll put this to the front of my patch queue and also add it to urgent.
I'll also rewrite your change log to remove the reference to the global
ops.

-- Steve

> 
> Changing the ftrace_shutdown logic to ensure the replacement
> is called for each ftrace_ops being unregistered.
> 
> Also changing FTRACE_ENABLE_CALLS into FTRACE_UPDATE_CALLS
> calls which seems more suitable now.
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 02/10] ftrace: Change mcount call replacement logic
  2011-12-20 19:39     ` Steven Rostedt
@ 2011-12-21  9:57       ` Jiri Olsa
  2011-12-21 11:34         ` Steven Rostedt
  0 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21  9:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Tue, Dec 20, 2011 at 02:39:09PM -0500, Steven Rostedt wrote:
> On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:

SNIP

>           <idle>-0     [001] ...1    13.258665: __schedule <-schedule
>           <idle>-0     [001] ...1    13.258665: add_preempt_count <-__schedule
>           <idle>-0     [001] ...2    13.258665: rcu_note_context_switch <-__schedule
>           <idle>-0     [001] ...2    13.258665: rcu_utilization: Start context switch
> 
> I'll put this to the front of my patch queue and also add it to urgent.
> I'll also rewrite your change log to remove the reference to the global

ok, I'll send new version without this one

thanks,
jirka

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 02/10] ftrace: Change mcount call replacement logic
  2011-12-21  9:57       ` Jiri Olsa
@ 2011-12-21 11:34         ` Steven Rostedt
  2011-12-21 11:35           ` Steven Rostedt
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-12-21 11:34 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, 2011-12-21 at 10:57 +0100, Jiri Olsa wrote:
> On Tue, Dec 20, 2011 at 02:39:09PM -0500, Steven Rostedt wrote:
> > On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:
> 
> SNIP
> 
> >           <idle>-0     [001] ...1    13.258665: __schedule <-schedule
> >           <idle>-0     [001] ...1    13.258665: add_preempt_count <-__schedule
> >           <idle>-0     [001] ...2    13.258665: rcu_note_context_switch <-__schedule
> >           <idle>-0     [001] ...2    13.258665: rcu_utilization: Start context switch
> > 
> > I'll put this to the front of my patch queue and also add it to urgent.
> > I'll also rewrite your change log to remove the reference to the global
> 
> ok, I'll send new version without this one

No need, I've already queued it and will be the first patch in the
patchset I send out soon.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 02/10] ftrace: Change mcount call replacement logic
  2011-12-21 11:34         ` Steven Rostedt
@ 2011-12-21 11:35           ` Steven Rostedt
  2011-12-21 11:40             ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-12-21 11:35 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, 2011-12-21 at 06:34 -0500, Steven Rostedt wrote:
> On Wed, 2011-12-21 at 10:57 +0100, Jiri Olsa wrote:
> > On Tue, Dec 20, 2011 at 02:39:09PM -0500, Steven Rostedt wrote:
> > > On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:
> > 
> > SNIP
> > 
> > >           <idle>-0     [001] ...1    13.258665: __schedule <-schedule
> > >           <idle>-0     [001] ...1    13.258665: add_preempt_count <-__schedule
> > >           <idle>-0     [001] ...2    13.258665: rcu_note_context_switch <-__schedule
> > >           <idle>-0     [001] ...2    13.258665: rcu_utilization: Start context switch
> > > 
> > > I'll put this to the front of my patch queue and also add it to urgent.
> > > I'll also rewrite your change log to remove the reference to the global
> > 
> > ok, I'll send new version without this one
> 
> No need, I've already queued it and will be the first patch in the
> patchset I send out soon.
> 

Or do you mean you'll send out a new patch series without this patch.

/me should not send email out when he first wakes up.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv2 02/10] ftrace: Change mcount call replacement logic
  2011-12-21 11:35           ` Steven Rostedt
@ 2011-12-21 11:40             ` Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 11:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, Dec 21, 2011 at 06:35:54AM -0500, Steven Rostedt wrote:
> On Wed, 2011-12-21 at 06:34 -0500, Steven Rostedt wrote:
> > On Wed, 2011-12-21 at 10:57 +0100, Jiri Olsa wrote:
> > > On Tue, Dec 20, 2011 at 02:39:09PM -0500, Steven Rostedt wrote:
> > > > On Mon, 2011-12-05 at 18:22 +0100, Jiri Olsa wrote:
> > > 
> > > SNIP
> > > 
> > > >           <idle>-0     [001] ...1    13.258665: __schedule <-schedule
> > > >           <idle>-0     [001] ...1    13.258665: add_preempt_count <-__schedule
> > > >           <idle>-0     [001] ...2    13.258665: rcu_note_context_switch <-__schedule
> > > >           <idle>-0     [001] ...2    13.258665: rcu_utilization: Start context switch
> > > > 
> > > > I'll put this to the front of my patch queue and also add it to urgent.
> > > > I'll also rewrite your change log to remove the reference to the global
> > > 
> > > ok, I'll send new version without this one
> > 
> > No need, I've already queued it and will be the first patch in the
> > patchset I send out soon.
> > 
> 
> Or do you mean you'll send out a new patch series without this patch.

yep :)

> 
> /me should not send email out when he first wakes up.
> 
> -- Steve
> 
> 

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCHv3 0/8] ftrace, perf: Adding support to use function trace
  2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
                     ` (10 preceding siblings ...)
  2011-12-19 13:40   ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
@ 2011-12-21 11:48   ` Jiri Olsa
  2011-12-21 11:48     ` [PATCH 1/8] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
                       ` (8 more replies)
  11 siblings, 9 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 11:48 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

hi,
here's another version of perf support for function trace
with filter. 

attached patches:
  1/8 ftrace: Change filter/notrace set functions to return exit code
  2/8 ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update
  3/8 ftrace: Add enable/disable ftrace_ops control interface
  4/8 ftrace, perf: Add open/close tracepoint perf registration actions
  5/8 ftrace, perf: Add add/del tracepoint perf registration actions
  6/8 ftrace, perf: Add support to use function tracepoint in perf
  7/8 ftrace, perf: Distinguish ftrace function event field type
  8/8 ftrace, perf: Add filter support for function trace event

v3 changes:
  3/8 - renamed __add/remove_ftrace_ops
      - fixed preemtp_enable/recursion_clear order in ftrace_ops_control_func 
      - renamed/commented API functions -  enable/disable_ftrace_function
  
  ommited graph tracer workarounf patch 10/10  

v2 changes:
 01/10 - keeping the old fix instead of adding hash_has_contents func
         I'll send separating patchset for this
 02/10 - using different way to avoid the issue (3/9 in v1)
 03/10 - using the way proposed by Steven for controling ftrace_ops
         (4/9 in v1)
 06/10 - added check ensuring the ftrace:function event could be used by
         root only (7/9 in v1)
 08/10 - added more description (8/9 in v1)
 09/10 - changed '&&' operator to '||' which seems more suitable
         in this case (9/9 in v1)

thanks,
jirka
---
 include/linux/ftrace.h             |   46 ++++++++-
 include/linux/ftrace_event.h       |    9 +-
 include/linux/perf_event.h         |    3 +
 kernel/trace/ftrace.c              |  135 ++++++++++++++++++++---
 kernel/trace/trace.h               |   11 ++-
 kernel/trace/trace_event_perf.c    |  212 +++++++++++++++++++++++++++++-------
 kernel/trace/trace_events.c        |   12 ++-
 kernel/trace/trace_events_filter.c |  116 ++++++++++++++++++-
 kernel/trace/trace_export.c        |   53 ++++++++-
 kernel/trace/trace_kprobe.c        |    8 +-
 kernel/trace/trace_syscalls.c      |   18 +++-
 11 files changed, 541 insertions(+), 82 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCH 1/8] ftrace: Change filter/notrace set functions to return exit code
  2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
@ 2011-12-21 11:48     ` Jiri Olsa
  2011-12-21 11:48     ` [PATCH 2/8] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 11:48 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.

The set_ftrace_filter interface returns error in case the filter
did not match:

  # echo krava > set_ftrace_filter
  bash: echo: write error: Invalid argument

Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |    4 ++--
 kernel/trace/ftrace.c  |   16 ++++++++++------
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 26eafce..523640f 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -180,9 +180,9 @@ struct dyn_ftrace {
 };
 
 int ftrace_force_update(void);
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset);
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 25b4f4d..09007c0 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2911,8 +2911,11 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	mutex_lock(&ftrace_regex_lock);
 	if (reset)
 		ftrace_filter_reset(hash);
-	if (buf)
-		ftrace_match_records(hash, buf, len);
+	if (buf &&
+	    !ftrace_match_records(hash, buf, len)) {
+		ret = -EINVAL;
+		goto out_regex_unlock;
+	}
 
 	mutex_lock(&ftrace_lock);
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
@@ -2922,6 +2925,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 
 	mutex_unlock(&ftrace_lock);
 
+ out_regex_unlock:
 	mutex_unlock(&ftrace_regex_lock);
 
 	free_ftrace_hash(hash);
@@ -2938,10 +2942,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
  * Filters denote which functions should be enabled when tracing is enabled.
  * If @buf is NULL and reset is set, all functions will be enabled for tracing.
  */
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 1);
+	return ftrace_set_regex(ops, buf, len, reset, 1);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_filter);
 
@@ -2956,10 +2960,10 @@ EXPORT_SYMBOL_GPL(ftrace_set_filter);
  * is enabled. If @buf is NULL and reset is set, all functions will be enabled
  * for tracing.
  */
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 0);
+	return ftrace_set_regex(ops, buf, len, reset, 0);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_notrace);
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 2/8] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update
  2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
  2011-12-21 11:48     ` [PATCH 1/8] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2011-12-21 11:48     ` Jiri Olsa
  2011-12-21 15:23       ` Steven Rostedt
  2011-12-21 11:48     ` [PATCH 3/8] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
                       ` (6 subsequent siblings)
  8 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 11:48 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

We need to check the existence of the other_hash before
we touch its count variable.

This issue is hit only when non global ftrace_ops is used.
The global ftrace_ops is initialized with empty hashes.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/ftrace.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 09007c0..7eb702f 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1372,7 +1372,8 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
 			if (filter_hash && in_hash && !in_other_hash)
 				match = 1;
 			else if (!filter_hash && in_hash &&
-				 (in_other_hash || !other_hash->count))
+				 (in_other_hash ||
+				  !other_hash || !other_hash->count))
 				match = 1;
 		}
 		if (!match)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 3/8] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
  2011-12-21 11:48     ` [PATCH 1/8] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
  2011-12-21 11:48     ` [PATCH 2/8] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
@ 2011-12-21 11:48     ` Jiri Olsa
  2011-12-21 16:01       ` Steven Rostedt
  2011-12-21 11:48     ` [PATCH 4/8] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
                       ` (5 subsequent siblings)
  8 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 11:48 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.

Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.

When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.

The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.

Adding 2 inline functions ftrace_function_enable/ftrace_function_disable,
which enable/disable the ftrace_ops for given cpu.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |   42 +++++++++++++++++
 kernel/trace/ftrace.c  |  116 +++++++++++++++++++++++++++++++++++++++++++++---
 kernel/trace/trace.h   |    2 +
 3 files changed, 153 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 523640f..67b8236 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -35,12 +35,14 @@ enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	void __percpu			*disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +99,46 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+/**
+ * ftrace_function_enable - enable controlled ftrace_ops on given cpu
+ *
+ * This function enables tracing on given cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
+{
+	atomic_t *disabled;
+
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)) ||
+			 !preempt_count())
+		return;
+
+	disabled = per_cpu_ptr(ops->disabled, cpu);
+	atomic_dec(disabled);
+}
+
+/**
+ * ftrace_function_disable - enable controlled ftrace_ops on given cpu
+ *
+ * This function enables tracing on given cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_disable(struct ftrace_ops *ops, int cpu)
+{
+	atomic_t *disabled;
+
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)) ||
+			 !preempt_count())
+		return;
+
+	disabled = per_cpu_ptr(ops->disabled, cpu);
+	atomic_inc(disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7eb702f..1b56013 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -60,6 +60,8 @@
 #define FTRACE_HASH_DEFAULT_BITS 10
 #define FTRACE_HASH_MAX_BITS 12
 
+#define FL_GLOBAL_CONTROL (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
+
 /* ftrace_enabled is a method to turn ftrace on or off */
 int ftrace_enabled __read_mostly;
 static int last_ftrace_enabled;
@@ -87,12 +89,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
 };
 
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
 ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
+static struct ftrace_ops control_ops;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
@@ -166,6 +170,38 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
 }
 #endif
 
+static void control_ops_disable_all(struct ftrace_ops *ops)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		atomic_set(per_cpu_ptr(ops->disabled, cpu), 1);
+}
+
+static int control_ops_alloc(struct ftrace_ops *ops)
+{
+	atomic_t *disabled;
+
+	disabled = alloc_percpu(atomic_t);
+	if (!disabled)
+		return -ENOMEM;
+
+	ops->disabled = disabled;
+	control_ops_disable_all(ops);
+	return 0;
+}
+
+static void control_ops_free(struct ftrace_ops *ops)
+{
+	free_percpu(ops->disabled);
+}
+
+static int control_ops_is_disabled(struct ftrace_ops *ops)
+{
+	atomic_t *disabled = this_cpu_ptr(ops->disabled);
+	return atomic_read(disabled);
+}
+
 static void update_global_ops(void)
 {
 	ftrace_func_t func;
@@ -257,6 +293,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	return 0;
 }
 
+static void add_ftrace_list_ops(struct ftrace_ops **list,
+				struct ftrace_ops *main_ops,
+				struct ftrace_ops *ops)
+{
+	int first = *list == &ftrace_list_end;
+	add_ftrace_ops(list, ops);
+	if (first)
+		add_ftrace_ops(&ftrace_ops_list, main_ops);
+}
+
+static int remove_ftrace_list_ops(struct ftrace_ops **list,
+				  struct ftrace_ops *main_ops,
+				  struct ftrace_ops *ops)
+{
+	int ret = remove_ftrace_ops(list, ops);
+	if (!ret && *list == &ftrace_list_end)
+		ret = remove_ftrace_ops(&ftrace_ops_list, main_ops);
+	return ret;
+}
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ftrace_disabled)
@@ -268,15 +324,19 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
 		return -EBUSY;
 
+	if ((ops->flags & FL_GLOBAL_CONTROL) == FL_GLOBAL_CONTROL)
+		return -EINVAL;
+
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		int first = ftrace_global_list == &ftrace_list_end;
-		add_ftrace_ops(&ftrace_global_list, ops);
+		add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
-		if (first)
-			add_ftrace_ops(&ftrace_ops_list, &global_ops);
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		if (control_ops_alloc(ops))
+			return -ENOMEM;
+		add_ftrace_list_ops(&ftrace_control_list, &control_ops, ops);
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -300,11 +360,23 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 		return -EINVAL;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		ret = remove_ftrace_ops(&ftrace_global_list, ops);
-		if (!ret && ftrace_global_list == &ftrace_list_end)
-			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+		ret = remove_ftrace_list_ops(&ftrace_global_list,
+					     &global_ops, ops);
 		if (!ret)
 			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		ret = remove_ftrace_list_ops(&ftrace_control_list,
+					     &control_ops, ops);
+		if (!ret) {
+			/*
+			 * The ftrace_ops is now removed from the list,
+			 * so there'll be no new users. We must ensure
+			 * all current users are done before we free
+			 * the control data.
+			 */
+			synchronize_sched();
+			control_ops_free(ops);
+		}
 	} else
 		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -3566,6 +3638,36 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
+ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_ops *op;
+
+	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
+		return;
+
+	/*
+	 * Some of the ops may be dynamically allocated,
+	 * they must be freed after a synchronize_sched().
+	 */
+	preempt_disable_notrace();
+	trace_recursion_set(TRACE_CONTROL_BIT);
+	op = rcu_dereference_raw(ftrace_control_list);
+	while (op != &ftrace_list_end) {
+		if (!control_ops_is_disabled(op) &&
+		    ftrace_ops_test(op, ip))
+			op->func(ip, parent_ip);
+
+		op = rcu_dereference_raw(op->next);
+	};
+	trace_recursion_clear(TRACE_CONTROL_BIT);
+	preempt_enable_notrace();
+}
+
+static struct ftrace_ops control_ops = {
+	.func = ftrace_ops_control_func,
+};
+
+static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 {
 	struct ftrace_ops *op;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 2c26574..41c54e3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ struct tracer {
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
 #define TRACE_GLOBAL_BIT		(1<<12)
+#define TRACE_CONTROL_BIT		(1<<13)
+
 /*
  * Abuse of the trace_recursion.
  * As we need a way to maintain state if we are tracing the function
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 4/8] ftrace, perf: Add open/close tracepoint perf registration actions
  2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
                       ` (2 preceding siblings ...)
  2011-12-21 11:48     ` [PATCH 3/8] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2011-12-21 11:48     ` Jiri Olsa
  2011-12-21 11:48     ` [PATCH 5/8] ftrace, perf: Add add/del " Jiri Olsa
                       ` (4 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 11:48 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
register/unregister from open/close actions.

The register/unregister actions are invoked for the first/last
tracepoint user when opening/closing the evet.

The open/close actions are invoked for each tracepoint user when
opening/closing the event.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    6 +-
 kernel/trace/trace.h            |    5 ++
 kernel/trace/trace_event_perf.c |  116 +++++++++++++++++++++++++--------------
 kernel/trace/trace_events.c     |   10 ++-
 kernel/trace/trace_kprobe.c     |    6 ++-
 kernel/trace/trace_syscalls.c   |   14 +++-
 6 files changed, 106 insertions(+), 51 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index c3da42d..195e360 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -146,6 +146,8 @@ enum trace_reg {
 	TRACE_REG_UNREGISTER,
 	TRACE_REG_PERF_REGISTER,
 	TRACE_REG_PERF_UNREGISTER,
+	TRACE_REG_PERF_OPEN,
+	TRACE_REG_PERF_CLOSE,
 };
 
 struct ftrace_event_call;
@@ -157,7 +159,7 @@ struct ftrace_event_class {
 	void			*perf_probe;
 #endif
 	int			(*reg)(struct ftrace_event_call *event,
-				       enum trace_reg type);
+				       enum trace_reg type, void *data);
 	int			(*define_fields)(struct ftrace_event_call *);
 	struct list_head	*(*get_fields)(struct ftrace_event_call *);
 	struct list_head	fields;
@@ -165,7 +167,7 @@ struct ftrace_event_class {
 };
 
 extern int ftrace_event_reg(struct ftrace_event_call *event,
-			    enum trace_reg type);
+			    enum trace_reg type, void *data);
 
 enum {
 	TRACE_EVENT_FL_ENABLED_BIT,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 41c54e3..85732a8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -828,4 +828,9 @@ extern const char *__stop___trace_bprintk_fmt[];
 	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 #include "trace_entries.h"
 
+#ifdef CONFIG_PERF_EVENTS
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data);
+#endif
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 19a359d..0cfcc37 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -44,23 +44,17 @@ static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 	return 0;
 }
 
-static int perf_trace_event_init(struct ftrace_event_call *tp_event,
-				 struct perf_event *p_event)
+static int perf_trace_event_reg(struct ftrace_event_call *tp_event,
+				struct perf_event *p_event)
 {
 	struct hlist_head __percpu *list;
-	int ret;
+	int ret = -ENOMEM;
 	int cpu;
 
-	ret = perf_trace_event_perm(tp_event, p_event);
-	if (ret)
-		return ret;
-
 	p_event->tp_event = tp_event;
 	if (tp_event->perf_refcount++ > 0)
 		return 0;
 
-	ret = -ENOMEM;
-
 	list = alloc_percpu(struct hlist_head);
 	if (!list)
 		goto fail;
@@ -83,7 +77,7 @@ static int perf_trace_event_init(struct ftrace_event_call *tp_event,
 		}
 	}
 
-	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER);
+	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER, NULL);
 	if (ret)
 		goto fail;
 
@@ -108,6 +102,69 @@ fail:
 	return ret;
 }
 
+static void perf_trace_event_unreg(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	int i;
+
+	if (--tp_event->perf_refcount > 0)
+		goto out;
+
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER, NULL);
+
+	/*
+	 * Ensure our callback won't be called anymore. The buffers
+	 * will be freed after that.
+	 */
+	tracepoint_synchronize_unregister();
+
+	free_percpu(tp_event->perf_events);
+	tp_event->perf_events = NULL;
+
+	if (!--total_ref_count) {
+		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
+			free_percpu(perf_trace_buf[i]);
+			perf_trace_buf[i] = NULL;
+		}
+	}
+out:
+	module_put(tp_event->mod);
+}
+
+static int perf_trace_event_open(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_OPEN, p_event);
+}
+
+static void perf_trace_event_close(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
+}
+
+static int perf_trace_event_init(struct ftrace_event_call *tp_event,
+				 struct perf_event *p_event)
+{
+	int ret;
+
+	ret = perf_trace_event_perm(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_reg(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_open(p_event);
+	if (ret) {
+		perf_trace_event_unreg(p_event);
+		return ret;
+	}
+
+	return 0;
+}
+
 int perf_trace_init(struct perf_event *p_event)
 {
 	struct ftrace_event_call *tp_event;
@@ -130,6 +187,14 @@ int perf_trace_init(struct perf_event *p_event)
 	return ret;
 }
 
+void perf_trace_destroy(struct perf_event *p_event)
+{
+	mutex_lock(&event_mutex);
+	perf_trace_event_close(p_event);
+	perf_trace_event_unreg(p_event);
+	mutex_unlock(&event_mutex);
+}
+
 int perf_trace_add(struct perf_event *p_event, int flags)
 {
 	struct ftrace_event_call *tp_event = p_event->tp_event;
@@ -154,37 +219,6 @@ void perf_trace_del(struct perf_event *p_event, int flags)
 	hlist_del_rcu(&p_event->hlist_entry);
 }
 
-void perf_trace_destroy(struct perf_event *p_event)
-{
-	struct ftrace_event_call *tp_event = p_event->tp_event;
-	int i;
-
-	mutex_lock(&event_mutex);
-	if (--tp_event->perf_refcount > 0)
-		goto out;
-
-	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER);
-
-	/*
-	 * Ensure our callback won't be called anymore. The buffers
-	 * will be freed after that.
-	 */
-	tracepoint_synchronize_unregister();
-
-	free_percpu(tp_event->perf_events);
-	tp_event->perf_events = NULL;
-
-	if (!--total_ref_count) {
-		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
-			free_percpu(perf_trace_buf[i]);
-			perf_trace_buf[i] = NULL;
-		}
-	}
-out:
-	module_put(tp_event->mod);
-	mutex_unlock(&event_mutex);
-}
-
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 				       struct pt_regs *regs, int *rctxp)
 {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c212a7f..5138fea 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -147,7 +147,8 @@ int trace_event_raw_init(struct ftrace_event_call *call)
 }
 EXPORT_SYMBOL_GPL(trace_event_raw_init);
 
-int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
+int ftrace_event_reg(struct ftrace_event_call *call,
+		     enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -170,6 +171,9 @@ int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
 					    call->class->perf_probe,
 					    call);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
@@ -209,7 +213,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_stop_cmdline_record();
 				call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			call->class->reg(call, TRACE_REG_UNREGISTER);
+			call->class->reg(call, TRACE_REG_UNREGISTER, NULL);
 		}
 		break;
 	case 1:
@@ -218,7 +222,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_start_cmdline_record();
 				call->flags |= TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			ret = call->class->reg(call, TRACE_REG_REGISTER);
+			ret = call->class->reg(call, TRACE_REG_REGISTER, NULL);
 			if (ret) {
 				tracing_stop_cmdline_record();
 				pr_info("event trace: Could not enable event "
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 00d527c..5667f89 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1892,7 +1892,8 @@ static __kprobes void kretprobe_perf_func(struct kretprobe_instance *ri,
 #endif	/* CONFIG_PERF_EVENTS */
 
 static __kprobes
-int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
+int kprobe_register(struct ftrace_event_call *event,
+		    enum trace_reg type, void *data)
 {
 	struct trace_probe *tp = (struct trace_probe *)event->data;
 
@@ -1909,6 +1910,9 @@ int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
 	case TRACE_REG_PERF_UNREGISTER:
 		disable_trace_probe(tp, TP_FLAG_PROFILE);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 5f35f6f..8599c1d 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -18,9 +18,9 @@ static DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
 static DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 
 static int syscall_enter_define_fields(struct ftrace_event_call *call);
 static int syscall_exit_define_fields(struct ftrace_event_call *call);
@@ -650,7 +650,7 @@ void perf_sysexit_disable(struct ftrace_event_call *call)
 #endif /* CONFIG_PERF_EVENTS */
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -665,13 +665,16 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysenter_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
 }
 
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -686,6 +689,9 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysexit_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 5/8] ftrace, perf: Add add/del tracepoint perf registration actions
  2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
                       ` (3 preceding siblings ...)
  2011-12-21 11:48     ` [PATCH 4/8] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
@ 2011-12-21 11:48     ` Jiri Olsa
  2011-12-21 11:48     ` [PATCH 6/8] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 11:48 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
perf event schedule in/out actions.

The add action is invoked for when the perf event is scheduled in,
while the del action is invoked when the event is scheduled out.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    2 ++
 kernel/trace/trace_event_perf.c |    4 +++-
 kernel/trace/trace_events.c     |    2 ++
 kernel/trace/trace_kprobe.c     |    2 ++
 kernel/trace/trace_syscalls.c   |    4 ++++
 5 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 195e360..2bf677c 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -148,6 +148,8 @@ enum trace_reg {
 	TRACE_REG_PERF_UNREGISTER,
 	TRACE_REG_PERF_OPEN,
 	TRACE_REG_PERF_CLOSE,
+	TRACE_REG_PERF_ADD,
+	TRACE_REG_PERF_DEL,
 };
 
 struct ftrace_event_call;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 0cfcc37..d72af0b 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -211,12 +211,14 @@ int perf_trace_add(struct perf_event *p_event, int flags)
 	list = this_cpu_ptr(pcpu_list);
 	hlist_add_head_rcu(&p_event->hlist_entry, list);
 
-	return 0;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_ADD, p_event);
 }
 
 void perf_trace_del(struct perf_event *p_event, int flags)
 {
+	struct ftrace_event_call *tp_event = p_event->tp_event;
 	hlist_del_rcu(&p_event->hlist_entry);
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_DEL, p_event);
 }
 
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 5138fea..079a93a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -173,6 +173,8 @@ int ftrace_event_reg(struct ftrace_event_call *call,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5667f89..580a05e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1912,6 +1912,8 @@ int kprobe_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 8599c1d..5e4f62e 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -667,6 +667,8 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
@@ -691,6 +693,8 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 6/8] ftrace, perf: Add support to use function tracepoint in perf
  2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
                       ` (4 preceding siblings ...)
  2011-12-21 11:48     ` [PATCH 5/8] ftrace, perf: Add add/del " Jiri Olsa
@ 2011-12-21 11:48     ` Jiri Olsa
  2011-12-21 11:48     ` [PATCH 7/8] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 11:48 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding perf registration support for the ftrace function event,
so it is now possible to register it via perf interface.

The perf_event struct statically contains ftrace_ops as a handle
for function tracer. The function tracer is registered/unregistered
in open/close actions, and enabled/disabled in add/del actions.

It is now possible to use function trace within perf commands
like:

  perf record -e ftrace:function ls
  perf stat -e ftrace:function ls

Allowed only for root.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h      |    3 +
 kernel/trace/trace.h            |    2 +
 kernel/trace/trace_event_perf.c |   92 +++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_export.c     |   28 ++++++++++++
 4 files changed, 125 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 564769c..47a9df6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -848,6 +848,9 @@ struct perf_event {
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
 	struct event_filter		*filter;
+#ifdef CONFIG_FUNCTION_TRACER
+	struct ftrace_ops               ftrace_ops;
+#endif
 #endif
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 85732a8..e88e58a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -591,6 +591,8 @@ static inline int ftrace_trace_task(struct task_struct *task)
 static inline int ftrace_is_dead(void) { return 0; }
 #endif
 
+int ftrace_event_is_function(struct ftrace_event_call *call);
+
 /*
  * struct trace_parser - servers for reading the user input separated by spaces
  * @cont: set if the input is not complete - no final space char was found
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index d72af0b..57eb232 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -24,6 +24,11 @@ static int	total_ref_count;
 static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 				 struct perf_event *p_event)
 {
+	/* The ftrace function trace is allowed only for root. */
+	if (ftrace_event_is_function(tp_event) &&
+	    perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	/* No tracing, just counting, so no obvious leak */
 	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
 		return 0;
@@ -250,3 +255,90 @@ __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 	return raw_data;
 }
 EXPORT_SYMBOL_GPL(perf_trace_buf_prepare);
+
+
+static void
+perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_entry *entry;
+	struct hlist_head *head;
+	struct pt_regs regs;
+	int rctx;
+
+#define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
+		    sizeof(u64)) - sizeof(u32))
+
+	BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE);
+
+	perf_fetch_caller_regs(&regs);
+
+	entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);
+	if (!entry)
+		return;
+
+	entry->ip = ip;
+	entry->parent_ip = parent_ip;
+
+	head = this_cpu_ptr(event_function.perf_events);
+	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
+			      1, &regs, head);
+
+#undef ENTRY_SIZE
+}
+
+static int perf_ftrace_function_register(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+
+	ops->flags |= FTRACE_OPS_FL_CONTROL;
+	ops->func = perf_ftrace_function_call;
+	return register_ftrace_function(ops);
+}
+
+static int perf_ftrace_function_unregister(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	return unregister_ftrace_function(ops);
+}
+
+static void perf_ftrace_function_enable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	ftrace_function_enable(ops, smp_processor_id());
+}
+
+static void perf_ftrace_function_disable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	ftrace_function_disable(ops, smp_processor_id());
+}
+
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data)
+{
+	int etype = call->event.type;
+
+	if (etype != TRACE_FN)
+		return -EINVAL;
+
+	switch (type) {
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+		return perf_ftrace_function_register(data);
+	case TRACE_REG_PERF_CLOSE:
+		return perf_ftrace_function_unregister(data);
+	case TRACE_REG_PERF_ADD:
+		perf_ftrace_function_enable(data);
+		return 0;
+	case TRACE_REG_PERF_DEL:
+		perf_ftrace_function_disable(data);
+		return 0;
+	}
+
+	return -EINVAL;
+}
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index bbeec31..867653c 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 
 #include "trace_entries.h"
 
+static int ftrace_event_class_register(struct ftrace_event_call *call,
+				       enum trace_reg type, void *data)
+{
+	switch (type) {
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
+#ifdef CONFIG_PERF_EVENTS
+		return perf_ftrace_event_register(call, type, data);
+#endif
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	}
+
+	return -EINVAL;
+}
+
 #undef __entry
 #define __entry REC
 
@@ -159,6 +181,7 @@ struct ftrace_event_class event_class_ftrace_##call = {			\
 	.system			= __stringify(TRACE_SYSTEM),		\
 	.define_fields		= ftrace_define_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
+	.reg			= ftrace_event_class_register,		\
 };									\
 									\
 struct ftrace_event_call __used event_##call = {			\
@@ -170,4 +193,9 @@ struct ftrace_event_call __used event_##call = {			\
 struct ftrace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
+int ftrace_event_is_function(struct ftrace_event_call *call)
+{
+	return call == &event_function;
+}
+
 #include "trace_entries.h"
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 7/8] ftrace, perf: Distinguish ftrace function event field type
  2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
                       ` (5 preceding siblings ...)
  2011-12-21 11:48     ` [PATCH 6/8] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
@ 2011-12-21 11:48     ` Jiri Olsa
  2011-12-21 11:48     ` [PATCH 8/8] ftrace, perf: Add filter support for function trace event Jiri Olsa
  2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 11:48 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding FILTER_TRACE_FN event field type for function tracepoint
event, so it can be properly recognized within filtering code.

Currently all fields of ftrace subsystem events share the common
field type FILTER_OTHER. Since the function trace fields need special
care within the filtering code we need to recognize it properly,
hence adding the FILTER_TRACE_FN event type.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h       |    1 +
 kernel/trace/trace_events_filter.c |    7 ++++++-
 kernel/trace/trace_export.c        |   25 ++++++++++++++++++++-----
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 2bf677c..dd478fc 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -245,6 +245,7 @@ enum {
 	FILTER_STATIC_STRING,
 	FILTER_DYN_STRING,
 	FILTER_PTR_STRING,
+	FILTER_TRACE_FN,
 };
 
 #define EVENT_STORAGE_SIZE 128
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index f04cc31..66b74ab 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -900,6 +900,11 @@ int filter_assign_type(const char *type)
 	return FILTER_OTHER;
 }
 
+static bool is_function_field(struct ftrace_event_field *field)
+{
+	return field->filter_type == FILTER_TRACE_FN;
+}
+
 static bool is_string_field(struct ftrace_event_field *field)
 {
 	return field->filter_type == FILTER_DYN_STRING ||
@@ -987,7 +992,7 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else {
+	} else if (!is_function_field(field)) {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 867653c..46c35e2 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -67,7 +67,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -77,7 +77,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -91,7 +91,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 		ret = trace_define_field(event_call, event_storage, #item, \
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 		mutex_unlock(&event_storage_mutex);			\
 		if (ret)						\
 			return ret;					\
@@ -104,7 +104,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -112,10 +112,24 @@ static void __always_unused ____ftrace_check_##name(void)	\
 #define __dynamic_array(type, item)					\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
-				 0, is_signed_type(type), FILTER_OTHER);\
+				 0, is_signed_type(type), filter_type);\
 	if (ret)							\
 		return ret;
 
+#define FILTER_TYPE_TRACE_FN           FILTER_TRACE_FN
+#define FILTER_TYPE_TRACE_GRAPH_ENT    FILTER_OTHER
+#define FILTER_TYPE_TRACE_GRAPH_RET    FILTER_OTHER
+#define FILTER_TYPE_TRACE_CTX          FILTER_OTHER
+#define FILTER_TYPE_TRACE_WAKE         FILTER_OTHER
+#define FILTER_TYPE_TRACE_STACK                FILTER_OTHER
+#define FILTER_TYPE_TRACE_USER_STACK   FILTER_OTHER
+#define FILTER_TYPE_TRACE_BPRINT       FILTER_OTHER
+#define FILTER_TYPE_TRACE_PRINT                FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_RW      FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_MAP     FILTER_OTHER
+#define FILTER_TYPE_TRACE_BRANCH       FILTER_OTHER
+#define FILTER_TYPE(arg)               FILTER_TYPE_##arg
+
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 int									\
@@ -123,6 +137,7 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 {									\
 	struct struct_name field;					\
 	int ret;							\
+	int filter_type = FILTER_TYPE(id);				\
 									\
 	tstruct;							\
 									\
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 8/8] ftrace, perf: Add filter support for function trace event
  2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
                       ` (6 preceding siblings ...)
  2011-12-21 11:48     ` [PATCH 7/8] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
@ 2011-12-21 11:48     ` Jiri Olsa
  2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 11:48 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

  "ip == f1 f2 ..." || "ip != f3 f4 ..." ...

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space. Same syntax
is supported/required as for the set_ftrace_filter and
set_ftrace_notrace files.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/trace.h               |    2 -
 kernel/trace/trace_events_filter.c |  113 +++++++++++++++++++++++++++++++++---
 2 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e88e58a..4ec6d18 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -770,9 +770,7 @@ struct filter_pred {
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	struct ftrace_event_field *field;
-#endif
 	int 			offset;
 	int 			not;
 	int 			op;
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 66b74ab..600bb1e 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -54,6 +54,13 @@ struct filter_op {
 	int precedence;
 };
 
+static struct filter_op filter_ftrace_ops[] = {
+	{ OP_OR,	"||",		1 },
+	{ OP_NE,	"!=",		2 },
+	{ OP_EQ,	"==",		2 },
+	{ OP_NONE,	"OP_NONE",	0 },
+};
+
 static struct filter_op filter_ops[] = {
 	{ OP_OR,	"||",		1 },
 	{ OP_AND,	"&&",		2 },
@@ -81,6 +88,7 @@ enum {
 	FILT_ERR_TOO_MANY_PREDS,
 	FILT_ERR_MISSING_FIELD,
 	FILT_ERR_INVALID_FILTER,
+	FILT_ERR_IP_FIELD_ONLY,
 };
 
 static char *err_text[] = {
@@ -96,6 +104,7 @@ static char *err_text[] = {
 	"Too many terms in predicate expression",
 	"Missing field name and/or value",
 	"Meaningless filter expression",
+	"Only 'ip' field is supported for function trace",
 };
 
 struct opstack_op {
@@ -992,7 +1001,12 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else if (!is_function_field(field)) {
+	} else if (is_function_field(field)) {
+		if (strcmp(field->name, "ip")) {
+			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
+			return -EINVAL;
+		}
+	} else {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
@@ -1339,10 +1353,8 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
 
 	strcpy(pred.regex.pattern, operand2);
 	pred.regex.len = strlen(pred.regex.pattern);
-
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	pred.field = field;
-#endif
+
 	return init_pred(ps, field, &pred) ? NULL : &pred;
 }
 
@@ -1894,6 +1906,83 @@ void ftrace_profile_free_filter(struct perf_event *event)
 	__free_filter(filter);
 }
 
+struct function_filter_data {
+	struct ftrace_ops *ops;
+	int first_filter;
+	int first_notrace;
+};
+
+static int __ftrace_function_set_filter(int filter, char *buf, int len,
+					struct function_filter_data *data)
+{
+	int *reset, ret;
+
+	reset = filter ? &data->first_filter : &data->first_notrace;
+
+	if (filter)
+		ret = ftrace_set_filter(data->ops, buf, len, *reset);
+	else
+		ret = ftrace_set_notrace(data->ops, buf, len, *reset);
+
+	if (*reset)
+		*reset = 0;
+
+	return ret;
+}
+
+static int ftrace_function_check_pred(struct filter_pred *pred)
+{
+	struct ftrace_event_field *field = pred->field;
+
+	/*
+	  Check the predicate for function trace, verify:
+	   - only '==' and '!=' is used
+	   - the 'ip' field is used
+	*/
+	if (WARN((pred->op != OP_EQ) && (pred->op != OP_NE),
+		 "wrong operator for function filter: %d\n", pred->op))
+		return -EINVAL;
+
+	if (strcmp(field->name, "ip"))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int ftrace_function_set_filter_cb(enum move_type move,
+					 struct filter_pred *pred,
+					 int *err, void *data)
+{
+	if ((move != MOVE_DOWN) ||
+	    (pred->left != FILTER_PRED_INVALID))
+		return WALK_PRED_DEFAULT;
+
+	/* Double checking the predicate is valid for function trace. */
+	*err = ftrace_function_check_pred(pred);
+	if (*err)
+		return WALK_PRED_ABORT;
+
+	*err = __ftrace_function_set_filter(pred->op == OP_EQ,
+					    pred->regex.pattern,
+					    pred->regex.len,
+					    data);
+
+	return (*err) ? WALK_PRED_ABORT : WALK_PRED_DEFAULT;
+}
+
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	struct function_filter_data data = {
+		.first_filter  = 1,
+		.first_notrace = 1,
+		.ops           = &event->ftrace_ops,
+	};
+
+	return walk_pred_tree(filter->preds, filter->root,
+			      ftrace_function_set_filter_cb, &data);
+}
+
 int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 			      char *filter_str)
 {
@@ -1901,6 +1990,7 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	struct event_filter *filter;
 	struct filter_parse_state *ps;
 	struct ftrace_event_call *call;
+	struct filter_op *fops = filter_ops;
 
 	mutex_lock(&event_mutex);
 
@@ -1925,14 +2015,21 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	if (!ps)
 		goto free_filter;
 
-	parse_init(ps, filter_ops, filter_str);
+	if (ftrace_event_is_function(call))
+		fops = filter_ftrace_ops;
+
+	parse_init(ps, fops, filter_str);
 	err = filter_parse(ps);
 	if (err)
 		goto free_ps;
 
 	err = replace_preds(call, filter, ps, filter_str, false);
-	if (!err)
-		event->filter = filter;
+	if (!err) {
+		if (ftrace_event_is_function(call))
+			err = ftrace_function_set_filter(event, filter);
+		else
+			event->filter = filter;
+	}
 
 free_ps:
 	filter_opstack_clear(ps);
@@ -1940,7 +2037,7 @@ free_ps:
 	kfree(ps);
 
 free_filter:
-	if (err)
+	if (err || ftrace_event_is_function(call))
 		__free_filter(filter);
 
 out_unlock:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/8] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update
  2011-12-21 11:48     ` [PATCH 2/8] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
@ 2011-12-21 15:23       ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-12-21 15:23 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, 2011-12-21 at 12:48 +0100, Jiri Olsa wrote:
> We need to check the existence of the other_hash before
> we touch its count variable.
> 
> This issue is hit only when non global ftrace_ops is used.
> The global ftrace_ops is initialized with empty hashes.
> 

As this wasn't the first time this bug showed up, I fixed this by this
patch:

https://lkml.org/lkml/2011/12/21/197

So you can remove this patch as well.

-- Steve

> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  kernel/trace/ftrace.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 09007c0..7eb702f 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1372,7 +1372,8 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
>  			if (filter_hash && in_hash && !in_other_hash)
>  				match = 1;
>  			else if (!filter_hash && in_hash &&
> -				 (in_other_hash || !other_hash->count))
> +				 (in_other_hash ||
> +				  !other_hash || !other_hash->count))
>  				match = 1;
>  		}
>  		if (!match)



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 3/8] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-21 11:48     ` [PATCH 3/8] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2011-12-21 16:01       ` Steven Rostedt
  2011-12-21 16:43         ` Jiri Olsa
  2012-01-24  1:26         ` Frederic Weisbecker
  0 siblings, 2 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-12-21 16:01 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, 2011-12-21 at 12:48 +0100, Jiri Olsa wrote:
> Adding a way to temporarily enable/disable ftrace_ops. The change
> follows the same way as 'global' ftrace_ops are done.
> 
> Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
> which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
> flag. In addition new per cpu flag called 'disabled' is also added to
> ftrace_ops to provide the control information for each cpu.
> 
> When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
> set as disabled for all cpus.
> 
> The ftrace_control_list contains all the registered 'control' ftrace_ops.
> The control_ops provides function which iterates ftrace_control_list
> and does the check for 'disabled' flag on current cpu.
> 
> Adding 2 inline functions ftrace_function_enable/ftrace_function_disable,
> which enable/disable the ftrace_ops for given cpu.
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  include/linux/ftrace.h |   42 +++++++++++++++++
>  kernel/trace/ftrace.c  |  116 +++++++++++++++++++++++++++++++++++++++++++++---
>  kernel/trace/trace.h   |    2 +
>  3 files changed, 153 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 523640f..67b8236 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -35,12 +35,14 @@ enum {
>  	FTRACE_OPS_FL_ENABLED		= 1 << 0,
>  	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
>  	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
> +	FTRACE_OPS_FL_CONTROL		= 1 << 3,
>  };
>  
>  struct ftrace_ops {
>  	ftrace_func_t			func;
>  	struct ftrace_ops		*next;
>  	unsigned long			flags;
> +	void __percpu			*disabled;
>  #ifdef CONFIG_DYNAMIC_FTRACE
>  	struct ftrace_hash		*notrace_hash;
>  	struct ftrace_hash		*filter_hash;
> @@ -97,6 +99,46 @@ int register_ftrace_function(struct ftrace_ops *ops);
>  int unregister_ftrace_function(struct ftrace_ops *ops);
>  void clear_ftrace_function(void);
>  
> +/**
> + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> + *
> + * This function enables tracing on given cpu by decreasing
> + * the per cpu control variable.
> + * It must be called with preemption disabled and only on
> + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> + */
> +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> +{
> +	atomic_t *disabled;
> +
> +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)) ||
> +			 !preempt_count())

The WARN_ON_ONCE() should also include the !preempt_count().


> +		return;
> +
> +	disabled = per_cpu_ptr(ops->disabled, cpu);
> +	atomic_dec(disabled);
> +}
> +
> +/**
> + * ftrace_function_disable - enable controlled ftrace_ops on given cpu
> + *
> + * This function enables tracing on given cpu by decreasing
> + * the per cpu control variable.
> + * It must be called with preemption disabled and only on
> + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> + */
> +static inline void ftrace_function_disable(struct ftrace_ops *ops, int cpu)
> +{
> +	atomic_t *disabled;
> +
> +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)) ||
> +			 !preempt_count())

Same here.

> +		return;
> +
> +	disabled = per_cpu_ptr(ops->disabled, cpu);
> +	atomic_inc(disabled);
> +}
> +
>  extern void ftrace_stub(unsigned long a0, unsigned long a1);
>  
>  #else /* !CONFIG_FUNCTION_TRACER */
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 7eb702f..1b56013 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -60,6 +60,8 @@
>  #define FTRACE_HASH_DEFAULT_BITS 10
>  #define FTRACE_HASH_MAX_BITS 12
>  
> +#define FL_GLOBAL_CONTROL (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
> +
>  /* ftrace_enabled is a method to turn ftrace on or off */
>  int ftrace_enabled __read_mostly;
>  static int last_ftrace_enabled;
> @@ -87,12 +89,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
>  };
>  
>  static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
> +static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
>  static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
>  ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
>  static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
>  ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
>  ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
>  static struct ftrace_ops global_ops;
> +static struct ftrace_ops control_ops;
>  
>  static void
>  ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
> @@ -166,6 +170,38 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
>  }
>  #endif
>  
> +static void control_ops_disable_all(struct ftrace_ops *ops)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu)
> +		atomic_set(per_cpu_ptr(ops->disabled, cpu), 1);
> +}
> +
> +static int control_ops_alloc(struct ftrace_ops *ops)
> +{
> +	atomic_t *disabled;
> +
> +	disabled = alloc_percpu(atomic_t);
> +	if (!disabled)
> +		return -ENOMEM;
> +
> +	ops->disabled = disabled;
> +	control_ops_disable_all(ops);
> +	return 0;
> +}
> +
> +static void control_ops_free(struct ftrace_ops *ops)
> +{
> +	free_percpu(ops->disabled);
> +}
> +
> +static int control_ops_is_disabled(struct ftrace_ops *ops)
> +{
> +	atomic_t *disabled = this_cpu_ptr(ops->disabled);

Again, the use of "this_cpu_ptr" is wrong. Gah! We should nuke all of
that crap.


> +	return atomic_read(disabled);
> +}
> +
>  static void update_global_ops(void)
>  {
>  	ftrace_func_t func;
> @@ -257,6 +293,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
>  	return 0;
>  }
>  
> +static void add_ftrace_list_ops(struct ftrace_ops **list,
> +				struct ftrace_ops *main_ops,
> +				struct ftrace_ops *ops)
> +{
> +	int first = *list == &ftrace_list_end;
> +	add_ftrace_ops(list, ops);
> +	if (first)
> +		add_ftrace_ops(&ftrace_ops_list, main_ops);
> +}
> +
> +static int remove_ftrace_list_ops(struct ftrace_ops **list,
> +				  struct ftrace_ops *main_ops,
> +				  struct ftrace_ops *ops)
> +{
> +	int ret = remove_ftrace_ops(list, ops);
> +	if (!ret && *list == &ftrace_list_end)
> +		ret = remove_ftrace_ops(&ftrace_ops_list, main_ops);
> +	return ret;
> +}
> +
>  static int __register_ftrace_function(struct ftrace_ops *ops)
>  {
>  	if (ftrace_disabled)
> @@ -268,15 +324,19 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
>  	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
>  		return -EBUSY;
>  
> +	if ((ops->flags & FL_GLOBAL_CONTROL) == FL_GLOBAL_CONTROL)

No biggy, but I usually find:

	if (ops->flags & FL_GLOBAL_CONTROL)

more readable. With what you have, I looked at that condition three
times to figure out what was different between what was '&'d with the
flags and what was being equal too. Usually the ((flags & X) == Y) is
done to check if a subset of bits are set within a mask of bits.


> +		return -EINVAL;
> +
>  	if (!core_kernel_data((unsigned long)ops))
>  		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
>  
>  	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
> -		int first = ftrace_global_list == &ftrace_list_end;
> -		add_ftrace_ops(&ftrace_global_list, ops);
> +		add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);

Much better and easier to read :)

>  		ops->flags |= FTRACE_OPS_FL_ENABLED;
> -		if (first)
> -			add_ftrace_ops(&ftrace_ops_list, &global_ops);
> +	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
> +		if (control_ops_alloc(ops))
> +			return -ENOMEM;
> +		add_ftrace_list_ops(&ftrace_control_list, &control_ops, ops);
>  	} else
>  		add_ftrace_ops(&ftrace_ops_list, ops);
>  

The rest looks good.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 3/8] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-21 16:01       ` Steven Rostedt
@ 2011-12-21 16:43         ` Jiri Olsa
  2011-12-21 16:55           ` Steven Rostedt
  2012-01-24  1:26         ` Frederic Weisbecker
  1 sibling, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 16:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, Dec 21, 2011 at 11:01:33AM -0500, Steven Rostedt wrote:
> On Wed, 2011-12-21 at 12:48 +0100, Jiri Olsa wrote:

SNIP

> > +/**
> > + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> > + *
> > + * This function enables tracing on given cpu by decreasing
> > + * the per cpu control variable.
> > + * It must be called with preemption disabled and only on
> > + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> > + */
> > +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> > +{
> > +	atomic_t *disabled;
> > +
> > +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)) ||
> > +			 !preempt_count())
> 
> The WARN_ON_ONCE() should also include the !preempt_count().
> 

ouch, that was initial intention.. need eye doctor ;)

> 
> > +		return;
> > +
> > +	disabled = per_cpu_ptr(ops->disabled, cpu);
> > +	atomic_dec(disabled);
> > +}
> > +

SNIP

> > +
> > +static int control_ops_is_disabled(struct ftrace_ops *ops)
> > +{
> > +	atomic_t *disabled = this_cpu_ptr(ops->disabled);
> 
> Again, the use of "this_cpu_ptr" is wrong. Gah! We should nuke all of
> that crap.

will nuke this one..

> 
> 

> >  static int __register_ftrace_function(struct ftrace_ops *ops)
> >  {
> >  	if (ftrace_disabled)
> > @@ -268,15 +324,19 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
> >  	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
> >  		return -EBUSY;
> >  
> > +	if ((ops->flags & FL_GLOBAL_CONTROL) == FL_GLOBAL_CONTROL)
> 
> No biggy, but I usually find:
> 
> 	if (ops->flags & FL_GLOBAL_CONTROL)
> 
> more readable. With what you have, I looked at that condition three
> times to figure out what was different between what was '&'d with the
> flags and what was being equal too. Usually the ((flags & X) == Y) is
> done to check if a subset of bits are set within a mask of bits.

Well, thats what I need to do here. Bail out if both bits are set,
since we dont support both global and control flags set at the same
time.. I'll add some comment to it.

thanks,
jirka

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 3/8] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-21 16:43         ` Jiri Olsa
@ 2011-12-21 16:55           ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2011-12-21 16:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, 2011-12-21 at 17:43 +0100, Jiri Olsa wrote:

> > more readable. With what you have, I looked at that condition three
> > times to figure out what was different between what was '&'d with the
> > flags and what was being equal too. Usually the ((flags & X) == Y) is
> > done to check if a subset of bits are set within a mask of bits.
> 
> Well, thats what I need to do here. Bail out if both bits are set,
> since we dont support both global and control flags set at the same
> time.. I'll add some comment to it.
> 

Ah that's right, that's not a single bit. OK, you need to rename
FL_GLOBAL_CONTROL to FL_GLOBAL_CONTROL_BITS or _MASK. _MASK may be
better, as I think it's used more often for this case.

Otherwise it looks like a single bit. Then you wont even need the
comment. But I wont stop you from adding one.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCHv4 0/8] ftrace, perf: Adding support to use function trace
  2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
                       ` (7 preceding siblings ...)
  2011-12-21 11:48     ` [PATCH 8/8] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2011-12-21 18:56     ` Jiri Olsa
  2011-12-21 18:56       ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
                         ` (8 more replies)
  8 siblings, 9 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 18:56 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

hi,
here's another version of perf support for function trace
with filter. 

attached patches:
  1/7 ftrace: Change filter/notrace set functions to return exit code
  2/7 ftrace: Add enable/disable ftrace_ops control interface
  3/7 ftrace, perf: Add open/close tracepoint perf registration actions
  4/7 ftrace, perf: Add add/del tracepoint perf registration actions
  5/7 ftrace, perf: Add support to use function tracepoint in perf
  6/7 ftrace, perf: Distinguish ftrace function event field type
  7/7 ftrace, perf: Add filter support for function trace event

v4 changes:
  2/7 - FL_GLOBAL_CONTROL changed to FL_GLOBAL_CONTROL_MASK
      - changed WARN_ON_ONCE() to include the !preempt_count()
      - changed this_cpu_ptr to per_cpu_ptr

  ommited Fix possible NULL dereferencing in __ftrace_hash_rec_update
  (2/8 in v3)

v3 changes:
  3/8 - renamed __add/remove_ftrace_ops
      - fixed preemtp_enable/recursion_clear order in ftrace_ops_control_func 
      - renamed/commented API functions -  enable/disable_ftrace_function
  
  ommited graph tracer workarounf patch 10/10  

v2 changes:
 01/10 - keeping the old fix instead of adding hash_has_contents func
         I'll send separating patchset for this
 02/10 - using different way to avoid the issue (3/9 in v1)
 03/10 - using the way proposed by Steven for controling ftrace_ops
         (4/9 in v1)
 06/10 - added check ensuring the ftrace:function event could be used by
         root only (7/9 in v1)
 08/10 - added more description (8/9 in v1)
 09/10 - changed '&&' operator to '||' which seems more suitable
         in this case (9/9 in v1)

thanks,
jirka
---
 include/linux/ftrace.h             |   46 ++++++++-
 include/linux/ftrace_event.h       |    9 +-
 include/linux/perf_event.h         |    3 +
 kernel/trace/ftrace.c              |  135 +++++++++++++++++++++--
 kernel/trace/trace.h               |   11 ++-
 kernel/trace/trace_event_perf.c    |  212 +++++++++++++++++++++++++++++-------
 kernel/trace/trace_events.c        |   12 ++-
 kernel/trace/trace_events_filter.c |  116 ++++++++++++++++++-
 kernel/trace/trace_export.c        |   53 ++++++++-
 kernel/trace/trace_kprobe.c        |    8 +-
 kernel/trace/trace_syscalls.c      |   18 +++-
 11 files changed, 542 insertions(+), 81 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code
  2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
@ 2011-12-21 18:56       ` Jiri Olsa
  2011-12-22  0:12         ` Steven Rostedt
  2011-12-21 18:56       ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
                         ` (7 subsequent siblings)
  8 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 18:56 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.

The set_ftrace_filter interface returns error in case the filter
did not match:

  # echo krava > set_ftrace_filter
  bash: echo: write error: Invalid argument

Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |    4 ++--
 kernel/trace/ftrace.c  |   16 ++++++++++------
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 26eafce..523640f 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -180,9 +180,9 @@ struct dyn_ftrace {
 };
 
 int ftrace_force_update(void);
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset);
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b79ab33..7eb702f 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2912,8 +2912,11 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	mutex_lock(&ftrace_regex_lock);
 	if (reset)
 		ftrace_filter_reset(hash);
-	if (buf)
-		ftrace_match_records(hash, buf, len);
+	if (buf &&
+	    !ftrace_match_records(hash, buf, len)) {
+		ret = -EINVAL;
+		goto out_regex_unlock;
+	}
 
 	mutex_lock(&ftrace_lock);
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
@@ -2923,6 +2926,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 
 	mutex_unlock(&ftrace_lock);
 
+ out_regex_unlock:
 	mutex_unlock(&ftrace_regex_lock);
 
 	free_ftrace_hash(hash);
@@ -2939,10 +2943,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
  * Filters denote which functions should be enabled when tracing is enabled.
  * If @buf is NULL and reset is set, all functions will be enabled for tracing.
  */
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 1);
+	return ftrace_set_regex(ops, buf, len, reset, 1);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_filter);
 
@@ -2957,10 +2961,10 @@ EXPORT_SYMBOL_GPL(ftrace_set_filter);
  * is enabled. If @buf is NULL and reset is set, all functions will be enabled
  * for tracing.
  */
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 0);
+	return ftrace_set_regex(ops, buf, len, reset, 0);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_notrace);
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
  2011-12-21 18:56       ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2011-12-21 18:56       ` Jiri Olsa
  2011-12-21 18:56       ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
                         ` (6 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 18:56 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.

Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.

When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.

The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.

Adding 2 inline functions ftrace_function_enable/ftrace_function_disable,
which enable/disable the ftrace_ops for given cpu.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |   42 +++++++++++++++++
 kernel/trace/ftrace.c  |  119 +++++++++++++++++++++++++++++++++++++++++++++---
 kernel/trace/trace.h   |    2 +
 3 files changed, 156 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 523640f..0d43a2b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -35,12 +35,14 @@ enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	void __percpu			*disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +99,46 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+/**
+ * ftrace_function_enable - enable controlled ftrace_ops on given cpu
+ *
+ * This function enables tracing on given cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
+{
+	atomic_t *disabled;
+
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
+			 !preempt_count()))
+		return;
+
+	disabled = per_cpu_ptr(ops->disabled, cpu);
+	atomic_dec(disabled);
+}
+
+/**
+ * ftrace_function_disable - enable controlled ftrace_ops on given cpu
+ *
+ * This function enables tracing on given cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_disable(struct ftrace_ops *ops, int cpu)
+{
+	atomic_t *disabled;
+
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
+			 !preempt_count()))
+		return;
+
+	disabled = per_cpu_ptr(ops->disabled, cpu);
+	atomic_inc(disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7eb702f..3c75928 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -60,6 +60,8 @@
 #define FTRACE_HASH_DEFAULT_BITS 10
 #define FTRACE_HASH_MAX_BITS 12
 
+#define FL_GLOBAL_CONTROL_MASK (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
+
 /* ftrace_enabled is a method to turn ftrace on or off */
 int ftrace_enabled __read_mostly;
 static int last_ftrace_enabled;
@@ -87,12 +89,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
 };
 
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
 ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
+static struct ftrace_ops control_ops;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
@@ -166,6 +170,38 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
 }
 #endif
 
+static void control_ops_disable_all(struct ftrace_ops *ops)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		atomic_set(per_cpu_ptr(ops->disabled, cpu), 1);
+}
+
+static int control_ops_alloc(struct ftrace_ops *ops)
+{
+	atomic_t *disabled;
+
+	disabled = alloc_percpu(atomic_t);
+	if (!disabled)
+		return -ENOMEM;
+
+	ops->disabled = disabled;
+	control_ops_disable_all(ops);
+	return 0;
+}
+
+static void control_ops_free(struct ftrace_ops *ops)
+{
+	free_percpu(ops->disabled);
+}
+
+static int control_ops_is_disabled(struct ftrace_ops *ops, int cpu)
+{
+	atomic_t *disabled = per_cpu_ptr(ops->disabled, cpu);
+	return atomic_read(disabled);
+}
+
 static void update_global_ops(void)
 {
 	ftrace_func_t func;
@@ -257,6 +293,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	return 0;
 }
 
+static void add_ftrace_list_ops(struct ftrace_ops **list,
+				struct ftrace_ops *main_ops,
+				struct ftrace_ops *ops)
+{
+	int first = *list == &ftrace_list_end;
+	add_ftrace_ops(list, ops);
+	if (first)
+		add_ftrace_ops(&ftrace_ops_list, main_ops);
+}
+
+static int remove_ftrace_list_ops(struct ftrace_ops **list,
+				  struct ftrace_ops *main_ops,
+				  struct ftrace_ops *ops)
+{
+	int ret = remove_ftrace_ops(list, ops);
+	if (!ret && *list == &ftrace_list_end)
+		ret = remove_ftrace_ops(&ftrace_ops_list, main_ops);
+	return ret;
+}
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ftrace_disabled)
@@ -268,15 +324,20 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
 		return -EBUSY;
 
+	/* We don't support both control and global flags set. */
+	if ((ops->flags & FL_GLOBAL_CONTROL_MASK) == FL_GLOBAL_CONTROL_MASK)
+		return -EINVAL;
+
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		int first = ftrace_global_list == &ftrace_list_end;
-		add_ftrace_ops(&ftrace_global_list, ops);
+		add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
-		if (first)
-			add_ftrace_ops(&ftrace_ops_list, &global_ops);
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		if (control_ops_alloc(ops))
+			return -ENOMEM;
+		add_ftrace_list_ops(&ftrace_control_list, &control_ops, ops);
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -300,11 +361,23 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 		return -EINVAL;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		ret = remove_ftrace_ops(&ftrace_global_list, ops);
-		if (!ret && ftrace_global_list == &ftrace_list_end)
-			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+		ret = remove_ftrace_list_ops(&ftrace_global_list,
+					     &global_ops, ops);
 		if (!ret)
 			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		ret = remove_ftrace_list_ops(&ftrace_control_list,
+					     &control_ops, ops);
+		if (!ret) {
+			/*
+			 * The ftrace_ops is now removed from the list,
+			 * so there'll be no new users. We must ensure
+			 * all current users are done before we free
+			 * the control data.
+			 */
+			synchronize_sched();
+			control_ops_free(ops);
+		}
 	} else
 		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -3566,6 +3639,38 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
+ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_ops *op;
+	int cpu;
+
+	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
+		return;
+
+	/*
+	 * Some of the ops may be dynamically allocated,
+	 * they must be freed after a synchronize_sched().
+	 */
+	preempt_disable_notrace();
+	trace_recursion_set(TRACE_CONTROL_BIT);
+	cpu = smp_processor_id();
+	op = rcu_dereference_raw(ftrace_control_list);
+	while (op != &ftrace_list_end) {
+		if (!control_ops_is_disabled(op, cpu) &&
+		    ftrace_ops_test(op, ip))
+			op->func(ip, parent_ip);
+
+		op = rcu_dereference_raw(op->next);
+	};
+	trace_recursion_clear(TRACE_CONTROL_BIT);
+	preempt_enable_notrace();
+}
+
+static struct ftrace_ops control_ops = {
+	.func = ftrace_ops_control_func,
+};
+
+static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 {
 	struct ftrace_ops *op;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 2c26574..41c54e3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ struct tracer {
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
 #define TRACE_GLOBAL_BIT		(1<<12)
+#define TRACE_CONTROL_BIT		(1<<13)
+
 /*
  * Abuse of the trace_recursion.
  * As we need a way to maintain state if we are tracing the function
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions
  2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
  2011-12-21 18:56       ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
  2011-12-21 18:56       ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2011-12-21 18:56       ` Jiri Olsa
  2011-12-21 18:56       ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
                         ` (5 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 18:56 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
register/unregister from open/close actions.

The register/unregister actions are invoked for the first/last
tracepoint user when opening/closing the evet.

The open/close actions are invoked for each tracepoint user when
opening/closing the event.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    6 +-
 kernel/trace/trace.h            |    5 ++
 kernel/trace/trace_event_perf.c |  116 +++++++++++++++++++++++++--------------
 kernel/trace/trace_events.c     |   10 ++-
 kernel/trace/trace_kprobe.c     |    6 ++-
 kernel/trace/trace_syscalls.c   |   14 +++-
 6 files changed, 106 insertions(+), 51 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index c3da42d..195e360 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -146,6 +146,8 @@ enum trace_reg {
 	TRACE_REG_UNREGISTER,
 	TRACE_REG_PERF_REGISTER,
 	TRACE_REG_PERF_UNREGISTER,
+	TRACE_REG_PERF_OPEN,
+	TRACE_REG_PERF_CLOSE,
 };
 
 struct ftrace_event_call;
@@ -157,7 +159,7 @@ struct ftrace_event_class {
 	void			*perf_probe;
 #endif
 	int			(*reg)(struct ftrace_event_call *event,
-				       enum trace_reg type);
+				       enum trace_reg type, void *data);
 	int			(*define_fields)(struct ftrace_event_call *);
 	struct list_head	*(*get_fields)(struct ftrace_event_call *);
 	struct list_head	fields;
@@ -165,7 +167,7 @@ struct ftrace_event_class {
 };
 
 extern int ftrace_event_reg(struct ftrace_event_call *event,
-			    enum trace_reg type);
+			    enum trace_reg type, void *data);
 
 enum {
 	TRACE_EVENT_FL_ENABLED_BIT,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 41c54e3..85732a8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -828,4 +828,9 @@ extern const char *__stop___trace_bprintk_fmt[];
 	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 #include "trace_entries.h"
 
+#ifdef CONFIG_PERF_EVENTS
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data);
+#endif
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 19a359d..0cfcc37 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -44,23 +44,17 @@ static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 	return 0;
 }
 
-static int perf_trace_event_init(struct ftrace_event_call *tp_event,
-				 struct perf_event *p_event)
+static int perf_trace_event_reg(struct ftrace_event_call *tp_event,
+				struct perf_event *p_event)
 {
 	struct hlist_head __percpu *list;
-	int ret;
+	int ret = -ENOMEM;
 	int cpu;
 
-	ret = perf_trace_event_perm(tp_event, p_event);
-	if (ret)
-		return ret;
-
 	p_event->tp_event = tp_event;
 	if (tp_event->perf_refcount++ > 0)
 		return 0;
 
-	ret = -ENOMEM;
-
 	list = alloc_percpu(struct hlist_head);
 	if (!list)
 		goto fail;
@@ -83,7 +77,7 @@ static int perf_trace_event_init(struct ftrace_event_call *tp_event,
 		}
 	}
 
-	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER);
+	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER, NULL);
 	if (ret)
 		goto fail;
 
@@ -108,6 +102,69 @@ fail:
 	return ret;
 }
 
+static void perf_trace_event_unreg(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	int i;
+
+	if (--tp_event->perf_refcount > 0)
+		goto out;
+
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER, NULL);
+
+	/*
+	 * Ensure our callback won't be called anymore. The buffers
+	 * will be freed after that.
+	 */
+	tracepoint_synchronize_unregister();
+
+	free_percpu(tp_event->perf_events);
+	tp_event->perf_events = NULL;
+
+	if (!--total_ref_count) {
+		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
+			free_percpu(perf_trace_buf[i]);
+			perf_trace_buf[i] = NULL;
+		}
+	}
+out:
+	module_put(tp_event->mod);
+}
+
+static int perf_trace_event_open(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_OPEN, p_event);
+}
+
+static void perf_trace_event_close(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
+}
+
+static int perf_trace_event_init(struct ftrace_event_call *tp_event,
+				 struct perf_event *p_event)
+{
+	int ret;
+
+	ret = perf_trace_event_perm(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_reg(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_open(p_event);
+	if (ret) {
+		perf_trace_event_unreg(p_event);
+		return ret;
+	}
+
+	return 0;
+}
+
 int perf_trace_init(struct perf_event *p_event)
 {
 	struct ftrace_event_call *tp_event;
@@ -130,6 +187,14 @@ int perf_trace_init(struct perf_event *p_event)
 	return ret;
 }
 
+void perf_trace_destroy(struct perf_event *p_event)
+{
+	mutex_lock(&event_mutex);
+	perf_trace_event_close(p_event);
+	perf_trace_event_unreg(p_event);
+	mutex_unlock(&event_mutex);
+}
+
 int perf_trace_add(struct perf_event *p_event, int flags)
 {
 	struct ftrace_event_call *tp_event = p_event->tp_event;
@@ -154,37 +219,6 @@ void perf_trace_del(struct perf_event *p_event, int flags)
 	hlist_del_rcu(&p_event->hlist_entry);
 }
 
-void perf_trace_destroy(struct perf_event *p_event)
-{
-	struct ftrace_event_call *tp_event = p_event->tp_event;
-	int i;
-
-	mutex_lock(&event_mutex);
-	if (--tp_event->perf_refcount > 0)
-		goto out;
-
-	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER);
-
-	/*
-	 * Ensure our callback won't be called anymore. The buffers
-	 * will be freed after that.
-	 */
-	tracepoint_synchronize_unregister();
-
-	free_percpu(tp_event->perf_events);
-	tp_event->perf_events = NULL;
-
-	if (!--total_ref_count) {
-		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
-			free_percpu(perf_trace_buf[i]);
-			perf_trace_buf[i] = NULL;
-		}
-	}
-out:
-	module_put(tp_event->mod);
-	mutex_unlock(&event_mutex);
-}
-
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 				       struct pt_regs *regs, int *rctxp)
 {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c212a7f..5138fea 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -147,7 +147,8 @@ int trace_event_raw_init(struct ftrace_event_call *call)
 }
 EXPORT_SYMBOL_GPL(trace_event_raw_init);
 
-int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
+int ftrace_event_reg(struct ftrace_event_call *call,
+		     enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -170,6 +171,9 @@ int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
 					    call->class->perf_probe,
 					    call);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
@@ -209,7 +213,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_stop_cmdline_record();
 				call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			call->class->reg(call, TRACE_REG_UNREGISTER);
+			call->class->reg(call, TRACE_REG_UNREGISTER, NULL);
 		}
 		break;
 	case 1:
@@ -218,7 +222,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_start_cmdline_record();
 				call->flags |= TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			ret = call->class->reg(call, TRACE_REG_REGISTER);
+			ret = call->class->reg(call, TRACE_REG_REGISTER, NULL);
 			if (ret) {
 				tracing_stop_cmdline_record();
 				pr_info("event trace: Could not enable event "
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 00d527c..5667f89 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1892,7 +1892,8 @@ static __kprobes void kretprobe_perf_func(struct kretprobe_instance *ri,
 #endif	/* CONFIG_PERF_EVENTS */
 
 static __kprobes
-int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
+int kprobe_register(struct ftrace_event_call *event,
+		    enum trace_reg type, void *data)
 {
 	struct trace_probe *tp = (struct trace_probe *)event->data;
 
@@ -1909,6 +1910,9 @@ int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
 	case TRACE_REG_PERF_UNREGISTER:
 		disable_trace_probe(tp, TP_FLAG_PROFILE);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 5f35f6f..8599c1d 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -18,9 +18,9 @@ static DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
 static DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 
 static int syscall_enter_define_fields(struct ftrace_event_call *call);
 static int syscall_exit_define_fields(struct ftrace_event_call *call);
@@ -650,7 +650,7 @@ void perf_sysexit_disable(struct ftrace_event_call *call)
 #endif /* CONFIG_PERF_EVENTS */
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -665,13 +665,16 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysenter_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
 }
 
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -686,6 +689,9 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysexit_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 4/7] ftrace, perf: Add add/del tracepoint perf registration actions
  2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                         ` (2 preceding siblings ...)
  2011-12-21 18:56       ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
@ 2011-12-21 18:56       ` Jiri Olsa
  2011-12-21 18:56       ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
                         ` (4 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 18:56 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
perf event schedule in/out actions.

The add action is invoked for when the perf event is scheduled in,
while the del action is invoked when the event is scheduled out.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    2 ++
 kernel/trace/trace_event_perf.c |    4 +++-
 kernel/trace/trace_events.c     |    2 ++
 kernel/trace/trace_kprobe.c     |    2 ++
 kernel/trace/trace_syscalls.c   |    4 ++++
 5 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 195e360..2bf677c 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -148,6 +148,8 @@ enum trace_reg {
 	TRACE_REG_PERF_UNREGISTER,
 	TRACE_REG_PERF_OPEN,
 	TRACE_REG_PERF_CLOSE,
+	TRACE_REG_PERF_ADD,
+	TRACE_REG_PERF_DEL,
 };
 
 struct ftrace_event_call;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 0cfcc37..d72af0b 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -211,12 +211,14 @@ int perf_trace_add(struct perf_event *p_event, int flags)
 	list = this_cpu_ptr(pcpu_list);
 	hlist_add_head_rcu(&p_event->hlist_entry, list);
 
-	return 0;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_ADD, p_event);
 }
 
 void perf_trace_del(struct perf_event *p_event, int flags)
 {
+	struct ftrace_event_call *tp_event = p_event->tp_event;
 	hlist_del_rcu(&p_event->hlist_entry);
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_DEL, p_event);
 }
 
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 5138fea..079a93a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -173,6 +173,8 @@ int ftrace_event_reg(struct ftrace_event_call *call,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5667f89..580a05e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1912,6 +1912,8 @@ int kprobe_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 8599c1d..5e4f62e 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -667,6 +667,8 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
@@ -691,6 +693,8 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf
  2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                         ` (3 preceding siblings ...)
  2011-12-21 18:56       ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
@ 2011-12-21 18:56       ` Jiri Olsa
  2011-12-21 18:56       ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
                         ` (3 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 18:56 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding perf registration support for the ftrace function event,
so it is now possible to register it via perf interface.

The perf_event struct statically contains ftrace_ops as a handle
for function tracer. The function tracer is registered/unregistered
in open/close actions, and enabled/disabled in add/del actions.

It is now possible to use function trace within perf commands
like:

  perf record -e ftrace:function ls
  perf stat -e ftrace:function ls

Allowed only for root.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h      |    3 +
 kernel/trace/trace.h            |    2 +
 kernel/trace/trace_event_perf.c |   92 +++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_export.c     |   28 ++++++++++++
 4 files changed, 125 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0b91db2..5003be6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -858,6 +858,9 @@ struct perf_event {
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
 	struct event_filter		*filter;
+#ifdef CONFIG_FUNCTION_TRACER
+	struct ftrace_ops               ftrace_ops;
+#endif
 #endif
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 85732a8..e88e58a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -591,6 +591,8 @@ static inline int ftrace_trace_task(struct task_struct *task)
 static inline int ftrace_is_dead(void) { return 0; }
 #endif
 
+int ftrace_event_is_function(struct ftrace_event_call *call);
+
 /*
  * struct trace_parser - servers for reading the user input separated by spaces
  * @cont: set if the input is not complete - no final space char was found
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index d72af0b..57eb232 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -24,6 +24,11 @@ static int	total_ref_count;
 static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 				 struct perf_event *p_event)
 {
+	/* The ftrace function trace is allowed only for root. */
+	if (ftrace_event_is_function(tp_event) &&
+	    perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	/* No tracing, just counting, so no obvious leak */
 	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
 		return 0;
@@ -250,3 +255,90 @@ __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 	return raw_data;
 }
 EXPORT_SYMBOL_GPL(perf_trace_buf_prepare);
+
+
+static void
+perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_entry *entry;
+	struct hlist_head *head;
+	struct pt_regs regs;
+	int rctx;
+
+#define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
+		    sizeof(u64)) - sizeof(u32))
+
+	BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE);
+
+	perf_fetch_caller_regs(&regs);
+
+	entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);
+	if (!entry)
+		return;
+
+	entry->ip = ip;
+	entry->parent_ip = parent_ip;
+
+	head = this_cpu_ptr(event_function.perf_events);
+	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
+			      1, &regs, head);
+
+#undef ENTRY_SIZE
+}
+
+static int perf_ftrace_function_register(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+
+	ops->flags |= FTRACE_OPS_FL_CONTROL;
+	ops->func = perf_ftrace_function_call;
+	return register_ftrace_function(ops);
+}
+
+static int perf_ftrace_function_unregister(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	return unregister_ftrace_function(ops);
+}
+
+static void perf_ftrace_function_enable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	ftrace_function_enable(ops, smp_processor_id());
+}
+
+static void perf_ftrace_function_disable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	ftrace_function_disable(ops, smp_processor_id());
+}
+
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data)
+{
+	int etype = call->event.type;
+
+	if (etype != TRACE_FN)
+		return -EINVAL;
+
+	switch (type) {
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+		return perf_ftrace_function_register(data);
+	case TRACE_REG_PERF_CLOSE:
+		return perf_ftrace_function_unregister(data);
+	case TRACE_REG_PERF_ADD:
+		perf_ftrace_function_enable(data);
+		return 0;
+	case TRACE_REG_PERF_DEL:
+		perf_ftrace_function_disable(data);
+		return 0;
+	}
+
+	return -EINVAL;
+}
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index bbeec31..867653c 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 
 #include "trace_entries.h"
 
+static int ftrace_event_class_register(struct ftrace_event_call *call,
+				       enum trace_reg type, void *data)
+{
+	switch (type) {
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
+#ifdef CONFIG_PERF_EVENTS
+		return perf_ftrace_event_register(call, type, data);
+#endif
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	}
+
+	return -EINVAL;
+}
+
 #undef __entry
 #define __entry REC
 
@@ -159,6 +181,7 @@ struct ftrace_event_class event_class_ftrace_##call = {			\
 	.system			= __stringify(TRACE_SYSTEM),		\
 	.define_fields		= ftrace_define_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
+	.reg			= ftrace_event_class_register,		\
 };									\
 									\
 struct ftrace_event_call __used event_##call = {			\
@@ -170,4 +193,9 @@ struct ftrace_event_call __used event_##call = {			\
 struct ftrace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
+int ftrace_event_is_function(struct ftrace_event_call *call)
+{
+	return call == &event_function;
+}
+
 #include "trace_entries.h"
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type
  2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                         ` (4 preceding siblings ...)
  2011-12-21 18:56       ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
@ 2011-12-21 18:56       ` Jiri Olsa
  2011-12-21 18:56       ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
                         ` (2 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 18:56 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding FILTER_TRACE_FN event field type for function tracepoint
event, so it can be properly recognized within filtering code.

Currently all fields of ftrace subsystem events share the common
field type FILTER_OTHER. Since the function trace fields need special
care within the filtering code we need to recognize it properly,
hence adding the FILTER_TRACE_FN event type.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h       |    1 +
 kernel/trace/trace_events_filter.c |    7 ++++++-
 kernel/trace/trace_export.c        |   25 ++++++++++++++++++++-----
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 2bf677c..dd478fc 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -245,6 +245,7 @@ enum {
 	FILTER_STATIC_STRING,
 	FILTER_DYN_STRING,
 	FILTER_PTR_STRING,
+	FILTER_TRACE_FN,
 };
 
 #define EVENT_STORAGE_SIZE 128
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index f04cc31..66b74ab 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -900,6 +900,11 @@ int filter_assign_type(const char *type)
 	return FILTER_OTHER;
 }
 
+static bool is_function_field(struct ftrace_event_field *field)
+{
+	return field->filter_type == FILTER_TRACE_FN;
+}
+
 static bool is_string_field(struct ftrace_event_field *field)
 {
 	return field->filter_type == FILTER_DYN_STRING ||
@@ -987,7 +992,7 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else {
+	} else if (!is_function_field(field)) {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 867653c..46c35e2 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -67,7 +67,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -77,7 +77,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -91,7 +91,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 		ret = trace_define_field(event_call, event_storage, #item, \
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 		mutex_unlock(&event_storage_mutex);			\
 		if (ret)						\
 			return ret;					\
@@ -104,7 +104,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -112,10 +112,24 @@ static void __always_unused ____ftrace_check_##name(void)	\
 #define __dynamic_array(type, item)					\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
-				 0, is_signed_type(type), FILTER_OTHER);\
+				 0, is_signed_type(type), filter_type);\
 	if (ret)							\
 		return ret;
 
+#define FILTER_TYPE_TRACE_FN           FILTER_TRACE_FN
+#define FILTER_TYPE_TRACE_GRAPH_ENT    FILTER_OTHER
+#define FILTER_TYPE_TRACE_GRAPH_RET    FILTER_OTHER
+#define FILTER_TYPE_TRACE_CTX          FILTER_OTHER
+#define FILTER_TYPE_TRACE_WAKE         FILTER_OTHER
+#define FILTER_TYPE_TRACE_STACK                FILTER_OTHER
+#define FILTER_TYPE_TRACE_USER_STACK   FILTER_OTHER
+#define FILTER_TYPE_TRACE_BPRINT       FILTER_OTHER
+#define FILTER_TYPE_TRACE_PRINT                FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_RW      FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_MAP     FILTER_OTHER
+#define FILTER_TYPE_TRACE_BRANCH       FILTER_OTHER
+#define FILTER_TYPE(arg)               FILTER_TYPE_##arg
+
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 int									\
@@ -123,6 +137,7 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 {									\
 	struct struct_name field;					\
 	int ret;							\
+	int filter_type = FILTER_TYPE(id);				\
 									\
 	tstruct;							\
 									\
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 7/7] ftrace, perf: Add filter support for function trace event
  2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                         ` (5 preceding siblings ...)
  2011-12-21 18:56       ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
@ 2011-12-21 18:56       ` Jiri Olsa
  2011-12-21 22:07         ` Frederic Weisbecker
  2011-12-21 19:02       ` [PATCHv4 0/7] ftrace, perf: Adding support to use function trace Jiri Olsa
  2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
  8 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 18:56 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

  "ip == f1 f2 ..." || "ip != f3 f4 ..." ...

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space. Same syntax
is supported/required as for the set_ftrace_filter and
set_ftrace_notrace files.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/trace.h               |    2 -
 kernel/trace/trace_events_filter.c |  113 +++++++++++++++++++++++++++++++++---
 2 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e88e58a..4ec6d18 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -770,9 +770,7 @@ struct filter_pred {
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	struct ftrace_event_field *field;
-#endif
 	int 			offset;
 	int 			not;
 	int 			op;
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 66b74ab..600bb1e 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -54,6 +54,13 @@ struct filter_op {
 	int precedence;
 };
 
+static struct filter_op filter_ftrace_ops[] = {
+	{ OP_OR,	"||",		1 },
+	{ OP_NE,	"!=",		2 },
+	{ OP_EQ,	"==",		2 },
+	{ OP_NONE,	"OP_NONE",	0 },
+};
+
 static struct filter_op filter_ops[] = {
 	{ OP_OR,	"||",		1 },
 	{ OP_AND,	"&&",		2 },
@@ -81,6 +88,7 @@ enum {
 	FILT_ERR_TOO_MANY_PREDS,
 	FILT_ERR_MISSING_FIELD,
 	FILT_ERR_INVALID_FILTER,
+	FILT_ERR_IP_FIELD_ONLY,
 };
 
 static char *err_text[] = {
@@ -96,6 +104,7 @@ static char *err_text[] = {
 	"Too many terms in predicate expression",
 	"Missing field name and/or value",
 	"Meaningless filter expression",
+	"Only 'ip' field is supported for function trace",
 };
 
 struct opstack_op {
@@ -992,7 +1001,12 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else if (!is_function_field(field)) {
+	} else if (is_function_field(field)) {
+		if (strcmp(field->name, "ip")) {
+			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
+			return -EINVAL;
+		}
+	} else {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
@@ -1339,10 +1353,8 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
 
 	strcpy(pred.regex.pattern, operand2);
 	pred.regex.len = strlen(pred.regex.pattern);
-
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	pred.field = field;
-#endif
+
 	return init_pred(ps, field, &pred) ? NULL : &pred;
 }
 
@@ -1894,6 +1906,83 @@ void ftrace_profile_free_filter(struct perf_event *event)
 	__free_filter(filter);
 }
 
+struct function_filter_data {
+	struct ftrace_ops *ops;
+	int first_filter;
+	int first_notrace;
+};
+
+static int __ftrace_function_set_filter(int filter, char *buf, int len,
+					struct function_filter_data *data)
+{
+	int *reset, ret;
+
+	reset = filter ? &data->first_filter : &data->first_notrace;
+
+	if (filter)
+		ret = ftrace_set_filter(data->ops, buf, len, *reset);
+	else
+		ret = ftrace_set_notrace(data->ops, buf, len, *reset);
+
+	if (*reset)
+		*reset = 0;
+
+	return ret;
+}
+
+static int ftrace_function_check_pred(struct filter_pred *pred)
+{
+	struct ftrace_event_field *field = pred->field;
+
+	/*
+	  Check the predicate for function trace, verify:
+	   - only '==' and '!=' is used
+	   - the 'ip' field is used
+	*/
+	if (WARN((pred->op != OP_EQ) && (pred->op != OP_NE),
+		 "wrong operator for function filter: %d\n", pred->op))
+		return -EINVAL;
+
+	if (strcmp(field->name, "ip"))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int ftrace_function_set_filter_cb(enum move_type move,
+					 struct filter_pred *pred,
+					 int *err, void *data)
+{
+	if ((move != MOVE_DOWN) ||
+	    (pred->left != FILTER_PRED_INVALID))
+		return WALK_PRED_DEFAULT;
+
+	/* Double checking the predicate is valid for function trace. */
+	*err = ftrace_function_check_pred(pred);
+	if (*err)
+		return WALK_PRED_ABORT;
+
+	*err = __ftrace_function_set_filter(pred->op == OP_EQ,
+					    pred->regex.pattern,
+					    pred->regex.len,
+					    data);
+
+	return (*err) ? WALK_PRED_ABORT : WALK_PRED_DEFAULT;
+}
+
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	struct function_filter_data data = {
+		.first_filter  = 1,
+		.first_notrace = 1,
+		.ops           = &event->ftrace_ops,
+	};
+
+	return walk_pred_tree(filter->preds, filter->root,
+			      ftrace_function_set_filter_cb, &data);
+}
+
 int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 			      char *filter_str)
 {
@@ -1901,6 +1990,7 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	struct event_filter *filter;
 	struct filter_parse_state *ps;
 	struct ftrace_event_call *call;
+	struct filter_op *fops = filter_ops;
 
 	mutex_lock(&event_mutex);
 
@@ -1925,14 +2015,21 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	if (!ps)
 		goto free_filter;
 
-	parse_init(ps, filter_ops, filter_str);
+	if (ftrace_event_is_function(call))
+		fops = filter_ftrace_ops;
+
+	parse_init(ps, fops, filter_str);
 	err = filter_parse(ps);
 	if (err)
 		goto free_ps;
 
 	err = replace_preds(call, filter, ps, filter_str, false);
-	if (!err)
-		event->filter = filter;
+	if (!err) {
+		if (ftrace_event_is_function(call))
+			err = ftrace_function_set_filter(event, filter);
+		else
+			event->filter = filter;
+	}
 
 free_ps:
 	filter_opstack_clear(ps);
@@ -1940,7 +2037,7 @@ free_ps:
 	kfree(ps);
 
 free_filter:
-	if (err)
+	if (err || ftrace_event_is_function(call))
 		__free_filter(filter);
 
 out_unlock:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCHv4 0/7] ftrace, perf: Adding support to use function trace
  2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                         ` (6 preceding siblings ...)
  2011-12-21 18:56       ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2011-12-21 19:02       ` Jiri Olsa
  2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-21 19:02 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

should be 0/7 instead of 0/8 ;)

On Wed, Dec 21, 2011 at 07:56:24PM +0100, Jiri Olsa wrote:
> hi,
> here's another version of perf support for function trace
> with filter. 
> 
> attached patches:
>   1/7 ftrace: Change filter/notrace set functions to return exit code
>   2/7 ftrace: Add enable/disable ftrace_ops control interface
>   3/7 ftrace, perf: Add open/close tracepoint perf registration actions
>   4/7 ftrace, perf: Add add/del tracepoint perf registration actions
>   5/7 ftrace, perf: Add support to use function tracepoint in perf
>   6/7 ftrace, perf: Distinguish ftrace function event field type
>   7/7 ftrace, perf: Add filter support for function trace event
> 
> v4 changes:
>   2/7 - FL_GLOBAL_CONTROL changed to FL_GLOBAL_CONTROL_MASK
>       - changed WARN_ON_ONCE() to include the !preempt_count()
>       - changed this_cpu_ptr to per_cpu_ptr
> 
>   ommited Fix possible NULL dereferencing in __ftrace_hash_rec_update
>   (2/8 in v3)
> 
> v3 changes:
>   3/8 - renamed __add/remove_ftrace_ops
>       - fixed preemtp_enable/recursion_clear order in ftrace_ops_control_func 
>       - renamed/commented API functions -  enable/disable_ftrace_function
>   
>   ommited graph tracer workarounf patch 10/10  
> 
> v2 changes:
>  01/10 - keeping the old fix instead of adding hash_has_contents func
>          I'll send separating patchset for this
>  02/10 - using different way to avoid the issue (3/9 in v1)
>  03/10 - using the way proposed by Steven for controling ftrace_ops
>          (4/9 in v1)
>  06/10 - added check ensuring the ftrace:function event could be used by
>          root only (7/9 in v1)
>  08/10 - added more description (8/9 in v1)
>  09/10 - changed '&&' operator to '||' which seems more suitable
>          in this case (9/9 in v1)
> 
> thanks,
> jirka
> ---
>  include/linux/ftrace.h             |   46 ++++++++-
>  include/linux/ftrace_event.h       |    9 +-
>  include/linux/perf_event.h         |    3 +
>  kernel/trace/ftrace.c              |  135 +++++++++++++++++++++--
>  kernel/trace/trace.h               |   11 ++-
>  kernel/trace/trace_event_perf.c    |  212 +++++++++++++++++++++++++++++-------
>  kernel/trace/trace_events.c        |   12 ++-
>  kernel/trace/trace_events_filter.c |  116 ++++++++++++++++++-
>  kernel/trace/trace_export.c        |   53 ++++++++-
>  kernel/trace/trace_kprobe.c        |    8 +-
>  kernel/trace/trace_syscalls.c      |   18 +++-
>  11 files changed, 542 insertions(+), 81 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/7] ftrace, perf: Add filter support for function trace event
  2011-12-21 18:56       ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2011-12-21 22:07         ` Frederic Weisbecker
  2011-12-22 12:55           ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Frederic Weisbecker @ 2011-12-21 22:07 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, Dec 21, 2011 at 07:56:31PM +0100, Jiri Olsa wrote:
> Adding support to filter function trace event via perf
> interface. It is now possible to use filter interface
> in the perf tool like:
> 
>   perf record -e ftrace:function --filter="(ip == mm_*)" ls
> 
> The filter syntax is restricted to the the 'ip' field only,
> and following operators are accepted '==' '!=' '||', ending
> up with the filter strings like:
> 
>   "ip == f1 f2 ..." || "ip != f3 f4 ..." ...

Having the functions seperated like this sort of violates the
grammar of the filtering interface.

The typical way to do this would have been to stringify the
functions: ip == "f1 f2"

I feel a bit uncomfortable with "ip == f1 f2" scheme but perhaps
we can live with that. Especially as otherwise that would
require us to type "ip == \"f1 f2\"" for the whole filtering expression.

Thoughts?

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code
  2011-12-21 18:56       ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2011-12-22  0:12         ` Steven Rostedt
  2011-12-22  8:01           ` [PATCHv5 " Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2011-12-22  0:12 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, 2011-12-21 at 19:56 +0100, Jiri Olsa wrote:
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index b79ab33..7eb702f 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2912,8 +2912,11 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
>  	mutex_lock(&ftrace_regex_lock);
>  	if (reset)
>  		ftrace_filter_reset(hash);
> -	if (buf)
> -		ftrace_match_records(hash, buf, len);
> +	if (buf &&
> +	    !ftrace_match_records(hash, buf, len)) {

I'm fine with this patch, but the above line break is ugly. Put that on
one line, and it still is < 80 characters.

I could just take this patch and fix the above. No need to make a new
patch set for this one fix.

-- Steve

> +		ret = -EINVAL;
> +		goto out_regex_unlock;
> +	}
>  
>  	mutex_lock(&ftrace_lock);
>  	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
> @@ -2923,6 +2926,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
>  
>  	mutex_unlock(&ftrace_lock);
>  
> + out_regex_unlock:
>  	mutex_unlock(&ftrace_regex_lock);
>  
>  	free_ftrace_hash(hash);



^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCHv5 1/7] ftrace: Change filter/notrace set functions to return exit code
  2011-12-22  0:12         ` Steven Rostedt
@ 2011-12-22  8:01           ` Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2011-12-22  8:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, Dec 21, 2011 at 07:12:12PM -0500, Steven Rostedt wrote:
> On Wed, 2011-12-21 at 19:56 +0100, Jiri Olsa wrote:
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index b79ab33..7eb702f 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -2912,8 +2912,11 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
> >  	mutex_lock(&ftrace_regex_lock);
> >  	if (reset)
> >  		ftrace_filter_reset(hash);
> > -	if (buf)
> > -		ftrace_match_records(hash, buf, len);
> > +	if (buf &&
> > +	    !ftrace_match_records(hash, buf, len)) {
> 
> I'm fine with this patch, but the above line break is ugly. Put that on
> one line, and it still is < 80 characters.
> 
> I could just take this patch and fix the above. No need to make a new
> patch set for this one fix.

fixed version attached, thanks
jirka

---
Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.

The set_ftrace_filter interface returns error in case the filter
did not match:

  # echo krava > set_ftrace_filter
  bash: echo: write error: Invalid argument

Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |    4 ++--
 kernel/trace/ftrace.c  |   15 +++++++++------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 26eafce..523640f 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -180,9 +180,9 @@ struct dyn_ftrace {
 };
 
 int ftrace_force_update(void);
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset);
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b79ab33..46e1031 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2912,8 +2912,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	mutex_lock(&ftrace_regex_lock);
 	if (reset)
 		ftrace_filter_reset(hash);
-	if (buf)
-		ftrace_match_records(hash, buf, len);
+	if (buf && !ftrace_match_records(hash, buf, len)) {
+		ret = -EINVAL;
+		goto out_regex_unlock;
+	}
 
 	mutex_lock(&ftrace_lock);
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
@@ -2923,6 +2925,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 
 	mutex_unlock(&ftrace_lock);
 
+ out_regex_unlock:
 	mutex_unlock(&ftrace_regex_lock);
 
 	free_ftrace_hash(hash);
@@ -2939,10 +2942,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
  * Filters denote which functions should be enabled when tracing is enabled.
  * If @buf is NULL and reset is set, all functions will be enabled for tracing.
  */
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 1);
+	return ftrace_set_regex(ops, buf, len, reset, 1);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_filter);
 
@@ -2957,10 +2960,10 @@ EXPORT_SYMBOL_GPL(ftrace_set_filter);
  * is enabled. If @buf is NULL and reset is set, all functions will be enabled
  * for tracing.
  */
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 0);
+	return ftrace_set_regex(ops, buf, len, reset, 0);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_notrace);
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/7] ftrace, perf: Add filter support for function trace event
  2011-12-21 22:07         ` Frederic Weisbecker
@ 2011-12-22 12:55           ` Jiri Olsa
  2011-12-22 15:26             ` [PATCHvFIXED " Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-22 12:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, Dec 21, 2011 at 11:07:58PM +0100, Frederic Weisbecker wrote:
> On Wed, Dec 21, 2011 at 07:56:31PM +0100, Jiri Olsa wrote:
> > Adding support to filter function trace event via perf
> > interface. It is now possible to use filter interface
> > in the perf tool like:
> > 
> >   perf record -e ftrace:function --filter="(ip == mm_*)" ls
> > 
> > The filter syntax is restricted to the the 'ip' field only,
> > and following operators are accepted '==' '!=' '||', ending
> > up with the filter strings like:
> > 
> >   "ip == f1 f2 ..." || "ip != f3 f4 ..." ...
> 
> Having the functions seperated like this sort of violates the
> grammar of the filtering interface.
> 
> The typical way to do this would have been to stringify the
> functions: ip == "f1 f2"
> 
> I feel a bit uncomfortable with "ip == f1 f2" scheme but perhaps
> we can live with that. Especially as otherwise that would
> require us to type "ip == \"f1 f2\"" for the whole filtering expression.

ugh, just realized there's a problem with this in the patch actually,
and it's not working as expected. I'll send out new version soon.. 

thanks,
jirka

> 
> Thoughts?

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHvFIXED 7/7] ftrace, perf: Add filter support for function trace event
  2011-12-22 12:55           ` Jiri Olsa
@ 2011-12-22 15:26             ` Jiri Olsa
  2011-12-24  2:35               ` Frederic Weisbecker
  0 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2011-12-22 15:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Thu, Dec 22, 2011 at 01:55:58PM +0100, Jiri Olsa wrote:
> On Wed, Dec 21, 2011 at 11:07:58PM +0100, Frederic Weisbecker wrote:
> > On Wed, Dec 21, 2011 at 07:56:31PM +0100, Jiri Olsa wrote:
> > > Adding support to filter function trace event via perf
> > > interface. It is now possible to use filter interface
> > > in the perf tool like:
> > > 
> > >   perf record -e ftrace:function --filter="(ip == mm_*)" ls
> > > 
> > > The filter syntax is restricted to the the 'ip' field only,
> > > and following operators are accepted '==' '!=' '||', ending
> > > up with the filter strings like:
> > > 
> > >   "ip == f1 f2 ..." || "ip != f3 f4 ..." ...
> > 
> > Having the functions seperated like this sort of violates the
> > grammar of the filtering interface.
> > 
> > The typical way to do this would have been to stringify the
> > functions: ip == "f1 f2"
> > 
> > I feel a bit uncomfortable with "ip == f1 f2" scheme but perhaps
> > we can live with that. Especially as otherwise that would
> > require us to type "ip == \"f1 f2\"" for the whole filtering expression.
> 
> ugh, just realized there's a problem with this in the patch actually,
> and it's not working as expected. I'll send out new version soon.. 
> 
> thanks,
> jirka
> 
> > 
> > Thoughts?

how about this one.. ;)

for some reason I presumed ftrace_set_filter would deal with ' '
as a filter separator.. but it needs to be done before we use
this function... (ftrace_set_notrace respectively)

so now you could use one of following:

perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls

jirka

---
Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

  ip == f1[, ]f2 ... || ip != f3[, ]f4 ...

with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"'.

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h             |    1 +
 kernel/trace/ftrace.c              |    6 ++
 kernel/trace/trace.h               |    2 -
 kernel/trace/trace_event_perf.c    |    4 +-
 kernel/trace/trace_events_filter.c |  162 ++++++++++++++++++++++++++++++++++--
 5 files changed, 164 insertions(+), 11 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 0d43a2b..40bf05f 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -228,6 +228,7 @@ int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
+void ftrace_free_filter(struct ftrace_ops *ops);
 
 int register_ftrace_command(struct ftrace_func_command *cmd);
 int unregister_ftrace_command(struct ftrace_func_command *cmd);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7af5fb3..693df34 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1193,6 +1193,12 @@ static void free_ftrace_hash_rcu(struct ftrace_hash *hash)
 	call_rcu_sched(&hash->rcu, __free_ftrace_hash_rcu);
 }
 
+void ftrace_free_filter(struct ftrace_ops *ops)
+{
+	free_ftrace_hash(ops->filter_hash);
+	free_ftrace_hash(ops->notrace_hash);
+}
+
 static struct ftrace_hash *alloc_ftrace_hash(int size_bits)
 {
 	struct ftrace_hash *hash;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e88e58a..4ec6d18 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -770,9 +770,7 @@ struct filter_pred {
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	struct ftrace_event_field *field;
-#endif
 	int 			offset;
 	int 			not;
 	int 			op;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 57eb232..220b50a 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -298,7 +298,9 @@ static int perf_ftrace_function_register(struct perf_event *event)
 static int perf_ftrace_function_unregister(struct perf_event *event)
 {
 	struct ftrace_ops *ops = &event->ftrace_ops;
-	return unregister_ftrace_function(ops);
+	int ret = unregister_ftrace_function(ops);
+	ftrace_free_filter(ops);
+	return ret;
 }
 
 static void perf_ftrace_function_enable(struct perf_event *event)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 66b74ab..c9a7bce 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -54,6 +54,13 @@ struct filter_op {
 	int precedence;
 };
 
+static struct filter_op filter_ftrace_ops[] = {
+	{ OP_OR,	"||",		1 },
+	{ OP_NE,	"!=",		2 },
+	{ OP_EQ,	"==",		2 },
+	{ OP_NONE,	"OP_NONE",	0 },
+};
+
 static struct filter_op filter_ops[] = {
 	{ OP_OR,	"||",		1 },
 	{ OP_AND,	"&&",		2 },
@@ -81,6 +88,7 @@ enum {
 	FILT_ERR_TOO_MANY_PREDS,
 	FILT_ERR_MISSING_FIELD,
 	FILT_ERR_INVALID_FILTER,
+	FILT_ERR_IP_FIELD_ONLY,
 };
 
 static char *err_text[] = {
@@ -96,6 +104,7 @@ static char *err_text[] = {
 	"Too many terms in predicate expression",
 	"Missing field name and/or value",
 	"Meaningless filter expression",
+	"Only 'ip' field is supported for function trace",
 };
 
 struct opstack_op {
@@ -992,7 +1001,12 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else if (!is_function_field(field)) {
+	} else if (is_function_field(field)) {
+		if (strcmp(field->name, "ip")) {
+			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
+			return -EINVAL;
+		}
+	} else {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
@@ -1339,10 +1353,8 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
 
 	strcpy(pred.regex.pattern, operand2);
 	pred.regex.len = strlen(pred.regex.pattern);
-
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	pred.field = field;
-#endif
+
 	return init_pred(ps, field, &pred) ? NULL : &pred;
 }
 
@@ -1894,6 +1906,132 @@ void ftrace_profile_free_filter(struct perf_event *event)
 	__free_filter(filter);
 }
 
+struct function_filter_data {
+	struct ftrace_ops *ops;
+	int first_filter;
+	int first_notrace;
+};
+
+static char **
+ftrace_function_filter_re(char *buf, int len, int *count)
+{
+	char *str, *sep, **re;
+
+	str = kstrndup(buf, len, GFP_KERNEL);
+	if (!str)
+		return NULL;
+
+	/*
+	 * The argv_split function takes white space
+	 * as a separator, so convert ',' into spaces.
+	 */
+	while((sep = strchr(str, ','))) {
+		*sep = ' ';
+	}
+
+	re = argv_split(GFP_KERNEL, str, count);
+	kfree(str);
+	return re;
+}
+
+static int ftrace_function_set_regexp(struct ftrace_ops *ops, int filter,
+				      int reset, char *re, int len)
+{
+	int ret;
+
+	if (filter)
+		ret = ftrace_set_filter(ops, re, len, reset);
+	else
+		ret = ftrace_set_notrace(ops, re, len, reset);
+
+	return ret;
+}
+
+static int __ftrace_function_set_filter(int filter, char *buf, int len,
+					struct function_filter_data *data)
+{
+	int i, re_cnt, ret;
+	int *reset;
+	char **re;
+
+	reset = filter ? &data->first_filter : &data->first_notrace;
+
+	/*
+	 * The 'ip' field could have multiple filters set, separated
+	 * either by space or comma. We first cut the filter and apply
+	 * all pieces separatelly.
+	 */
+	re = ftrace_function_filter_re(buf, len, &re_cnt);
+	if (!re)
+		return -EINVAL;
+
+	for (i = 0; i < re_cnt; i++) {
+		ret = ftrace_function_set_regexp(data->ops, filter, *reset,
+						 re[i], strlen(re[i]));
+		if (ret)
+			break;
+
+		if (*reset)
+			*reset = 0;
+	}
+
+	argv_free(re);
+	return ret;
+}
+
+static int ftrace_function_check_pred(struct filter_pred *pred)
+{
+	struct ftrace_event_field *field = pred->field;
+
+	/*
+	 * Check the predicate for function trace, verify:
+	 *  - only '==' and '!=' is used
+	 *  - the 'ip' field is used
+	 */
+	if (WARN((pred->op != OP_EQ) && (pred->op != OP_NE),
+		 "wrong operator for function filter: %d\n", pred->op))
+		return -EINVAL;
+
+	if (strcmp(field->name, "ip"))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int ftrace_function_set_filter_cb(enum move_type move,
+					 struct filter_pred *pred,
+					 int *err, void *data)
+{
+	if ((move != MOVE_DOWN) ||
+	    (pred->left != FILTER_PRED_INVALID))
+		return WALK_PRED_DEFAULT;
+
+	/* Double checking the predicate is valid for function trace. */
+	*err = ftrace_function_check_pred(pred);
+	if (*err)
+		return WALK_PRED_ABORT;
+
+	*err = __ftrace_function_set_filter(pred->op == OP_EQ,
+					    pred->regex.pattern,
+					    pred->regex.len,
+					    data);
+
+	return (*err) ? WALK_PRED_ABORT : WALK_PRED_DEFAULT;
+}
+
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	struct function_filter_data data = {
+		.first_filter  = 1,
+		.first_notrace = 1,
+		.ops           = &event->ftrace_ops,
+	};
+
+	return walk_pred_tree(filter->preds, filter->root,
+			      ftrace_function_set_filter_cb, &data);
+}
+
 int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 			      char *filter_str)
 {
@@ -1901,6 +2039,7 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	struct event_filter *filter;
 	struct filter_parse_state *ps;
 	struct ftrace_event_call *call;
+	struct filter_op *fops = filter_ops;
 
 	mutex_lock(&event_mutex);
 
@@ -1925,14 +2064,21 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	if (!ps)
 		goto free_filter;
 
-	parse_init(ps, filter_ops, filter_str);
+	if (ftrace_event_is_function(call))
+		fops = filter_ftrace_ops;
+
+	parse_init(ps, fops, filter_str);
 	err = filter_parse(ps);
 	if (err)
 		goto free_ps;
 
 	err = replace_preds(call, filter, ps, filter_str, false);
-	if (!err)
-		event->filter = filter;
+	if (!err) {
+		if (ftrace_event_is_function(call))
+			err = ftrace_function_set_filter(event, filter);
+		else
+			event->filter = filter;
+	}
 
 free_ps:
 	filter_opstack_clear(ps);
@@ -1940,7 +2086,7 @@ free_ps:
 	kfree(ps);
 
 free_filter:
-	if (err)
+	if (err || ftrace_event_is_function(call))
 		__free_filter(filter);
 
 out_unlock:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCHvFIXED 7/7] ftrace, perf: Add filter support for function trace event
  2011-12-22 15:26             ` [PATCHvFIXED " Jiri Olsa
@ 2011-12-24  2:35               ` Frederic Weisbecker
  0 siblings, 0 replies; 186+ messages in thread
From: Frederic Weisbecker @ 2011-12-24  2:35 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Thu, Dec 22, 2011 at 04:26:09PM +0100, Jiri Olsa wrote:
> On Thu, Dec 22, 2011 at 01:55:58PM +0100, Jiri Olsa wrote:
> > On Wed, Dec 21, 2011 at 11:07:58PM +0100, Frederic Weisbecker wrote:
> > > On Wed, Dec 21, 2011 at 07:56:31PM +0100, Jiri Olsa wrote:
> > > > Adding support to filter function trace event via perf
> > > > interface. It is now possible to use filter interface
> > > > in the perf tool like:
> > > > 
> > > >   perf record -e ftrace:function --filter="(ip == mm_*)" ls
> > > > 
> > > > The filter syntax is restricted to the the 'ip' field only,
> > > > and following operators are accepted '==' '!=' '||', ending
> > > > up with the filter strings like:
> > > > 
> > > >   "ip == f1 f2 ..." || "ip != f3 f4 ..." ...
> > > 
> > > Having the functions seperated like this sort of violates the
> > > grammar of the filtering interface.
> > > 
> > > The typical way to do this would have been to stringify the
> > > functions: ip == "f1 f2"
> > > 
> > > I feel a bit uncomfortable with "ip == f1 f2" scheme but perhaps
> > > we can live with that. Especially as otherwise that would
> > > require us to type "ip == \"f1 f2\"" for the whole filtering expression.
> > 
> > ugh, just realized there's a problem with this in the patch actually,
> > and it's not working as expected. I'll send out new version soon.. 
> > 
> > thanks,
> > jirka
> > 
> > > 
> > > Thoughts?
> 
> how about this one.. ;)
> 
> for some reason I presumed ftrace_set_filter would deal with ' '
> as a filter separator.. but it needs to be done before we use
> this function... (ftrace_set_notrace respectively)
> 
> so now you could use one of following:
> 
> perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
> perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
> perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls

Nice! ;)

Thanks!

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCHv5 0/7] ftrace, perf: Adding support to use function trace
  2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                         ` (7 preceding siblings ...)
  2011-12-21 19:02       ` [PATCHv4 0/7] ftrace, perf: Adding support to use function trace Jiri Olsa
@ 2012-01-02  9:04       ` Jiri Olsa
  2012-01-02  9:04         ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
                           ` (8 more replies)
  8 siblings, 9 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-02  9:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

hi,
here's another version of perf support for function trace
with filter. 

attached patches:
  1/7 ftrace: Change filter/notrace set functions to return exit code
  2/7 ftrace: Add enable/disable ftrace_ops control interface
  3/7 ftrace, perf: Add open/close tracepoint perf registration actions
  4/7 ftrace, perf: Add add/del tracepoint perf registration actions
  5/7 ftrace, perf: Add support to use function tracepoint in perf
  6/7 ftrace, perf: Distinguish ftrace function event field type
  7/7 ftrace, perf: Add filter support for function trace event

v5 changes:
  7/7 - fixed to properly support ',' in filter expressions

v4 changes:
  2/7 - FL_GLOBAL_CONTROL changed to FL_GLOBAL_CONTROL_MASK
      - changed WARN_ON_ONCE() to include the !preempt_count()
      - changed this_cpu_ptr to per_cpu_ptr

  ommited Fix possible NULL dereferencing in __ftrace_hash_rec_update
  (2/8 in v3)

v3 changes:
  3/8 - renamed __add/remove_ftrace_ops
      - fixed preemtp_enable/recursion_clear order in ftrace_ops_control_func 
      - renamed/commented API functions -  enable/disable_ftrace_function
  
  ommited graph tracer workarounf patch 10/10  

v2 changes:
 01/10 - keeping the old fix instead of adding hash_has_contents func
         I'll send separating patchset for this
 02/10 - using different way to avoid the issue (3/9 in v1)
 03/10 - using the way proposed by Steven for controling ftrace_ops
         (4/9 in v1)
 06/10 - added check ensuring the ftrace:function event could be used by
         root only (7/9 in v1)
 08/10 - added more description (8/9 in v1)
 09/10 - changed '&&' operator to '||' which seems more suitable
         in this case (9/9 in v1)

thanks,
jirka
---
 include/linux/ftrace.h             |   47 ++++++++-
 include/linux/ftrace_event.h       |    9 +-
 include/linux/perf_event.h         |    3 +
 kernel/trace/ftrace.c              |  140 +++++++++++++++++++++--
 kernel/trace/trace.h               |   11 ++-
 kernel/trace/trace_event_perf.c    |  214 +++++++++++++++++++++++++++++-------
 kernel/trace/trace_events.c        |   12 ++-
 kernel/trace/trace_events_filter.c |  164 ++++++++++++++++++++++++++-
 kernel/trace/trace_export.c        |   53 ++++++++-
 kernel/trace/trace_kprobe.c        |    8 +-
 kernel/trace/trace_syscalls.c      |   18 +++-
 11 files changed, 598 insertions(+), 81 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code
  2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
@ 2012-01-02  9:04         ` Jiri Olsa
  2012-02-17 13:46           ` [tip:perf/core] ftrace: Change filter/ notrace " tip-bot for Jiri Olsa
  2012-01-02  9:04         ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
                           ` (7 subsequent siblings)
  8 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-02  9:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.

The set_ftrace_filter interface returns error in case the filter
did not match:

  # echo krava > set_ftrace_filter
  bash: echo: write error: Invalid argument

Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |    4 ++--
 kernel/trace/ftrace.c  |   15 +++++++++------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 26eafce..523640f 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -180,9 +180,9 @@ struct dyn_ftrace {
 };
 
 int ftrace_force_update(void);
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset);
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b79ab33..46e1031 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2912,8 +2912,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	mutex_lock(&ftrace_regex_lock);
 	if (reset)
 		ftrace_filter_reset(hash);
-	if (buf)
-		ftrace_match_records(hash, buf, len);
+	if (buf && !ftrace_match_records(hash, buf, len)) {
+		ret = -EINVAL;
+		goto out_regex_unlock;
+	}
 
 	mutex_lock(&ftrace_lock);
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
@@ -2923,6 +2925,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 
 	mutex_unlock(&ftrace_lock);
 
+ out_regex_unlock:
 	mutex_unlock(&ftrace_regex_lock);
 
 	free_ftrace_hash(hash);
@@ -2939,10 +2942,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
  * Filters denote which functions should be enabled when tracing is enabled.
  * If @buf is NULL and reset is set, all functions will be enabled for tracing.
  */
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 1);
+	return ftrace_set_regex(ops, buf, len, reset, 1);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_filter);
 
@@ -2957,10 +2960,10 @@ EXPORT_SYMBOL_GPL(ftrace_set_filter);
  * is enabled. If @buf is NULL and reset is set, all functions will be enabled
  * for tracing.
  */
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 0);
+	return ftrace_set_regex(ops, buf, len, reset, 0);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_notrace);
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
  2012-01-02  9:04         ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2012-01-02  9:04         ` Jiri Olsa
  2012-01-17  1:42           ` Frederic Weisbecker
  2012-01-02  9:04         ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
                           ` (6 subsequent siblings)
  8 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-02  9:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.

Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.

When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.

The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.

Adding 2 inline functions ftrace_function_enable/ftrace_function_disable,
which enable/disable the ftrace_ops for given cpu.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |   42 +++++++++++++++++
 kernel/trace/ftrace.c  |  119 +++++++++++++++++++++++++++++++++++++++++++++---
 kernel/trace/trace.h   |    2 +
 3 files changed, 156 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 523640f..0d43a2b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -35,12 +35,14 @@ enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	void __percpu			*disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +99,46 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+/**
+ * ftrace_function_enable - enable controlled ftrace_ops on given cpu
+ *
+ * This function enables tracing on given cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
+{
+	atomic_t *disabled;
+
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
+			 !preempt_count()))
+		return;
+
+	disabled = per_cpu_ptr(ops->disabled, cpu);
+	atomic_dec(disabled);
+}
+
+/**
+ * ftrace_function_disable - enable controlled ftrace_ops on given cpu
+ *
+ * This function enables tracing on given cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_disable(struct ftrace_ops *ops, int cpu)
+{
+	atomic_t *disabled;
+
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
+			 !preempt_count()))
+		return;
+
+	disabled = per_cpu_ptr(ops->disabled, cpu);
+	atomic_inc(disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 46e1031..7af5fb3 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -60,6 +60,8 @@
 #define FTRACE_HASH_DEFAULT_BITS 10
 #define FTRACE_HASH_MAX_BITS 12
 
+#define FL_GLOBAL_CONTROL_MASK (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
+
 /* ftrace_enabled is a method to turn ftrace on or off */
 int ftrace_enabled __read_mostly;
 static int last_ftrace_enabled;
@@ -87,12 +89,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
 };
 
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
 ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
+static struct ftrace_ops control_ops;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
@@ -166,6 +170,38 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
 }
 #endif
 
+static void control_ops_disable_all(struct ftrace_ops *ops)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		atomic_set(per_cpu_ptr(ops->disabled, cpu), 1);
+}
+
+static int control_ops_alloc(struct ftrace_ops *ops)
+{
+	atomic_t *disabled;
+
+	disabled = alloc_percpu(atomic_t);
+	if (!disabled)
+		return -ENOMEM;
+
+	ops->disabled = disabled;
+	control_ops_disable_all(ops);
+	return 0;
+}
+
+static void control_ops_free(struct ftrace_ops *ops)
+{
+	free_percpu(ops->disabled);
+}
+
+static int control_ops_is_disabled(struct ftrace_ops *ops, int cpu)
+{
+	atomic_t *disabled = per_cpu_ptr(ops->disabled, cpu);
+	return atomic_read(disabled);
+}
+
 static void update_global_ops(void)
 {
 	ftrace_func_t func;
@@ -257,6 +293,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	return 0;
 }
 
+static void add_ftrace_list_ops(struct ftrace_ops **list,
+				struct ftrace_ops *main_ops,
+				struct ftrace_ops *ops)
+{
+	int first = *list == &ftrace_list_end;
+	add_ftrace_ops(list, ops);
+	if (first)
+		add_ftrace_ops(&ftrace_ops_list, main_ops);
+}
+
+static int remove_ftrace_list_ops(struct ftrace_ops **list,
+				  struct ftrace_ops *main_ops,
+				  struct ftrace_ops *ops)
+{
+	int ret = remove_ftrace_ops(list, ops);
+	if (!ret && *list == &ftrace_list_end)
+		ret = remove_ftrace_ops(&ftrace_ops_list, main_ops);
+	return ret;
+}
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ftrace_disabled)
@@ -268,15 +324,20 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
 		return -EBUSY;
 
+	/* We don't support both control and global flags set. */
+	if ((ops->flags & FL_GLOBAL_CONTROL_MASK) == FL_GLOBAL_CONTROL_MASK)
+		return -EINVAL;
+
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		int first = ftrace_global_list == &ftrace_list_end;
-		add_ftrace_ops(&ftrace_global_list, ops);
+		add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
-		if (first)
-			add_ftrace_ops(&ftrace_ops_list, &global_ops);
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		if (control_ops_alloc(ops))
+			return -ENOMEM;
+		add_ftrace_list_ops(&ftrace_control_list, &control_ops, ops);
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -300,11 +361,23 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 		return -EINVAL;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		ret = remove_ftrace_ops(&ftrace_global_list, ops);
-		if (!ret && ftrace_global_list == &ftrace_list_end)
-			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+		ret = remove_ftrace_list_ops(&ftrace_global_list,
+					     &global_ops, ops);
 		if (!ret)
 			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		ret = remove_ftrace_list_ops(&ftrace_control_list,
+					     &control_ops, ops);
+		if (!ret) {
+			/*
+			 * The ftrace_ops is now removed from the list,
+			 * so there'll be no new users. We must ensure
+			 * all current users are done before we free
+			 * the control data.
+			 */
+			synchronize_sched();
+			control_ops_free(ops);
+		}
 	} else
 		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -3565,6 +3638,38 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
+ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_ops *op;
+	int cpu;
+
+	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
+		return;
+
+	/*
+	 * Some of the ops may be dynamically allocated,
+	 * they must be freed after a synchronize_sched().
+	 */
+	preempt_disable_notrace();
+	trace_recursion_set(TRACE_CONTROL_BIT);
+	cpu = smp_processor_id();
+	op = rcu_dereference_raw(ftrace_control_list);
+	while (op != &ftrace_list_end) {
+		if (!control_ops_is_disabled(op, cpu) &&
+		    ftrace_ops_test(op, ip))
+			op->func(ip, parent_ip);
+
+		op = rcu_dereference_raw(op->next);
+	};
+	trace_recursion_clear(TRACE_CONTROL_BIT);
+	preempt_enable_notrace();
+}
+
+static struct ftrace_ops control_ops = {
+	.func = ftrace_ops_control_func,
+};
+
+static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 {
 	struct ftrace_ops *op;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 2c26574..41c54e3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ struct tracer {
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
 #define TRACE_GLOBAL_BIT		(1<<12)
+#define TRACE_CONTROL_BIT		(1<<13)
+
 /*
  * Abuse of the trace_recursion.
  * As we need a way to maintain state if we are tracing the function
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions
  2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
  2012-01-02  9:04         ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
  2012-01-02  9:04         ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2012-01-02  9:04         ` Jiri Olsa
  2012-01-02  9:04         ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
                           ` (5 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-02  9:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
register/unregister from open/close actions.

The register/unregister actions are invoked for the first/last
tracepoint user when opening/closing the evet.

The open/close actions are invoked for each tracepoint user when
opening/closing the event.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    6 +-
 kernel/trace/trace.h            |    5 ++
 kernel/trace/trace_event_perf.c |  116 +++++++++++++++++++++++++--------------
 kernel/trace/trace_events.c     |   10 ++-
 kernel/trace/trace_kprobe.c     |    6 ++-
 kernel/trace/trace_syscalls.c   |   14 +++-
 6 files changed, 106 insertions(+), 51 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index c3da42d..195e360 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -146,6 +146,8 @@ enum trace_reg {
 	TRACE_REG_UNREGISTER,
 	TRACE_REG_PERF_REGISTER,
 	TRACE_REG_PERF_UNREGISTER,
+	TRACE_REG_PERF_OPEN,
+	TRACE_REG_PERF_CLOSE,
 };
 
 struct ftrace_event_call;
@@ -157,7 +159,7 @@ struct ftrace_event_class {
 	void			*perf_probe;
 #endif
 	int			(*reg)(struct ftrace_event_call *event,
-				       enum trace_reg type);
+				       enum trace_reg type, void *data);
 	int			(*define_fields)(struct ftrace_event_call *);
 	struct list_head	*(*get_fields)(struct ftrace_event_call *);
 	struct list_head	fields;
@@ -165,7 +167,7 @@ struct ftrace_event_class {
 };
 
 extern int ftrace_event_reg(struct ftrace_event_call *event,
-			    enum trace_reg type);
+			    enum trace_reg type, void *data);
 
 enum {
 	TRACE_EVENT_FL_ENABLED_BIT,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 41c54e3..85732a8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -828,4 +828,9 @@ extern const char *__stop___trace_bprintk_fmt[];
 	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 #include "trace_entries.h"
 
+#ifdef CONFIG_PERF_EVENTS
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data);
+#endif
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 19a359d..0cfcc37 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -44,23 +44,17 @@ static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 	return 0;
 }
 
-static int perf_trace_event_init(struct ftrace_event_call *tp_event,
-				 struct perf_event *p_event)
+static int perf_trace_event_reg(struct ftrace_event_call *tp_event,
+				struct perf_event *p_event)
 {
 	struct hlist_head __percpu *list;
-	int ret;
+	int ret = -ENOMEM;
 	int cpu;
 
-	ret = perf_trace_event_perm(tp_event, p_event);
-	if (ret)
-		return ret;
-
 	p_event->tp_event = tp_event;
 	if (tp_event->perf_refcount++ > 0)
 		return 0;
 
-	ret = -ENOMEM;
-
 	list = alloc_percpu(struct hlist_head);
 	if (!list)
 		goto fail;
@@ -83,7 +77,7 @@ static int perf_trace_event_init(struct ftrace_event_call *tp_event,
 		}
 	}
 
-	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER);
+	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER, NULL);
 	if (ret)
 		goto fail;
 
@@ -108,6 +102,69 @@ fail:
 	return ret;
 }
 
+static void perf_trace_event_unreg(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	int i;
+
+	if (--tp_event->perf_refcount > 0)
+		goto out;
+
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER, NULL);
+
+	/*
+	 * Ensure our callback won't be called anymore. The buffers
+	 * will be freed after that.
+	 */
+	tracepoint_synchronize_unregister();
+
+	free_percpu(tp_event->perf_events);
+	tp_event->perf_events = NULL;
+
+	if (!--total_ref_count) {
+		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
+			free_percpu(perf_trace_buf[i]);
+			perf_trace_buf[i] = NULL;
+		}
+	}
+out:
+	module_put(tp_event->mod);
+}
+
+static int perf_trace_event_open(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_OPEN, p_event);
+}
+
+static void perf_trace_event_close(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
+}
+
+static int perf_trace_event_init(struct ftrace_event_call *tp_event,
+				 struct perf_event *p_event)
+{
+	int ret;
+
+	ret = perf_trace_event_perm(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_reg(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_open(p_event);
+	if (ret) {
+		perf_trace_event_unreg(p_event);
+		return ret;
+	}
+
+	return 0;
+}
+
 int perf_trace_init(struct perf_event *p_event)
 {
 	struct ftrace_event_call *tp_event;
@@ -130,6 +187,14 @@ int perf_trace_init(struct perf_event *p_event)
 	return ret;
 }
 
+void perf_trace_destroy(struct perf_event *p_event)
+{
+	mutex_lock(&event_mutex);
+	perf_trace_event_close(p_event);
+	perf_trace_event_unreg(p_event);
+	mutex_unlock(&event_mutex);
+}
+
 int perf_trace_add(struct perf_event *p_event, int flags)
 {
 	struct ftrace_event_call *tp_event = p_event->tp_event;
@@ -154,37 +219,6 @@ void perf_trace_del(struct perf_event *p_event, int flags)
 	hlist_del_rcu(&p_event->hlist_entry);
 }
 
-void perf_trace_destroy(struct perf_event *p_event)
-{
-	struct ftrace_event_call *tp_event = p_event->tp_event;
-	int i;
-
-	mutex_lock(&event_mutex);
-	if (--tp_event->perf_refcount > 0)
-		goto out;
-
-	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER);
-
-	/*
-	 * Ensure our callback won't be called anymore. The buffers
-	 * will be freed after that.
-	 */
-	tracepoint_synchronize_unregister();
-
-	free_percpu(tp_event->perf_events);
-	tp_event->perf_events = NULL;
-
-	if (!--total_ref_count) {
-		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
-			free_percpu(perf_trace_buf[i]);
-			perf_trace_buf[i] = NULL;
-		}
-	}
-out:
-	module_put(tp_event->mod);
-	mutex_unlock(&event_mutex);
-}
-
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 				       struct pt_regs *regs, int *rctxp)
 {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c212a7f..5138fea 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -147,7 +147,8 @@ int trace_event_raw_init(struct ftrace_event_call *call)
 }
 EXPORT_SYMBOL_GPL(trace_event_raw_init);
 
-int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
+int ftrace_event_reg(struct ftrace_event_call *call,
+		     enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -170,6 +171,9 @@ int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
 					    call->class->perf_probe,
 					    call);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
@@ -209,7 +213,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_stop_cmdline_record();
 				call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			call->class->reg(call, TRACE_REG_UNREGISTER);
+			call->class->reg(call, TRACE_REG_UNREGISTER, NULL);
 		}
 		break;
 	case 1:
@@ -218,7 +222,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_start_cmdline_record();
 				call->flags |= TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			ret = call->class->reg(call, TRACE_REG_REGISTER);
+			ret = call->class->reg(call, TRACE_REG_REGISTER, NULL);
 			if (ret) {
 				tracing_stop_cmdline_record();
 				pr_info("event trace: Could not enable event "
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 00d527c..5667f89 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1892,7 +1892,8 @@ static __kprobes void kretprobe_perf_func(struct kretprobe_instance *ri,
 #endif	/* CONFIG_PERF_EVENTS */
 
 static __kprobes
-int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
+int kprobe_register(struct ftrace_event_call *event,
+		    enum trace_reg type, void *data)
 {
 	struct trace_probe *tp = (struct trace_probe *)event->data;
 
@@ -1909,6 +1910,9 @@ int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
 	case TRACE_REG_PERF_UNREGISTER:
 		disable_trace_probe(tp, TP_FLAG_PROFILE);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 5f35f6f..8599c1d 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -18,9 +18,9 @@ static DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
 static DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 
 static int syscall_enter_define_fields(struct ftrace_event_call *call);
 static int syscall_exit_define_fields(struct ftrace_event_call *call);
@@ -650,7 +650,7 @@ void perf_sysexit_disable(struct ftrace_event_call *call)
 #endif /* CONFIG_PERF_EVENTS */
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -665,13 +665,16 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysenter_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
 }
 
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -686,6 +689,9 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysexit_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 4/7] ftrace, perf: Add add/del tracepoint perf registration actions
  2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
                           ` (2 preceding siblings ...)
  2012-01-02  9:04         ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
@ 2012-01-02  9:04         ` Jiri Olsa
  2012-01-02  9:04         ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
                           ` (4 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-02  9:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
perf event schedule in/out actions.

The add action is invoked for when the perf event is scheduled in,
while the del action is invoked when the event is scheduled out.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    2 ++
 kernel/trace/trace_event_perf.c |    4 +++-
 kernel/trace/trace_events.c     |    2 ++
 kernel/trace/trace_kprobe.c     |    2 ++
 kernel/trace/trace_syscalls.c   |    4 ++++
 5 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 195e360..2bf677c 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -148,6 +148,8 @@ enum trace_reg {
 	TRACE_REG_PERF_UNREGISTER,
 	TRACE_REG_PERF_OPEN,
 	TRACE_REG_PERF_CLOSE,
+	TRACE_REG_PERF_ADD,
+	TRACE_REG_PERF_DEL,
 };
 
 struct ftrace_event_call;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 0cfcc37..d72af0b 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -211,12 +211,14 @@ int perf_trace_add(struct perf_event *p_event, int flags)
 	list = this_cpu_ptr(pcpu_list);
 	hlist_add_head_rcu(&p_event->hlist_entry, list);
 
-	return 0;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_ADD, p_event);
 }
 
 void perf_trace_del(struct perf_event *p_event, int flags)
 {
+	struct ftrace_event_call *tp_event = p_event->tp_event;
 	hlist_del_rcu(&p_event->hlist_entry);
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_DEL, p_event);
 }
 
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 5138fea..079a93a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -173,6 +173,8 @@ int ftrace_event_reg(struct ftrace_event_call *call,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5667f89..580a05e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1912,6 +1912,8 @@ int kprobe_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 8599c1d..5e4f62e 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -667,6 +667,8 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
@@ -691,6 +693,8 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf
  2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
                           ` (3 preceding siblings ...)
  2012-01-02  9:04         ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
@ 2012-01-02  9:04         ` Jiri Olsa
  2012-01-02  9:04         ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
                           ` (3 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-02  9:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding perf registration support for the ftrace function event,
so it is now possible to register it via perf interface.

The perf_event struct statically contains ftrace_ops as a handle
for function tracer. The function tracer is registered/unregistered
in open/close actions, and enabled/disabled in add/del actions.

It is now possible to use function trace within perf commands
like:

  perf record -e ftrace:function ls
  perf stat -e ftrace:function ls

Allowed only for root.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h      |    3 +
 kernel/trace/trace.h            |    2 +
 kernel/trace/trace_event_perf.c |   92 +++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_export.c     |   28 ++++++++++++
 4 files changed, 125 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0b91db2..5003be6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -858,6 +858,9 @@ struct perf_event {
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
 	struct event_filter		*filter;
+#ifdef CONFIG_FUNCTION_TRACER
+	struct ftrace_ops               ftrace_ops;
+#endif
 #endif
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 85732a8..e88e58a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -591,6 +591,8 @@ static inline int ftrace_trace_task(struct task_struct *task)
 static inline int ftrace_is_dead(void) { return 0; }
 #endif
 
+int ftrace_event_is_function(struct ftrace_event_call *call);
+
 /*
  * struct trace_parser - servers for reading the user input separated by spaces
  * @cont: set if the input is not complete - no final space char was found
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index d72af0b..57eb232 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -24,6 +24,11 @@ static int	total_ref_count;
 static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 				 struct perf_event *p_event)
 {
+	/* The ftrace function trace is allowed only for root. */
+	if (ftrace_event_is_function(tp_event) &&
+	    perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	/* No tracing, just counting, so no obvious leak */
 	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
 		return 0;
@@ -250,3 +255,90 @@ __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 	return raw_data;
 }
 EXPORT_SYMBOL_GPL(perf_trace_buf_prepare);
+
+
+static void
+perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_entry *entry;
+	struct hlist_head *head;
+	struct pt_regs regs;
+	int rctx;
+
+#define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
+		    sizeof(u64)) - sizeof(u32))
+
+	BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE);
+
+	perf_fetch_caller_regs(&regs);
+
+	entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);
+	if (!entry)
+		return;
+
+	entry->ip = ip;
+	entry->parent_ip = parent_ip;
+
+	head = this_cpu_ptr(event_function.perf_events);
+	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
+			      1, &regs, head);
+
+#undef ENTRY_SIZE
+}
+
+static int perf_ftrace_function_register(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+
+	ops->flags |= FTRACE_OPS_FL_CONTROL;
+	ops->func = perf_ftrace_function_call;
+	return register_ftrace_function(ops);
+}
+
+static int perf_ftrace_function_unregister(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	return unregister_ftrace_function(ops);
+}
+
+static void perf_ftrace_function_enable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	ftrace_function_enable(ops, smp_processor_id());
+}
+
+static void perf_ftrace_function_disable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	ftrace_function_disable(ops, smp_processor_id());
+}
+
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data)
+{
+	int etype = call->event.type;
+
+	if (etype != TRACE_FN)
+		return -EINVAL;
+
+	switch (type) {
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+		return perf_ftrace_function_register(data);
+	case TRACE_REG_PERF_CLOSE:
+		return perf_ftrace_function_unregister(data);
+	case TRACE_REG_PERF_ADD:
+		perf_ftrace_function_enable(data);
+		return 0;
+	case TRACE_REG_PERF_DEL:
+		perf_ftrace_function_disable(data);
+		return 0;
+	}
+
+	return -EINVAL;
+}
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index bbeec31..867653c 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 
 #include "trace_entries.h"
 
+static int ftrace_event_class_register(struct ftrace_event_call *call,
+				       enum trace_reg type, void *data)
+{
+	switch (type) {
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
+#ifdef CONFIG_PERF_EVENTS
+		return perf_ftrace_event_register(call, type, data);
+#endif
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	}
+
+	return -EINVAL;
+}
+
 #undef __entry
 #define __entry REC
 
@@ -159,6 +181,7 @@ struct ftrace_event_class event_class_ftrace_##call = {			\
 	.system			= __stringify(TRACE_SYSTEM),		\
 	.define_fields		= ftrace_define_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
+	.reg			= ftrace_event_class_register,		\
 };									\
 									\
 struct ftrace_event_call __used event_##call = {			\
@@ -170,4 +193,9 @@ struct ftrace_event_call __used event_##call = {			\
 struct ftrace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
+int ftrace_event_is_function(struct ftrace_event_call *call)
+{
+	return call == &event_function;
+}
+
 #include "trace_entries.h"
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type
  2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
                           ` (4 preceding siblings ...)
  2012-01-02  9:04         ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
@ 2012-01-02  9:04         ` Jiri Olsa
  2012-01-02  9:04         ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
                           ` (2 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-02  9:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding FILTER_TRACE_FN event field type for function tracepoint
event, so it can be properly recognized within filtering code.

Currently all fields of ftrace subsystem events share the common
field type FILTER_OTHER. Since the function trace fields need special
care within the filtering code we need to recognize it properly,
hence adding the FILTER_TRACE_FN event type.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h       |    1 +
 kernel/trace/trace_events_filter.c |    7 ++++++-
 kernel/trace/trace_export.c        |   25 ++++++++++++++++++++-----
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 2bf677c..dd478fc 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -245,6 +245,7 @@ enum {
 	FILTER_STATIC_STRING,
 	FILTER_DYN_STRING,
 	FILTER_PTR_STRING,
+	FILTER_TRACE_FN,
 };
 
 #define EVENT_STORAGE_SIZE 128
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index f04cc31..66b74ab 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -900,6 +900,11 @@ int filter_assign_type(const char *type)
 	return FILTER_OTHER;
 }
 
+static bool is_function_field(struct ftrace_event_field *field)
+{
+	return field->filter_type == FILTER_TRACE_FN;
+}
+
 static bool is_string_field(struct ftrace_event_field *field)
 {
 	return field->filter_type == FILTER_DYN_STRING ||
@@ -987,7 +992,7 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else {
+	} else if (!is_function_field(field)) {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 867653c..46c35e2 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -67,7 +67,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -77,7 +77,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -91,7 +91,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 		ret = trace_define_field(event_call, event_storage, #item, \
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 		mutex_unlock(&event_storage_mutex);			\
 		if (ret)						\
 			return ret;					\
@@ -104,7 +104,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -112,10 +112,24 @@ static void __always_unused ____ftrace_check_##name(void)	\
 #define __dynamic_array(type, item)					\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
-				 0, is_signed_type(type), FILTER_OTHER);\
+				 0, is_signed_type(type), filter_type);\
 	if (ret)							\
 		return ret;
 
+#define FILTER_TYPE_TRACE_FN           FILTER_TRACE_FN
+#define FILTER_TYPE_TRACE_GRAPH_ENT    FILTER_OTHER
+#define FILTER_TYPE_TRACE_GRAPH_RET    FILTER_OTHER
+#define FILTER_TYPE_TRACE_CTX          FILTER_OTHER
+#define FILTER_TYPE_TRACE_WAKE         FILTER_OTHER
+#define FILTER_TYPE_TRACE_STACK                FILTER_OTHER
+#define FILTER_TYPE_TRACE_USER_STACK   FILTER_OTHER
+#define FILTER_TYPE_TRACE_BPRINT       FILTER_OTHER
+#define FILTER_TYPE_TRACE_PRINT                FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_RW      FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_MAP     FILTER_OTHER
+#define FILTER_TYPE_TRACE_BRANCH       FILTER_OTHER
+#define FILTER_TYPE(arg)               FILTER_TYPE_##arg
+
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 int									\
@@ -123,6 +137,7 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 {									\
 	struct struct_name field;					\
 	int ret;							\
+	int filter_type = FILTER_TYPE(id);				\
 									\
 	tstruct;							\
 									\
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 7/7] ftrace, perf: Add filter support for function trace event
  2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
                           ` (5 preceding siblings ...)
  2012-01-02  9:04         ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
@ 2012-01-02  9:04         ` Jiri Olsa
  2012-01-16 23:59           ` Steven Rostedt
  2012-01-16  8:57         ` [PATCHv5 0/7] ftrace, perf: Adding support to use function trace Jiri Olsa
  2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
  8 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-02  9:04 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

  ip == f1[, ]f2 ... || ip != f3[, ]f4 ...

with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"'.

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h             |    1 +
 kernel/trace/ftrace.c              |    6 ++
 kernel/trace/trace.h               |    2 -
 kernel/trace/trace_event_perf.c    |    4 +-
 kernel/trace/trace_events_filter.c |  161 ++++++++++++++++++++++++++++++++++--
 5 files changed, 163 insertions(+), 11 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 0d43a2b..40bf05f 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -228,6 +228,7 @@ int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
+void ftrace_free_filter(struct ftrace_ops *ops);
 
 int register_ftrace_command(struct ftrace_func_command *cmd);
 int unregister_ftrace_command(struct ftrace_func_command *cmd);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7af5fb3..693df34 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1193,6 +1193,12 @@ static void free_ftrace_hash_rcu(struct ftrace_hash *hash)
 	call_rcu_sched(&hash->rcu, __free_ftrace_hash_rcu);
 }
 
+void ftrace_free_filter(struct ftrace_ops *ops)
+{
+	free_ftrace_hash(ops->filter_hash);
+	free_ftrace_hash(ops->notrace_hash);
+}
+
 static struct ftrace_hash *alloc_ftrace_hash(int size_bits)
 {
 	struct ftrace_hash *hash;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e88e58a..4ec6d18 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -770,9 +770,7 @@ struct filter_pred {
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	struct ftrace_event_field *field;
-#endif
 	int 			offset;
 	int 			not;
 	int 			op;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 57eb232..220b50a 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -298,7 +298,9 @@ static int perf_ftrace_function_register(struct perf_event *event)
 static int perf_ftrace_function_unregister(struct perf_event *event)
 {
 	struct ftrace_ops *ops = &event->ftrace_ops;
-	return unregister_ftrace_function(ops);
+	int ret = unregister_ftrace_function(ops);
+	ftrace_free_filter(ops);
+	return ret;
 }
 
 static void perf_ftrace_function_enable(struct perf_event *event)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 66b74ab..23170cc 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -54,6 +54,13 @@ struct filter_op {
 	int precedence;
 };
 
+static struct filter_op filter_ftrace_ops[] = {
+	{ OP_OR,	"||",		1 },
+	{ OP_NE,	"!=",		2 },
+	{ OP_EQ,	"==",		2 },
+	{ OP_NONE,	"OP_NONE",	0 },
+};
+
 static struct filter_op filter_ops[] = {
 	{ OP_OR,	"||",		1 },
 	{ OP_AND,	"&&",		2 },
@@ -81,6 +88,7 @@ enum {
 	FILT_ERR_TOO_MANY_PREDS,
 	FILT_ERR_MISSING_FIELD,
 	FILT_ERR_INVALID_FILTER,
+	FILT_ERR_IP_FIELD_ONLY,
 };
 
 static char *err_text[] = {
@@ -96,6 +104,7 @@ static char *err_text[] = {
 	"Too many terms in predicate expression",
 	"Missing field name and/or value",
 	"Meaningless filter expression",
+	"Only 'ip' field is supported for function trace",
 };
 
 struct opstack_op {
@@ -992,7 +1001,12 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else if (!is_function_field(field)) {
+	} else if (is_function_field(field)) {
+		if (strcmp(field->name, "ip")) {
+			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
+			return -EINVAL;
+		}
+	} else {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
@@ -1339,10 +1353,8 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
 
 	strcpy(pred.regex.pattern, operand2);
 	pred.regex.len = strlen(pred.regex.pattern);
-
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	pred.field = field;
-#endif
+
 	return init_pred(ps, field, &pred) ? NULL : &pred;
 }
 
@@ -1894,6 +1906,131 @@ void ftrace_profile_free_filter(struct perf_event *event)
 	__free_filter(filter);
 }
 
+struct function_filter_data {
+	struct ftrace_ops *ops;
+	int first_filter;
+	int first_notrace;
+};
+
+static char **
+ftrace_function_filter_re(char *buf, int len, int *count)
+{
+	char *str, *sep, **re;
+
+	str = kstrndup(buf, len, GFP_KERNEL);
+	if (!str)
+		return NULL;
+
+	/*
+	 * The argv_split function takes white space
+	 * as a separator, so convert ',' into spaces.
+	 */
+	while ((sep = strchr(str, ',')))
+		*sep = ' ';
+
+	re = argv_split(GFP_KERNEL, str, count);
+	kfree(str);
+	return re;
+}
+
+static int ftrace_function_set_regexp(struct ftrace_ops *ops, int filter,
+				      int reset, char *re, int len)
+{
+	int ret;
+
+	if (filter)
+		ret = ftrace_set_filter(ops, re, len, reset);
+	else
+		ret = ftrace_set_notrace(ops, re, len, reset);
+
+	return ret;
+}
+
+static int __ftrace_function_set_filter(int filter, char *buf, int len,
+					struct function_filter_data *data)
+{
+	int i, re_cnt, ret;
+	int *reset;
+	char **re;
+
+	reset = filter ? &data->first_filter : &data->first_notrace;
+
+	/*
+	 * The 'ip' field could have multiple filters set, separated
+	 * either by space or comma. We first cut the filter and apply
+	 * all pieces separatelly.
+	 */
+	re = ftrace_function_filter_re(buf, len, &re_cnt);
+	if (!re)
+		return -EINVAL;
+
+	for (i = 0; i < re_cnt; i++) {
+		ret = ftrace_function_set_regexp(data->ops, filter, *reset,
+						 re[i], strlen(re[i]));
+		if (ret)
+			break;
+
+		if (*reset)
+			*reset = 0;
+	}
+
+	argv_free(re);
+	return ret;
+}
+
+static int ftrace_function_check_pred(struct filter_pred *pred)
+{
+	struct ftrace_event_field *field = pred->field;
+
+	/*
+	 * Check the predicate for function trace, verify:
+	 *  - only '==' and '!=' is used
+	 *  - the 'ip' field is used
+	 */
+	if (WARN((pred->op != OP_EQ) && (pred->op != OP_NE),
+		 "wrong operator for function filter: %d\n", pred->op))
+		return -EINVAL;
+
+	if (strcmp(field->name, "ip"))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int ftrace_function_set_filter_cb(enum move_type move,
+					 struct filter_pred *pred,
+					 int *err, void *data)
+{
+	if ((move != MOVE_DOWN) ||
+	    (pred->left != FILTER_PRED_INVALID))
+		return WALK_PRED_DEFAULT;
+
+	/* Double checking the predicate is valid for function trace. */
+	*err = ftrace_function_check_pred(pred);
+	if (*err)
+		return WALK_PRED_ABORT;
+
+	*err = __ftrace_function_set_filter(pred->op == OP_EQ,
+					    pred->regex.pattern,
+					    pred->regex.len,
+					    data);
+
+	return (*err) ? WALK_PRED_ABORT : WALK_PRED_DEFAULT;
+}
+
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	struct function_filter_data data = {
+		.first_filter  = 1,
+		.first_notrace = 1,
+		.ops           = &event->ftrace_ops,
+	};
+
+	return walk_pred_tree(filter->preds, filter->root,
+			      ftrace_function_set_filter_cb, &data);
+}
+
 int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 			      char *filter_str)
 {
@@ -1901,6 +2038,7 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	struct event_filter *filter;
 	struct filter_parse_state *ps;
 	struct ftrace_event_call *call;
+	struct filter_op *fops = filter_ops;
 
 	mutex_lock(&event_mutex);
 
@@ -1925,14 +2063,21 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 	if (!ps)
 		goto free_filter;
 
-	parse_init(ps, filter_ops, filter_str);
+	if (ftrace_event_is_function(call))
+		fops = filter_ftrace_ops;
+
+	parse_init(ps, fops, filter_str);
 	err = filter_parse(ps);
 	if (err)
 		goto free_ps;
 
 	err = replace_preds(call, filter, ps, filter_str, false);
-	if (!err)
-		event->filter = filter;
+	if (!err) {
+		if (ftrace_event_is_function(call))
+			err = ftrace_function_set_filter(event, filter);
+		else
+			event->filter = filter;
+	}
 
 free_ps:
 	filter_opstack_clear(ps);
@@ -1940,7 +2085,7 @@ free_ps:
 	kfree(ps);
 
 free_filter:
-	if (err)
+	if (err || ftrace_event_is_function(call))
 		__free_filter(filter);
 
 out_unlock:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [tip:perf/core] ftrace: Fix unregister ftrace_ops accounting
  2011-12-05 17:22   ` [PATCHv2 02/10] ftrace: Change mcount call replacement logic Jiri Olsa
  2011-12-19 19:03     ` Steven Rostedt
  2011-12-20 19:39     ` Steven Rostedt
@ 2012-01-08  9:13     ` tip-bot for Jiri Olsa
  2 siblings, 0 replies; 186+ messages in thread
From: tip-bot for Jiri Olsa @ 2012-01-08  9:13 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, rostedt, tglx, jolsa

Commit-ID:  30fb6aa74011dcf595f306ca2727254d708b786e
Gitweb:     http://git.kernel.org/tip/30fb6aa74011dcf595f306ca2727254d708b786e
Author:     Jiri Olsa <jolsa@redhat.com>
AuthorDate: Mon, 5 Dec 2011 18:22:48 +0100
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Wed, 21 Dec 2011 07:09:14 -0500

ftrace: Fix unregister ftrace_ops accounting

Multiple users of the function tracer can register their functions
with the ftrace_ops structure. The accounting within ftrace will
update the counter on each function record that is being traced.
When the ftrace_ops filtering adds or removes functions, the
function records will be updated accordingly if the ftrace_ops is
still registered.

When a ftrace_ops is removed, the counter of the function records,
that the ftrace_ops traces, are decremented. When they reach zero
the functions that they represent are modified to stop calling the
mcount code.

When changes are made, the code is updated via stop_machine() with
a command passed to the function to tell it what to do. There is an
ENABLE and DISABLE command that tells the called function to enable
or disable the functions. But the ENABLE is really a misnomer as it
should just update the records, as records that have been enabled
and now have a count of zero should be disabled.

The DISABLE command is used to disable all functions regardless of
their counter values. This is the big off switch and is not the
complement of the ENABLE command.

To make matters worse, when a ftrace_ops is unregistered and there
is another ftrace_ops registered, neither the DISABLE nor the
ENABLE command are set when calling into the stop_machine() function
and the records will not be updated to match their counter. A command
is passed to that function that will update the mcount code to call
the registered callback directly if it is the only one left. This
means that the ftrace_ops that is still registered will have its callback
called by all functions that have been set for it as well as the ftrace_ops
that was just unregistered.

Here's a way to trigger this bug. Compile the kernel with
CONFIG_FUNCTION_PROFILER set and with CONFIG_FUNCTION_GRAPH not set:

 CONFIG_FUNCTION_PROFILER=y
 # CONFIG_FUNCTION_GRAPH is not set

This will force the function profiler to use the function tracer instead
of the function graph tracer.

  # cd /sys/kernel/debug/tracing
  # echo schedule > set_ftrace_filter
  # echo function > current_tracer
  # cat set_ftrace_filter
 schedule
  # cat trace
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 692/68108025   #P:4
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |
      kworker/0:2-909   [000] ....   531.235574: schedule <-worker_thread
           <idle>-0     [001] .N..   531.235575: schedule <-cpu_idle
      kworker/0:2-909   [000] ....   531.235597: schedule <-worker_thread
             sshd-2563  [001] ....   531.235647: schedule <-schedule_hrtimeout_range_clock

  # echo 1 > function_profile_enabled
  # echo 0 > function_porfile_enabled
  # cat set_ftrace_filter
 schedule
  # cat trace
 # tracer: function
 #
 # entries-in-buffer/entries-written: 159701/118821262   #P:4
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |
           <idle>-0     [002] ...1   604.870655: local_touch_nmi <-cpu_idle
           <idle>-0     [002] d..1   604.870655: enter_idle <-cpu_idle
           <idle>-0     [002] d..1   604.870656: atomic_notifier_call_chain <-enter_idle
           <idle>-0     [002] d..1   604.870656: __atomic_notifier_call_chain <-atomic_notifier_call_chain

The same problem could have happened with the trace_probe_ops,
but they are modified with the set_frace_filter file which does the
update at closure of the file.

The simple solution is to change ENABLE to UPDATE and call it every
time an ftrace_ops is unregistered.

Link: http://lkml.kernel.org/r/1323105776-26961-3-git-send-email-jolsa@redhat.com

Cc: stable@vger.kernel.org # 3.0+
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ftrace.c |   27 +++++++++++++--------------
 1 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b1e8943..25b4f4d 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -948,7 +948,7 @@ struct ftrace_func_probe {
 };
 
 enum {
-	FTRACE_ENABLE_CALLS		= (1 << 0),
+	FTRACE_UPDATE_CALLS		= (1 << 0),
 	FTRACE_DISABLE_CALLS		= (1 << 1),
 	FTRACE_UPDATE_TRACE_FUNC	= (1 << 2),
 	FTRACE_START_FUNC_RET		= (1 << 3),
@@ -1519,7 +1519,7 @@ int ftrace_text_reserved(void *start, void *end)
 
 
 static int
-__ftrace_replace_code(struct dyn_ftrace *rec, int enable)
+__ftrace_replace_code(struct dyn_ftrace *rec, int update)
 {
 	unsigned long ftrace_addr;
 	unsigned long flag = 0UL;
@@ -1527,17 +1527,17 @@ __ftrace_replace_code(struct dyn_ftrace *rec, int enable)
 	ftrace_addr = (unsigned long)FTRACE_ADDR;
 
 	/*
-	 * If we are enabling tracing:
+	 * If we are updating calls:
 	 *
 	 *   If the record has a ref count, then we need to enable it
 	 *   because someone is using it.
 	 *
 	 *   Otherwise we make sure its disabled.
 	 *
-	 * If we are disabling tracing, then disable all records that
+	 * If we are disabling calls, then disable all records that
 	 * are enabled.
 	 */
-	if (enable && (rec->flags & ~FTRACE_FL_MASK))
+	if (update && (rec->flags & ~FTRACE_FL_MASK))
 		flag = FTRACE_FL_ENABLED;
 
 	/* If the state of this record hasn't changed, then do nothing */
@@ -1553,7 +1553,7 @@ __ftrace_replace_code(struct dyn_ftrace *rec, int enable)
 	return ftrace_make_nop(NULL, rec, ftrace_addr);
 }
 
-static void ftrace_replace_code(int enable)
+static void ftrace_replace_code(int update)
 {
 	struct dyn_ftrace *rec;
 	struct ftrace_page *pg;
@@ -1567,7 +1567,7 @@ static void ftrace_replace_code(int enable)
 		if (rec->flags & FTRACE_FL_FREE)
 			continue;
 
-		failed = __ftrace_replace_code(rec, enable);
+		failed = __ftrace_replace_code(rec, update);
 		if (failed) {
 			ftrace_bug(failed, rec->ip);
 			/* Stop processing */
@@ -1623,7 +1623,7 @@ static int __ftrace_modify_code(void *data)
 	 */
 	function_trace_stop++;
 
-	if (*command & FTRACE_ENABLE_CALLS)
+	if (*command & FTRACE_UPDATE_CALLS)
 		ftrace_replace_code(1);
 	else if (*command & FTRACE_DISABLE_CALLS)
 		ftrace_replace_code(0);
@@ -1691,7 +1691,7 @@ static int ftrace_startup(struct ftrace_ops *ops, int command)
 		return -ENODEV;
 
 	ftrace_start_up++;
-	command |= FTRACE_ENABLE_CALLS;
+	command |= FTRACE_UPDATE_CALLS;
 
 	/* ops marked global share the filter hashes */
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
@@ -1743,8 +1743,7 @@ static void ftrace_shutdown(struct ftrace_ops *ops, int command)
 	if (ops != &global_ops || !global_start_up)
 		ops->flags &= ~FTRACE_OPS_FL_ENABLED;
 
-	if (!ftrace_start_up)
-		command |= FTRACE_DISABLE_CALLS;
+	command |= FTRACE_UPDATE_CALLS;
 
 	if (saved_ftrace_func != ftrace_trace_function) {
 		saved_ftrace_func = ftrace_trace_function;
@@ -1766,7 +1765,7 @@ static void ftrace_startup_sysctl(void)
 	saved_ftrace_func = NULL;
 	/* ftrace_start_up is true if we want ftrace running */
 	if (ftrace_start_up)
-		ftrace_run_update_code(FTRACE_ENABLE_CALLS);
+		ftrace_run_update_code(FTRACE_UPDATE_CALLS);
 }
 
 static void ftrace_shutdown_sysctl(void)
@@ -2919,7 +2918,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
 	if (!ret && ops->flags & FTRACE_OPS_FL_ENABLED
 	    && ftrace_enabled)
-		ftrace_run_update_code(FTRACE_ENABLE_CALLS);
+		ftrace_run_update_code(FTRACE_UPDATE_CALLS);
 
 	mutex_unlock(&ftrace_lock);
 
@@ -3107,7 +3106,7 @@ ftrace_regex_release(struct inode *inode, struct file *file)
 				       orig_hash, iter->hash);
 		if (!ret && (iter->ops->flags & FTRACE_OPS_FL_ENABLED)
 		    && ftrace_enabled)
-			ftrace_run_update_code(FTRACE_ENABLE_CALLS);
+			ftrace_run_update_code(FTRACE_UPDATE_CALLS);
 
 		mutex_unlock(&ftrace_lock);
 	}

^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCHv5 0/7] ftrace, perf: Adding support to use function trace
  2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
                           ` (6 preceding siblings ...)
  2012-01-02  9:04         ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2012-01-16  8:57         ` Jiri Olsa
  2012-01-16 16:17           ` Steven Rostedt
  2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
  8 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-16  8:57 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

hi,
any feedback?

thanks,
jirka

On Mon, Jan 02, 2012 at 10:04:13AM +0100, Jiri Olsa wrote:
> hi,
> here's another version of perf support for function trace
> with filter. 
> 
> attached patches:
>   1/7 ftrace: Change filter/notrace set functions to return exit code
>   2/7 ftrace: Add enable/disable ftrace_ops control interface
>   3/7 ftrace, perf: Add open/close tracepoint perf registration actions
>   4/7 ftrace, perf: Add add/del tracepoint perf registration actions
>   5/7 ftrace, perf: Add support to use function tracepoint in perf
>   6/7 ftrace, perf: Distinguish ftrace function event field type
>   7/7 ftrace, perf: Add filter support for function trace event
> 
> v5 changes:
>   7/7 - fixed to properly support ',' in filter expressions
> 
> v4 changes:
>   2/7 - FL_GLOBAL_CONTROL changed to FL_GLOBAL_CONTROL_MASK
>       - changed WARN_ON_ONCE() to include the !preempt_count()
>       - changed this_cpu_ptr to per_cpu_ptr
> 
>   ommited Fix possible NULL dereferencing in __ftrace_hash_rec_update
>   (2/8 in v3)
> 
> v3 changes:
>   3/8 - renamed __add/remove_ftrace_ops
>       - fixed preemtp_enable/recursion_clear order in ftrace_ops_control_func 
>       - renamed/commented API functions -  enable/disable_ftrace_function
>   
>   ommited graph tracer workarounf patch 10/10  
> 
> v2 changes:
>  01/10 - keeping the old fix instead of adding hash_has_contents func
>          I'll send separating patchset for this
>  02/10 - using different way to avoid the issue (3/9 in v1)
>  03/10 - using the way proposed by Steven for controling ftrace_ops
>          (4/9 in v1)
>  06/10 - added check ensuring the ftrace:function event could be used by
>          root only (7/9 in v1)
>  08/10 - added more description (8/9 in v1)
>  09/10 - changed '&&' operator to '||' which seems more suitable
>          in this case (9/9 in v1)
> 
> thanks,
> jirka
> ---
>  include/linux/ftrace.h             |   47 ++++++++-
>  include/linux/ftrace_event.h       |    9 +-
>  include/linux/perf_event.h         |    3 +
>  kernel/trace/ftrace.c              |  140 +++++++++++++++++++++--
>  kernel/trace/trace.h               |   11 ++-
>  kernel/trace/trace_event_perf.c    |  214 +++++++++++++++++++++++++++++-------
>  kernel/trace/trace_events.c        |   12 ++-
>  kernel/trace/trace_events_filter.c |  164 ++++++++++++++++++++++++++-
>  kernel/trace/trace_export.c        |   53 ++++++++-
>  kernel/trace/trace_kprobe.c        |    8 +-
>  kernel/trace/trace_syscalls.c      |   18 +++-
>  11 files changed, 598 insertions(+), 81 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv5 0/7] ftrace, perf: Adding support to use function trace
  2012-01-16  8:57         ` [PATCHv5 0/7] ftrace, perf: Adding support to use function trace Jiri Olsa
@ 2012-01-16 16:17           ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2012-01-16 16:17 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, 2012-01-16 at 09:57 +0100, Jiri Olsa wrote:
> hi,
> any feedback?
> 

Sorry,

Got lost in the after break email. Will look at it this week though.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/7] ftrace, perf: Add filter support for function trace event
  2012-01-02  9:04         ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2012-01-16 23:59           ` Steven Rostedt
  2012-01-18 13:45             ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2012-01-16 23:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, 2012-01-02 at 10:04 +0100, Jiri Olsa wrote:
> Adding support to filter function trace event via perf
> interface. It is now possible to use filter interface
> in the perf tool like:
> 
>   perf record -e ftrace:function --filter="(ip == mm_*)" ls
> 
> The filter syntax is restricted to the the 'ip' field only,
> and following operators are accepted '==' '!=' '||', ending
> up with the filter strings like:
> 
>   ip == f1[, ]f2 ... || ip != f3[, ]f4 ...
> 
> with comma ',' or space ' ' as a function separator. If the
> space ' ' is used as a separator, the right side of the
> assignment needs to be enclosed in double quotes '"'.
> 
> The '==' operator adds trace filter with same effect as would
> be added via set_ftrace_filter file.
> 
> The '!=' operator adds trace filter with same effect as would
> be added via set_ftrace_notrace file.
> 
> The right side of the '!=', '==' operators is list of functions
> or regexp. to be added to filter separated by space.
> 
> The '||' operator is used for connecting multiple filter definitions
> together. It is possible to have more than one '==' and '!='
> operators within one filter string.

Hate to ask you this, but can you rebase this patch against latest
tip/perf/core? Things have changed that cause this patch not to apply.

I'll go ahead and test your other 6 patches.

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-02  9:04         ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2012-01-17  1:42           ` Frederic Weisbecker
  2012-01-17  2:07             ` Steven Rostedt
  2012-01-18 13:59             ` Jiri Olsa
  0 siblings, 2 replies; 186+ messages in thread
From: Frederic Weisbecker @ 2012-01-17  1:42 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, Jan 02, 2012 at 10:04:15AM +0100, Jiri Olsa wrote:
> Adding a way to temporarily enable/disable ftrace_ops. The change
> follows the same way as 'global' ftrace_ops are done.
> 
> Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
> which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
> flag. In addition new per cpu flag called 'disabled' is also added to
> ftrace_ops to provide the control information for each cpu.
> 
> When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
> set as disabled for all cpus.
> 
> The ftrace_control_list contains all the registered 'control' ftrace_ops.
> The control_ops provides function which iterates ftrace_control_list
> and does the check for 'disabled' flag on current cpu.
> 
> Adding 2 inline functions ftrace_function_enable/ftrace_function_disable,
> which enable/disable the ftrace_ops for given cpu.
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---

So this is used to implement pmu->add() / -> del(). But perf_tp_event_match()
already takes care of that by checking PERF_HES_STOPPED.

Now what you are doing here is an interesting optimization as it doesn't even
call on ftrace_ops that have called pmu->del().

I'm not against the idea but I want to ensure this is really your purpose
and it would be nice to put some words about that in the changelog as
well as in PATCH 4/7.

>  include/linux/ftrace.h |   42 +++++++++++++++++
>  kernel/trace/ftrace.c  |  119 +++++++++++++++++++++++++++++++++++++++++++++---
>  kernel/trace/trace.h   |    2 +
>  3 files changed, 156 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 523640f..0d43a2b 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -35,12 +35,14 @@ enum {
>  	FTRACE_OPS_FL_ENABLED		= 1 << 0,
>  	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
>  	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
> +	FTRACE_OPS_FL_CONTROL		= 1 << 3,

Please comment the role of this flag. In fact it would be nice to have
a comment that explains all these. GLOBAL and DYNAMIC don't actually give
much clues alone.

Thanks.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-17  1:42           ` Frederic Weisbecker
@ 2012-01-17  2:07             ` Steven Rostedt
  2012-01-17  2:29               ` Frederic Weisbecker
  2012-01-18 13:59             ` Jiri Olsa
  1 sibling, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2012-01-17  2:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jiri Olsa, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Tue, 2012-01-17 at 02:42 +0100, Frederic Weisbecker wrote:
>  
> > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > index 523640f..0d43a2b 100644
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -35,12 +35,14 @@ enum {
> >  	FTRACE_OPS_FL_ENABLED		= 1 << 0,
> >  	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
> >  	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
> > +	FTRACE_OPS_FL_CONTROL		= 1 << 3,
> 
> Please comment the role of this flag. In fact it would be nice to have
> a comment that explains all these. GLOBAL and DYNAMIC don't actually give
> much clues alone.

Someone else asked about commenting these (was it you?). I probably
should, as it is confusing to what they are used for.

I just figured every just figures it out by looking at the code ;-)

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-17  2:07             ` Steven Rostedt
@ 2012-01-17  2:29               ` Frederic Weisbecker
  0 siblings, 0 replies; 186+ messages in thread
From: Frederic Weisbecker @ 2012-01-17  2:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, Jan 16, 2012 at 09:07:35PM -0500, Steven Rostedt wrote:
> On Tue, 2012-01-17 at 02:42 +0100, Frederic Weisbecker wrote:
> >  
> > > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > > index 523640f..0d43a2b 100644
> > > --- a/include/linux/ftrace.h
> > > +++ b/include/linux/ftrace.h
> > > @@ -35,12 +35,14 @@ enum {
> > >  	FTRACE_OPS_FL_ENABLED		= 1 << 0,
> > >  	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
> > >  	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
> > > +	FTRACE_OPS_FL_CONTROL		= 1 << 3,
> > 
> > Please comment the role of this flag. In fact it would be nice to have
> > a comment that explains all these. GLOBAL and DYNAMIC don't actually give
> > much clues alone.
> 
> Someone else asked about commenting these (was it you?).

Probably, as I asked you the meaning of that GLOBAL on irc :)

> I probably should, as it is confusing to what they are used for.

Yeah that would be nice.

> I just figured every just figures it out by looking at the code ;-)

That's often what happens when the reader of the code is also the writer :)
Also, there is no user that doesn't have FTRACE_OPS_FL_GLOBAL yet, so
it's hard to find a counter example.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/7] ftrace, perf: Add filter support for function trace event
  2012-01-16 23:59           ` Steven Rostedt
@ 2012-01-18 13:45             ` Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-18 13:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, Jan 16, 2012 at 06:59:08PM -0500, Steven Rostedt wrote:
> On Mon, 2012-01-02 at 10:04 +0100, Jiri Olsa wrote:
> > Adding support to filter function trace event via perf
> > interface. It is now possible to use filter interface
> > in the perf tool like:
> > 
> >   perf record -e ftrace:function --filter="(ip == mm_*)" ls
> > 
> > The filter syntax is restricted to the the 'ip' field only,
> > and following operators are accepted '==' '!=' '||', ending
> > up with the filter strings like:
> > 
> >   ip == f1[, ]f2 ... || ip != f3[, ]f4 ...
> > 
> > with comma ',' or space ' ' as a function separator. If the
> > space ' ' is used as a separator, the right side of the
> > assignment needs to be enclosed in double quotes '"'.
> > 
> > The '==' operator adds trace filter with same effect as would
> > be added via set_ftrace_filter file.
> > 
> > The '!=' operator adds trace filter with same effect as would
> > be added via set_ftrace_notrace file.
> > 
> > The right side of the '!=', '==' operators is list of functions
> > or regexp. to be added to filter separated by space.
> > 
> > The '||' operator is used for connecting multiple filter definitions
> > together. It is possible to have more than one '==' and '!='
> > operators within one filter string.
> 
> Hate to ask you this, but can you rebase this patch against latest
> tip/perf/core? Things have changed that cause this patch not to apply.

yep, attaching rebased patch ;)

thanks,
jirka


---
Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

  ip == f1[, ]f2 ... || ip != f3[, ]f4 ...

with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"'.

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h             |    1 +
 kernel/trace/ftrace.c              |    6 ++
 kernel/trace/trace.h               |    2 -
 kernel/trace/trace_event_perf.c    |    4 +-
 kernel/trace/trace_events_filter.c |  169 +++++++++++++++++++++++++++++++++---
 5 files changed, 168 insertions(+), 14 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 2a6d9af..797a5a5 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -226,6 +226,7 @@ int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
+void ftrace_free_filter(struct ftrace_ops *ops);
 
 int register_ftrace_command(struct ftrace_func_command *cmd);
 int unregister_ftrace_command(struct ftrace_func_command *cmd);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 45c9b0c..9935a2a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1192,6 +1192,12 @@ static void free_ftrace_hash_rcu(struct ftrace_hash *hash)
 	call_rcu_sched(&hash->rcu, __free_ftrace_hash_rcu);
 }
 
+void ftrace_free_filter(struct ftrace_ops *ops)
+{
+	free_ftrace_hash(ops->filter_hash);
+	free_ftrace_hash(ops->notrace_hash);
+}
+
 static struct ftrace_hash *alloc_ftrace_hash(int size_bits)
 {
 	struct ftrace_hash *hash;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e88e58a..4ec6d18 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -770,9 +770,7 @@ struct filter_pred {
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	struct ftrace_event_field *field;
-#endif
 	int 			offset;
 	int 			not;
 	int 			op;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 57eb232..220b50a 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -298,7 +298,9 @@ static int perf_ftrace_function_register(struct perf_event *event)
 static int perf_ftrace_function_unregister(struct perf_event *event)
 {
 	struct ftrace_ops *ops = &event->ftrace_ops;
-	return unregister_ftrace_function(ops);
+	int ret = unregister_ftrace_function(ops);
+	ftrace_free_filter(ops);
+	return ret;
 }
 
 static void perf_ftrace_function_enable(struct perf_event *event)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index eb04a2a..c8a64ec 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -54,6 +54,13 @@ struct filter_op {
 	int precedence;
 };
 
+static struct filter_op filter_ftrace_ops[] = {
+	{ OP_OR,	"||",		1 },
+	{ OP_NE,	"!=",		2 },
+	{ OP_EQ,	"==",		2 },
+	{ OP_NONE,	"OP_NONE",	0 },
+};
+
 static struct filter_op filter_ops[] = {
 	{ OP_OR,	"||",		1 },
 	{ OP_AND,	"&&",		2 },
@@ -81,6 +88,7 @@ enum {
 	FILT_ERR_TOO_MANY_PREDS,
 	FILT_ERR_MISSING_FIELD,
 	FILT_ERR_INVALID_FILTER,
+	FILT_ERR_IP_FIELD_ONLY,
 };
 
 static char *err_text[] = {
@@ -96,6 +104,7 @@ static char *err_text[] = {
 	"Too many terms in predicate expression",
 	"Missing field name and/or value",
 	"Meaningless filter expression",
+	"Only 'ip' field is supported for function trace",
 };
 
 struct opstack_op {
@@ -992,7 +1001,12 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else if (!is_function_field(field)) {
+	} else if (is_function_field(field)) {
+		if (strcmp(field->name, "ip")) {
+			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
+			return -EINVAL;
+		}
+	} else {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
@@ -1339,10 +1353,7 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
 
 	strcpy(pred.regex.pattern, operand2);
 	pred.regex.len = strlen(pred.regex.pattern);
-
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	pred.field = field;
-#endif
 	return init_pred(ps, field, &pred) ? NULL : &pred;
 }
 
@@ -1743,8 +1754,8 @@ static int replace_system_preds(struct event_subsystem *system,
 	return -ENOMEM;
 }
 
-static int create_filter_start(char *filter_str, bool set_str,
-			       struct filter_parse_state **psp,
+static int create_filter_start(char *filter_str, struct filter_op *fops,
+			       bool set_str, struct filter_parse_state **psp,
 			       struct event_filter **filterp)
 {
 	struct event_filter *filter;
@@ -1770,7 +1781,7 @@ static int create_filter_start(char *filter_str, bool set_str,
 	*filterp = filter;
 	*psp = ps;
 
-	parse_init(ps, filter_ops, filter_str);
+	parse_init(ps, fops, filter_str);
 	err = filter_parse(ps);
 	if (err && set_str)
 		append_filter_err(ps, filter);
@@ -1808,9 +1819,13 @@ static int create_filter(struct ftrace_event_call *call,
 {
 	struct event_filter *filter = NULL;
 	struct filter_parse_state *ps = NULL;
+	struct filter_op *fops = filter_ops;
 	int err;
 
-	err = create_filter_start(filter_str, set_str, &ps, &filter);
+	if (ftrace_event_is_function(call))
+		fops = filter_ftrace_ops;
+
+	err = create_filter_start(filter_str, fops, set_str, &ps, &filter);
 	if (!err) {
 		err = replace_preds(call, filter, ps, filter_str, false);
 		if (err && set_str)
@@ -1838,7 +1853,7 @@ static int create_system_filter(struct event_subsystem *system,
 	struct filter_parse_state *ps = NULL;
 	int err;
 
-	err = create_filter_start(filter_str, true, &ps, &filter);
+	err = create_filter_start(filter_str, filter_ops, true, &ps, &filter);
 	if (!err) {
 		err = replace_system_preds(system, ps, filter_str);
 		if (!err) {
@@ -1955,6 +1970,131 @@ void ftrace_profile_free_filter(struct perf_event *event)
 	__free_filter(filter);
 }
 
+struct function_filter_data {
+	struct ftrace_ops *ops;
+	int first_filter;
+	int first_notrace;
+};
+
+static char **
+ftrace_function_filter_re(char *buf, int len, int *count)
+{
+	char *str, *sep, **re;
+
+	str = kstrndup(buf, len, GFP_KERNEL);
+	if (!str)
+		return NULL;
+
+	/*
+	 * The argv_split function takes white space
+	 * as a separator, so convert ',' into spaces.
+	 */
+	while ((sep = strchr(str, ',')))
+		*sep = ' ';
+
+	re = argv_split(GFP_KERNEL, str, count);
+	kfree(str);
+	return re;
+}
+
+static int ftrace_function_set_regexp(struct ftrace_ops *ops, int filter,
+				      int reset, char *re, int len)
+{
+	int ret;
+
+	if (filter)
+		ret = ftrace_set_filter(ops, re, len, reset);
+	else
+		ret = ftrace_set_notrace(ops, re, len, reset);
+
+	return ret;
+}
+
+static int __ftrace_function_set_filter(int filter, char *buf, int len,
+					struct function_filter_data *data)
+{
+	int i, re_cnt, ret;
+	int *reset;
+	char **re;
+
+	reset = filter ? &data->first_filter : &data->first_notrace;
+
+	/*
+	 * The 'ip' field could have multiple filters set, separated
+	 * either by space or comma. We first cut the filter and apply
+	 * all pieces separatelly.
+	 */
+	re = ftrace_function_filter_re(buf, len, &re_cnt);
+	if (!re)
+		return -EINVAL;
+
+	for (i = 0; i < re_cnt; i++) {
+		ret = ftrace_function_set_regexp(data->ops, filter, *reset,
+						 re[i], strlen(re[i]));
+		if (ret)
+			break;
+
+		if (*reset)
+			*reset = 0;
+	}
+
+	argv_free(re);
+	return ret;
+}
+
+static int ftrace_function_check_pred(struct filter_pred *pred)
+{
+	struct ftrace_event_field *field = pred->field;
+
+	/*
+	 * Check the predicate for function trace, verify:
+	 *  - only '==' and '!=' is used
+	 *  - the 'ip' field is used
+	 */
+	if (WARN((pred->op != OP_EQ) && (pred->op != OP_NE),
+		 "wrong operator for function filter: %d\n", pred->op))
+		return -EINVAL;
+
+	if (strcmp(field->name, "ip"))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int ftrace_function_set_filter_cb(enum move_type move,
+					 struct filter_pred *pred,
+					 int *err, void *data)
+{
+	if ((move != MOVE_DOWN) ||
+	    (pred->left != FILTER_PRED_INVALID))
+		return WALK_PRED_DEFAULT;
+
+	/* Double checking the predicate is valid for function trace. */
+	*err = ftrace_function_check_pred(pred);
+	if (*err)
+		return WALK_PRED_ABORT;
+
+	*err = __ftrace_function_set_filter(pred->op == OP_EQ,
+					    pred->regex.pattern,
+					    pred->regex.len,
+					    data);
+
+	return (*err) ? WALK_PRED_ABORT : WALK_PRED_DEFAULT;
+}
+
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	struct function_filter_data data = {
+		.first_filter  = 1,
+		.first_notrace = 1,
+		.ops           = &event->ftrace_ops,
+	};
+
+	return walk_pred_tree(filter->preds, filter->root,
+			      ftrace_function_set_filter_cb, &data);
+}
+
 int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 			      char *filter_str)
 {
@@ -1975,9 +2115,16 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 		goto out_unlock;
 
 	err = create_filter(call, filter_str, false, &filter);
-	if (!err)
-		event->filter = filter;
+	if (err)
+		goto free_filter;
+
+	if (ftrace_event_is_function(call))
+		err = ftrace_function_set_filter(event, filter);
 	else
+		event->filter = filter;
+
+free_filter:
+	if (err ||  ftrace_event_is_function(call))
 		__free_filter(filter);
 
 out_unlock:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-17  1:42           ` Frederic Weisbecker
  2012-01-17  2:07             ` Steven Rostedt
@ 2012-01-18 13:59             ` Jiri Olsa
  1 sibling, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-18 13:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Tue, Jan 17, 2012 at 02:42:12AM +0100, Frederic Weisbecker wrote:
> On Mon, Jan 02, 2012 at 10:04:15AM +0100, Jiri Olsa wrote:
> > Adding a way to temporarily enable/disable ftrace_ops. The change
> > follows the same way as 'global' ftrace_ops are done.
> > 
> > Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
> > which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
> > flag. In addition new per cpu flag called 'disabled' is also added to
> > ftrace_ops to provide the control information for each cpu.
> > 
> > When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
> > set as disabled for all cpus.
> > 
> > The ftrace_control_list contains all the registered 'control' ftrace_ops.
> > The control_ops provides function which iterates ftrace_control_list
> > and does the check for 'disabled' flag on current cpu.
> > 
> > Adding 2 inline functions ftrace_function_enable/ftrace_function_disable,
> > which enable/disable the ftrace_ops for given cpu.
> > 
> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> > ---
> 
> So this is used to implement pmu->add() / -> del(). But perf_tp_event_match()
> already takes care of that by checking PERF_HES_STOPPED.
> 
> Now what you are doing here is an interesting optimization as it doesn't even
> call on ftrace_ops that have called pmu->del().
> 
> I'm not against the idea but I want to ensure this is really your purpose
> and it would be nice to put some words about that in the changelog as
> well as in PATCH 4/7.

well, to say the truth I missed the PERF_HES_STOPPED possibility ;)
since I was up to disabling ftrace_ops completely when it's not used..

but as you said it's faster, so I'd stay with it..

I'll try to make some comment about that and repost the patch

> 
> >  include/linux/ftrace.h |   42 +++++++++++++++++
> >  kernel/trace/ftrace.c  |  119 +++++++++++++++++++++++++++++++++++++++++++++---
> >  kernel/trace/trace.h   |    2 +
> >  3 files changed, 156 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > index 523640f..0d43a2b 100644
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -35,12 +35,14 @@ enum {
> >  	FTRACE_OPS_FL_ENABLED		= 1 << 0,
> >  	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
> >  	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
> > +	FTRACE_OPS_FL_CONTROL		= 1 << 3,
> 
> Please comment the role of this flag. In fact it would be nice to have
> a comment that explains all these. GLOBAL and DYNAMIC don't actually give
> much clues alone.

I'll comment those as well..

thanks,
jirka

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCHv6 0/7] ftrace, perf: Adding support to use function trace
  2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
                           ` (7 preceding siblings ...)
  2012-01-16  8:57         ` [PATCHv5 0/7] ftrace, perf: Adding support to use function trace Jiri Olsa
@ 2012-01-18 18:44         ` Jiri Olsa
  2012-01-18 18:44           ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
                             ` (8 more replies)
  8 siblings, 9 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-18 18:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

hi,
here's another version of perf support for function trace
with filter. 

attached patches:
  1/7 ftrace: Change filter/notrace set functions to return exit code
  2/7 ftrace: Add enable/disable ftrace_ops control interface
  3/7 ftrace, perf: Add open/close tracepoint perf registration actions
  4/7 ftrace, perf: Add add/del tracepoint perf registration actions
  5/7 ftrace, perf: Add support to use function tracepoint in perf
  6/7 ftrace, perf: Distinguish ftrace function event field type
  7/7 ftrace, perf: Add filter support for function trace event

v6 changes:
  2/7 - added comments to FTRACE_OPS_FL_* bits enum
  5/7 - added more info to the change log regarding ftrace_ops enable/disable
  7/7 - rebased to the latest filter changes

v5 changes:
  7/7 - fixed to properly support ',' in filter expressions

v4 changes:
  2/7 - FL_GLOBAL_CONTROL changed to FL_GLOBAL_CONTROL_MASK
      - changed WARN_ON_ONCE() to include the !preempt_count()
      - changed this_cpu_ptr to per_cpu_ptr

  ommited Fix possible NULL dereferencing in __ftrace_hash_rec_update
  (2/8 in v3)

v3 changes:
  3/8 - renamed __add/remove_ftrace_ops
      - fixed preemtp_enable/recursion_clear order in ftrace_ops_control_func 
      - renamed/commented API functions -  enable/disable_ftrace_function
  
  ommited graph tracer workarounf patch 10/10  

v2 changes:
 01/10 - keeping the old fix instead of adding hash_has_contents func
         I'll send separating patchset for this
 02/10 - using different way to avoid the issue (3/9 in v1)
 03/10 - using the way proposed by Steven for controling ftrace_ops
         (4/9 in v1)
 06/10 - added check ensuring the ftrace:function event could be used by
         root only (7/9 in v1)
 08/10 - added more description (8/9 in v1)
 09/10 - changed '&&' operator to '||' which seems more suitable
         in this case (9/9 in v1)

thanks,
jirka
---
 include/linux/ftrace.h             |   61 ++++++++++-
 include/linux/ftrace_event.h       |    9 +-
 include/linux/perf_event.h         |    3 +
 kernel/trace/ftrace.c              |  140 +++++++++++++++++++++--
 kernel/trace/trace.h               |   11 ++-
 kernel/trace/trace_event_perf.c    |  214 +++++++++++++++++++++++++++++-------
 kernel/trace/trace_events.c        |   12 ++-
 kernel/trace/trace_events_filter.c |  172 +++++++++++++++++++++++++++--
 kernel/trace/trace_export.c        |   53 ++++++++-
 kernel/trace/trace_kprobe.c        |    8 +-
 kernel/trace/trace_syscalls.c      |   18 +++-
 11 files changed, 617 insertions(+), 84 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code
  2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
@ 2012-01-18 18:44           ` Jiri Olsa
  2012-01-19 16:31             ` Frederic Weisbecker
  2012-01-18 18:44           ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
                             ` (7 subsequent siblings)
  8 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-18 18:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.

The set_ftrace_filter interface returns error in case the filter
did not match:

  # echo krava > set_ftrace_filter
  bash: echo: write error: Invalid argument

Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |    4 ++--
 kernel/trace/ftrace.c  |   15 +++++++++------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 028e26f..f33fb3b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -178,9 +178,9 @@ struct dyn_ftrace {
 };
 
 int ftrace_force_update(void);
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset);
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 683d559..e2e0597 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3146,8 +3146,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	mutex_lock(&ftrace_regex_lock);
 	if (reset)
 		ftrace_filter_reset(hash);
-	if (buf)
-		ftrace_match_records(hash, buf, len);
+	if (buf && !ftrace_match_records(hash, buf, len)) {
+		ret = -EINVAL;
+		goto out_regex_unlock;
+	}
 
 	mutex_lock(&ftrace_lock);
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
@@ -3157,6 +3159,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 
 	mutex_unlock(&ftrace_lock);
 
+ out_regex_unlock:
 	mutex_unlock(&ftrace_regex_lock);
 
 	free_ftrace_hash(hash);
@@ -3173,10 +3176,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
  * Filters denote which functions should be enabled when tracing is enabled.
  * If @buf is NULL and reset is set, all functions will be enabled for tracing.
  */
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 1);
+	return ftrace_set_regex(ops, buf, len, reset, 1);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_filter);
 
@@ -3191,10 +3194,10 @@ EXPORT_SYMBOL_GPL(ftrace_set_filter);
  * is enabled. If @buf is NULL and reset is set, all functions will be enabled
  * for tracing.
  */
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 0);
+	return ftrace_set_regex(ops, buf, len, reset, 0);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_notrace);
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
  2012-01-18 18:44           ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2012-01-18 18:44           ` Jiri Olsa
  2012-01-20 17:02             ` Frederic Weisbecker
  2012-01-18 18:44           ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
                             ` (6 subsequent siblings)
  8 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-18 18:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.

Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.

When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.

The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.

Adding 2 inline functions ftrace_function_enable/ftrace_function_disable,
which enable/disable the ftrace_ops for given cpu.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |   56 ++++++++++++++++++++++
 kernel/trace/ftrace.c  |  119 +++++++++++++++++++++++++++++++++++++++++++++---
 kernel/trace/trace.h   |    2 +
 3 files changed, 170 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index f33fb3b..d3f529c 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -31,16 +31,32 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
 
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
 
+/*
+ * FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
+ * set in the flags member.
+ *
+ * ENABLED - set/unset when ftrace_ops is registered/unregistered
+ * GLOBAL  - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           is part of the global tracers sharing the same filter
+ *           via set_ftrace_* debugfs files.
+ * DYNAMIC - set when ftrace_ops is registered to denote dynamically
+ *           allocated ftrace_ops which need special care
+ * CONTROL - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           could be controled by following calls:
+ *           ftrace_function_enable, ftrace_function_disable
+ */
 enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	void __percpu			*disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +113,46 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+/**
+ * ftrace_function_enable - enable controlled ftrace_ops on given cpu
+ *
+ * This function enables tracing on given cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
+{
+	atomic_t *disabled;
+
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
+			 !preempt_count()))
+		return;
+
+	disabled = per_cpu_ptr(ops->disabled, cpu);
+	atomic_dec(disabled);
+}
+
+/**
+ * ftrace_function_disable - enable controlled ftrace_ops on given cpu
+ *
+ * This function enables tracing on given cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_disable(struct ftrace_ops *ops, int cpu)
+{
+	atomic_t *disabled;
+
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
+			 !preempt_count()))
+		return;
+
+	disabled = per_cpu_ptr(ops->disabled, cpu);
+	atomic_inc(disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e2e0597..45c9b0c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -62,6 +62,8 @@
 #define FTRACE_HASH_DEFAULT_BITS 10
 #define FTRACE_HASH_MAX_BITS 12
 
+#define FL_GLOBAL_CONTROL_MASK (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
+
 /* ftrace_enabled is a method to turn ftrace on or off */
 int ftrace_enabled __read_mostly;
 static int last_ftrace_enabled;
@@ -89,12 +91,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
 };
 
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
 ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
+static struct ftrace_ops control_ops;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
@@ -168,6 +172,38 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
 }
 #endif
 
+static void control_ops_disable_all(struct ftrace_ops *ops)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		atomic_set(per_cpu_ptr(ops->disabled, cpu), 1);
+}
+
+static int control_ops_alloc(struct ftrace_ops *ops)
+{
+	atomic_t *disabled;
+
+	disabled = alloc_percpu(atomic_t);
+	if (!disabled)
+		return -ENOMEM;
+
+	ops->disabled = disabled;
+	control_ops_disable_all(ops);
+	return 0;
+}
+
+static void control_ops_free(struct ftrace_ops *ops)
+{
+	free_percpu(ops->disabled);
+}
+
+static int control_ops_is_disabled(struct ftrace_ops *ops, int cpu)
+{
+	atomic_t *disabled = per_cpu_ptr(ops->disabled, cpu);
+	return atomic_read(disabled);
+}
+
 static void update_global_ops(void)
 {
 	ftrace_func_t func;
@@ -259,6 +295,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	return 0;
 }
 
+static void add_ftrace_list_ops(struct ftrace_ops **list,
+				struct ftrace_ops *main_ops,
+				struct ftrace_ops *ops)
+{
+	int first = *list == &ftrace_list_end;
+	add_ftrace_ops(list, ops);
+	if (first)
+		add_ftrace_ops(&ftrace_ops_list, main_ops);
+}
+
+static int remove_ftrace_list_ops(struct ftrace_ops **list,
+				  struct ftrace_ops *main_ops,
+				  struct ftrace_ops *ops)
+{
+	int ret = remove_ftrace_ops(list, ops);
+	if (!ret && *list == &ftrace_list_end)
+		ret = remove_ftrace_ops(&ftrace_ops_list, main_ops);
+	return ret;
+}
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ftrace_disabled)
@@ -270,15 +326,20 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
 		return -EBUSY;
 
+	/* We don't support both control and global flags set. */
+	if ((ops->flags & FL_GLOBAL_CONTROL_MASK) == FL_GLOBAL_CONTROL_MASK)
+		return -EINVAL;
+
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		int first = ftrace_global_list == &ftrace_list_end;
-		add_ftrace_ops(&ftrace_global_list, ops);
+		add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
-		if (first)
-			add_ftrace_ops(&ftrace_ops_list, &global_ops);
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		if (control_ops_alloc(ops))
+			return -ENOMEM;
+		add_ftrace_list_ops(&ftrace_control_list, &control_ops, ops);
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -302,11 +363,23 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 		return -EINVAL;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		ret = remove_ftrace_ops(&ftrace_global_list, ops);
-		if (!ret && ftrace_global_list == &ftrace_list_end)
-			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+		ret = remove_ftrace_list_ops(&ftrace_global_list,
+					     &global_ops, ops);
 		if (!ret)
 			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		ret = remove_ftrace_list_ops(&ftrace_control_list,
+					     &control_ops, ops);
+		if (!ret) {
+			/*
+			 * The ftrace_ops is now removed from the list,
+			 * so there'll be no new users. We must ensure
+			 * all current users are done before we free
+			 * the control data.
+			 */
+			synchronize_sched();
+			control_ops_free(ops);
+		}
 	} else
 		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -3874,6 +3947,38 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
+ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_ops *op;
+	int cpu;
+
+	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
+		return;
+
+	/*
+	 * Some of the ops may be dynamically allocated,
+	 * they must be freed after a synchronize_sched().
+	 */
+	preempt_disable_notrace();
+	trace_recursion_set(TRACE_CONTROL_BIT);
+	cpu = smp_processor_id();
+	op = rcu_dereference_raw(ftrace_control_list);
+	while (op != &ftrace_list_end) {
+		if (!control_ops_is_disabled(op, cpu) &&
+		    ftrace_ops_test(op, ip))
+			op->func(ip, parent_ip);
+
+		op = rcu_dereference_raw(op->next);
+	};
+	trace_recursion_clear(TRACE_CONTROL_BIT);
+	preempt_enable_notrace();
+}
+
+static struct ftrace_ops control_ops = {
+	.func = ftrace_ops_control_func,
+};
+
+static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 {
 	struct ftrace_ops *op;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 2c26574..41c54e3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ struct tracer {
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
 #define TRACE_GLOBAL_BIT		(1<<12)
+#define TRACE_CONTROL_BIT		(1<<13)
+
 /*
  * Abuse of the trace_recursion.
  * As we need a way to maintain state if we are tracing the function
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions
  2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
  2012-01-18 18:44           ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
  2012-01-18 18:44           ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2012-01-18 18:44           ` Jiri Olsa
  2012-01-18 18:44           ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
                             ` (5 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-18 18:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
register/unregister from open/close actions.

The register/unregister actions are invoked for the first/last
tracepoint user when opening/closing the evet.

The open/close actions are invoked for each tracepoint user when
opening/closing the event.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    6 +-
 kernel/trace/trace.h            |    5 ++
 kernel/trace/trace_event_perf.c |  116 +++++++++++++++++++++++++--------------
 kernel/trace/trace_events.c     |   10 ++-
 kernel/trace/trace_kprobe.c     |    6 ++-
 kernel/trace/trace_syscalls.c   |   14 +++-
 6 files changed, 106 insertions(+), 51 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index c3da42d..195e360 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -146,6 +146,8 @@ enum trace_reg {
 	TRACE_REG_UNREGISTER,
 	TRACE_REG_PERF_REGISTER,
 	TRACE_REG_PERF_UNREGISTER,
+	TRACE_REG_PERF_OPEN,
+	TRACE_REG_PERF_CLOSE,
 };
 
 struct ftrace_event_call;
@@ -157,7 +159,7 @@ struct ftrace_event_class {
 	void			*perf_probe;
 #endif
 	int			(*reg)(struct ftrace_event_call *event,
-				       enum trace_reg type);
+				       enum trace_reg type, void *data);
 	int			(*define_fields)(struct ftrace_event_call *);
 	struct list_head	*(*get_fields)(struct ftrace_event_call *);
 	struct list_head	fields;
@@ -165,7 +167,7 @@ struct ftrace_event_class {
 };
 
 extern int ftrace_event_reg(struct ftrace_event_call *event,
-			    enum trace_reg type);
+			    enum trace_reg type, void *data);
 
 enum {
 	TRACE_EVENT_FL_ENABLED_BIT,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 41c54e3..85732a8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -828,4 +828,9 @@ extern const char *__stop___trace_bprintk_fmt[];
 	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 #include "trace_entries.h"
 
+#ifdef CONFIG_PERF_EVENTS
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data);
+#endif
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 19a359d..0cfcc37 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -44,23 +44,17 @@ static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 	return 0;
 }
 
-static int perf_trace_event_init(struct ftrace_event_call *tp_event,
-				 struct perf_event *p_event)
+static int perf_trace_event_reg(struct ftrace_event_call *tp_event,
+				struct perf_event *p_event)
 {
 	struct hlist_head __percpu *list;
-	int ret;
+	int ret = -ENOMEM;
 	int cpu;
 
-	ret = perf_trace_event_perm(tp_event, p_event);
-	if (ret)
-		return ret;
-
 	p_event->tp_event = tp_event;
 	if (tp_event->perf_refcount++ > 0)
 		return 0;
 
-	ret = -ENOMEM;
-
 	list = alloc_percpu(struct hlist_head);
 	if (!list)
 		goto fail;
@@ -83,7 +77,7 @@ static int perf_trace_event_init(struct ftrace_event_call *tp_event,
 		}
 	}
 
-	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER);
+	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER, NULL);
 	if (ret)
 		goto fail;
 
@@ -108,6 +102,69 @@ fail:
 	return ret;
 }
 
+static void perf_trace_event_unreg(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	int i;
+
+	if (--tp_event->perf_refcount > 0)
+		goto out;
+
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER, NULL);
+
+	/*
+	 * Ensure our callback won't be called anymore. The buffers
+	 * will be freed after that.
+	 */
+	tracepoint_synchronize_unregister();
+
+	free_percpu(tp_event->perf_events);
+	tp_event->perf_events = NULL;
+
+	if (!--total_ref_count) {
+		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
+			free_percpu(perf_trace_buf[i]);
+			perf_trace_buf[i] = NULL;
+		}
+	}
+out:
+	module_put(tp_event->mod);
+}
+
+static int perf_trace_event_open(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_OPEN, p_event);
+}
+
+static void perf_trace_event_close(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
+}
+
+static int perf_trace_event_init(struct ftrace_event_call *tp_event,
+				 struct perf_event *p_event)
+{
+	int ret;
+
+	ret = perf_trace_event_perm(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_reg(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_open(p_event);
+	if (ret) {
+		perf_trace_event_unreg(p_event);
+		return ret;
+	}
+
+	return 0;
+}
+
 int perf_trace_init(struct perf_event *p_event)
 {
 	struct ftrace_event_call *tp_event;
@@ -130,6 +187,14 @@ int perf_trace_init(struct perf_event *p_event)
 	return ret;
 }
 
+void perf_trace_destroy(struct perf_event *p_event)
+{
+	mutex_lock(&event_mutex);
+	perf_trace_event_close(p_event);
+	perf_trace_event_unreg(p_event);
+	mutex_unlock(&event_mutex);
+}
+
 int perf_trace_add(struct perf_event *p_event, int flags)
 {
 	struct ftrace_event_call *tp_event = p_event->tp_event;
@@ -154,37 +219,6 @@ void perf_trace_del(struct perf_event *p_event, int flags)
 	hlist_del_rcu(&p_event->hlist_entry);
 }
 
-void perf_trace_destroy(struct perf_event *p_event)
-{
-	struct ftrace_event_call *tp_event = p_event->tp_event;
-	int i;
-
-	mutex_lock(&event_mutex);
-	if (--tp_event->perf_refcount > 0)
-		goto out;
-
-	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER);
-
-	/*
-	 * Ensure our callback won't be called anymore. The buffers
-	 * will be freed after that.
-	 */
-	tracepoint_synchronize_unregister();
-
-	free_percpu(tp_event->perf_events);
-	tp_event->perf_events = NULL;
-
-	if (!--total_ref_count) {
-		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
-			free_percpu(perf_trace_buf[i]);
-			perf_trace_buf[i] = NULL;
-		}
-	}
-out:
-	module_put(tp_event->mod);
-	mutex_unlock(&event_mutex);
-}
-
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 				       struct pt_regs *regs, int *rctxp)
 {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c212a7f..5138fea 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -147,7 +147,8 @@ int trace_event_raw_init(struct ftrace_event_call *call)
 }
 EXPORT_SYMBOL_GPL(trace_event_raw_init);
 
-int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
+int ftrace_event_reg(struct ftrace_event_call *call,
+		     enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -170,6 +171,9 @@ int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
 					    call->class->perf_probe,
 					    call);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
@@ -209,7 +213,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_stop_cmdline_record();
 				call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			call->class->reg(call, TRACE_REG_UNREGISTER);
+			call->class->reg(call, TRACE_REG_UNREGISTER, NULL);
 		}
 		break;
 	case 1:
@@ -218,7 +222,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_start_cmdline_record();
 				call->flags |= TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			ret = call->class->reg(call, TRACE_REG_REGISTER);
+			ret = call->class->reg(call, TRACE_REG_REGISTER, NULL);
 			if (ret) {
 				tracing_stop_cmdline_record();
 				pr_info("event trace: Could not enable event "
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 00d527c..5667f89 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1892,7 +1892,8 @@ static __kprobes void kretprobe_perf_func(struct kretprobe_instance *ri,
 #endif	/* CONFIG_PERF_EVENTS */
 
 static __kprobes
-int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
+int kprobe_register(struct ftrace_event_call *event,
+		    enum trace_reg type, void *data)
 {
 	struct trace_probe *tp = (struct trace_probe *)event->data;
 
@@ -1909,6 +1910,9 @@ int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
 	case TRACE_REG_PERF_UNREGISTER:
 		disable_trace_probe(tp, TP_FLAG_PROFILE);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index cb65454..6916b0d 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -17,9 +17,9 @@ static DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
 static DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 
 static int syscall_enter_define_fields(struct ftrace_event_call *call);
 static int syscall_exit_define_fields(struct ftrace_event_call *call);
@@ -649,7 +649,7 @@ void perf_sysexit_disable(struct ftrace_event_call *call)
 #endif /* CONFIG_PERF_EVENTS */
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -664,13 +664,16 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysenter_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
 }
 
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -685,6 +688,9 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysexit_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 4/7] ftrace, perf: Add add/del tracepoint perf registration actions
  2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
                             ` (2 preceding siblings ...)
  2012-01-18 18:44           ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
@ 2012-01-18 18:44           ` Jiri Olsa
  2012-01-18 18:44           ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
                             ` (4 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-18 18:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
perf event schedule in/out actions.

The add action is invoked for when the perf event is scheduled in,
while the del action is invoked when the event is scheduled out.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    2 ++
 kernel/trace/trace_event_perf.c |    4 +++-
 kernel/trace/trace_events.c     |    2 ++
 kernel/trace/trace_kprobe.c     |    2 ++
 kernel/trace/trace_syscalls.c   |    4 ++++
 5 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 195e360..2bf677c 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -148,6 +148,8 @@ enum trace_reg {
 	TRACE_REG_PERF_UNREGISTER,
 	TRACE_REG_PERF_OPEN,
 	TRACE_REG_PERF_CLOSE,
+	TRACE_REG_PERF_ADD,
+	TRACE_REG_PERF_DEL,
 };
 
 struct ftrace_event_call;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 0cfcc37..d72af0b 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -211,12 +211,14 @@ int perf_trace_add(struct perf_event *p_event, int flags)
 	list = this_cpu_ptr(pcpu_list);
 	hlist_add_head_rcu(&p_event->hlist_entry, list);
 
-	return 0;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_ADD, p_event);
 }
 
 void perf_trace_del(struct perf_event *p_event, int flags)
 {
+	struct ftrace_event_call *tp_event = p_event->tp_event;
 	hlist_del_rcu(&p_event->hlist_entry);
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_DEL, p_event);
 }
 
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 5138fea..079a93a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -173,6 +173,8 @@ int ftrace_event_reg(struct ftrace_event_call *call,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5667f89..580a05e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1912,6 +1912,8 @@ int kprobe_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 6916b0d..dbdd804 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -666,6 +666,8 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
@@ -690,6 +692,8 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf
  2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
                             ` (3 preceding siblings ...)
  2012-01-18 18:44           ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
@ 2012-01-18 18:44           ` Jiri Olsa
  2012-01-18 18:44           ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
                             ` (3 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-18 18:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding perf registration support for the ftrace function event,
so it is now possible to register it via perf interface.

The perf_event struct statically contains ftrace_ops as a handle
for function tracer. The function tracer is registered/unregistered
in open/close actions.

To be efficient, we enable/disable ftrace_ops each time the traced
process is scheduled in/out (via TRACE_REG_PERF_(ADD|DELL) handlers).
This way tracing is enabled only when the process is running.
Intentionally using this way instead of the event's hw state
PERF_HES_STOPPED, which would not disable the ftrace_ops.

It is now possible to use function trace within perf commands
like:

  perf record -e ftrace:function ls
  perf stat -e ftrace:function ls

Allowed only for root.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h      |    3 +
 kernel/trace/trace.h            |    2 +
 kernel/trace/trace_event_perf.c |   92 +++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_export.c     |   28 ++++++++++++
 4 files changed, 125 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0b91db2..5003be6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -858,6 +858,9 @@ struct perf_event {
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
 	struct event_filter		*filter;
+#ifdef CONFIG_FUNCTION_TRACER
+	struct ftrace_ops               ftrace_ops;
+#endif
 #endif
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 85732a8..e88e58a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -591,6 +591,8 @@ static inline int ftrace_trace_task(struct task_struct *task)
 static inline int ftrace_is_dead(void) { return 0; }
 #endif
 
+int ftrace_event_is_function(struct ftrace_event_call *call);
+
 /*
  * struct trace_parser - servers for reading the user input separated by spaces
  * @cont: set if the input is not complete - no final space char was found
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index d72af0b..57eb232 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -24,6 +24,11 @@ static int	total_ref_count;
 static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 				 struct perf_event *p_event)
 {
+	/* The ftrace function trace is allowed only for root. */
+	if (ftrace_event_is_function(tp_event) &&
+	    perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	/* No tracing, just counting, so no obvious leak */
 	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
 		return 0;
@@ -250,3 +255,90 @@ __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 	return raw_data;
 }
 EXPORT_SYMBOL_GPL(perf_trace_buf_prepare);
+
+
+static void
+perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_entry *entry;
+	struct hlist_head *head;
+	struct pt_regs regs;
+	int rctx;
+
+#define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
+		    sizeof(u64)) - sizeof(u32))
+
+	BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE);
+
+	perf_fetch_caller_regs(&regs);
+
+	entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);
+	if (!entry)
+		return;
+
+	entry->ip = ip;
+	entry->parent_ip = parent_ip;
+
+	head = this_cpu_ptr(event_function.perf_events);
+	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
+			      1, &regs, head);
+
+#undef ENTRY_SIZE
+}
+
+static int perf_ftrace_function_register(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+
+	ops->flags |= FTRACE_OPS_FL_CONTROL;
+	ops->func = perf_ftrace_function_call;
+	return register_ftrace_function(ops);
+}
+
+static int perf_ftrace_function_unregister(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	return unregister_ftrace_function(ops);
+}
+
+static void perf_ftrace_function_enable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	ftrace_function_enable(ops, smp_processor_id());
+}
+
+static void perf_ftrace_function_disable(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	ftrace_function_disable(ops, smp_processor_id());
+}
+
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data)
+{
+	int etype = call->event.type;
+
+	if (etype != TRACE_FN)
+		return -EINVAL;
+
+	switch (type) {
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+		return perf_ftrace_function_register(data);
+	case TRACE_REG_PERF_CLOSE:
+		return perf_ftrace_function_unregister(data);
+	case TRACE_REG_PERF_ADD:
+		perf_ftrace_function_enable(data);
+		return 0;
+	case TRACE_REG_PERF_DEL:
+		perf_ftrace_function_disable(data);
+		return 0;
+	}
+
+	return -EINVAL;
+}
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index bbeec31..867653c 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 
 #include "trace_entries.h"
 
+static int ftrace_event_class_register(struct ftrace_event_call *call,
+				       enum trace_reg type, void *data)
+{
+	switch (type) {
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
+#ifdef CONFIG_PERF_EVENTS
+		return perf_ftrace_event_register(call, type, data);
+#endif
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	}
+
+	return -EINVAL;
+}
+
 #undef __entry
 #define __entry REC
 
@@ -159,6 +181,7 @@ struct ftrace_event_class event_class_ftrace_##call = {			\
 	.system			= __stringify(TRACE_SYSTEM),		\
 	.define_fields		= ftrace_define_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
+	.reg			= ftrace_event_class_register,		\
 };									\
 									\
 struct ftrace_event_call __used event_##call = {			\
@@ -170,4 +193,9 @@ struct ftrace_event_call __used event_##call = {			\
 struct ftrace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
+int ftrace_event_is_function(struct ftrace_event_call *call)
+{
+	return call == &event_function;
+}
+
 #include "trace_entries.h"
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type
  2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
                             ` (4 preceding siblings ...)
  2012-01-18 18:44           ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
@ 2012-01-18 18:44           ` Jiri Olsa
  2012-01-18 18:44           ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
                             ` (2 subsequent siblings)
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-18 18:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding FILTER_TRACE_FN event field type for function tracepoint
event, so it can be properly recognized within filtering code.

Currently all fields of ftrace subsystem events share the common
field type FILTER_OTHER. Since the function trace fields need special
care within the filtering code we need to recognize it properly,
hence adding the FILTER_TRACE_FN event type.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h       |    1 +
 kernel/trace/trace_events_filter.c |    7 ++++++-
 kernel/trace/trace_export.c        |   25 ++++++++++++++++++++-----
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 2bf677c..dd478fc 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -245,6 +245,7 @@ enum {
 	FILTER_STATIC_STRING,
 	FILTER_DYN_STRING,
 	FILTER_PTR_STRING,
+	FILTER_TRACE_FN,
 };
 
 #define EVENT_STORAGE_SIZE 128
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 24aee71..eb04a2a 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -900,6 +900,11 @@ int filter_assign_type(const char *type)
 	return FILTER_OTHER;
 }
 
+static bool is_function_field(struct ftrace_event_field *field)
+{
+	return field->filter_type == FILTER_TRACE_FN;
+}
+
 static bool is_string_field(struct ftrace_event_field *field)
 {
 	return field->filter_type == FILTER_DYN_STRING ||
@@ -987,7 +992,7 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else {
+	} else if (!is_function_field(field)) {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 867653c..46c35e2 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -67,7 +67,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -77,7 +77,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -91,7 +91,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 		ret = trace_define_field(event_call, event_storage, #item, \
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 		mutex_unlock(&event_storage_mutex);			\
 		if (ret)						\
 			return ret;					\
@@ -104,7 +104,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -112,10 +112,24 @@ static void __always_unused ____ftrace_check_##name(void)	\
 #define __dynamic_array(type, item)					\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
-				 0, is_signed_type(type), FILTER_OTHER);\
+				 0, is_signed_type(type), filter_type);\
 	if (ret)							\
 		return ret;
 
+#define FILTER_TYPE_TRACE_FN           FILTER_TRACE_FN
+#define FILTER_TYPE_TRACE_GRAPH_ENT    FILTER_OTHER
+#define FILTER_TYPE_TRACE_GRAPH_RET    FILTER_OTHER
+#define FILTER_TYPE_TRACE_CTX          FILTER_OTHER
+#define FILTER_TYPE_TRACE_WAKE         FILTER_OTHER
+#define FILTER_TYPE_TRACE_STACK                FILTER_OTHER
+#define FILTER_TYPE_TRACE_USER_STACK   FILTER_OTHER
+#define FILTER_TYPE_TRACE_BPRINT       FILTER_OTHER
+#define FILTER_TYPE_TRACE_PRINT                FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_RW      FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_MAP     FILTER_OTHER
+#define FILTER_TYPE_TRACE_BRANCH       FILTER_OTHER
+#define FILTER_TYPE(arg)               FILTER_TYPE_##arg
+
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 int									\
@@ -123,6 +137,7 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 {									\
 	struct struct_name field;					\
 	int ret;							\
+	int filter_type = FILTER_TYPE(id);				\
 									\
 	tstruct;							\
 									\
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 7/7] ftrace, perf: Add filter support for function trace event
  2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
                             ` (5 preceding siblings ...)
  2012-01-18 18:44           ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
@ 2012-01-18 18:44           ` Jiri Olsa
  2012-01-18 21:43           ` [PATCHv6 0/7] ftrace, perf: Adding support to use function trace Steven Rostedt
  2012-01-28 18:43           ` [PATCHv7 " Jiri Olsa
  8 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-18 18:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

  ip == f1[, ]f2 ... || ip != f3[, ]f4 ...

with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"'.

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h             |    1 +
 kernel/trace/ftrace.c              |    6 ++
 kernel/trace/trace.h               |    2 -
 kernel/trace/trace_event_perf.c    |    4 +-
 kernel/trace/trace_events_filter.c |  169 +++++++++++++++++++++++++++++++++---
 5 files changed, 168 insertions(+), 14 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index d3f529c..60781ab 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -240,6 +240,7 @@ int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
+void ftrace_free_filter(struct ftrace_ops *ops);
 
 int register_ftrace_command(struct ftrace_func_command *cmd);
 int unregister_ftrace_command(struct ftrace_func_command *cmd);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 45c9b0c..9935a2a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1192,6 +1192,12 @@ static void free_ftrace_hash_rcu(struct ftrace_hash *hash)
 	call_rcu_sched(&hash->rcu, __free_ftrace_hash_rcu);
 }
 
+void ftrace_free_filter(struct ftrace_ops *ops)
+{
+	free_ftrace_hash(ops->filter_hash);
+	free_ftrace_hash(ops->notrace_hash);
+}
+
 static struct ftrace_hash *alloc_ftrace_hash(int size_bits)
 {
 	struct ftrace_hash *hash;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e88e58a..4ec6d18 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -770,9 +770,7 @@ struct filter_pred {
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	struct ftrace_event_field *field;
-#endif
 	int 			offset;
 	int 			not;
 	int 			op;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 57eb232..220b50a 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -298,7 +298,9 @@ static int perf_ftrace_function_register(struct perf_event *event)
 static int perf_ftrace_function_unregister(struct perf_event *event)
 {
 	struct ftrace_ops *ops = &event->ftrace_ops;
-	return unregister_ftrace_function(ops);
+	int ret = unregister_ftrace_function(ops);
+	ftrace_free_filter(ops);
+	return ret;
 }
 
 static void perf_ftrace_function_enable(struct perf_event *event)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index eb04a2a..c8a64ec 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -54,6 +54,13 @@ struct filter_op {
 	int precedence;
 };
 
+static struct filter_op filter_ftrace_ops[] = {
+	{ OP_OR,	"||",		1 },
+	{ OP_NE,	"!=",		2 },
+	{ OP_EQ,	"==",		2 },
+	{ OP_NONE,	"OP_NONE",	0 },
+};
+
 static struct filter_op filter_ops[] = {
 	{ OP_OR,	"||",		1 },
 	{ OP_AND,	"&&",		2 },
@@ -81,6 +88,7 @@ enum {
 	FILT_ERR_TOO_MANY_PREDS,
 	FILT_ERR_MISSING_FIELD,
 	FILT_ERR_INVALID_FILTER,
+	FILT_ERR_IP_FIELD_ONLY,
 };
 
 static char *err_text[] = {
@@ -96,6 +104,7 @@ static char *err_text[] = {
 	"Too many terms in predicate expression",
 	"Missing field name and/or value",
 	"Meaningless filter expression",
+	"Only 'ip' field is supported for function trace",
 };
 
 struct opstack_op {
@@ -992,7 +1001,12 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else if (!is_function_field(field)) {
+	} else if (is_function_field(field)) {
+		if (strcmp(field->name, "ip")) {
+			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
+			return -EINVAL;
+		}
+	} else {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
@@ -1339,10 +1353,7 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
 
 	strcpy(pred.regex.pattern, operand2);
 	pred.regex.len = strlen(pred.regex.pattern);
-
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	pred.field = field;
-#endif
 	return init_pred(ps, field, &pred) ? NULL : &pred;
 }
 
@@ -1743,8 +1754,8 @@ static int replace_system_preds(struct event_subsystem *system,
 	return -ENOMEM;
 }
 
-static int create_filter_start(char *filter_str, bool set_str,
-			       struct filter_parse_state **psp,
+static int create_filter_start(char *filter_str, struct filter_op *fops,
+			       bool set_str, struct filter_parse_state **psp,
 			       struct event_filter **filterp)
 {
 	struct event_filter *filter;
@@ -1770,7 +1781,7 @@ static int create_filter_start(char *filter_str, bool set_str,
 	*filterp = filter;
 	*psp = ps;
 
-	parse_init(ps, filter_ops, filter_str);
+	parse_init(ps, fops, filter_str);
 	err = filter_parse(ps);
 	if (err && set_str)
 		append_filter_err(ps, filter);
@@ -1808,9 +1819,13 @@ static int create_filter(struct ftrace_event_call *call,
 {
 	struct event_filter *filter = NULL;
 	struct filter_parse_state *ps = NULL;
+	struct filter_op *fops = filter_ops;
 	int err;
 
-	err = create_filter_start(filter_str, set_str, &ps, &filter);
+	if (ftrace_event_is_function(call))
+		fops = filter_ftrace_ops;
+
+	err = create_filter_start(filter_str, fops, set_str, &ps, &filter);
 	if (!err) {
 		err = replace_preds(call, filter, ps, filter_str, false);
 		if (err && set_str)
@@ -1838,7 +1853,7 @@ static int create_system_filter(struct event_subsystem *system,
 	struct filter_parse_state *ps = NULL;
 	int err;
 
-	err = create_filter_start(filter_str, true, &ps, &filter);
+	err = create_filter_start(filter_str, filter_ops, true, &ps, &filter);
 	if (!err) {
 		err = replace_system_preds(system, ps, filter_str);
 		if (!err) {
@@ -1955,6 +1970,131 @@ void ftrace_profile_free_filter(struct perf_event *event)
 	__free_filter(filter);
 }
 
+struct function_filter_data {
+	struct ftrace_ops *ops;
+	int first_filter;
+	int first_notrace;
+};
+
+static char **
+ftrace_function_filter_re(char *buf, int len, int *count)
+{
+	char *str, *sep, **re;
+
+	str = kstrndup(buf, len, GFP_KERNEL);
+	if (!str)
+		return NULL;
+
+	/*
+	 * The argv_split function takes white space
+	 * as a separator, so convert ',' into spaces.
+	 */
+	while ((sep = strchr(str, ',')))
+		*sep = ' ';
+
+	re = argv_split(GFP_KERNEL, str, count);
+	kfree(str);
+	return re;
+}
+
+static int ftrace_function_set_regexp(struct ftrace_ops *ops, int filter,
+				      int reset, char *re, int len)
+{
+	int ret;
+
+	if (filter)
+		ret = ftrace_set_filter(ops, re, len, reset);
+	else
+		ret = ftrace_set_notrace(ops, re, len, reset);
+
+	return ret;
+}
+
+static int __ftrace_function_set_filter(int filter, char *buf, int len,
+					struct function_filter_data *data)
+{
+	int i, re_cnt, ret;
+	int *reset;
+	char **re;
+
+	reset = filter ? &data->first_filter : &data->first_notrace;
+
+	/*
+	 * The 'ip' field could have multiple filters set, separated
+	 * either by space or comma. We first cut the filter and apply
+	 * all pieces separatelly.
+	 */
+	re = ftrace_function_filter_re(buf, len, &re_cnt);
+	if (!re)
+		return -EINVAL;
+
+	for (i = 0; i < re_cnt; i++) {
+		ret = ftrace_function_set_regexp(data->ops, filter, *reset,
+						 re[i], strlen(re[i]));
+		if (ret)
+			break;
+
+		if (*reset)
+			*reset = 0;
+	}
+
+	argv_free(re);
+	return ret;
+}
+
+static int ftrace_function_check_pred(struct filter_pred *pred)
+{
+	struct ftrace_event_field *field = pred->field;
+
+	/*
+	 * Check the predicate for function trace, verify:
+	 *  - only '==' and '!=' is used
+	 *  - the 'ip' field is used
+	 */
+	if (WARN((pred->op != OP_EQ) && (pred->op != OP_NE),
+		 "wrong operator for function filter: %d\n", pred->op))
+		return -EINVAL;
+
+	if (strcmp(field->name, "ip"))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int ftrace_function_set_filter_cb(enum move_type move,
+					 struct filter_pred *pred,
+					 int *err, void *data)
+{
+	if ((move != MOVE_DOWN) ||
+	    (pred->left != FILTER_PRED_INVALID))
+		return WALK_PRED_DEFAULT;
+
+	/* Double checking the predicate is valid for function trace. */
+	*err = ftrace_function_check_pred(pred);
+	if (*err)
+		return WALK_PRED_ABORT;
+
+	*err = __ftrace_function_set_filter(pred->op == OP_EQ,
+					    pred->regex.pattern,
+					    pred->regex.len,
+					    data);
+
+	return (*err) ? WALK_PRED_ABORT : WALK_PRED_DEFAULT;
+}
+
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	struct function_filter_data data = {
+		.first_filter  = 1,
+		.first_notrace = 1,
+		.ops           = &event->ftrace_ops,
+	};
+
+	return walk_pred_tree(filter->preds, filter->root,
+			      ftrace_function_set_filter_cb, &data);
+}
+
 int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 			      char *filter_str)
 {
@@ -1975,9 +2115,16 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 		goto out_unlock;
 
 	err = create_filter(call, filter_str, false, &filter);
-	if (!err)
-		event->filter = filter;
+	if (err)
+		goto free_filter;
+
+	if (ftrace_event_is_function(call))
+		err = ftrace_function_set_filter(event, filter);
 	else
+		event->filter = filter;
+
+free_filter:
+	if (err ||  ftrace_event_is_function(call))
 		__free_filter(filter);
 
 out_unlock:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCHv6 0/7] ftrace, perf: Adding support to use function trace
  2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
                             ` (6 preceding siblings ...)
  2012-01-18 18:44           ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2012-01-18 21:43           ` Steven Rostedt
  2012-01-28 18:43           ` [PATCHv7 " Jiri Olsa
  8 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2012-01-18 21:43 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jiri Olsa, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

Frederic,

Can you review and ACK this patch set (if you don't find anything wrong
with it). I can then run it through a bunch of tests.

Thanks,

-- Steve



On Wed, 2012-01-18 at 19:44 +0100, Jiri Olsa wrote:
> hi,
> here's another version of perf support for function trace
> with filter. 
> 
> attached patches:
>   1/7 ftrace: Change filter/notrace set functions to return exit code
>   2/7 ftrace: Add enable/disable ftrace_ops control interface
>   3/7 ftrace, perf: Add open/close tracepoint perf registration actions
>   4/7 ftrace, perf: Add add/del tracepoint perf registration actions
>   5/7 ftrace, perf: Add support to use function tracepoint in perf
>   6/7 ftrace, perf: Distinguish ftrace function event field type
>   7/7 ftrace, perf: Add filter support for function trace event
> 
> v6 changes:
>   2/7 - added comments to FTRACE_OPS_FL_* bits enum
>   5/7 - added more info to the change log regarding ftrace_ops enable/disable
>   7/7 - rebased to the latest filter changes
> 
> v5 changes:
>   7/7 - fixed to properly support ',' in filter expressions
> 
> v4 changes:
>   2/7 - FL_GLOBAL_CONTROL changed to FL_GLOBAL_CONTROL_MASK
>       - changed WARN_ON_ONCE() to include the !preempt_count()
>       - changed this_cpu_ptr to per_cpu_ptr
> 
>   ommited Fix possible NULL dereferencing in __ftrace_hash_rec_update
>   (2/8 in v3)
> 
> v3 changes:
>   3/8 - renamed __add/remove_ftrace_ops
>       - fixed preemtp_enable/recursion_clear order in ftrace_ops_control_func 
>       - renamed/commented API functions -  enable/disable_ftrace_function
>   
>   ommited graph tracer workarounf patch 10/10  
> 
> v2 changes:
>  01/10 - keeping the old fix instead of adding hash_has_contents func
>          I'll send separating patchset for this
>  02/10 - using different way to avoid the issue (3/9 in v1)
>  03/10 - using the way proposed by Steven for controling ftrace_ops
>          (4/9 in v1)
>  06/10 - added check ensuring the ftrace:function event could be used by
>          root only (7/9 in v1)
>  08/10 - added more description (8/9 in v1)
>  09/10 - changed '&&' operator to '||' which seems more suitable
>          in this case (9/9 in v1)
> 
> thanks,
> jirka
> ---
>  include/linux/ftrace.h             |   61 ++++++++++-
>  include/linux/ftrace_event.h       |    9 +-
>  include/linux/perf_event.h         |    3 +
>  kernel/trace/ftrace.c              |  140 +++++++++++++++++++++--
>  kernel/trace/trace.h               |   11 ++-
>  kernel/trace/trace_event_perf.c    |  214 +++++++++++++++++++++++++++++-------
>  kernel/trace/trace_events.c        |   12 ++-
>  kernel/trace/trace_events_filter.c |  172 +++++++++++++++++++++++++++--
>  kernel/trace/trace_export.c        |   53 ++++++++-
>  kernel/trace/trace_kprobe.c        |    8 +-
>  kernel/trace/trace_syscalls.c      |   18 +++-
>  11 files changed, 617 insertions(+), 84 deletions(-)



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code
  2012-01-18 18:44           ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2012-01-19 16:31             ` Frederic Weisbecker
  0 siblings, 0 replies; 186+ messages in thread
From: Frederic Weisbecker @ 2012-01-19 16:31 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, Jan 18, 2012 at 07:44:29PM +0100, Jiri Olsa wrote:
> Currently the ftrace_set_filter and ftrace_set_notrace functions
> do not return any return code. So there's no way for ftrace_ops
> user to tell wether the filter was correctly applied.
> 
> The set_ftrace_filter interface returns error in case the filter
> did not match:
> 
>   # echo krava > set_ftrace_filter
>   bash: echo: write error: Invalid argument
> 
> Changing both ftrace_set_filter and ftrace_set_notrace functions
> to return zero if the filter was applied correctly or -E* values
> in case of error.
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-18 18:44           ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2012-01-20 17:02             ` Frederic Weisbecker
  2012-01-25 23:13               ` Steven Rostedt
  0 siblings, 1 reply; 186+ messages in thread
From: Frederic Weisbecker @ 2012-01-20 17:02 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, Jan 18, 2012 at 07:44:30PM +0100, Jiri Olsa wrote:
> Adding a way to temporarily enable/disable ftrace_ops. The change
> follows the same way as 'global' ftrace_ops are done.
> 
> Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
> which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
> flag. In addition new per cpu flag called 'disabled' is also added to
> ftrace_ops to provide the control information for each cpu.
> 
> When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
> set as disabled for all cpus.
> 
> The ftrace_control_list contains all the registered 'control' ftrace_ops.
> The control_ops provides function which iterates ftrace_control_list
> and does the check for 'disabled' flag on current cpu.
> 
> Adding 2 inline functions ftrace_function_enable/ftrace_function_disable,
> which enable/disable the ftrace_ops for given cpu.
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  include/linux/ftrace.h |   56 ++++++++++++++++++++++
>  kernel/trace/ftrace.c  |  119 +++++++++++++++++++++++++++++++++++++++++++++---
>  kernel/trace/trace.h   |    2 +
>  3 files changed, 170 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index f33fb3b..d3f529c 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -31,16 +31,32 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
>  
>  typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
>  
> +/*
> + * FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
> + * set in the flags member.
> + *
> + * ENABLED - set/unset when ftrace_ops is registered/unregistered
> + * GLOBAL  - set manualy by ftrace_ops user to denote the ftrace_ops
> + *           is part of the global tracers sharing the same filter
> + *           via set_ftrace_* debugfs files.
> + * DYNAMIC - set when ftrace_ops is registered to denote dynamically
> + *           allocated ftrace_ops which need special care
> + * CONTROL - set manualy by ftrace_ops user to denote the ftrace_ops
> + *           could be controled by following calls:
> + *           ftrace_function_enable, ftrace_function_disable
> + */

Nice!

But I have some more comments:


>  enum {
>  	FTRACE_OPS_FL_ENABLED		= 1 << 0,
>  	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
>  	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
> +	FTRACE_OPS_FL_CONTROL		= 1 << 3,
>  };
>  
>  struct ftrace_ops {
>  	ftrace_func_t			func;
>  	struct ftrace_ops		*next;
>  	unsigned long			flags;
> +	void __percpu			*disabled;
>  #ifdef CONFIG_DYNAMIC_FTRACE
>  	struct ftrace_hash		*notrace_hash;
>  	struct ftrace_hash		*filter_hash;
> @@ -97,6 +113,46 @@ int register_ftrace_function(struct ftrace_ops *ops);
>  int unregister_ftrace_function(struct ftrace_ops *ops);
>  void clear_ftrace_function(void);
>  
> +/**
> + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> + *
> + * This function enables tracing on given cpu by decreasing
> + * the per cpu control variable.
> + * It must be called with preemption disabled and only on
> + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> + */
> +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> +{
> +	atomic_t *disabled;
> +
> +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
> +			 !preempt_count()))
> +		return;
> +
> +	disabled = per_cpu_ptr(ops->disabled, cpu);
> +	atomic_dec(disabled);
> +}

As you're using this for the local CPU exclusively, I suggest you rather
rename it to "ftrace_function_{dis,en}able_cpu(struct ftrace_ops *ops)"
and use __get_cpu_var() that does the preempt check for you.

[...]
> +static void control_ops_disable_all(struct ftrace_ops *ops)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu)
> +		atomic_set(per_cpu_ptr(ops->disabled, cpu), 1);
> +}
> +
> +static int control_ops_alloc(struct ftrace_ops *ops)
> +{
> +	atomic_t *disabled;
> +
> +	disabled = alloc_percpu(atomic_t);
> +	if (!disabled)
> +		return -ENOMEM;
> +
> +	ops->disabled = disabled;
> +	control_ops_disable_all(ops);
> +	return 0;
> +}
> +
> +static void control_ops_free(struct ftrace_ops *ops)
> +{
> +	free_percpu(ops->disabled);
> +}
> +
> +static int control_ops_is_disabled(struct ftrace_ops *ops, int cpu)
> +{
> +	atomic_t *disabled = per_cpu_ptr(ops->disabled, cpu);
> +	return atomic_read(disabled);

I think this is checked only locally. Better use __get_cpu_var().
Also note atomic_read() doesn't involve an smp barrier.

atomic_inc/dec are smp safe wrt. ordering. But atomic_set() and atomic_read()
are not. I believe this is safe because we still have PERF_HES_STOPPED check.

And also it seems we read the value from the same CPU we have set it. So
we actually don't need SMP ordering. But then this raise the question of
the relevance of using atomic ops. Normal values would do the trick.

[...]
>  static void
> +ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
> +{
> +	struct ftrace_ops *op;
> +	int cpu;
> +
> +	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
> +		return;
> +
> +	/*
> +	 * Some of the ops may be dynamically allocated,
> +	 * they must be freed after a synchronize_sched().
> +	 */
> +	preempt_disable_notrace();
> +	trace_recursion_set(TRACE_CONTROL_BIT);
> +	cpu = smp_processor_id();
> +	op = rcu_dereference_raw(ftrace_control_list);
> +	while (op != &ftrace_list_end) {
> +		if (!control_ops_is_disabled(op, cpu) &&
> +		    ftrace_ops_test(op, ip))
> +			op->func(ip, parent_ip);
> +
> +		op = rcu_dereference_raw(op->next);

Should it be rcu_dereference_sched() ?

> +	};
> +	trace_recursion_clear(TRACE_CONTROL_BIT);
> +	preempt_enable_notrace();
> +}
> +
> +static struct ftrace_ops control_ops = {
> +	.func = ftrace_ops_control_func,
> +};

So note this patch is optimizing for the off case (when
we have called pmu->del()), but at the cost of having an
impact in the on case with having at least one level
of multiplexing (and two on the worst case if we have ftrace
running in parallel but this is enough a corner case that we
don't care).

But this is perhaps still a win.

> +
> +static void
>  ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
>  {
>  	struct ftrace_ops *op;
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 2c26574..41c54e3 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -288,6 +288,8 @@ struct tracer {
>  /* for function tracing recursion */
>  #define TRACE_INTERNAL_BIT		(1<<11)
>  #define TRACE_GLOBAL_BIT		(1<<12)
> +#define TRACE_CONTROL_BIT		(1<<13)
> +
>  /*
>   * Abuse of the trace_recursion.
>   * As we need a way to maintain state if we are tracing the function
> -- 
> 1.7.1
> 

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 3/8] ftrace: Add enable/disable ftrace_ops control interface
  2011-12-21 16:01       ` Steven Rostedt
  2011-12-21 16:43         ` Jiri Olsa
@ 2012-01-24  1:26         ` Frederic Weisbecker
  1 sibling, 0 replies; 186+ messages in thread
From: Frederic Weisbecker @ 2012-01-24  1:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, Dec 21, 2011 at 11:01:33AM -0500, Steven Rostedt wrote:
> On Wed, 2011-12-21 at 12:48 +0100, Jiri Olsa wrote:
> > +static int control_ops_is_disabled(struct ftrace_ops *ops)
> > +{
> > +	atomic_t *disabled = this_cpu_ptr(ops->disabled);
> 
> Again, the use of "this_cpu_ptr" is wrong. Gah! We should nuke all of
> that crap.
> 

Is it? It includes the preemption check if CONFIG_DEBUG_PREEMPT, just
like __get_cpu_var()

Just saying that because in a later version of this patch, Jiri used
per_cpu_ptr(ops->disabled, cpu). And this is the wrong thing to do
given that we always fetch the local pointer and per_cpu_ptr() doesn't
check for preemption disabled.

 
> > +	return atomic_read(disabled);
> > +}


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-20 17:02             ` Frederic Weisbecker
@ 2012-01-25 23:13               ` Steven Rostedt
  2012-01-26  2:37                 ` Frederic Weisbecker
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2012-01-25 23:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jiri Olsa, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, 2012-01-20 at 18:02 +0100, Frederic Weisbecker wrote:
>  
> > +/**
> > + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> > + *
> > + * This function enables tracing on given cpu by decreasing
> > + * the per cpu control variable.
> > + * It must be called with preemption disabled and only on
> > + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> > + */
> > +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> > +{
> > +	atomic_t *disabled;
> > +
> > +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
> > +			 !preempt_count()))
> > +		return;
> > +
> > +	disabled = per_cpu_ptr(ops->disabled, cpu);
> > +	atomic_dec(disabled);
> > +}
> 
> As you're using this for the local CPU exclusively, I suggest you rather
> rename it to "ftrace_function_{dis,en}able_cpu(struct ftrace_ops *ops)"

I wonder if "ftrace_function_local_{dis,en}able(ops)" would be better?
That would match something like local_irq_disable/enable.

> and use __get_cpu_var() that does the preempt check for you.

Hmm, I haven't tried that with allocated per_cpu pointers before. If it
works, sure.

> 
> [...]
> > +static void control_ops_disable_all(struct ftrace_ops *ops)
> > +{
> > +	int cpu;
> > +
> > +	for_each_possible_cpu(cpu)
> > +		atomic_set(per_cpu_ptr(ops->disabled, cpu), 1);
> > +}
> > +
> > +static int control_ops_alloc(struct ftrace_ops *ops)
> > +{
> > +	atomic_t *disabled;
> > +
> > +	disabled = alloc_percpu(atomic_t);
> > +	if (!disabled)
> > +		return -ENOMEM;
> > +
> > +	ops->disabled = disabled;
> > +	control_ops_disable_all(ops);
> > +	return 0;
> > +}
> > +
> > +static void control_ops_free(struct ftrace_ops *ops)
> > +{
> > +	free_percpu(ops->disabled);
> > +}
> > +
> > +static int control_ops_is_disabled(struct ftrace_ops *ops, int cpu)
> > +{
> > +	atomic_t *disabled = per_cpu_ptr(ops->disabled, cpu);
> > +	return atomic_read(disabled);
> 
> I think this is checked only locally. Better use __get_cpu_var().

If it works, sure.

> Also note atomic_read() doesn't involve an smp barrier.

None needed, as this should all be done for the same CPU, and preemption
disabled.


> 
> atomic_inc/dec are smp safe wrt. ordering. But atomic_set() and atomic_read()
> are not. I believe this is safe because we still have PERF_HES_STOPPED check.

It should be safe because smp is not involved. We disable/enable
function tracing per cpu, and then check per cpu if it is running. The
same task will disable or enable it (I believe in the scheduler).

> 
> And also it seems we read the value from the same CPU we have set it. So
> we actually don't need SMP ordering. But then this raise the question of

Right.

> the relevance of using atomic ops. Normal values would do the trick.

Good point. The atomic here isn't needed.

> 
> [...]
> >  static void
> > +ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
> > +{
> > +	struct ftrace_ops *op;
> > +	int cpu;
> > +
> > +	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
> > +		return;
> > +
> > +	/*
> > +	 * Some of the ops may be dynamically allocated,
> > +	 * they must be freed after a synchronize_sched().
> > +	 */
> > +	preempt_disable_notrace();
> > +	trace_recursion_set(TRACE_CONTROL_BIT);
> > +	cpu = smp_processor_id();
> > +	op = rcu_dereference_raw(ftrace_control_list);
> > +	while (op != &ftrace_list_end) {
> > +		if (!control_ops_is_disabled(op, cpu) &&
> > +		    ftrace_ops_test(op, ip))
> > +			op->func(ip, parent_ip);
> > +
> > +		op = rcu_dereference_raw(op->next);
> 
> Should it be rcu_dereference_sched() ?

>From the comment posted by Paul McKenney who converted the global_list
ops (that does somewhat the same thing as the control ops here):

/*
 * Traverse the ftrace_global_list, invoking all entries.  The reason that we
 * can use rcu_dereference_raw() is that elements removed from this list
 * are simply leaked, so there is no need to interact with a grace-period
 * mechanism.  The rcu_dereference_raw() calls are needed to handle
 * concurrent insertions into the ftrace_global_list.
 *
 * Silly Alpha and silly pointer-speculation compiler optimizations!
 */


But then reading the commit he has:

    Replace the calls to read_barrier_depends() in
    ftrace_list_func() with rcu_dereference_raw() to improve
    readability.  The reason that we use rcu_dereference_raw() here
    is that removed entries are never freed, instead they are simply
    leaked.  This is one of a very few cases where use of
    rcu_dereference_raw() is the long-term right answer.  And I
    don't yet know of any others.  ;-)

Hmm, and I use the rcu_derefrence_raw() in other places in this file,
but those places now get freed. Although, I'm a bit nervous in changing
these to rcu_dereference_sched, because if CONFIG_DEBUG_LOCK_ALLOC is
enabled, then the checks will be done for *every function* called.

We obviously have preemption disabled, or other bad things may happen.
Wonder if we really need this?  Ftrace itself is a internal checker and
not truly a kernel component. It may be "exempt" from theses checks ;-)

I could make the switch and see what overhead this causes. It may live
lock the system. It wouldn't be the first time lockdep & ftrace live
locked the system. Or made it so unbearably slow. Lockdep and ftrace do
not play well together. They both are very intrusive. The two remind me
of the United States congress. Where there is two parties trying to take
control of everything, but nothing ever gets done. We end up with a
grid/live lock in the country/computer.
 
> 
> > +	};
> > +	trace_recursion_clear(TRACE_CONTROL_BIT);
> > +	preempt_enable_notrace();
> > +}
> > +
> > +static struct ftrace_ops control_ops = {
> > +	.func = ftrace_ops_control_func,
> > +};
> 
> So note this patch is optimizing for the off case (when
> we have called pmu->del()), but at the cost of having an
> impact in the on case with having at least one level
> of multiplexing (and two on the worst case if we have ftrace
> running in parallel but this is enough a corner case that we
> don't care).
> 
> But this is perhaps still a win.

There's a lot more overhead elsewhere that this shouldn't be an issue.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-25 23:13               ` Steven Rostedt
@ 2012-01-26  2:37                 ` Frederic Weisbecker
  2012-01-27 10:37                   ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Frederic Weisbecker @ 2012-01-26  2:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Wed, Jan 25, 2012 at 06:13:41PM -0500, Steven Rostedt wrote:
> On Fri, 2012-01-20 at 18:02 +0100, Frederic Weisbecker wrote:
> >  
> > > +/**
> > > + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> > > + *
> > > + * This function enables tracing on given cpu by decreasing
> > > + * the per cpu control variable.
> > > + * It must be called with preemption disabled and only on
> > > + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> > > + */
> > > +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> > > +{
> > > +	atomic_t *disabled;
> > > +
> > > +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
> > > +			 !preempt_count()))
> > > +		return;
> > > +
> > > +	disabled = per_cpu_ptr(ops->disabled, cpu);
> > > +	atomic_dec(disabled);
> > > +}
> > 
> > As you're using this for the local CPU exclusively, I suggest you rather
> > rename it to "ftrace_function_{dis,en}able_cpu(struct ftrace_ops *ops)"
> 
> I wonder if "ftrace_function_local_{dis,en}able(ops)" would be better?
> That would match something like local_irq_disable/enable.

Good idea.

> 
> > and use __get_cpu_var() that does the preempt check for you.
> 
> Hmm, I haven't tried that with allocated per_cpu pointers before. If it
> works, sure.

Yeah that works. In fact this_cpu_ptr() would do the best. Both this_cpu_ptr()
and __get_cpu_var() map to the same internal ops with preemption checks.

> > >  static void
> > > +ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
> > > +{
> > > +	struct ftrace_ops *op;
> > > +	int cpu;
> > > +
> > > +	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
> > > +		return;
> > > +
> > > +	/*
> > > +	 * Some of the ops may be dynamically allocated,
> > > +	 * they must be freed after a synchronize_sched().
> > > +	 */
> > > +	preempt_disable_notrace();
> > > +	trace_recursion_set(TRACE_CONTROL_BIT);
> > > +	cpu = smp_processor_id();
> > > +	op = rcu_dereference_raw(ftrace_control_list);
> > > +	while (op != &ftrace_list_end) {
> > > +		if (!control_ops_is_disabled(op, cpu) &&
> > > +		    ftrace_ops_test(op, ip))
> > > +			op->func(ip, parent_ip);
> > > +
> > > +		op = rcu_dereference_raw(op->next);
> > 
> > Should it be rcu_dereference_sched() ?
> 
> From the comment posted by Paul McKenney who converted the global_list
> ops (that does somewhat the same thing as the control ops here):
> 
> /*
>  * Traverse the ftrace_global_list, invoking all entries.  The reason that we
>  * can use rcu_dereference_raw() is that elements removed from this list
>  * are simply leaked, so there is no need to interact with a grace-period
>  * mechanism.  The rcu_dereference_raw() calls are needed to handle
>  * concurrent insertions into the ftrace_global_list.
>  *
>  * Silly Alpha and silly pointer-speculation compiler optimizations!
>  */

Ok.

> 
> 
> But then reading the commit he has:
> 
>     Replace the calls to read_barrier_depends() in
>     ftrace_list_func() with rcu_dereference_raw() to improve
>     readability.  The reason that we use rcu_dereference_raw() here
>     is that removed entries are never freed, instead they are simply
>     leaked.  This is one of a very few cases where use of
>     rcu_dereference_raw() is the long-term right answer.  And I
>     don't yet know of any others.  ;-)
> 
> Hmm, and I use the rcu_derefrence_raw() in other places in this file,
> but those places now get freed. Although, I'm a bit nervous in changing
> these to rcu_dereference_sched, because if CONFIG_DEBUG_LOCK_ALLOC is
> enabled, then the checks will be done for *every function* called.
> 
> We obviously have preemption disabled, or other bad things may happen.
> Wonder if we really need this?  Ftrace itself is a internal checker and
> not truly a kernel component. It may be "exempt" from theses checks ;-)
> 
> I could make the switch and see what overhead this causes. It may live
> lock the system. It wouldn't be the first time lockdep & ftrace live
> locked the system. Or made it so unbearably slow.

Yeah right. We can probably make this exception for function tracing by
keeping rcu_dereference_raw(). But now that it involves a grace period,
the comment probably should be updated :)

> Lockdep and ftrace do
> not play well together. They both are very intrusive. The two remind me
> of the United States congress. Where there is two parties trying to take
> control of everything, but nothing ever gets done. We end up with a
> grid/live lock in the country/computer.

:o)

> > 
> > > +	};
> > > +	trace_recursion_clear(TRACE_CONTROL_BIT);
> > > +	preempt_enable_notrace();
> > > +}
> > > +
> > > +static struct ftrace_ops control_ops = {
> > > +	.func = ftrace_ops_control_func,
> > > +};
> > 
> > So note this patch is optimizing for the off case (when
> > we have called pmu->del()), but at the cost of having an
> > impact in the on case with having at least one level
> > of multiplexing (and two on the worst case if we have ftrace
> > running in parallel but this is enough a corner case that we
> > don't care).
> > 
> > But this is perhaps still a win.
> 
> There's a lot more overhead elsewhere that this shouldn't be an issue.

Yeah that looks fine.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-26  2:37                 ` Frederic Weisbecker
@ 2012-01-27 10:37                   ` Jiri Olsa
  2012-01-27 10:38                     ` Jiri Olsa
  2012-01-27 16:40                     ` Frederic Weisbecker
  0 siblings, 2 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-27 10:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Thu, Jan 26, 2012 at 03:37:29AM +0100, Frederic Weisbecker wrote:
> On Wed, Jan 25, 2012 at 06:13:41PM -0500, Steven Rostedt wrote:
> > On Fri, 2012-01-20 at 18:02 +0100, Frederic Weisbecker wrote:
> > >  
> > > > +/**
> > > > + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> > > > + *
> > > > + * This function enables tracing on given cpu by decreasing
> > > > + * the per cpu control variable.
> > > > + * It must be called with preemption disabled and only on
> > > > + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> > > > + */
> > > > +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> > > > +{
> > > > +	atomic_t *disabled;
> > > > +
> > > > +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
> > > > +			 !preempt_count()))
> > > > +		return;
> > > > +
> > > > +	disabled = per_cpu_ptr(ops->disabled, cpu);
> > > > +	atomic_dec(disabled);
> > > > +}
> > > 
> > > As you're using this for the local CPU exclusively, I suggest you rather
> > > rename it to "ftrace_function_{dis,en}able_cpu(struct ftrace_ops *ops)"
> > 
> > I wonder if "ftrace_function_local_{dis,en}able(ops)" would be better?
> > That would match something like local_irq_disable/enable.
> 
> Good idea.
> 
> > 
> > > and use __get_cpu_var() that does the preempt check for you.

I haven't found preempt check this_cpu_ptr path.. not sure I missed it..
so I'm keeping the implicit preemt check.
Attaching new version of the patch (I'll need to resend whole serie once
this one is agreed).

changes:
 - using this_cpu_ptr to touch ftrace_ops::disable
 - renamed ftrace_ops:disable API:
	void ftrace_function_local_enable(struct ftrace_ops *ops)
	void ftrace_function_local_disable(struct ftrace_ops *ops)
	int  ftrace_function_local_disabled(struct ftrace_ops *ops)

thanks,
jirka


---
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index f33fb3b..63447bf 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -31,16 +31,32 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
 
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
 
+/*
+ * FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
+ * set in the flags member.
+ *
+ * ENABLED - set/unset when ftrace_ops is registered/unregistered
+ * GLOBAL  - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           is part of the global tracers sharing the same filter
+ *           via set_ftrace_* debugfs files.
+ * DYNAMIC - set when ftrace_ops is registered to denote dynamically
+ *           allocated ftrace_ops which need special care
+ * CONTROL - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           could be controled by following calls:
+ *           ftrace_function_enable, ftrace_function_disable
+ */
 enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	int __percpu			*disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +113,56 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+/**
+ * ftrace_function_local_enable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_local_enable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
+			 !preempt_count()))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))--;
+}
+
+/**
+ * ftrace_function_local_disable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_local_disable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
+			 !preempt_count()))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))++;
+}
+
+/**
+ * ftrace_function_local_disabled - returns ftrace_ops disabled value
+ *                                  on current cpu
+ *
+ * This function returns value of ftrace_ops::disabled on current cpu.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline int ftrace_function_local_disabled(struct ftrace_ops *ops)
+{
+	WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
+		     !preempt_count());
+
+	return *this_cpu_ptr(ops->disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e2e0597..c8d2af2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -62,6 +62,8 @@
 #define FTRACE_HASH_DEFAULT_BITS 10
 #define FTRACE_HASH_MAX_BITS 12
 
+#define FL_GLOBAL_CONTROL_MASK (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
+
 /* ftrace_enabled is a method to turn ftrace on or off */
 int ftrace_enabled __read_mostly;
 static int last_ftrace_enabled;
@@ -89,12 +91,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
 };
 
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
 ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
+static struct ftrace_ops control_ops;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
@@ -168,6 +172,32 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
 }
 #endif
 
+static void control_ops_disable_all(struct ftrace_ops *ops)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		*per_cpu_ptr(ops->disabled, cpu) = 1;
+}
+
+static int control_ops_alloc(struct ftrace_ops *ops)
+{
+	int __percpu *disabled;
+
+	disabled = alloc_percpu(int);
+	if (!disabled)
+		return -ENOMEM;
+
+	ops->disabled = disabled;
+	control_ops_disable_all(ops);
+	return 0;
+}
+
+static void control_ops_free(struct ftrace_ops *ops)
+{
+	free_percpu(ops->disabled);
+}
+
 static void update_global_ops(void)
 {
 	ftrace_func_t func;
@@ -259,6 +289,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	return 0;
 }
 
+static void add_ftrace_list_ops(struct ftrace_ops **list,
+				struct ftrace_ops *main_ops,
+				struct ftrace_ops *ops)
+{
+	int first = *list == &ftrace_list_end;
+	add_ftrace_ops(list, ops);
+	if (first)
+		add_ftrace_ops(&ftrace_ops_list, main_ops);
+}
+
+static int remove_ftrace_list_ops(struct ftrace_ops **list,
+				  struct ftrace_ops *main_ops,
+				  struct ftrace_ops *ops)
+{
+	int ret = remove_ftrace_ops(list, ops);
+	if (!ret && *list == &ftrace_list_end)
+		ret = remove_ftrace_ops(&ftrace_ops_list, main_ops);
+	return ret;
+}
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ftrace_disabled)
@@ -270,15 +320,20 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
 		return -EBUSY;
 
+	/* We don't support both control and global flags set. */
+	if ((ops->flags & FL_GLOBAL_CONTROL_MASK) == FL_GLOBAL_CONTROL_MASK)
+		return -EINVAL;
+
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		int first = ftrace_global_list == &ftrace_list_end;
-		add_ftrace_ops(&ftrace_global_list, ops);
+		add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
-		if (first)
-			add_ftrace_ops(&ftrace_ops_list, &global_ops);
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		if (control_ops_alloc(ops))
+			return -ENOMEM;
+		add_ftrace_list_ops(&ftrace_control_list, &control_ops, ops);
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -302,11 +357,23 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 		return -EINVAL;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		ret = remove_ftrace_ops(&ftrace_global_list, ops);
-		if (!ret && ftrace_global_list == &ftrace_list_end)
-			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+		ret = remove_ftrace_list_ops(&ftrace_global_list,
+					     &global_ops, ops);
 		if (!ret)
 			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		ret = remove_ftrace_list_ops(&ftrace_control_list,
+					     &control_ops, ops);
+		if (!ret) {
+			/*
+			 * The ftrace_ops is now removed from the list,
+			 * so there'll be no new users. We must ensure
+			 * all current users are done before we free
+			 * the control data.
+			 */
+			synchronize_sched();
+			control_ops_free(ops);
+		}
 	} else
 		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -3874,6 +3941,36 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
+ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_ops *op;
+
+	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
+		return;
+
+	/*
+	 * Some of the ops may be dynamically allocated,
+	 * they must be freed after a synchronize_sched().
+	 */
+	preempt_disable_notrace();
+	trace_recursion_set(TRACE_CONTROL_BIT);
+	op = rcu_dereference_raw(ftrace_control_list);
+	while (op != &ftrace_list_end) {
+		if (!ftrace_function_local_disabled(op) &&
+		    ftrace_ops_test(op, ip))
+			op->func(ip, parent_ip);
+
+		op = rcu_dereference_raw(op->next);
+	};
+	trace_recursion_clear(TRACE_CONTROL_BIT);
+	preempt_enable_notrace();
+}
+
+static struct ftrace_ops control_ops = {
+	.func = ftrace_ops_control_func,
+};
+
+static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 {
 	struct ftrace_ops *op;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b93ecba..55c6ea0 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ struct tracer {
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
 #define TRACE_GLOBAL_BIT		(1<<12)
+#define TRACE_CONTROL_BIT		(1<<13)
+
 /*
  * Abuse of the trace_recursion.
  * As we need a way to maintain state if we are tracing the function

^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-27 10:37                   ` Jiri Olsa
@ 2012-01-27 10:38                     ` Jiri Olsa
  2012-01-27 16:40                     ` Frederic Weisbecker
  1 sibling, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-27 10:38 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Jan 27, 2012 at 11:37:14AM +0100, Jiri Olsa wrote:
> On Thu, Jan 26, 2012 at 03:37:29AM +0100, Frederic Weisbecker wrote:
> > On Wed, Jan 25, 2012 at 06:13:41PM -0500, Steven Rostedt wrote:
> > > On Fri, 2012-01-20 at 18:02 +0100, Frederic Weisbecker wrote:
> > > >  
> > > > > +/**
> > > > > + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> > > > > + *
> > > > > + * This function enables tracing on given cpu by decreasing
> > > > > + * the per cpu control variable.
> > > > > + * It must be called with preemption disabled and only on
> > > > > + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> > > > > + */
> > > > > +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> > > > > +{
> > > > > +	atomic_t *disabled;
> > > > > +
> > > > > +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
> > > > > +			 !preempt_count()))
> > > > > +		return;
> > > > > +
> > > > > +	disabled = per_cpu_ptr(ops->disabled, cpu);
> > > > > +	atomic_dec(disabled);
> > > > > +}
> > > > 
> > > > As you're using this for the local CPU exclusively, I suggest you rather
> > > > rename it to "ftrace_function_{dis,en}able_cpu(struct ftrace_ops *ops)"
> > > 
> > > I wonder if "ftrace_function_local_{dis,en}able(ops)" would be better?
> > > That would match something like local_irq_disable/enable.
> > 
> > Good idea.
> > 
> > > 
> > > > and use __get_cpu_var() that does the preempt check for you.
> 
> I haven't found preempt check this_cpu_ptr path.. not sure I missed it..
> so I'm keeping the implicit preemt check.
> Attaching new version of the patch (I'll need to resend whole serie once
> this one is agreed).
> 
> changes:
>  - using this_cpu_ptr to touch ftrace_ops::disable
>  - renamed ftrace_ops:disable API:
> 	void ftrace_function_local_enable(struct ftrace_ops *ops)
> 	void ftrace_function_local_disable(struct ftrace_ops *ops)
> 	int  ftrace_function_local_disabled(struct ftrace_ops *ops)

ops, missed one... using int instead of atomic_t for ftrace_ops::disable

jirka

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-27 10:37                   ` Jiri Olsa
  2012-01-27 10:38                     ` Jiri Olsa
@ 2012-01-27 16:40                     ` Frederic Weisbecker
  2012-01-27 16:54                       ` Jiri Olsa
  1 sibling, 1 reply; 186+ messages in thread
From: Frederic Weisbecker @ 2012-01-27 16:40 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Jan 27, 2012 at 11:37:14AM +0100, Jiri Olsa wrote:
> On Thu, Jan 26, 2012 at 03:37:29AM +0100, Frederic Weisbecker wrote:
> > On Wed, Jan 25, 2012 at 06:13:41PM -0500, Steven Rostedt wrote:
> > > On Fri, 2012-01-20 at 18:02 +0100, Frederic Weisbecker wrote:
> > > >  
> > > > > +/**
> > > > > + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> > > > > + *
> > > > > + * This function enables tracing on given cpu by decreasing
> > > > > + * the per cpu control variable.
> > > > > + * It must be called with preemption disabled and only on
> > > > > + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> > > > > + */
> > > > > +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> > > > > +{
> > > > > +	atomic_t *disabled;
> > > > > +
> > > > > +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
> > > > > +			 !preempt_count()))
> > > > > +		return;
> > > > > +
> > > > > +	disabled = per_cpu_ptr(ops->disabled, cpu);
> > > > > +	atomic_dec(disabled);
> > > > > +}
> > > > 
> > > > As you're using this for the local CPU exclusively, I suggest you rather
> > > > rename it to "ftrace_function_{dis,en}able_cpu(struct ftrace_ops *ops)"
> > > 
> > > I wonder if "ftrace_function_local_{dis,en}able(ops)" would be better?
> > > That would match something like local_irq_disable/enable.
> > 
> > Good idea.
> > 
> > > 
> > > > and use __get_cpu_var() that does the preempt check for you.
> 
> I haven't found preempt check this_cpu_ptr path.. not sure I missed it..
> so I'm keeping the implicit preemt check.

#ifdef CONFIG_DEBUG_PREEMPT
#define my_cpu_offset per_cpu_offset(smp_processor_id())
#else
#define my_cpu_offset __my_cpu_offset
#endif

#ifdef CONFIG_DEBUG_PREEMPT
#define this_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, my_cpu_offset)
#else
#define this_cpu_ptr(ptr) __this_cpu_ptr(ptr)
#endif

And smp_processor_id() has a preemption check.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-27 16:40                     ` Frederic Weisbecker
@ 2012-01-27 16:54                       ` Jiri Olsa
  2012-01-27 17:02                         ` Frederic Weisbecker
  2012-01-27 17:21                         ` Steven Rostedt
  0 siblings, 2 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-27 16:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Jan 27, 2012 at 05:40:49PM +0100, Frederic Weisbecker wrote:
> On Fri, Jan 27, 2012 at 11:37:14AM +0100, Jiri Olsa wrote:
> > On Thu, Jan 26, 2012 at 03:37:29AM +0100, Frederic Weisbecker wrote:
> > > On Wed, Jan 25, 2012 at 06:13:41PM -0500, Steven Rostedt wrote:
> > > > On Fri, 2012-01-20 at 18:02 +0100, Frederic Weisbecker wrote:
> > > > >  
> > > > > > +/**
> > > > > > + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> > > > > > + *
> > > > > > + * This function enables tracing on given cpu by decreasing
> > > > > > + * the per cpu control variable.
> > > > > > + * It must be called with preemption disabled and only on
> > > > > > + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> > > > > > + */
> > > > > > +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> > > > > > +{
> > > > > > +	atomic_t *disabled;
> > > > > > +
> > > > > > +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
> > > > > > +			 !preempt_count()))
> > > > > > +		return;
> > > > > > +
> > > > > > +	disabled = per_cpu_ptr(ops->disabled, cpu);
> > > > > > +	atomic_dec(disabled);
> > > > > > +}
> > > > > 
> > > > > As you're using this for the local CPU exclusively, I suggest you rather
> > > > > rename it to "ftrace_function_{dis,en}able_cpu(struct ftrace_ops *ops)"
> > > > 
> > > > I wonder if "ftrace_function_local_{dis,en}able(ops)" would be better?
> > > > That would match something like local_irq_disable/enable.
> > > 
> > > Good idea.
> > > 
> > > > 
> > > > > and use __get_cpu_var() that does the preempt check for you.
> > 
> > I haven't found preempt check this_cpu_ptr path.. not sure I missed it..
> > so I'm keeping the implicit preemt check.
> 
> #ifdef CONFIG_DEBUG_PREEMPT
> #define my_cpu_offset per_cpu_offset(smp_processor_id())
> #else
> #define my_cpu_offset __my_cpu_offset
> #endif
> 
> #ifdef CONFIG_DEBUG_PREEMPT
> #define this_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, my_cpu_offset)
> #else
> #define this_cpu_ptr(ptr) __this_cpu_ptr(ptr)
> #endif
> 
> And smp_processor_id() has a preemption check.

yay.. ok :) so this one is triggered only if there's CONFIG_DEBUG_PREEMPT
option enabled.. seems to me it'd better to keep the implicit check anyway.

jirka

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-27 16:54                       ` Jiri Olsa
@ 2012-01-27 17:02                         ` Frederic Weisbecker
  2012-01-27 17:20                           ` Jiri Olsa
  2012-01-27 17:21                         ` Steven Rostedt
  1 sibling, 1 reply; 186+ messages in thread
From: Frederic Weisbecker @ 2012-01-27 17:02 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Jan 27, 2012 at 05:54:16PM +0100, Jiri Olsa wrote:
> On Fri, Jan 27, 2012 at 05:40:49PM +0100, Frederic Weisbecker wrote:
> > On Fri, Jan 27, 2012 at 11:37:14AM +0100, Jiri Olsa wrote:
> > > On Thu, Jan 26, 2012 at 03:37:29AM +0100, Frederic Weisbecker wrote:
> > > > On Wed, Jan 25, 2012 at 06:13:41PM -0500, Steven Rostedt wrote:
> > > > > On Fri, 2012-01-20 at 18:02 +0100, Frederic Weisbecker wrote:
> > > > > >  
> > > > > > > +/**
> > > > > > > + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> > > > > > > + *
> > > > > > > + * This function enables tracing on given cpu by decreasing
> > > > > > > + * the per cpu control variable.
> > > > > > > + * It must be called with preemption disabled and only on
> > > > > > > + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> > > > > > > + */
> > > > > > > +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> > > > > > > +{
> > > > > > > +	atomic_t *disabled;
> > > > > > > +
> > > > > > > +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
> > > > > > > +			 !preempt_count()))
> > > > > > > +		return;
> > > > > > > +
> > > > > > > +	disabled = per_cpu_ptr(ops->disabled, cpu);
> > > > > > > +	atomic_dec(disabled);
> > > > > > > +}
> > > > > > 
> > > > > > As you're using this for the local CPU exclusively, I suggest you rather
> > > > > > rename it to "ftrace_function_{dis,en}able_cpu(struct ftrace_ops *ops)"
> > > > > 
> > > > > I wonder if "ftrace_function_local_{dis,en}able(ops)" would be better?
> > > > > That would match something like local_irq_disable/enable.
> > > > 
> > > > Good idea.
> > > > 
> > > > > 
> > > > > > and use __get_cpu_var() that does the preempt check for you.
> > > 
> > > I haven't found preempt check this_cpu_ptr path.. not sure I missed it..
> > > so I'm keeping the implicit preemt check.
> > 
> > #ifdef CONFIG_DEBUG_PREEMPT
> > #define my_cpu_offset per_cpu_offset(smp_processor_id())
> > #else
> > #define my_cpu_offset __my_cpu_offset
> > #endif
> > 
> > #ifdef CONFIG_DEBUG_PREEMPT
> > #define this_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, my_cpu_offset)
> > #else
> > #define this_cpu_ptr(ptr) __this_cpu_ptr(ptr)
> > #endif
> > 
> > And smp_processor_id() has a preemption check.
> 
> yay.. ok :) so this one is triggered only if there's CONFIG_DEBUG_PREEMPT
> option enabled.. seems to me it'd better to keep the implicit check anyway.
> 
> jirka


This is a debugging option deemed to lower runtime debugging checks in
production.

Is there a good reason to keep the check on every case?

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-27 17:02                         ` Frederic Weisbecker
@ 2012-01-27 17:20                           ` Jiri Olsa
  2012-01-28 16:39                             ` Frederic Weisbecker
  0 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-27 17:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Jan 27, 2012 at 06:02:04PM +0100, Frederic Weisbecker wrote:
> On Fri, Jan 27, 2012 at 05:54:16PM +0100, Jiri Olsa wrote:
> > On Fri, Jan 27, 2012 at 05:40:49PM +0100, Frederic Weisbecker wrote:
> > > On Fri, Jan 27, 2012 at 11:37:14AM +0100, Jiri Olsa wrote:
> > > > On Thu, Jan 26, 2012 at 03:37:29AM +0100, Frederic Weisbecker wrote:
> > > > > On Wed, Jan 25, 2012 at 06:13:41PM -0500, Steven Rostedt wrote:
> > > > > > On Fri, 2012-01-20 at 18:02 +0100, Frederic Weisbecker wrote:
> > > > > > >  
> > > > > > > > +/**
> > > > > > > > + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> > > > > > > > + *
> > > > > > > > + * This function enables tracing on given cpu by decreasing
> > > > > > > > + * the per cpu control variable.
> > > > > > > > + * It must be called with preemption disabled and only on
> > > > > > > > + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> > > > > > > > + */
> > > > > > > > +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> > > > > > > > +{
> > > > > > > > +	atomic_t *disabled;
> > > > > > > > +
> > > > > > > > +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
> > > > > > > > +			 !preempt_count()))
> > > > > > > > +		return;
> > > > > > > > +
> > > > > > > > +	disabled = per_cpu_ptr(ops->disabled, cpu);
> > > > > > > > +	atomic_dec(disabled);
> > > > > > > > +}
> > > > > > > 
> > > > > > > As you're using this for the local CPU exclusively, I suggest you rather
> > > > > > > rename it to "ftrace_function_{dis,en}able_cpu(struct ftrace_ops *ops)"
> > > > > > 
> > > > > > I wonder if "ftrace_function_local_{dis,en}able(ops)" would be better?
> > > > > > That would match something like local_irq_disable/enable.
> > > > > 
> > > > > Good idea.
> > > > > 
> > > > > > 
> > > > > > > and use __get_cpu_var() that does the preempt check for you.
> > > > 
> > > > I haven't found preempt check this_cpu_ptr path.. not sure I missed it..
> > > > so I'm keeping the implicit preemt check.
> > > 
> > > #ifdef CONFIG_DEBUG_PREEMPT
> > > #define my_cpu_offset per_cpu_offset(smp_processor_id())
> > > #else
> > > #define my_cpu_offset __my_cpu_offset
> > > #endif
> > > 
> > > #ifdef CONFIG_DEBUG_PREEMPT
> > > #define this_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, my_cpu_offset)
> > > #else
> > > #define this_cpu_ptr(ptr) __this_cpu_ptr(ptr)
> > > #endif
> > > 
> > > And smp_processor_id() has a preemption check.
> > 
> > yay.. ok :) so this one is triggered only if there's CONFIG_DEBUG_PREEMPT
> > option enabled.. seems to me it'd better to keep the implicit check anyway.
> > 
> > jirka
> 
> 
> This is a debugging option deemed to lower runtime debugging checks in
> production.
> 
> Is there a good reason to keep the check on every case?

none I guess, apart from me feeling better.. ;)
attached new version without the preemt_count check int the WARN_ON_ONCE

thanks,
jirka


---
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index f33fb3b..d95df4b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -31,16 +31,32 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
 
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
 
+/*
+ * FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
+ * set in the flags member.
+ *
+ * ENABLED - set/unset when ftrace_ops is registered/unregistered
+ * GLOBAL  - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           is part of the global tracers sharing the same filter
+ *           via set_ftrace_* debugfs files.
+ * DYNAMIC - set when ftrace_ops is registered to denote dynamically
+ *           allocated ftrace_ops which need special care
+ * CONTROL - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           could be controled by following calls:
+ *           ftrace_function_enable, ftrace_function_disable
+ */
 enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	int __percpu			*disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +113,52 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+/**
+ * ftrace_function_local_enable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_local_enable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))--;
+}
+
+/**
+ * ftrace_function_local_disable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline void ftrace_function_local_disable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))++;
+}
+
+/**
+ * ftrace_function_local_disabled - returns ftrace_ops disabled value
+ *                                  on current cpu
+ *
+ * This function returns value of ftrace_ops::disabled on current cpu.
+ * It must be called with preemption disabled and only on
+ * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
+ */
+static inline int ftrace_function_local_disabled(struct ftrace_ops *ops)
+{
+	WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL));
+	return *this_cpu_ptr(ops->disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e2e0597..c8d2af2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -62,6 +62,8 @@
 #define FTRACE_HASH_DEFAULT_BITS 10
 #define FTRACE_HASH_MAX_BITS 12
 
+#define FL_GLOBAL_CONTROL_MASK (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
+
 /* ftrace_enabled is a method to turn ftrace on or off */
 int ftrace_enabled __read_mostly;
 static int last_ftrace_enabled;
@@ -89,12 +91,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
 };
 
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
 ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
+static struct ftrace_ops control_ops;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
@@ -168,6 +172,32 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
 }
 #endif
 
+static void control_ops_disable_all(struct ftrace_ops *ops)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		*per_cpu_ptr(ops->disabled, cpu) = 1;
+}
+
+static int control_ops_alloc(struct ftrace_ops *ops)
+{
+	int __percpu *disabled;
+
+	disabled = alloc_percpu(int);
+	if (!disabled)
+		return -ENOMEM;
+
+	ops->disabled = disabled;
+	control_ops_disable_all(ops);
+	return 0;
+}
+
+static void control_ops_free(struct ftrace_ops *ops)
+{
+	free_percpu(ops->disabled);
+}
+
 static void update_global_ops(void)
 {
 	ftrace_func_t func;
@@ -259,6 +289,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	return 0;
 }
 
+static void add_ftrace_list_ops(struct ftrace_ops **list,
+				struct ftrace_ops *main_ops,
+				struct ftrace_ops *ops)
+{
+	int first = *list == &ftrace_list_end;
+	add_ftrace_ops(list, ops);
+	if (first)
+		add_ftrace_ops(&ftrace_ops_list, main_ops);
+}
+
+static int remove_ftrace_list_ops(struct ftrace_ops **list,
+				  struct ftrace_ops *main_ops,
+				  struct ftrace_ops *ops)
+{
+	int ret = remove_ftrace_ops(list, ops);
+	if (!ret && *list == &ftrace_list_end)
+		ret = remove_ftrace_ops(&ftrace_ops_list, main_ops);
+	return ret;
+}
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ftrace_disabled)
@@ -270,15 +320,20 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
 		return -EBUSY;
 
+	/* We don't support both control and global flags set. */
+	if ((ops->flags & FL_GLOBAL_CONTROL_MASK) == FL_GLOBAL_CONTROL_MASK)
+		return -EINVAL;
+
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		int first = ftrace_global_list == &ftrace_list_end;
-		add_ftrace_ops(&ftrace_global_list, ops);
+		add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
-		if (first)
-			add_ftrace_ops(&ftrace_ops_list, &global_ops);
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		if (control_ops_alloc(ops))
+			return -ENOMEM;
+		add_ftrace_list_ops(&ftrace_control_list, &control_ops, ops);
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -302,11 +357,23 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 		return -EINVAL;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		ret = remove_ftrace_ops(&ftrace_global_list, ops);
-		if (!ret && ftrace_global_list == &ftrace_list_end)
-			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+		ret = remove_ftrace_list_ops(&ftrace_global_list,
+					     &global_ops, ops);
 		if (!ret)
 			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		ret = remove_ftrace_list_ops(&ftrace_control_list,
+					     &control_ops, ops);
+		if (!ret) {
+			/*
+			 * The ftrace_ops is now removed from the list,
+			 * so there'll be no new users. We must ensure
+			 * all current users are done before we free
+			 * the control data.
+			 */
+			synchronize_sched();
+			control_ops_free(ops);
+		}
 	} else
 		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -3874,6 +3941,36 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
+ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_ops *op;
+
+	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
+		return;
+
+	/*
+	 * Some of the ops may be dynamically allocated,
+	 * they must be freed after a synchronize_sched().
+	 */
+	preempt_disable_notrace();
+	trace_recursion_set(TRACE_CONTROL_BIT);
+	op = rcu_dereference_raw(ftrace_control_list);
+	while (op != &ftrace_list_end) {
+		if (!ftrace_function_local_disabled(op) &&
+		    ftrace_ops_test(op, ip))
+			op->func(ip, parent_ip);
+
+		op = rcu_dereference_raw(op->next);
+	};
+	trace_recursion_clear(TRACE_CONTROL_BIT);
+	preempt_enable_notrace();
+}
+
+static struct ftrace_ops control_ops = {
+	.func = ftrace_ops_control_func,
+};
+
+static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 {
 	struct ftrace_ops *op;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b93ecba..55c6ea0 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ struct tracer {
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
 #define TRACE_GLOBAL_BIT		(1<<12)
+#define TRACE_CONTROL_BIT		(1<<13)
+
 /*
  * Abuse of the trace_recursion.
  * As we need a way to maintain state if we are tracing the function

^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-27 16:54                       ` Jiri Olsa
  2012-01-27 17:02                         ` Frederic Weisbecker
@ 2012-01-27 17:21                         ` Steven Rostedt
  1 sibling, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2012-01-27 17:21 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Frederic Weisbecker, mingo, paulus, acme, a.p.zijlstra,
	linux-kernel, aarapov

On Fri, 2012-01-27 at 17:54 +0100, Jiri Olsa wrote:

> yay.. ok :) so this one is triggered only if there's CONFIG_DEBUG_PREEMPT
> option enabled.. seems to me it'd better to keep the implicit check anyway.

If the per_cpu pointer usage already warns if preemption is not
disabled, then we don't need the extra check. I think I was the one to
recommend adding it, but if the warning is already there, I don't think
it is necessary. You can still keep a comment, and even say, the per_cpu
pointer usage will complain when CONFIG_DEBUG_PREEMPT is enabled, if
this is called without preemption disabled.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-27 17:20                           ` Jiri Olsa
@ 2012-01-28 16:39                             ` Frederic Weisbecker
  0 siblings, 0 replies; 186+ messages in thread
From: Frederic Weisbecker @ 2012-01-28 16:39 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Jan 27, 2012 at 06:20:12PM +0100, Jiri Olsa wrote:
> On Fri, Jan 27, 2012 at 06:02:04PM +0100, Frederic Weisbecker wrote:
> > On Fri, Jan 27, 2012 at 05:54:16PM +0100, Jiri Olsa wrote:
> > > On Fri, Jan 27, 2012 at 05:40:49PM +0100, Frederic Weisbecker wrote:
> > > > On Fri, Jan 27, 2012 at 11:37:14AM +0100, Jiri Olsa wrote:
> > > > > On Thu, Jan 26, 2012 at 03:37:29AM +0100, Frederic Weisbecker wrote:
> > > > > > On Wed, Jan 25, 2012 at 06:13:41PM -0500, Steven Rostedt wrote:
> > > > > > > On Fri, 2012-01-20 at 18:02 +0100, Frederic Weisbecker wrote:
> > > > > > > >  
> > > > > > > > > +/**
> > > > > > > > > + * ftrace_function_enable - enable controlled ftrace_ops on given cpu
> > > > > > > > > + *
> > > > > > > > > + * This function enables tracing on given cpu by decreasing
> > > > > > > > > + * the per cpu control variable.
> > > > > > > > > + * It must be called with preemption disabled and only on
> > > > > > > > > + * ftrace_ops registered with FTRACE_OPS_FL_CONTROL.
> > > > > > > > > + */
> > > > > > > > > +static inline void ftrace_function_enable(struct ftrace_ops *ops, int cpu)
> > > > > > > > > +{
> > > > > > > > > +	atomic_t *disabled;
> > > > > > > > > +
> > > > > > > > > +	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL) ||
> > > > > > > > > +			 !preempt_count()))
> > > > > > > > > +		return;
> > > > > > > > > +
> > > > > > > > > +	disabled = per_cpu_ptr(ops->disabled, cpu);
> > > > > > > > > +	atomic_dec(disabled);
> > > > > > > > > +}
> > > > > > > > 
> > > > > > > > As you're using this for the local CPU exclusively, I suggest you rather
> > > > > > > > rename it to "ftrace_function_{dis,en}able_cpu(struct ftrace_ops *ops)"
> > > > > > > 
> > > > > > > I wonder if "ftrace_function_local_{dis,en}able(ops)" would be better?
> > > > > > > That would match something like local_irq_disable/enable.
> > > > > > 
> > > > > > Good idea.
> > > > > > 
> > > > > > > 
> > > > > > > > and use __get_cpu_var() that does the preempt check for you.
> > > > > 
> > > > > I haven't found preempt check this_cpu_ptr path.. not sure I missed it..
> > > > > so I'm keeping the implicit preemt check.
> > > > 
> > > > #ifdef CONFIG_DEBUG_PREEMPT
> > > > #define my_cpu_offset per_cpu_offset(smp_processor_id())
> > > > #else
> > > > #define my_cpu_offset __my_cpu_offset
> > > > #endif
> > > > 
> > > > #ifdef CONFIG_DEBUG_PREEMPT
> > > > #define this_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, my_cpu_offset)
> > > > #else
> > > > #define this_cpu_ptr(ptr) __this_cpu_ptr(ptr)
> > > > #endif
> > > > 
> > > > And smp_processor_id() has a preemption check.
> > > 
> > > yay.. ok :) so this one is triggered only if there's CONFIG_DEBUG_PREEMPT
> > > option enabled.. seems to me it'd better to keep the implicit check anyway.
> > > 
> > > jirka
> > 
> > 
> > This is a debugging option deemed to lower runtime debugging checks in
> > production.
> > 
> > Is there a good reason to keep the check on every case?
> 
> none I guess, apart from me feeling better.. ;)
> attached new version without the preemt_count check int the WARN_ON_ONCE

Looks good.

Thanks.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCHv7 0/7] ftrace, perf: Adding support to use function trace
  2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
                             ` (7 preceding siblings ...)
  2012-01-18 21:43           ` [PATCHv6 0/7] ftrace, perf: Adding support to use function trace Steven Rostedt
@ 2012-01-28 18:43           ` Jiri Olsa
  2012-01-28 18:43             ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
                               ` (7 more replies)
  8 siblings, 8 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-28 18:43 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

hi,
here's another version of perf support for function trace
with filter. 

attached patches:
  1/7 ftrace: Change filter/notrace set functions to return exit code
  2/7 ftrace: Add enable/disable ftrace_ops control interface
  3/7 ftrace, perf: Add open/close tracepoint perf registration actions
  4/7 ftrace, perf: Add add/del tracepoint perf registration actions
  5/7 ftrace, perf: Add support to use function tracepoint in perf
  6/7 ftrace, perf: Distinguish ftrace function event field type
  7/7 ftrace, perf: Add filter support for function trace event

v7 changes:
  2/7 - using int instead of atomic_t for ftrace_ops::disable
      - using this_cpu_ptr to touch ftrace_ops::disable
      - renamed ftrace_ops:disable API
          void ftrace_function_local_enable(struct ftrace_ops *ops)
          void ftrace_function_local_disable(struct ftrace_ops *ops)
          int  ftrace_function_local_disabled(struct ftrace_ops *ops)

v6 changes:
  2/7 - added comments to FTRACE_OPS_FL_* bits enum
  5/7 - added more info to the change log regarding ftrace_ops enable/disable
  7/7 - rebased to the latest filter changes

v5 changes:
  7/7 - fixed to properly support ',' in filter expressions

v4 changes:
  2/7 - FL_GLOBAL_CONTROL changed to FL_GLOBAL_CONTROL_MASK
      - changed WARN_ON_ONCE() to include the !preempt_count()
      - changed this_cpu_ptr to per_cpu_ptr

  ommited Fix possible NULL dereferencing in __ftrace_hash_rec_update
  (2/8 in v3)

v3 changes:
  3/8 - renamed __add/remove_ftrace_ops
      - fixed preemtp_enable/recursion_clear order in ftrace_ops_control_func 
      - renamed/commented API functions -  enable/disable_ftrace_function
  
  ommited graph tracer workarounf patch 10/10  

v2 changes:
 01/10 - keeping the old fix instead of adding hash_has_contents func
         I'll send separating patchset for this
 02/10 - using different way to avoid the issue (3/9 in v1)
 03/10 - using the way proposed by Steven for controling ftrace_ops
         (4/9 in v1)
 06/10 - added check ensuring the ftrace:function event could be used by
         root only (7/9 in v1)
 08/10 - added more description (8/9 in v1)
 09/10 - changed '&&' operator to '||' which seems more suitable
         in this case (9/9 in v1)

thanks,
jirka
---
 include/linux/ftrace.h             |   70 ++++++++++++-
 include/linux/ftrace_event.h       |    9 +-
 include/linux/perf_event.h         |    3 +
 kernel/trace/ftrace.c              |  132 ++++++++++++++++++++--
 kernel/trace/trace.h               |   11 ++-
 kernel/trace/trace_event_perf.c    |  212 +++++++++++++++++++++++++++++-------
 kernel/trace/trace_events.c        |   12 ++-
 kernel/trace/trace_events_filter.c |  172 +++++++++++++++++++++++++++--
 kernel/trace/trace_export.c        |   53 ++++++++-
 kernel/trace/trace_kprobe.c        |    8 +-
 kernel/trace/trace_syscalls.c      |   18 +++-
 11 files changed, 616 insertions(+), 84 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code
  2012-01-28 18:43           ` [PATCHv7 " Jiri Olsa
@ 2012-01-28 18:43             ` Jiri Olsa
  2012-01-30  5:42               ` Frederic Weisbecker
  2012-01-28 18:43             ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
                               ` (6 subsequent siblings)
  7 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-28 18:43 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.

The set_ftrace_filter interface returns error in case the filter
did not match:

  # echo krava > set_ftrace_filter
  bash: echo: write error: Invalid argument

Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |    4 ++--
 kernel/trace/ftrace.c  |   15 +++++++++------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 028e26f..f33fb3b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -178,9 +178,9 @@ struct dyn_ftrace {
 };
 
 int ftrace_force_update(void);
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset);
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 683d559..e2e0597 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3146,8 +3146,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	mutex_lock(&ftrace_regex_lock);
 	if (reset)
 		ftrace_filter_reset(hash);
-	if (buf)
-		ftrace_match_records(hash, buf, len);
+	if (buf && !ftrace_match_records(hash, buf, len)) {
+		ret = -EINVAL;
+		goto out_regex_unlock;
+	}
 
 	mutex_lock(&ftrace_lock);
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
@@ -3157,6 +3159,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 
 	mutex_unlock(&ftrace_lock);
 
+ out_regex_unlock:
 	mutex_unlock(&ftrace_regex_lock);
 
 	free_ftrace_hash(hash);
@@ -3173,10 +3176,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
  * Filters denote which functions should be enabled when tracing is enabled.
  * If @buf is NULL and reset is set, all functions will be enabled for tracing.
  */
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 1);
+	return ftrace_set_regex(ops, buf, len, reset, 1);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_filter);
 
@@ -3191,10 +3194,10 @@ EXPORT_SYMBOL_GPL(ftrace_set_filter);
  * is enabled. If @buf is NULL and reset is set, all functions will be enabled
  * for tracing.
  */
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 0);
+	return ftrace_set_regex(ops, buf, len, reset, 0);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_notrace);
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-28 18:43           ` [PATCHv7 " Jiri Olsa
  2012-01-28 18:43             ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2012-01-28 18:43             ` Jiri Olsa
  2012-01-30  5:59               ` Frederic Weisbecker
  2012-02-03 13:40               ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Steven Rostedt
  2012-01-28 18:43             ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
                               ` (5 subsequent siblings)
  7 siblings, 2 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-01-28 18:43 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.

Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.

When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.

The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.

Adding 3 inline functions:
  ftrace_function_local_disable/ftrace_function_local_enable
  - enable/disable the ftrace_ops on current cpu
  ftrace_function_local_disabled
  - get disabled ftrace_ops::disabled value for current cpu

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |   65 ++++++++++++++++++++++++++++
 kernel/trace/ftrace.c  |  111 +++++++++++++++++++++++++++++++++++++++++++++---
 kernel/trace/trace.h   |    2 +
 3 files changed, 171 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index f33fb3b..5cb3a51 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -31,16 +31,32 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
 
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
 
+/*
+ * FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
+ * set in the flags member.
+ *
+ * ENABLED - set/unset when ftrace_ops is registered/unregistered
+ * GLOBAL  - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           is part of the global tracers sharing the same filter
+ *           via set_ftrace_* debugfs files.
+ * DYNAMIC - set when ftrace_ops is registered to denote dynamically
+ *           allocated ftrace_ops which need special care
+ * CONTROL - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           could be controled by following calls:
+ *           ftrace_function_enable, ftrace_function_disable
+ */
 enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	int __percpu			*disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +113,55 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+/**
+ * ftrace_function_local_enable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline void ftrace_function_local_enable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))--;
+}
+
+/**
+ * ftrace_function_local_disable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline void ftrace_function_local_disable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))++;
+}
+
+/**
+ * ftrace_function_local_disabled - returns ftrace_ops disabled value
+ *                                  on current cpu
+ *
+ * This function returns value of ftrace_ops::disabled on current cpu.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline int ftrace_function_local_disabled(struct ftrace_ops *ops)
+{
+	WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL));
+	return *this_cpu_ptr(ops->disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e2e0597..c8d2af2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -62,6 +62,8 @@
 #define FTRACE_HASH_DEFAULT_BITS 10
 #define FTRACE_HASH_MAX_BITS 12
 
+#define FL_GLOBAL_CONTROL_MASK (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
+
 /* ftrace_enabled is a method to turn ftrace on or off */
 int ftrace_enabled __read_mostly;
 static int last_ftrace_enabled;
@@ -89,12 +91,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
 };
 
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
 ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
+static struct ftrace_ops control_ops;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
@@ -168,6 +172,32 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
 }
 #endif
 
+static void control_ops_disable_all(struct ftrace_ops *ops)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		*per_cpu_ptr(ops->disabled, cpu) = 1;
+}
+
+static int control_ops_alloc(struct ftrace_ops *ops)
+{
+	int __percpu *disabled;
+
+	disabled = alloc_percpu(int);
+	if (!disabled)
+		return -ENOMEM;
+
+	ops->disabled = disabled;
+	control_ops_disable_all(ops);
+	return 0;
+}
+
+static void control_ops_free(struct ftrace_ops *ops)
+{
+	free_percpu(ops->disabled);
+}
+
 static void update_global_ops(void)
 {
 	ftrace_func_t func;
@@ -259,6 +289,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	return 0;
 }
 
+static void add_ftrace_list_ops(struct ftrace_ops **list,
+				struct ftrace_ops *main_ops,
+				struct ftrace_ops *ops)
+{
+	int first = *list == &ftrace_list_end;
+	add_ftrace_ops(list, ops);
+	if (first)
+		add_ftrace_ops(&ftrace_ops_list, main_ops);
+}
+
+static int remove_ftrace_list_ops(struct ftrace_ops **list,
+				  struct ftrace_ops *main_ops,
+				  struct ftrace_ops *ops)
+{
+	int ret = remove_ftrace_ops(list, ops);
+	if (!ret && *list == &ftrace_list_end)
+		ret = remove_ftrace_ops(&ftrace_ops_list, main_ops);
+	return ret;
+}
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ftrace_disabled)
@@ -270,15 +320,20 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
 		return -EBUSY;
 
+	/* We don't support both control and global flags set. */
+	if ((ops->flags & FL_GLOBAL_CONTROL_MASK) == FL_GLOBAL_CONTROL_MASK)
+		return -EINVAL;
+
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		int first = ftrace_global_list == &ftrace_list_end;
-		add_ftrace_ops(&ftrace_global_list, ops);
+		add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
-		if (first)
-			add_ftrace_ops(&ftrace_ops_list, &global_ops);
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		if (control_ops_alloc(ops))
+			return -ENOMEM;
+		add_ftrace_list_ops(&ftrace_control_list, &control_ops, ops);
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -302,11 +357,23 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 		return -EINVAL;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		ret = remove_ftrace_ops(&ftrace_global_list, ops);
-		if (!ret && ftrace_global_list == &ftrace_list_end)
-			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+		ret = remove_ftrace_list_ops(&ftrace_global_list,
+					     &global_ops, ops);
 		if (!ret)
 			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		ret = remove_ftrace_list_ops(&ftrace_control_list,
+					     &control_ops, ops);
+		if (!ret) {
+			/*
+			 * The ftrace_ops is now removed from the list,
+			 * so there'll be no new users. We must ensure
+			 * all current users are done before we free
+			 * the control data.
+			 */
+			synchronize_sched();
+			control_ops_free(ops);
+		}
 	} else
 		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -3874,6 +3941,36 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
+ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_ops *op;
+
+	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
+		return;
+
+	/*
+	 * Some of the ops may be dynamically allocated,
+	 * they must be freed after a synchronize_sched().
+	 */
+	preempt_disable_notrace();
+	trace_recursion_set(TRACE_CONTROL_BIT);
+	op = rcu_dereference_raw(ftrace_control_list);
+	while (op != &ftrace_list_end) {
+		if (!ftrace_function_local_disabled(op) &&
+		    ftrace_ops_test(op, ip))
+			op->func(ip, parent_ip);
+
+		op = rcu_dereference_raw(op->next);
+	};
+	trace_recursion_clear(TRACE_CONTROL_BIT);
+	preempt_enable_notrace();
+}
+
+static struct ftrace_ops control_ops = {
+	.func = ftrace_ops_control_func,
+};
+
+static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 {
 	struct ftrace_ops *op;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b93ecba..55c6ea0 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ struct tracer {
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
 #define TRACE_GLOBAL_BIT		(1<<12)
+#define TRACE_CONTROL_BIT		(1<<13)
+
 /*
  * Abuse of the trace_recursion.
  * As we need a way to maintain state if we are tracing the function
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions
  2012-01-28 18:43           ` [PATCHv7 " Jiri Olsa
  2012-01-28 18:43             ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
  2012-01-28 18:43             ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2012-01-28 18:43             ` Jiri Olsa
  2012-02-02 17:35               ` Frederic Weisbecker
  2012-01-28 18:43             ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
                               ` (4 subsequent siblings)
  7 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-28 18:43 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
register/unregister from open/close actions.

The register/unregister actions are invoked for the first/last
tracepoint user when opening/closing the event.

The open/close actions are invoked for each tracepoint user when
opening/closing the event.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    6 +-
 kernel/trace/trace.h            |    5 ++
 kernel/trace/trace_event_perf.c |  116 +++++++++++++++++++++++++--------------
 kernel/trace/trace_events.c     |   10 ++-
 kernel/trace/trace_kprobe.c     |    6 ++-
 kernel/trace/trace_syscalls.c   |   14 +++-
 6 files changed, 106 insertions(+), 51 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index c3da42d..195e360 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -146,6 +146,8 @@ enum trace_reg {
 	TRACE_REG_UNREGISTER,
 	TRACE_REG_PERF_REGISTER,
 	TRACE_REG_PERF_UNREGISTER,
+	TRACE_REG_PERF_OPEN,
+	TRACE_REG_PERF_CLOSE,
 };
 
 struct ftrace_event_call;
@@ -157,7 +159,7 @@ struct ftrace_event_class {
 	void			*perf_probe;
 #endif
 	int			(*reg)(struct ftrace_event_call *event,
-				       enum trace_reg type);
+				       enum trace_reg type, void *data);
 	int			(*define_fields)(struct ftrace_event_call *);
 	struct list_head	*(*get_fields)(struct ftrace_event_call *);
 	struct list_head	fields;
@@ -165,7 +167,7 @@ struct ftrace_event_class {
 };
 
 extern int ftrace_event_reg(struct ftrace_event_call *event,
-			    enum trace_reg type);
+			    enum trace_reg type, void *data);
 
 enum {
 	TRACE_EVENT_FL_ENABLED_BIT,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55c6ea0..0eaf077 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -828,4 +828,9 @@ extern const char *__stop___trace_bprintk_fmt[];
 	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 #include "trace_entries.h"
 
+#ifdef CONFIG_PERF_EVENTS
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data);
+#endif
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 19a359d..0cfcc37 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -44,23 +44,17 @@ static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 	return 0;
 }
 
-static int perf_trace_event_init(struct ftrace_event_call *tp_event,
-				 struct perf_event *p_event)
+static int perf_trace_event_reg(struct ftrace_event_call *tp_event,
+				struct perf_event *p_event)
 {
 	struct hlist_head __percpu *list;
-	int ret;
+	int ret = -ENOMEM;
 	int cpu;
 
-	ret = perf_trace_event_perm(tp_event, p_event);
-	if (ret)
-		return ret;
-
 	p_event->tp_event = tp_event;
 	if (tp_event->perf_refcount++ > 0)
 		return 0;
 
-	ret = -ENOMEM;
-
 	list = alloc_percpu(struct hlist_head);
 	if (!list)
 		goto fail;
@@ -83,7 +77,7 @@ static int perf_trace_event_init(struct ftrace_event_call *tp_event,
 		}
 	}
 
-	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER);
+	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER, NULL);
 	if (ret)
 		goto fail;
 
@@ -108,6 +102,69 @@ fail:
 	return ret;
 }
 
+static void perf_trace_event_unreg(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	int i;
+
+	if (--tp_event->perf_refcount > 0)
+		goto out;
+
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER, NULL);
+
+	/*
+	 * Ensure our callback won't be called anymore. The buffers
+	 * will be freed after that.
+	 */
+	tracepoint_synchronize_unregister();
+
+	free_percpu(tp_event->perf_events);
+	tp_event->perf_events = NULL;
+
+	if (!--total_ref_count) {
+		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
+			free_percpu(perf_trace_buf[i]);
+			perf_trace_buf[i] = NULL;
+		}
+	}
+out:
+	module_put(tp_event->mod);
+}
+
+static int perf_trace_event_open(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_OPEN, p_event);
+}
+
+static void perf_trace_event_close(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
+}
+
+static int perf_trace_event_init(struct ftrace_event_call *tp_event,
+				 struct perf_event *p_event)
+{
+	int ret;
+
+	ret = perf_trace_event_perm(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_reg(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_open(p_event);
+	if (ret) {
+		perf_trace_event_unreg(p_event);
+		return ret;
+	}
+
+	return 0;
+}
+
 int perf_trace_init(struct perf_event *p_event)
 {
 	struct ftrace_event_call *tp_event;
@@ -130,6 +187,14 @@ int perf_trace_init(struct perf_event *p_event)
 	return ret;
 }
 
+void perf_trace_destroy(struct perf_event *p_event)
+{
+	mutex_lock(&event_mutex);
+	perf_trace_event_close(p_event);
+	perf_trace_event_unreg(p_event);
+	mutex_unlock(&event_mutex);
+}
+
 int perf_trace_add(struct perf_event *p_event, int flags)
 {
 	struct ftrace_event_call *tp_event = p_event->tp_event;
@@ -154,37 +219,6 @@ void perf_trace_del(struct perf_event *p_event, int flags)
 	hlist_del_rcu(&p_event->hlist_entry);
 }
 
-void perf_trace_destroy(struct perf_event *p_event)
-{
-	struct ftrace_event_call *tp_event = p_event->tp_event;
-	int i;
-
-	mutex_lock(&event_mutex);
-	if (--tp_event->perf_refcount > 0)
-		goto out;
-
-	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER);
-
-	/*
-	 * Ensure our callback won't be called anymore. The buffers
-	 * will be freed after that.
-	 */
-	tracepoint_synchronize_unregister();
-
-	free_percpu(tp_event->perf_events);
-	tp_event->perf_events = NULL;
-
-	if (!--total_ref_count) {
-		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
-			free_percpu(perf_trace_buf[i]);
-			perf_trace_buf[i] = NULL;
-		}
-	}
-out:
-	module_put(tp_event->mod);
-	mutex_unlock(&event_mutex);
-}
-
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 				       struct pt_regs *regs, int *rctxp)
 {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c212a7f..5138fea 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -147,7 +147,8 @@ int trace_event_raw_init(struct ftrace_event_call *call)
 }
 EXPORT_SYMBOL_GPL(trace_event_raw_init);
 
-int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
+int ftrace_event_reg(struct ftrace_event_call *call,
+		     enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -170,6 +171,9 @@ int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
 					    call->class->perf_probe,
 					    call);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
@@ -209,7 +213,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_stop_cmdline_record();
 				call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			call->class->reg(call, TRACE_REG_UNREGISTER);
+			call->class->reg(call, TRACE_REG_UNREGISTER, NULL);
 		}
 		break;
 	case 1:
@@ -218,7 +222,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_start_cmdline_record();
 				call->flags |= TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			ret = call->class->reg(call, TRACE_REG_REGISTER);
+			ret = call->class->reg(call, TRACE_REG_REGISTER, NULL);
 			if (ret) {
 				tracing_stop_cmdline_record();
 				pr_info("event trace: Could not enable event "
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 00d527c..5667f89 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1892,7 +1892,8 @@ static __kprobes void kretprobe_perf_func(struct kretprobe_instance *ri,
 #endif	/* CONFIG_PERF_EVENTS */
 
 static __kprobes
-int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
+int kprobe_register(struct ftrace_event_call *event,
+		    enum trace_reg type, void *data)
 {
 	struct trace_probe *tp = (struct trace_probe *)event->data;
 
@@ -1909,6 +1910,9 @@ int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
 	case TRACE_REG_PERF_UNREGISTER:
 		disable_trace_probe(tp, TP_FLAG_PROFILE);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index cb65454..6916b0d 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -17,9 +17,9 @@ static DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
 static DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 
 static int syscall_enter_define_fields(struct ftrace_event_call *call);
 static int syscall_exit_define_fields(struct ftrace_event_call *call);
@@ -649,7 +649,7 @@ void perf_sysexit_disable(struct ftrace_event_call *call)
 #endif /* CONFIG_PERF_EVENTS */
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -664,13 +664,16 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysenter_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
 }
 
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -685,6 +688,9 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysexit_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 4/7] ftrace, perf: Add add/del tracepoint perf registration actions
  2012-01-28 18:43           ` [PATCHv7 " Jiri Olsa
                               ` (2 preceding siblings ...)
  2012-01-28 18:43             ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
@ 2012-01-28 18:43             ` Jiri Olsa
  2012-02-02 17:42               ` Frederic Weisbecker
  2012-01-28 18:43             ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
                               ` (3 subsequent siblings)
  7 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-28 18:43 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
perf event schedule in/out actions.

The add action is invoked for when the perf event is scheduled in,
while the del action is invoked when the event is scheduled out.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h    |    2 ++
 kernel/trace/trace_event_perf.c |    4 +++-
 kernel/trace/trace_events.c     |    2 ++
 kernel/trace/trace_kprobe.c     |    2 ++
 kernel/trace/trace_syscalls.c   |    4 ++++
 5 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 195e360..2bf677c 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -148,6 +148,8 @@ enum trace_reg {
 	TRACE_REG_PERF_UNREGISTER,
 	TRACE_REG_PERF_OPEN,
 	TRACE_REG_PERF_CLOSE,
+	TRACE_REG_PERF_ADD,
+	TRACE_REG_PERF_DEL,
 };
 
 struct ftrace_event_call;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 0cfcc37..d72af0b 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -211,12 +211,14 @@ int perf_trace_add(struct perf_event *p_event, int flags)
 	list = this_cpu_ptr(pcpu_list);
 	hlist_add_head_rcu(&p_event->hlist_entry, list);
 
-	return 0;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_ADD, p_event);
 }
 
 void perf_trace_del(struct perf_event *p_event, int flags)
 {
+	struct ftrace_event_call *tp_event = p_event->tp_event;
 	hlist_del_rcu(&p_event->hlist_entry);
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_DEL, p_event);
 }
 
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 5138fea..079a93a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -173,6 +173,8 @@ int ftrace_event_reg(struct ftrace_event_call *call,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5667f89..580a05e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1912,6 +1912,8 @@ int kprobe_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 6916b0d..dbdd804 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -666,6 +666,8 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
@@ -690,6 +692,8 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf
  2012-01-28 18:43           ` [PATCHv7 " Jiri Olsa
                               ` (3 preceding siblings ...)
  2012-01-28 18:43             ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
@ 2012-01-28 18:43             ` Jiri Olsa
  2012-02-02 18:14               ` Frederic Weisbecker
  2012-01-28 18:43             ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
                               ` (2 subsequent siblings)
  7 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-28 18:43 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding perf registration support for the ftrace function event,
so it is now possible to register it via perf interface.

The perf_event struct statically contains ftrace_ops as a handle
for function tracer. The function tracer is registered/unregistered
in open/close actions.

To be efficient, we enable/disable ftrace_ops each time the traced
process is scheduled in/out (via TRACE_REG_PERF_(ADD|DELL) handlers).
This way tracing is enabled only when the process is running.
Intentionally using this way instead of the event's hw state
PERF_HES_STOPPED, which would not disable the ftrace_ops.

It is now possible to use function trace within perf commands
like:

  perf record -e ftrace:function ls
  perf stat -e ftrace:function ls

Allowed only for root.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h      |    3 +
 kernel/trace/trace.h            |    2 +
 kernel/trace/trace_event_perf.c |   90 +++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_export.c     |   28 ++++++++++++
 4 files changed, 123 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 412b790..92a056f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -859,6 +859,9 @@ struct perf_event {
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
 	struct event_filter		*filter;
+#ifdef CONFIG_FUNCTION_TRACER
+	struct ftrace_ops               ftrace_ops;
+#endif
 #endif
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 0eaf077..8ff7324 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -591,6 +591,8 @@ static inline int ftrace_trace_task(struct task_struct *task)
 static inline int ftrace_is_dead(void) { return 0; }
 #endif
 
+int ftrace_event_is_function(struct ftrace_event_call *call);
+
 /*
  * struct trace_parser - servers for reading the user input separated by spaces
  * @cont: set if the input is not complete - no final space char was found
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index d72af0b..8ee0461 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -24,6 +24,11 @@ static int	total_ref_count;
 static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 				 struct perf_event *p_event)
 {
+	/* The ftrace function trace is allowed only for root. */
+	if (ftrace_event_is_function(tp_event) &&
+	    perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	/* No tracing, just counting, so no obvious leak */
 	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
 		return 0;
@@ -250,3 +255,88 @@ __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 	return raw_data;
 }
 EXPORT_SYMBOL_GPL(perf_trace_buf_prepare);
+
+
+static void
+perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_entry *entry;
+	struct hlist_head *head;
+	struct pt_regs regs;
+	int rctx;
+
+#define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
+		    sizeof(u64)) - sizeof(u32))
+
+	BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE);
+
+	perf_fetch_caller_regs(&regs);
+
+	entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);
+	if (!entry)
+		return;
+
+	entry->ip = ip;
+	entry->parent_ip = parent_ip;
+
+	head = this_cpu_ptr(event_function.perf_events);
+	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
+			      1, &regs, head);
+
+#undef ENTRY_SIZE
+}
+
+static int perf_ftrace_function_register(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+
+	ops->flags |= FTRACE_OPS_FL_CONTROL;
+	ops->func = perf_ftrace_function_call;
+	return register_ftrace_function(ops);
+}
+
+static int perf_ftrace_function_unregister(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	return unregister_ftrace_function(ops);
+}
+
+static void perf_ftrace_function_enable(struct perf_event *event)
+{
+	ftrace_function_local_enable(&event->ftrace_ops);
+}
+
+static void perf_ftrace_function_disable(struct perf_event *event)
+{
+	ftrace_function_local_disable(&event->ftrace_ops);
+}
+
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data)
+{
+	int etype = call->event.type;
+
+	if (etype != TRACE_FN)
+		return -EINVAL;
+
+	switch (type) {
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+		return perf_ftrace_function_register(data);
+	case TRACE_REG_PERF_CLOSE:
+		return perf_ftrace_function_unregister(data);
+	case TRACE_REG_PERF_ADD:
+		perf_ftrace_function_enable(data);
+		return 0;
+	case TRACE_REG_PERF_DEL:
+		perf_ftrace_function_disable(data);
+		return 0;
+	}
+
+	return -EINVAL;
+}
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index bbeec31..867653c 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 
 #include "trace_entries.h"
 
+static int ftrace_event_class_register(struct ftrace_event_call *call,
+				       enum trace_reg type, void *data)
+{
+	switch (type) {
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
+#ifdef CONFIG_PERF_EVENTS
+		return perf_ftrace_event_register(call, type, data);
+#endif
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	}
+
+	return -EINVAL;
+}
+
 #undef __entry
 #define __entry REC
 
@@ -159,6 +181,7 @@ struct ftrace_event_class event_class_ftrace_##call = {			\
 	.system			= __stringify(TRACE_SYSTEM),		\
 	.define_fields		= ftrace_define_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
+	.reg			= ftrace_event_class_register,		\
 };									\
 									\
 struct ftrace_event_call __used event_##call = {			\
@@ -170,4 +193,9 @@ struct ftrace_event_call __used event_##call = {			\
 struct ftrace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
+int ftrace_event_is_function(struct ftrace_event_call *call)
+{
+	return call == &event_function;
+}
+
 #include "trace_entries.h"
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type
  2012-01-28 18:43           ` [PATCHv7 " Jiri Olsa
                               ` (4 preceding siblings ...)
  2012-01-28 18:43             ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
@ 2012-01-28 18:43             ` Jiri Olsa
  2012-02-03 14:16               ` Steven Rostedt
  2012-01-28 18:43             ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
  7 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-28 18:43 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding FILTER_TRACE_FN event field type for function tracepoint
event, so it can be properly recognized within filtering code.

Currently all fields of ftrace subsystem events share the common
field type FILTER_OTHER. Since the function trace fields need special
care within the filtering code we need to recognize it properly,
hence adding the FILTER_TRACE_FN event type.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h       |    1 +
 kernel/trace/trace_events_filter.c |    7 ++++++-
 kernel/trace/trace_export.c        |   25 ++++++++++++++++++++-----
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 2bf677c..dd478fc 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -245,6 +245,7 @@ enum {
 	FILTER_STATIC_STRING,
 	FILTER_DYN_STRING,
 	FILTER_PTR_STRING,
+	FILTER_TRACE_FN,
 };
 
 #define EVENT_STORAGE_SIZE 128
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 24aee71..eb04a2a 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -900,6 +900,11 @@ int filter_assign_type(const char *type)
 	return FILTER_OTHER;
 }
 
+static bool is_function_field(struct ftrace_event_field *field)
+{
+	return field->filter_type == FILTER_TRACE_FN;
+}
+
 static bool is_string_field(struct ftrace_event_field *field)
 {
 	return field->filter_type == FILTER_DYN_STRING ||
@@ -987,7 +992,7 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else {
+	} else if (!is_function_field(field)) {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 867653c..46c35e2 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -67,7 +67,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -77,7 +77,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -91,7 +91,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 		ret = trace_define_field(event_call, event_storage, #item, \
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 		mutex_unlock(&event_storage_mutex);			\
 		if (ret)						\
 			return ret;					\
@@ -104,7 +104,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -112,10 +112,24 @@ static void __always_unused ____ftrace_check_##name(void)	\
 #define __dynamic_array(type, item)					\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
-				 0, is_signed_type(type), FILTER_OTHER);\
+				 0, is_signed_type(type), filter_type);\
 	if (ret)							\
 		return ret;
 
+#define FILTER_TYPE_TRACE_FN           FILTER_TRACE_FN
+#define FILTER_TYPE_TRACE_GRAPH_ENT    FILTER_OTHER
+#define FILTER_TYPE_TRACE_GRAPH_RET    FILTER_OTHER
+#define FILTER_TYPE_TRACE_CTX          FILTER_OTHER
+#define FILTER_TYPE_TRACE_WAKE         FILTER_OTHER
+#define FILTER_TYPE_TRACE_STACK                FILTER_OTHER
+#define FILTER_TYPE_TRACE_USER_STACK   FILTER_OTHER
+#define FILTER_TYPE_TRACE_BPRINT       FILTER_OTHER
+#define FILTER_TYPE_TRACE_PRINT                FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_RW      FILTER_OTHER
+#define FILTER_TYPE_TRACE_MMIO_MAP     FILTER_OTHER
+#define FILTER_TYPE_TRACE_BRANCH       FILTER_OTHER
+#define FILTER_TYPE(arg)               FILTER_TYPE_##arg
+
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 int									\
@@ -123,6 +137,7 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 {									\
 	struct struct_name field;					\
 	int ret;							\
+	int filter_type = FILTER_TYPE(id);				\
 									\
 	tstruct;							\
 									\
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 7/7] ftrace, perf: Add filter support for function trace event
  2012-01-28 18:43           ` [PATCHv7 " Jiri Olsa
                               ` (5 preceding siblings ...)
  2012-01-28 18:43             ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
@ 2012-01-28 18:43             ` Jiri Olsa
  2012-02-07  0:20               ` Jiri Olsa
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
  7 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-28 18:43 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

  ip == f1[, ]f2 ... || ip != f3[, ]f4 ...

with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"'.

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h             |    1 +
 kernel/trace/ftrace.c              |    6 ++
 kernel/trace/trace.h               |    2 -
 kernel/trace/trace_event_perf.c    |    4 +-
 kernel/trace/trace_events_filter.c |  169 +++++++++++++++++++++++++++++++++---
 5 files changed, 168 insertions(+), 14 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 5cb3a51..1699f46 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -249,6 +249,7 @@ int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
+void ftrace_free_filter(struct ftrace_ops *ops);
 
 int register_ftrace_command(struct ftrace_func_command *cmd);
 int unregister_ftrace_command(struct ftrace_func_command *cmd);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c8d2af2..239b94a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1186,6 +1186,12 @@ static void free_ftrace_hash_rcu(struct ftrace_hash *hash)
 	call_rcu_sched(&hash->rcu, __free_ftrace_hash_rcu);
 }
 
+void ftrace_free_filter(struct ftrace_ops *ops)
+{
+	free_ftrace_hash(ops->filter_hash);
+	free_ftrace_hash(ops->notrace_hash);
+}
+
 static struct ftrace_hash *alloc_ftrace_hash(int size_bits)
 {
 	struct ftrace_hash *hash;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 8ff7324..c2a3242 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -770,9 +770,7 @@ struct filter_pred {
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	struct ftrace_event_field *field;
-#endif
 	int 			offset;
 	int 			not;
 	int 			op;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 8ee0461..aff37d9 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -298,7 +298,9 @@ static int perf_ftrace_function_register(struct perf_event *event)
 static int perf_ftrace_function_unregister(struct perf_event *event)
 {
 	struct ftrace_ops *ops = &event->ftrace_ops;
-	return unregister_ftrace_function(ops);
+	int ret = unregister_ftrace_function(ops);
+	ftrace_free_filter(ops);
+	return ret;
 }
 
 static void perf_ftrace_function_enable(struct perf_event *event)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index eb04a2a..c8a64ec 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -54,6 +54,13 @@ struct filter_op {
 	int precedence;
 };
 
+static struct filter_op filter_ftrace_ops[] = {
+	{ OP_OR,	"||",		1 },
+	{ OP_NE,	"!=",		2 },
+	{ OP_EQ,	"==",		2 },
+	{ OP_NONE,	"OP_NONE",	0 },
+};
+
 static struct filter_op filter_ops[] = {
 	{ OP_OR,	"||",		1 },
 	{ OP_AND,	"&&",		2 },
@@ -81,6 +88,7 @@ enum {
 	FILT_ERR_TOO_MANY_PREDS,
 	FILT_ERR_MISSING_FIELD,
 	FILT_ERR_INVALID_FILTER,
+	FILT_ERR_IP_FIELD_ONLY,
 };
 
 static char *err_text[] = {
@@ -96,6 +104,7 @@ static char *err_text[] = {
 	"Too many terms in predicate expression",
 	"Missing field name and/or value",
 	"Meaningless filter expression",
+	"Only 'ip' field is supported for function trace",
 };
 
 struct opstack_op {
@@ -992,7 +1001,12 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else if (!is_function_field(field)) {
+	} else if (is_function_field(field)) {
+		if (strcmp(field->name, "ip")) {
+			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
+			return -EINVAL;
+		}
+	} else {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
@@ -1339,10 +1353,7 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
 
 	strcpy(pred.regex.pattern, operand2);
 	pred.regex.len = strlen(pred.regex.pattern);
-
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	pred.field = field;
-#endif
 	return init_pred(ps, field, &pred) ? NULL : &pred;
 }
 
@@ -1743,8 +1754,8 @@ static int replace_system_preds(struct event_subsystem *system,
 	return -ENOMEM;
 }
 
-static int create_filter_start(char *filter_str, bool set_str,
-			       struct filter_parse_state **psp,
+static int create_filter_start(char *filter_str, struct filter_op *fops,
+			       bool set_str, struct filter_parse_state **psp,
 			       struct event_filter **filterp)
 {
 	struct event_filter *filter;
@@ -1770,7 +1781,7 @@ static int create_filter_start(char *filter_str, bool set_str,
 	*filterp = filter;
 	*psp = ps;
 
-	parse_init(ps, filter_ops, filter_str);
+	parse_init(ps, fops, filter_str);
 	err = filter_parse(ps);
 	if (err && set_str)
 		append_filter_err(ps, filter);
@@ -1808,9 +1819,13 @@ static int create_filter(struct ftrace_event_call *call,
 {
 	struct event_filter *filter = NULL;
 	struct filter_parse_state *ps = NULL;
+	struct filter_op *fops = filter_ops;
 	int err;
 
-	err = create_filter_start(filter_str, set_str, &ps, &filter);
+	if (ftrace_event_is_function(call))
+		fops = filter_ftrace_ops;
+
+	err = create_filter_start(filter_str, fops, set_str, &ps, &filter);
 	if (!err) {
 		err = replace_preds(call, filter, ps, filter_str, false);
 		if (err && set_str)
@@ -1838,7 +1853,7 @@ static int create_system_filter(struct event_subsystem *system,
 	struct filter_parse_state *ps = NULL;
 	int err;
 
-	err = create_filter_start(filter_str, true, &ps, &filter);
+	err = create_filter_start(filter_str, filter_ops, true, &ps, &filter);
 	if (!err) {
 		err = replace_system_preds(system, ps, filter_str);
 		if (!err) {
@@ -1955,6 +1970,131 @@ void ftrace_profile_free_filter(struct perf_event *event)
 	__free_filter(filter);
 }
 
+struct function_filter_data {
+	struct ftrace_ops *ops;
+	int first_filter;
+	int first_notrace;
+};
+
+static char **
+ftrace_function_filter_re(char *buf, int len, int *count)
+{
+	char *str, *sep, **re;
+
+	str = kstrndup(buf, len, GFP_KERNEL);
+	if (!str)
+		return NULL;
+
+	/*
+	 * The argv_split function takes white space
+	 * as a separator, so convert ',' into spaces.
+	 */
+	while ((sep = strchr(str, ',')))
+		*sep = ' ';
+
+	re = argv_split(GFP_KERNEL, str, count);
+	kfree(str);
+	return re;
+}
+
+static int ftrace_function_set_regexp(struct ftrace_ops *ops, int filter,
+				      int reset, char *re, int len)
+{
+	int ret;
+
+	if (filter)
+		ret = ftrace_set_filter(ops, re, len, reset);
+	else
+		ret = ftrace_set_notrace(ops, re, len, reset);
+
+	return ret;
+}
+
+static int __ftrace_function_set_filter(int filter, char *buf, int len,
+					struct function_filter_data *data)
+{
+	int i, re_cnt, ret;
+	int *reset;
+	char **re;
+
+	reset = filter ? &data->first_filter : &data->first_notrace;
+
+	/*
+	 * The 'ip' field could have multiple filters set, separated
+	 * either by space or comma. We first cut the filter and apply
+	 * all pieces separatelly.
+	 */
+	re = ftrace_function_filter_re(buf, len, &re_cnt);
+	if (!re)
+		return -EINVAL;
+
+	for (i = 0; i < re_cnt; i++) {
+		ret = ftrace_function_set_regexp(data->ops, filter, *reset,
+						 re[i], strlen(re[i]));
+		if (ret)
+			break;
+
+		if (*reset)
+			*reset = 0;
+	}
+
+	argv_free(re);
+	return ret;
+}
+
+static int ftrace_function_check_pred(struct filter_pred *pred)
+{
+	struct ftrace_event_field *field = pred->field;
+
+	/*
+	 * Check the predicate for function trace, verify:
+	 *  - only '==' and '!=' is used
+	 *  - the 'ip' field is used
+	 */
+	if (WARN((pred->op != OP_EQ) && (pred->op != OP_NE),
+		 "wrong operator for function filter: %d\n", pred->op))
+		return -EINVAL;
+
+	if (strcmp(field->name, "ip"))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int ftrace_function_set_filter_cb(enum move_type move,
+					 struct filter_pred *pred,
+					 int *err, void *data)
+{
+	if ((move != MOVE_DOWN) ||
+	    (pred->left != FILTER_PRED_INVALID))
+		return WALK_PRED_DEFAULT;
+
+	/* Double checking the predicate is valid for function trace. */
+	*err = ftrace_function_check_pred(pred);
+	if (*err)
+		return WALK_PRED_ABORT;
+
+	*err = __ftrace_function_set_filter(pred->op == OP_EQ,
+					    pred->regex.pattern,
+					    pred->regex.len,
+					    data);
+
+	return (*err) ? WALK_PRED_ABORT : WALK_PRED_DEFAULT;
+}
+
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	struct function_filter_data data = {
+		.first_filter  = 1,
+		.first_notrace = 1,
+		.ops           = &event->ftrace_ops,
+	};
+
+	return walk_pred_tree(filter->preds, filter->root,
+			      ftrace_function_set_filter_cb, &data);
+}
+
 int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 			      char *filter_str)
 {
@@ -1975,9 +2115,16 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 		goto out_unlock;
 
 	err = create_filter(call, filter_str, false, &filter);
-	if (!err)
-		event->filter = filter;
+	if (err)
+		goto free_filter;
+
+	if (ftrace_event_is_function(call))
+		err = ftrace_function_set_filter(event, filter);
 	else
+		event->filter = filter;
+
+free_filter:
+	if (err ||  ftrace_event_is_function(call))
 		__free_filter(filter);
 
 out_unlock:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code
  2012-01-28 18:43             ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2012-01-30  5:42               ` Frederic Weisbecker
  0 siblings, 0 replies; 186+ messages in thread
From: Frederic Weisbecker @ 2012-01-30  5:42 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Sat, Jan 28, 2012 at 07:43:23PM +0100, Jiri Olsa wrote:
> Currently the ftrace_set_filter and ftrace_set_notrace functions
> do not return any return code. So there's no way for ftrace_ops
> user to tell wether the filter was correctly applied.
> 
> The set_ftrace_filter interface returns error in case the filter
> did not match:
> 
>   # echo krava > set_ftrace_filter
>   bash: echo: write error: Invalid argument
> 
> Changing both ftrace_set_filter and ftrace_set_notrace functions
> to return zero if the filter was applied correctly or -E* values
> in case of error.
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-28 18:43             ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2012-01-30  5:59               ` Frederic Weisbecker
  2012-01-30  9:18                 ` Jiri Olsa
  2012-02-03 13:40               ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Steven Rostedt
  1 sibling, 1 reply; 186+ messages in thread
From: Frederic Weisbecker @ 2012-01-30  5:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Sat, Jan 28, 2012 at 07:43:24PM +0100, Jiri Olsa wrote:
> Adding a way to temporarily enable/disable ftrace_ops. The change
> follows the same way as 'global' ftrace_ops are done.
> 
> Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
> which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
> flag. In addition new per cpu flag called 'disabled' is also added to
> ftrace_ops to provide the control information for each cpu.
> 
> When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
> set as disabled for all cpus.
> 
> The ftrace_control_list contains all the registered 'control' ftrace_ops.
> The control_ops provides function which iterates ftrace_control_list
> and does the check for 'disabled' flag on current cpu.
> 
> Adding 3 inline functions:
>   ftrace_function_local_disable/ftrace_function_local_enable
>   - enable/disable the ftrace_ops on current cpu
>   ftrace_function_local_disabled
>   - get disabled ftrace_ops::disabled value for current cpu
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  include/linux/ftrace.h |   65 ++++++++++++++++++++++++++++
>  kernel/trace/ftrace.c  |  111 +++++++++++++++++++++++++++++++++++++++++++++---
>  kernel/trace/trace.h   |    2 +
>  3 files changed, 171 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index f33fb3b..5cb3a51 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -31,16 +31,32 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
>  
>  typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
>  
> +/*
> + * FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
> + * set in the flags member.
> + *
> + * ENABLED - set/unset when ftrace_ops is registered/unregistered
> + * GLOBAL  - set manualy by ftrace_ops user to denote the ftrace_ops
> + *           is part of the global tracers sharing the same filter
> + *           via set_ftrace_* debugfs files.
> + * DYNAMIC - set when ftrace_ops is registered to denote dynamically
> + *           allocated ftrace_ops which need special care
> + * CONTROL - set manualy by ftrace_ops user to denote the ftrace_ops
> + *           could be controled by following calls:
> + *           ftrace_function_enable, ftrace_function_disable

Should be ftrace_function_local_enable.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-30  5:59               ` Frederic Weisbecker
@ 2012-01-30  9:18                 ` Jiri Olsa
  2012-02-03 13:42                   ` Steven Rostedt
  0 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-01-30  9:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Mon, Jan 30, 2012 at 06:59:08AM +0100, Frederic Weisbecker wrote:
> On Sat, Jan 28, 2012 at 07:43:24PM +0100, Jiri Olsa wrote:
> > Adding a way to temporarily enable/disable ftrace_ops. The change
> > follows the same way as 'global' ftrace_ops are done.
> > 
> > Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
> > which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
> > flag. In addition new per cpu flag called 'disabled' is also added to
> > ftrace_ops to provide the control information for each cpu.
> > 
> > When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
> > set as disabled for all cpus.
> > 
> > The ftrace_control_list contains all the registered 'control' ftrace_ops.
> > The control_ops provides function which iterates ftrace_control_list
> > and does the check for 'disabled' flag on current cpu.
> > 
> > Adding 3 inline functions:
> >   ftrace_function_local_disable/ftrace_function_local_enable
> >   - enable/disable the ftrace_ops on current cpu
> >   ftrace_function_local_disabled
> >   - get disabled ftrace_ops::disabled value for current cpu
> > 
> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> > ---
> >  include/linux/ftrace.h |   65 ++++++++++++++++++++++++++++
> >  kernel/trace/ftrace.c  |  111 +++++++++++++++++++++++++++++++++++++++++++++---
> >  kernel/trace/trace.h   |    2 +
> >  3 files changed, 171 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > index f33fb3b..5cb3a51 100644
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -31,16 +31,32 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
> >  
> >  typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
> >  
> > +/*
> > + * FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
> > + * set in the flags member.
> > + *
> > + * ENABLED - set/unset when ftrace_ops is registered/unregistered
> > + * GLOBAL  - set manualy by ftrace_ops user to denote the ftrace_ops
> > + *           is part of the global tracers sharing the same filter
> > + *           via set_ftrace_* debugfs files.
> > + * DYNAMIC - set when ftrace_ops is registered to denote dynamically
> > + *           allocated ftrace_ops which need special care
> > + * CONTROL - set manualy by ftrace_ops user to denote the ftrace_ops
> > + *           could be controled by following calls:
> > + *           ftrace_function_enable, ftrace_function_disable
> 
> Should be ftrace_function_local_enable.
> 
> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

attached new patch :)

thanks,
jirka


---
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index f33fb3b..64a309d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -31,16 +31,33 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
 
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
 
+/*
+ * FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
+ * set in the flags member.
+ *
+ * ENABLED - set/unset when ftrace_ops is registered/unregistered
+ * GLOBAL  - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           is part of the global tracers sharing the same filter
+ *           via set_ftrace_* debugfs files.
+ * DYNAMIC - set when ftrace_ops is registered to denote dynamically
+ *           allocated ftrace_ops which need special care
+ * CONTROL - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           could be controled by following calls:
+ *             ftrace_function_local_enable
+ *             ftrace_function_local_disable
+ */
 enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	int __percpu			*disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +114,55 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+/**
+ * ftrace_function_local_enable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline void ftrace_function_local_enable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))--;
+}
+
+/**
+ * ftrace_function_local_disable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline void ftrace_function_local_disable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))++;
+}
+
+/**
+ * ftrace_function_local_disabled - returns ftrace_ops disabled value
+ *                                  on current cpu
+ *
+ * This function returns value of ftrace_ops::disabled on current cpu.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline int ftrace_function_local_disabled(struct ftrace_ops *ops)
+{
+	WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL));
+	return *this_cpu_ptr(ops->disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e2e0597..c8d2af2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -62,6 +62,8 @@
 #define FTRACE_HASH_DEFAULT_BITS 10
 #define FTRACE_HASH_MAX_BITS 12
 
+#define FL_GLOBAL_CONTROL_MASK (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
+
 /* ftrace_enabled is a method to turn ftrace on or off */
 int ftrace_enabled __read_mostly;
 static int last_ftrace_enabled;
@@ -89,12 +91,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
 };
 
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
 ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
+static struct ftrace_ops control_ops;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
@@ -168,6 +172,32 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
 }
 #endif
 
+static void control_ops_disable_all(struct ftrace_ops *ops)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		*per_cpu_ptr(ops->disabled, cpu) = 1;
+}
+
+static int control_ops_alloc(struct ftrace_ops *ops)
+{
+	int __percpu *disabled;
+
+	disabled = alloc_percpu(int);
+	if (!disabled)
+		return -ENOMEM;
+
+	ops->disabled = disabled;
+	control_ops_disable_all(ops);
+	return 0;
+}
+
+static void control_ops_free(struct ftrace_ops *ops)
+{
+	free_percpu(ops->disabled);
+}
+
 static void update_global_ops(void)
 {
 	ftrace_func_t func;
@@ -259,6 +289,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	return 0;
 }
 
+static void add_ftrace_list_ops(struct ftrace_ops **list,
+				struct ftrace_ops *main_ops,
+				struct ftrace_ops *ops)
+{
+	int first = *list == &ftrace_list_end;
+	add_ftrace_ops(list, ops);
+	if (first)
+		add_ftrace_ops(&ftrace_ops_list, main_ops);
+}
+
+static int remove_ftrace_list_ops(struct ftrace_ops **list,
+				  struct ftrace_ops *main_ops,
+				  struct ftrace_ops *ops)
+{
+	int ret = remove_ftrace_ops(list, ops);
+	if (!ret && *list == &ftrace_list_end)
+		ret = remove_ftrace_ops(&ftrace_ops_list, main_ops);
+	return ret;
+}
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ftrace_disabled)
@@ -270,15 +320,20 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
 		return -EBUSY;
 
+	/* We don't support both control and global flags set. */
+	if ((ops->flags & FL_GLOBAL_CONTROL_MASK) == FL_GLOBAL_CONTROL_MASK)
+		return -EINVAL;
+
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		int first = ftrace_global_list == &ftrace_list_end;
-		add_ftrace_ops(&ftrace_global_list, ops);
+		add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
-		if (first)
-			add_ftrace_ops(&ftrace_ops_list, &global_ops);
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		if (control_ops_alloc(ops))
+			return -ENOMEM;
+		add_ftrace_list_ops(&ftrace_control_list, &control_ops, ops);
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -302,11 +357,23 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 		return -EINVAL;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		ret = remove_ftrace_ops(&ftrace_global_list, ops);
-		if (!ret && ftrace_global_list == &ftrace_list_end)
-			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+		ret = remove_ftrace_list_ops(&ftrace_global_list,
+					     &global_ops, ops);
 		if (!ret)
 			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		ret = remove_ftrace_list_ops(&ftrace_control_list,
+					     &control_ops, ops);
+		if (!ret) {
+			/*
+			 * The ftrace_ops is now removed from the list,
+			 * so there'll be no new users. We must ensure
+			 * all current users are done before we free
+			 * the control data.
+			 */
+			synchronize_sched();
+			control_ops_free(ops);
+		}
 	} else
 		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -3874,6 +3941,36 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
+ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_ops *op;
+
+	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
+		return;
+
+	/*
+	 * Some of the ops may be dynamically allocated,
+	 * they must be freed after a synchronize_sched().
+	 */
+	preempt_disable_notrace();
+	trace_recursion_set(TRACE_CONTROL_BIT);
+	op = rcu_dereference_raw(ftrace_control_list);
+	while (op != &ftrace_list_end) {
+		if (!ftrace_function_local_disabled(op) &&
+		    ftrace_ops_test(op, ip))
+			op->func(ip, parent_ip);
+
+		op = rcu_dereference_raw(op->next);
+	};
+	trace_recursion_clear(TRACE_CONTROL_BIT);
+	preempt_enable_notrace();
+}
+
+static struct ftrace_ops control_ops = {
+	.func = ftrace_ops_control_func,
+};
+
+static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 {
 	struct ftrace_ops *op;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b93ecba..55c6ea0 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ struct tracer {
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
 #define TRACE_GLOBAL_BIT		(1<<12)
+#define TRACE_CONTROL_BIT		(1<<13)
+
 /*
  * Abuse of the trace_recursion.
  * As we need a way to maintain state if we are tracing the function

^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions
  2012-01-28 18:43             ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
@ 2012-02-02 17:35               ` Frederic Weisbecker
  2012-02-03 10:23                 ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Frederic Weisbecker @ 2012-02-02 17:35 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Sat, Jan 28, 2012 at 07:43:25PM +0100, Jiri Olsa wrote:
> Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
> register/unregister from open/close actions.
> 
> The register/unregister actions are invoked for the first/last
> tracepoint user when opening/closing the event.
> 
> The open/close actions are invoked for each tracepoint user when
> opening/closing the event.
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  include/linux/ftrace_event.h    |    6 +-
>  kernel/trace/trace.h            |    5 ++
>  kernel/trace/trace_event_perf.c |  116 +++++++++++++++++++++++++--------------
>  kernel/trace/trace_events.c     |   10 ++-
>  kernel/trace/trace_kprobe.c     |    6 ++-
>  kernel/trace/trace_syscalls.c   |   14 +++-
>  6 files changed, 106 insertions(+), 51 deletions(-)
> 
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index c3da42d..195e360 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -146,6 +146,8 @@ enum trace_reg {
>  	TRACE_REG_UNREGISTER,
>  	TRACE_REG_PERF_REGISTER,
>  	TRACE_REG_PERF_UNREGISTER,
> +	TRACE_REG_PERF_OPEN,
> +	TRACE_REG_PERF_CLOSE,
>  };
>  
>  struct ftrace_event_call;
> @@ -157,7 +159,7 @@ struct ftrace_event_class {
>  	void			*perf_probe;
>  #endif
>  	int			(*reg)(struct ftrace_event_call *event,
> -				       enum trace_reg type);
> +				       enum trace_reg type, void *data);
>  	int			(*define_fields)(struct ftrace_event_call *);
>  	struct list_head	*(*get_fields)(struct ftrace_event_call *);
>  	struct list_head	fields;
> @@ -165,7 +167,7 @@ struct ftrace_event_class {
>  };
>  
>  extern int ftrace_event_reg(struct ftrace_event_call *event,
> -			    enum trace_reg type);
> +			    enum trace_reg type, void *data);
>  
>  enum {
>  	TRACE_EVENT_FL_ENABLED_BIT,
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 55c6ea0..0eaf077 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -828,4 +828,9 @@ extern const char *__stop___trace_bprintk_fmt[];
>  	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
>  #include "trace_entries.h"
>  
> +#ifdef CONFIG_PERF_EVENTS
> +int perf_ftrace_event_register(struct ftrace_event_call *call,
> +			       enum trace_reg type, void *data);
> +#endif

Seem to belong to a further patch. Nevermind.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 4/7] ftrace, perf: Add add/del tracepoint perf registration actions
  2012-01-28 18:43             ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
@ 2012-02-02 17:42               ` Frederic Weisbecker
  0 siblings, 0 replies; 186+ messages in thread
From: Frederic Weisbecker @ 2012-02-02 17:42 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Sat, Jan 28, 2012 at 07:43:26PM +0100, Jiri Olsa wrote:
> Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
> perf event schedule in/out actions.
> 
> The add action is invoked for when the perf event is scheduled in,
> while the del action is invoked when the event is scheduled out.
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf
  2012-01-28 18:43             ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
@ 2012-02-02 18:14               ` Frederic Weisbecker
  2012-02-03 12:54                 ` Jiri Olsa
  2012-02-03 13:53                 ` Steven Rostedt
  0 siblings, 2 replies; 186+ messages in thread
From: Frederic Weisbecker @ 2012-02-02 18:14 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Sat, Jan 28, 2012 at 07:43:27PM +0100, Jiri Olsa wrote:
> Adding perf registration support for the ftrace function event,
> so it is now possible to register it via perf interface.
> 
> The perf_event struct statically contains ftrace_ops as a handle
> for function tracer. The function tracer is registered/unregistered
> in open/close actions.
> 
> To be efficient, we enable/disable ftrace_ops each time the traced
> process is scheduled in/out (via TRACE_REG_PERF_(ADD|DELL) handlers).
> This way tracing is enabled only when the process is running.
> Intentionally using this way instead of the event's hw state
> PERF_HES_STOPPED, which would not disable the ftrace_ops.
> 
> It is now possible to use function trace within perf commands
> like:
> 
>   perf record -e ftrace:function ls
>   perf stat -e ftrace:function ls
> 
> Allowed only for root.

Good idea. We probably don't want to leak the rate of calls of a kernel
function to userspace.

[...]
> +static void
> +perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip)
> +{
> +	struct ftrace_entry *entry;
> +	struct hlist_head *head;
> +	struct pt_regs regs;
> +	int rctx;
> +
> +#define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
> +		    sizeof(u64)) - sizeof(u32))
> +
> +	BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE);
> +
> +	perf_fetch_caller_regs(&regs);
> +
> +	entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);
> +	if (!entry)
> +		return;
> +
> +	entry->ip = ip;
> +	entry->parent_ip = parent_ip;
> +
> +	head = this_cpu_ptr(event_function.perf_events);
> +	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
> +			      1, &regs, head);
> +
> +#undef ENTRY_SIZE
> +}
> +
> +static int perf_ftrace_function_register(struct perf_event *event)
> +{
> +	struct ftrace_ops *ops = &event->ftrace_ops;
> +
> +	ops->flags |= FTRACE_OPS_FL_CONTROL;
> +	ops->func = perf_ftrace_function_call;
> +	return register_ftrace_function(ops);
> +}
> +
> +static int perf_ftrace_function_unregister(struct perf_event *event)
> +{
> +	struct ftrace_ops *ops = &event->ftrace_ops;
> +	return unregister_ftrace_function(ops);
> +}
> +
> +static void perf_ftrace_function_enable(struct perf_event *event)
> +{
> +	ftrace_function_local_enable(&event->ftrace_ops);
> +}
> +
> +static void perf_ftrace_function_disable(struct perf_event *event)
> +{
> +	ftrace_function_local_disable(&event->ftrace_ops);
> +}
> +
> +int perf_ftrace_event_register(struct ftrace_event_call *call,
> +			       enum trace_reg type, void *data)
> +{
> +	int etype = call->event.type;
> +
> +	if (etype != TRACE_FN)
> +		return -EINVAL;
> +
> +	switch (type) {
> +	case TRACE_REG_REGISTER:
> +	case TRACE_REG_UNREGISTER:
> +		break;
> +	case TRACE_REG_PERF_REGISTER:
> +	case TRACE_REG_PERF_UNREGISTER:
> +		return 0;
> +	case TRACE_REG_PERF_OPEN:
> +		return perf_ftrace_function_register(data);
> +	case TRACE_REG_PERF_CLOSE:
> +		return perf_ftrace_function_unregister(data);
> +	case TRACE_REG_PERF_ADD:
> +		perf_ftrace_function_enable(data);
> +		return 0;
> +	case TRACE_REG_PERF_DEL:
> +		perf_ftrace_function_disable(data);
> +		return 0;
> +	}
> +
> +	return -EINVAL;
> +}

All the above from perf_ftrace_function_call() to here should perhaps
go to trace_function.c.

> diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> index bbeec31..867653c 100644
> --- a/kernel/trace/trace_export.c
> +++ b/kernel/trace/trace_export.c
> @@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
>  
>  #include "trace_entries.h"
>  
> +static int ftrace_event_class_register(struct ftrace_event_call *call,
> +				       enum trace_reg type, void *data)
> +{
> +	switch (type) {
> +	case TRACE_REG_PERF_REGISTER:
> +	case TRACE_REG_PERF_UNREGISTER:
> +		return 0;
> +	case TRACE_REG_PERF_OPEN:
> +	case TRACE_REG_PERF_CLOSE:
> +	case TRACE_REG_PERF_ADD:
> +	case TRACE_REG_PERF_DEL:
> +#ifdef CONFIG_PERF_EVENTS
> +		return perf_ftrace_event_register(call, type, data);
> +#endif
> +	case TRACE_REG_REGISTER:
> +	case TRACE_REG_UNREGISTER:
> +		break;
> +	}
> +
> +	return -EINVAL;
> +}

Hmm, one day we'll need to demux here. What about adding an argument to
FTRACE_ENTRY() to add the pointer to .reg ?

> +
>  #undef __entry
>  #define __entry REC
>  
> @@ -159,6 +181,7 @@ struct ftrace_event_class event_class_ftrace_##call = {			\
>  	.system			= __stringify(TRACE_SYSTEM),		\
>  	.define_fields		= ftrace_define_fields_##call,		\
>  	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
> +	.reg			= ftrace_event_class_register,		\
>  };									\
>  									\
>  struct ftrace_event_call __used event_##call = {			\
> @@ -170,4 +193,9 @@ struct ftrace_event_call __used event_##call = {			\
>  struct ftrace_event_call __used						\
>  __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
>  
> +int ftrace_event_is_function(struct ftrace_event_call *call)
> +{
> +	return call == &event_function;
> +}
> +
>  #include "trace_entries.h"
> -- 
> 1.7.1
> 

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions
  2012-02-02 17:35               ` Frederic Weisbecker
@ 2012-02-03 10:23                 ` Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-03 10:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Thu, Feb 02, 2012 at 06:35:06PM +0100, Frederic Weisbecker wrote:
> On Sat, Jan 28, 2012 at 07:43:25PM +0100, Jiri Olsa wrote:
> > Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
> > register/unregister from open/close actions.
> > 
> > The register/unregister actions are invoked for the first/last
> > tracepoint user when opening/closing the event.
> > 
> > The open/close actions are invoked for each tracepoint user when
> > opening/closing the event.
> > 
> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> > ---
> >  include/linux/ftrace_event.h    |    6 +-
> >  kernel/trace/trace.h            |    5 ++
> >  kernel/trace/trace_event_perf.c |  116 +++++++++++++++++++++++++--------------
> >  kernel/trace/trace_events.c     |   10 ++-
> >  kernel/trace/trace_kprobe.c     |    6 ++-
> >  kernel/trace/trace_syscalls.c   |   14 +++-
> >  6 files changed, 106 insertions(+), 51 deletions(-)
> > 
> > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > index c3da42d..195e360 100644
> > --- a/include/linux/ftrace_event.h
> > +++ b/include/linux/ftrace_event.h
> > @@ -146,6 +146,8 @@ enum trace_reg {
> >  	TRACE_REG_UNREGISTER,
> >  	TRACE_REG_PERF_REGISTER,
> >  	TRACE_REG_PERF_UNREGISTER,
> > +	TRACE_REG_PERF_OPEN,
> > +	TRACE_REG_PERF_CLOSE,
> >  };
> >  
> >  struct ftrace_event_call;
> > @@ -157,7 +159,7 @@ struct ftrace_event_class {
> >  	void			*perf_probe;
> >  #endif
> >  	int			(*reg)(struct ftrace_event_call *event,
> > -				       enum trace_reg type);
> > +				       enum trace_reg type, void *data);
> >  	int			(*define_fields)(struct ftrace_event_call *);
> >  	struct list_head	*(*get_fields)(struct ftrace_event_call *);
> >  	struct list_head	fields;
> > @@ -165,7 +167,7 @@ struct ftrace_event_class {
> >  };
> >  
> >  extern int ftrace_event_reg(struct ftrace_event_call *event,
> > -			    enum trace_reg type);
> > +			    enum trace_reg type, void *data);
> >  
> >  enum {
> >  	TRACE_EVENT_FL_ENABLED_BIT,
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index 55c6ea0..0eaf077 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -828,4 +828,9 @@ extern const char *__stop___trace_bprintk_fmt[];
> >  	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
> >  #include "trace_entries.h"
> >  
> > +#ifdef CONFIG_PERF_EVENTS
> > +int perf_ftrace_event_register(struct ftrace_event_call *call,
> > +			       enum trace_reg type, void *data);
> > +#endif
> 
> Seem to belong to a further patch. Nevermind.
> 
> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

right, I'll make the change in new version

thanks,
jirka

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf
  2012-02-02 18:14               ` Frederic Weisbecker
@ 2012-02-03 12:54                 ` Jiri Olsa
  2012-02-03 13:00                   ` Jiri Olsa
  2012-02-04 13:21                   ` Frederic Weisbecker
  2012-02-03 13:53                 ` Steven Rostedt
  1 sibling, 2 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-03 12:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Thu, Feb 02, 2012 at 07:14:12PM +0100, Frederic Weisbecker wrote:
> On Sat, Jan 28, 2012 at 07:43:27PM +0100, Jiri Olsa wrote:
> > Adding perf registration support for the ftrace function event,
> > so it is now possible to register it via perf interface.
> > 
> > The perf_event struct statically contains ftrace_ops as a handle
> > for function tracer. The function tracer is registered/unregistered
> > in open/close actions.
> > 
> > To be efficient, we enable/disable ftrace_ops each time the traced

SNIP

> > +
> > +	return -EINVAL;
> > +}
> 
> All the above from perf_ftrace_function_call() to here should perhaps
> go to trace_function.c.

hm, I'd call it rather trace_perf_function.c 

> 
> > diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> > index bbeec31..867653c 100644
> > --- a/kernel/trace/trace_export.c
> > +++ b/kernel/trace/trace_export.c
> > @@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
> >  
> >  #include "trace_entries.h"
> >  
> > +static int ftrace_event_class_register(struct ftrace_event_call *call,
> > +				       enum trace_reg type, void *data)
> > +{
> > +	switch (type) {
> > +	case TRACE_REG_PERF_REGISTER:
> > +	case TRACE_REG_PERF_UNREGISTER:
> > +		return 0;
> > +	case TRACE_REG_PERF_OPEN:
> > +	case TRACE_REG_PERF_CLOSE:
> > +	case TRACE_REG_PERF_ADD:
> > +	case TRACE_REG_PERF_DEL:
> > +#ifdef CONFIG_PERF_EVENTS
> > +		return perf_ftrace_event_register(call, type, data);
> > +#endif
> > +	case TRACE_REG_REGISTER:
> > +	case TRACE_REG_UNREGISTER:
> > +		break;
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> 
> Hmm, one day we'll need to demux here. What about adding an argument to
> FTRACE_ENTRY() to add the pointer to .reg ?

ok, would something like the attached change be ok?

thanks,
jirka


---
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55c6ea0..638476a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -68,6 +68,10 @@ enum trace_type {
 #undef FTRACE_ENTRY_DUP
 #define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk)
 
+#undef FTRACE_ENTRY_REG
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
+
 #include "trace_entries.h"
 
 /*
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index bbeec31..f74de86 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -18,6 +18,14 @@
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM	ftrace
 
+/*
+ * The FTRACE_ENTRY_REG macro allows ftrace entry to define register
+ * function and thus become accesible via perf.
+ */
+#undef FTRACE_ENTRY_REG
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
+
 /* not needed for this file */
 #undef __field_struct
 #define __field_struct(type, item)
@@ -152,13 +160,14 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 #undef F_printk
 #define F_printk(fmt, args...) #fmt ", "  __stringify(args)
 
-#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
+#undef FTRACE_ENTRY_REG
+#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn)\
 									\
 struct ftrace_event_class event_class_ftrace_##call = {			\
 	.system			= __stringify(TRACE_SYSTEM),		\
 	.define_fields		= ftrace_define_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
+	.reg			= regfn,				\
 };									\
 									\
 struct ftrace_event_call __used event_##call = {			\
@@ -170,4 +179,9 @@ struct ftrace_event_call __used event_##call = {			\
 struct ftrace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
+#undef FTRACE_ENTRY
+#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
+	FTRACE_ENTRY_REG(call, struct_name, etype,			\
+			 PARAMS(tstruct), PARAMS(print), NULL)
+
 #include "trace_entries.h"

^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf
  2012-02-03 12:54                 ` Jiri Olsa
@ 2012-02-03 13:00                   ` Jiri Olsa
  2012-02-03 14:07                     ` Steven Rostedt
  2012-02-04 13:21                   ` Frederic Weisbecker
  1 sibling, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-02-03 13:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Feb 03, 2012 at 01:54:13PM +0100, Jiri Olsa wrote:
> On Thu, Feb 02, 2012 at 07:14:12PM +0100, Frederic Weisbecker wrote:
> > On Sat, Jan 28, 2012 at 07:43:27PM +0100, Jiri Olsa wrote:
> > > Adding perf registration support for the ftrace function event,
> > > so it is now possible to register it via perf interface.
> > > 
> > > The perf_event struct statically contains ftrace_ops as a handle
> > > for function tracer. The function tracer is registered/unregistered
> > > in open/close actions.
> > > 
> > > To be efficient, we enable/disable ftrace_ops each time the traced
> 
> SNIP
> 
> > > +
> > > +	return -EINVAL;
> > > +}
> > 
> > All the above from perf_ftrace_function_call() to here should perhaps
> > go to trace_function.c.
> 
> hm, I'd call it rather trace_perf_function.c 
> 
> > 
> > > diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> > > index bbeec31..867653c 100644
> > > --- a/kernel/trace/trace_export.c
> > > +++ b/kernel/trace/trace_export.c
> > > @@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
> > >  
> > >  #include "trace_entries.h"
> > >  
> > > +static int ftrace_event_class_register(struct ftrace_event_call *call,
> > > +				       enum trace_reg type, void *data)
> > > +{
> > > +	switch (type) {
> > > +	case TRACE_REG_PERF_REGISTER:
> > > +	case TRACE_REG_PERF_UNREGISTER:
> > > +		return 0;
> > > +	case TRACE_REG_PERF_OPEN:
> > > +	case TRACE_REG_PERF_CLOSE:
> > > +	case TRACE_REG_PERF_ADD:
> > > +	case TRACE_REG_PERF_DEL:
> > > +#ifdef CONFIG_PERF_EVENTS
> > > +		return perf_ftrace_event_register(call, type, data);
> > > +#endif
> > > +	case TRACE_REG_REGISTER:
> > > +	case TRACE_REG_UNREGISTER:
> > > +		break;
> > > +	}
> > > +
> > > +	return -EINVAL;
> > > +}
> > 
> > Hmm, one day we'll need to demux here. What about adding an argument to
> > FTRACE_ENTRY() to add the pointer to .reg ?
> 
> ok, would something like the attached change be ok?
> 
> thanks,
> jirka

and use FTRACE_ENTRY_REG for ftrace event like:

FTRACE_ENTRY_REG(function, ftrace_entry,
        TRACE_FN,
        F_STRUCT(
                __field(        unsigned long,  ip              )
                __field(        unsigned long,  parent_ip       )
        ),
        F_printk(" %lx <-- %lx", __entry->ip, __entry->parent_ip),
        perf_ftrace_event_register
);

jirka

> 
> 
> ---
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 55c6ea0..638476a 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -68,6 +68,10 @@ enum trace_type {
>  #undef FTRACE_ENTRY_DUP
>  #define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk)
>  
> +#undef FTRACE_ENTRY_REG
> +#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
> +	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
> +
>  #include "trace_entries.h"
>  
>  /*
> diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> index bbeec31..f74de86 100644
> --- a/kernel/trace/trace_export.c
> +++ b/kernel/trace/trace_export.c
> @@ -18,6 +18,14 @@
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM	ftrace
>  
> +/*
> + * The FTRACE_ENTRY_REG macro allows ftrace entry to define register
> + * function and thus become accesible via perf.
> + */
> +#undef FTRACE_ENTRY_REG
> +#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
> +	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
> +
>  /* not needed for this file */
>  #undef __field_struct
>  #define __field_struct(type, item)
> @@ -152,13 +160,14 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
>  #undef F_printk
>  #define F_printk(fmt, args...) #fmt ", "  __stringify(args)
>  
> -#undef FTRACE_ENTRY
> -#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
> +#undef FTRACE_ENTRY_REG
> +#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn)\
>  									\
>  struct ftrace_event_class event_class_ftrace_##call = {			\
>  	.system			= __stringify(TRACE_SYSTEM),		\
>  	.define_fields		= ftrace_define_fields_##call,		\
>  	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
> +	.reg			= regfn,				\
>  };									\
>  									\
>  struct ftrace_event_call __used event_##call = {			\
> @@ -170,4 +179,9 @@ struct ftrace_event_call __used event_##call = {			\
>  struct ftrace_event_call __used						\
>  __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
>  
> +#undef FTRACE_ENTRY
> +#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
> +	FTRACE_ENTRY_REG(call, struct_name, etype,			\
> +			 PARAMS(tstruct), PARAMS(print), NULL)
> +
>  #include "trace_entries.h"

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-28 18:43             ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
  2012-01-30  5:59               ` Frederic Weisbecker
@ 2012-02-03 13:40               ` Steven Rostedt
  1 sibling, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2012-02-03 13:40 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Sat, 2012-01-28 at 19:43 +0100, Jiri Olsa wrote:
>  
> +/*
> + * FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
> + * set in the flags member.
> + *
> + * ENABLED - set/unset when ftrace_ops is registered/unregistered
> + * GLOBAL  - set manualy by ftrace_ops user to denote the ftrace_ops
> + *           is part of the global tracers sharing the same filter
> + *           via set_ftrace_* debugfs files.
> + * DYNAMIC - set when ftrace_ops is registered to denote dynamically
> + *           allocated ftrace_ops which need special care
> + * CONTROL - set manualy by ftrace_ops user to denote the ftrace_ops
> + *           could be controled by following calls:
> + *           ftrace_function_enable, ftrace_function_disable
> + */
>  enum {
>  	FTRACE_OPS_FL_ENABLED		= 1 << 0,
>  	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
>  	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
> +	FTRACE_OPS_FL_CONTROL		= 1 << 3,
>  };
>  

Nicely written :-)


-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-01-30  9:18                 ` Jiri Olsa
@ 2012-02-03 13:42                   ` Steven Rostedt
  2012-02-03 13:50                     ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2012-02-03 13:42 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Frederic Weisbecker, mingo, paulus, acme, a.p.zijlstra,
	linux-kernel, aarapov

On Mon, 2012-01-30 at 10:18 +0100, Jiri Olsa wrote:

> attached new patch :)

When attaching a new patch, please also include the change log and the
Signed-off-by. I actually can't take this patch without the SOB.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-02-03 13:42                   ` Steven Rostedt
@ 2012-02-03 13:50                     ` Jiri Olsa
  2012-02-03 14:08                       ` Steven Rostedt
  0 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-02-03 13:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, mingo, paulus, acme, a.p.zijlstra,
	linux-kernel, aarapov

On Fri, Feb 03, 2012 at 08:42:40AM -0500, Steven Rostedt wrote:
> On Mon, 2012-01-30 at 10:18 +0100, Jiri Olsa wrote:
> 
> > attached new patch :)
> 
> When attaching a new patch, please also include the change log and the
> Signed-off-by. I actually can't take this patch without the SOB.

ok, will do next time

If you want to take 1/7 and 2/7 now I can resend 2/7 right away,
but there's going to be new version for the rest based on Frederic's comments.. 

jirka

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf
  2012-02-02 18:14               ` Frederic Weisbecker
  2012-02-03 12:54                 ` Jiri Olsa
@ 2012-02-03 13:53                 ` Steven Rostedt
  1 sibling, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2012-02-03 13:53 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jiri Olsa, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Thu, 2012-02-02 at 19:14 +0100, Frederic Weisbecker wrote:

> All the above from perf_ftrace_function_call() to here should perhaps
> go to trace_function.c.

Please no. This is a perf specific call into the function tracer.
Unless it is rewritten to be a completely generic function. As with the
normal rational on "generic" vs "specific" in the kernel. When there's
only one user, then keep it "specific". When we get two or more users,
then it can be converted to be generic.

Please do not put this into trace_function.c, as the only user of it is
here.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf
  2012-02-03 13:00                   ` Jiri Olsa
@ 2012-02-03 14:07                     ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2012-02-03 14:07 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Frederic Weisbecker, mingo, paulus, acme, a.p.zijlstra,
	linux-kernel, aarapov

On Fri, 2012-02-03 at 14:00 +0100, Jiri Olsa wrote:

> and use FTRACE_ENTRY_REG for ftrace event like:
> 
> FTRACE_ENTRY_REG(function, ftrace_entry,
>         TRACE_FN,
>         F_STRUCT(
>                 __field(        unsigned long,  ip              )
>                 __field(        unsigned long,  parent_ip       )
>         ),
>         F_printk(" %lx <-- %lx", __entry->ip, __entry->parent_ip),
>         perf_ftrace_event_register
> );

This should be fine.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface
  2012-02-03 13:50                     ` Jiri Olsa
@ 2012-02-03 14:08                       ` Steven Rostedt
  2012-02-03 14:22                         ` [PATCHv8 0/2] first 2 patches passed review Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2012-02-03 14:08 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Frederic Weisbecker, mingo, paulus, acme, a.p.zijlstra,
	linux-kernel, aarapov

On Fri, 2012-02-03 at 14:50 +0100, Jiri Olsa wrote:

> If you want to take 1/7 and 2/7 now I can resend 2/7 right away,
> but there's going to be new version for the rest based on Frederic's comments.. 

Yeah, you can send the 2 patches as a separate series, and I'll start
running them through my tests.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type
  2012-01-28 18:43             ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
@ 2012-02-03 14:16               ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2012-02-03 14:16 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Sat, 2012-01-28 at 19:43 +0100, Jiri Olsa wrote:
>  
> +#define FILTER_TYPE_TRACE_FN           FILTER_TRACE_FN
> +#define FILTER_TYPE_TRACE_GRAPH_ENT    FILTER_OTHER
> +#define FILTER_TYPE_TRACE_GRAPH_RET    FILTER_OTHER
> +#define FILTER_TYPE_TRACE_CTX          FILTER_OTHER
> +#define FILTER_TYPE_TRACE_WAKE         FILTER_OTHER
> +#define FILTER_TYPE_TRACE_STACK                FILTER_OTHER
> +#define FILTER_TYPE_TRACE_USER_STACK   FILTER_OTHER
> +#define FILTER_TYPE_TRACE_BPRINT       FILTER_OTHER
> +#define FILTER_TYPE_TRACE_PRINT                FILTER_OTHER
> +#define FILTER_TYPE_TRACE_MMIO_RW      FILTER_OTHER
> +#define FILTER_TYPE_TRACE_MMIO_MAP     FILTER_OTHER
> +#define FILTER_TYPE_TRACE_BRANCH       FILTER_OTHER
> +#define FILTER_TYPE(arg)               FILTER_TYPE_##arg
> +
>  #undef FTRACE_ENTRY
>  #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\

If all FTRACE_ENTRY needs a filter defined (as you did with the #defines
above), then we should just add a FILTER field to FTRACE_ENTRY(). The
defines are just ugly, and will be a pain if we ever add or remove a
type. If we remove a type, we'll probably forget to remove the define
for it.

#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)

	int filter_type = filter;

And then the trace entries can have the type of filter.

	F_STRUCT(
		...
	),

	FILTER_OTHER

That would be much cleaner.

-- Steve

>  int									\
> @@ -123,6 +137,7 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
>  {									\
>  	struct struct_name field;					\
>  	int ret;							\
> +	int filter_type = FILTER_TYPE(id);				\
>  									\
>  	tstruct;							\
>  									\



^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCHv8 0/2] first 2 patches passed review
  2012-02-03 14:08                       ` Steven Rostedt
@ 2012-02-03 14:22                         ` Jiri Olsa
  2012-02-03 14:22                           ` [PATCH 1/2] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
                                             ` (2 more replies)
  0 siblings, 3 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-03 14:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

ok, sending 1st 2 patches separatelly

thanks,
jirka
---
 include/linux/ftrace.h |   70 ++++++++++++++++++++++++++-
 kernel/trace/ftrace.c  |  126 +++++++++++++++++++++++++++++++++++++++++++-----
 kernel/trace/trace.h   |    2 +
 3 files changed, 183 insertions(+), 15 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCH 1/2] ftrace: Change filter/notrace set functions to return exit code
  2012-02-03 14:22                         ` [PATCHv8 0/2] first 2 patches passed review Jiri Olsa
@ 2012-02-03 14:22                           ` Jiri Olsa
  2012-02-03 14:22                           ` [PATCH 2/2] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
  2012-02-04 13:24                           ` [PATCHv8 0/2] first 2 patches passed review Frederic Weisbecker
  2 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-03 14:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.

The set_ftrace_filter interface returns error in case the filter
did not match:

  # echo krava > set_ftrace_filter
  bash: echo: write error: Invalid argument

Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |    4 ++--
 kernel/trace/ftrace.c  |   15 +++++++++------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 028e26f..f33fb3b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -178,9 +178,9 @@ struct dyn_ftrace {
 };
 
 int ftrace_force_update(void);
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset);
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 683d559..e2e0597 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3146,8 +3146,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	mutex_lock(&ftrace_regex_lock);
 	if (reset)
 		ftrace_filter_reset(hash);
-	if (buf)
-		ftrace_match_records(hash, buf, len);
+	if (buf && !ftrace_match_records(hash, buf, len)) {
+		ret = -EINVAL;
+		goto out_regex_unlock;
+	}
 
 	mutex_lock(&ftrace_lock);
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
@@ -3157,6 +3159,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 
 	mutex_unlock(&ftrace_lock);
 
+ out_regex_unlock:
 	mutex_unlock(&ftrace_regex_lock);
 
 	free_ftrace_hash(hash);
@@ -3173,10 +3176,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
  * Filters denote which functions should be enabled when tracing is enabled.
  * If @buf is NULL and reset is set, all functions will be enabled for tracing.
  */
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 1);
+	return ftrace_set_regex(ops, buf, len, reset, 1);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_filter);
 
@@ -3191,10 +3194,10 @@ EXPORT_SYMBOL_GPL(ftrace_set_filter);
  * is enabled. If @buf is NULL and reset is set, all functions will be enabled
  * for tracing.
  */
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 0);
+	return ftrace_set_regex(ops, buf, len, reset, 0);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_notrace);
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 2/2] ftrace: Add enable/disable ftrace_ops control interface
  2012-02-03 14:22                         ` [PATCHv8 0/2] first 2 patches passed review Jiri Olsa
  2012-02-03 14:22                           ` [PATCH 1/2] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2012-02-03 14:22                           ` Jiri Olsa
  2012-02-04 13:24                           ` [PATCHv8 0/2] first 2 patches passed review Frederic Weisbecker
  2 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-03 14:22 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.

Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.

When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.

The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.

Adding 3 inline functions:
  ftrace_function_local_disable/ftrace_function_local_enable
  - enable/disable the ftrace_ops on current cpu
  ftrace_function_local_disabled
  - get disabled ftrace_ops::disabled value for current cpu

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h |   66 ++++++++++++++++++++++++++++
 kernel/trace/ftrace.c  |  111 +++++++++++++++++++++++++++++++++++++++++++++---
 kernel/trace/trace.h   |    2 +
 3 files changed, 172 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index f33fb3b..64a309d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -31,16 +31,33 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
 
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
 
+/*
+ * FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
+ * set in the flags member.
+ *
+ * ENABLED - set/unset when ftrace_ops is registered/unregistered
+ * GLOBAL  - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           is part of the global tracers sharing the same filter
+ *           via set_ftrace_* debugfs files.
+ * DYNAMIC - set when ftrace_ops is registered to denote dynamically
+ *           allocated ftrace_ops which need special care
+ * CONTROL - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           could be controled by following calls:
+ *             ftrace_function_local_enable
+ *             ftrace_function_local_disable
+ */
 enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	int __percpu			*disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +114,55 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+/**
+ * ftrace_function_local_enable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline void ftrace_function_local_enable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))--;
+}
+
+/**
+ * ftrace_function_local_disable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline void ftrace_function_local_disable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))++;
+}
+
+/**
+ * ftrace_function_local_disabled - returns ftrace_ops disabled value
+ *                                  on current cpu
+ *
+ * This function returns value of ftrace_ops::disabled on current cpu.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline int ftrace_function_local_disabled(struct ftrace_ops *ops)
+{
+	WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL));
+	return *this_cpu_ptr(ops->disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e2e0597..c8d2af2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -62,6 +62,8 @@
 #define FTRACE_HASH_DEFAULT_BITS 10
 #define FTRACE_HASH_MAX_BITS 12
 
+#define FL_GLOBAL_CONTROL_MASK (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
+
 /* ftrace_enabled is a method to turn ftrace on or off */
 int ftrace_enabled __read_mostly;
 static int last_ftrace_enabled;
@@ -89,12 +91,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
 };
 
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
 ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
+static struct ftrace_ops control_ops;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
@@ -168,6 +172,32 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
 }
 #endif
 
+static void control_ops_disable_all(struct ftrace_ops *ops)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		*per_cpu_ptr(ops->disabled, cpu) = 1;
+}
+
+static int control_ops_alloc(struct ftrace_ops *ops)
+{
+	int __percpu *disabled;
+
+	disabled = alloc_percpu(int);
+	if (!disabled)
+		return -ENOMEM;
+
+	ops->disabled = disabled;
+	control_ops_disable_all(ops);
+	return 0;
+}
+
+static void control_ops_free(struct ftrace_ops *ops)
+{
+	free_percpu(ops->disabled);
+}
+
 static void update_global_ops(void)
 {
 	ftrace_func_t func;
@@ -259,6 +289,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	return 0;
 }
 
+static void add_ftrace_list_ops(struct ftrace_ops **list,
+				struct ftrace_ops *main_ops,
+				struct ftrace_ops *ops)
+{
+	int first = *list == &ftrace_list_end;
+	add_ftrace_ops(list, ops);
+	if (first)
+		add_ftrace_ops(&ftrace_ops_list, main_ops);
+}
+
+static int remove_ftrace_list_ops(struct ftrace_ops **list,
+				  struct ftrace_ops *main_ops,
+				  struct ftrace_ops *ops)
+{
+	int ret = remove_ftrace_ops(list, ops);
+	if (!ret && *list == &ftrace_list_end)
+		ret = remove_ftrace_ops(&ftrace_ops_list, main_ops);
+	return ret;
+}
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ftrace_disabled)
@@ -270,15 +320,20 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
 		return -EBUSY;
 
+	/* We don't support both control and global flags set. */
+	if ((ops->flags & FL_GLOBAL_CONTROL_MASK) == FL_GLOBAL_CONTROL_MASK)
+		return -EINVAL;
+
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		int first = ftrace_global_list == &ftrace_list_end;
-		add_ftrace_ops(&ftrace_global_list, ops);
+		add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
-		if (first)
-			add_ftrace_ops(&ftrace_ops_list, &global_ops);
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		if (control_ops_alloc(ops))
+			return -ENOMEM;
+		add_ftrace_list_ops(&ftrace_control_list, &control_ops, ops);
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -302,11 +357,23 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 		return -EINVAL;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		ret = remove_ftrace_ops(&ftrace_global_list, ops);
-		if (!ret && ftrace_global_list == &ftrace_list_end)
-			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+		ret = remove_ftrace_list_ops(&ftrace_global_list,
+					     &global_ops, ops);
 		if (!ret)
 			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		ret = remove_ftrace_list_ops(&ftrace_control_list,
+					     &control_ops, ops);
+		if (!ret) {
+			/*
+			 * The ftrace_ops is now removed from the list,
+			 * so there'll be no new users. We must ensure
+			 * all current users are done before we free
+			 * the control data.
+			 */
+			synchronize_sched();
+			control_ops_free(ops);
+		}
 	} else
 		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -3874,6 +3941,36 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
+ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_ops *op;
+
+	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
+		return;
+
+	/*
+	 * Some of the ops may be dynamically allocated,
+	 * they must be freed after a synchronize_sched().
+	 */
+	preempt_disable_notrace();
+	trace_recursion_set(TRACE_CONTROL_BIT);
+	op = rcu_dereference_raw(ftrace_control_list);
+	while (op != &ftrace_list_end) {
+		if (!ftrace_function_local_disabled(op) &&
+		    ftrace_ops_test(op, ip))
+			op->func(ip, parent_ip);
+
+		op = rcu_dereference_raw(op->next);
+	};
+	trace_recursion_clear(TRACE_CONTROL_BIT);
+	preempt_enable_notrace();
+}
+
+static struct ftrace_ops control_ops = {
+	.func = ftrace_ops_control_func,
+};
+
+static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 {
 	struct ftrace_ops *op;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b93ecba..55c6ea0 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ struct tracer {
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
 #define TRACE_GLOBAL_BIT		(1<<12)
+#define TRACE_CONTROL_BIT		(1<<13)
+
 /*
  * Abuse of the trace_recursion.
  * As we need a way to maintain state if we are tracing the function
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf
  2012-02-03 12:54                 ` Jiri Olsa
  2012-02-03 13:00                   ` Jiri Olsa
@ 2012-02-04 13:21                   ` Frederic Weisbecker
  2012-02-06 19:35                     ` Steven Rostedt
  1 sibling, 1 reply; 186+ messages in thread
From: Frederic Weisbecker @ 2012-02-04 13:21 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Feb 03, 2012 at 01:54:13PM +0100, Jiri Olsa wrote:
> On Thu, Feb 02, 2012 at 07:14:12PM +0100, Frederic Weisbecker wrote:
> > On Sat, Jan 28, 2012 at 07:43:27PM +0100, Jiri Olsa wrote:
> > > Adding perf registration support for the ftrace function event,
> > > so it is now possible to register it via perf interface.
> > > 
> > > The perf_event struct statically contains ftrace_ops as a handle
> > > for function tracer. The function tracer is registered/unregistered
> > > in open/close actions.
> > > 
> > > To be efficient, we enable/disable ftrace_ops each time the traced
> 
> SNIP
> 
> > > +
> > > +	return -EINVAL;
> > > +}
> > 
> > All the above from perf_ftrace_function_call() to here should perhaps
> > go to trace_function.c.
> 
> hm, I'd call it rather trace_perf_function.c 

Yeah looks fine.

> 
> > 
> > > diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> > > index bbeec31..867653c 100644
> > > --- a/kernel/trace/trace_export.c
> > > +++ b/kernel/trace/trace_export.c
> > > @@ -131,6 +131,28 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
> > >  
> > >  #include "trace_entries.h"
> > >  
> > > +static int ftrace_event_class_register(struct ftrace_event_call *call,
> > > +				       enum trace_reg type, void *data)
> > > +{
> > > +	switch (type) {
> > > +	case TRACE_REG_PERF_REGISTER:
> > > +	case TRACE_REG_PERF_UNREGISTER:
> > > +		return 0;
> > > +	case TRACE_REG_PERF_OPEN:
> > > +	case TRACE_REG_PERF_CLOSE:
> > > +	case TRACE_REG_PERF_ADD:
> > > +	case TRACE_REG_PERF_DEL:
> > > +#ifdef CONFIG_PERF_EVENTS
> > > +		return perf_ftrace_event_register(call, type, data);
> > > +#endif
> > > +	case TRACE_REG_REGISTER:
> > > +	case TRACE_REG_UNREGISTER:
> > > +		break;
> > > +	}
> > > +
> > > +	return -EINVAL;
> > > +}
> > 
> > Hmm, one day we'll need to demux here. What about adding an argument to
> > FTRACE_ENTRY() to add the pointer to .reg ?
> 
> ok, would something like the attached change be ok?
> 
> thanks,
> jirka
> 
> 
> ---
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 55c6ea0..638476a 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -68,6 +68,10 @@ enum trace_type {
>  #undef FTRACE_ENTRY_DUP
>  #define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk)
>  
> +#undef FTRACE_ENTRY_REG
> +#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
> +	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
> +
>  #include "trace_entries.h"
>  
>  /*
> diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> index bbeec31..f74de86 100644
> --- a/kernel/trace/trace_export.c
> +++ b/kernel/trace/trace_export.c
> @@ -18,6 +18,14 @@
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM	ftrace
>  
> +/*
> + * The FTRACE_ENTRY_REG macro allows ftrace entry to define register
> + * function and thus become accesible via perf.
> + */
> +#undef FTRACE_ENTRY_REG
> +#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
> +	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
> +
>  /* not needed for this file */
>  #undef __field_struct
>  #define __field_struct(type, item)
> @@ -152,13 +160,14 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
>  #undef F_printk
>  #define F_printk(fmt, args...) #fmt ", "  __stringify(args)
>  
> -#undef FTRACE_ENTRY
> -#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
> +#undef FTRACE_ENTRY_REG
> +#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn)\
>  									\
>  struct ftrace_event_class event_class_ftrace_##call = {			\
>  	.system			= __stringify(TRACE_SYSTEM),		\
>  	.define_fields		= ftrace_define_fields_##call,		\
>  	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
> +	.reg			= regfn,				\
>  };									\
>  									\
>  struct ftrace_event_call __used event_##call = {			\
> @@ -170,4 +179,9 @@ struct ftrace_event_call __used event_##call = {			\
>  struct ftrace_event_call __used						\
>  __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
>  
> +#undef FTRACE_ENTRY
> +#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
> +	FTRACE_ENTRY_REG(call, struct_name, etype,			\
> +			 PARAMS(tstruct), PARAMS(print), NULL)
> +
>  #include "trace_entries.h"


Yeah looks good. I wouldn't mind having only FTRACE_ENTRY() with one
more parameter but I'm fine with the two macros as well.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv8 0/2] first 2 patches passed review
  2012-02-03 14:22                         ` [PATCHv8 0/2] first 2 patches passed review Jiri Olsa
  2012-02-03 14:22                           ` [PATCH 1/2] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
  2012-02-03 14:22                           ` [PATCH 2/2] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2012-02-04 13:24                           ` Frederic Weisbecker
  2 siblings, 0 replies; 186+ messages in thread
From: Frederic Weisbecker @ 2012-02-04 13:24 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Feb 03, 2012 at 03:22:37PM +0100, Jiri Olsa wrote:
> ok, sending 1st 2 patches separatelly
> 
> thanks,
> jirka

Please also keep the acks, unless you do some noticeable change :)

> ---
>  include/linux/ftrace.h |   70 ++++++++++++++++++++++++++-
>  kernel/trace/ftrace.c  |  126 +++++++++++++++++++++++++++++++++++++++++++-----
>  kernel/trace/trace.h   |    2 +
>  3 files changed, 183 insertions(+), 15 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf
  2012-02-04 13:21                   ` Frederic Weisbecker
@ 2012-02-06 19:35                     ` Steven Rostedt
  0 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2012-02-06 19:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jiri Olsa, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Sat, 2012-02-04 at 14:21 +0100, Frederic Weisbecker wrote:
>   struct ftrace_event_call __used event_##call = {			\
> > @@ -170,4 +179,9 @@ struct ftrace_event_call __used event_##call = {			\
> >  struct ftrace_event_call __used						\
> >  __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
> >  
> > +#undef FTRACE_ENTRY
> > +#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
> > +	FTRACE_ENTRY_REG(call, struct_name, etype,			\
> > +			 PARAMS(tstruct), PARAMS(print), NULL)
> > +
> >  #include "trace_entries.h"
> 
> 
> Yeah looks good. I wouldn't mind having only FTRACE_ENTRY() with one
> more parameter but I'm fine with the two macros as well.

I prefer the two macros. It's cleaner.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCH 7/7] ftrace, perf: Add filter support for function trace event
  2012-01-28 18:43             ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2012-02-07  0:20               ` Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-07  0:20 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

On Sat, Jan 28, 2012 at 07:43:29PM +0100, Jiri Olsa wrote:
> Adding support to filter function trace event via perf
> interface. It is now possible to use filter interface
> in the perf tool like:
> 
>   perf record -e ftrace:function --filter="(ip == mm_*)" ls
> 
> The filter syntax is restricted to the the 'ip' field only,
> and following operators are accepted '==' '!=' '||', ending
> up with the filter strings like:
> 
>   ip == f1[, ]f2 ... || ip != f3[, ]f4 ...
> 
> with comma ',' or space ' ' as a function separator. If the
> space ' ' is used as a separator, the right side of the
> assignment needs to be enclosed in double quotes '"'.
> 
> The '==' operator adds trace filter with same effect as would
> be added via set_ftrace_filter file.
> 
> The '!=' operator adds trace filter with same effect as would
> be added via set_ftrace_notrace file.
> 
> The right side of the '!=', '==' operators is list of functions
> or regexp. to be added to filter separated by space.
> 
> The '||' operator is used for connecting multiple filter definitions
> together. It is possible to have more than one '==' and '!='
> operators within one filter string.
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>

SNIP

>  static void perf_ftrace_function_enable(struct perf_event *event)
> diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> index eb04a2a..c8a64ec 100644
> --- a/kernel/trace/trace_events_filter.c
> +++ b/kernel/trace/trace_events_filter.c
> @@ -54,6 +54,13 @@ struct filter_op {
>  	int precedence;
>  };
>  
> +static struct filter_op filter_ftrace_ops[] = {
> +	{ OP_OR,	"||",		1 },
> +	{ OP_NE,	"!=",		2 },
> +	{ OP_EQ,	"==",		2 },
> +	{ OP_NONE,	"OP_NONE",	0 },
> +};

ugh.. just found I cannot define filter_ftrace_ops like this,
will send fix with new version.

thanks,
jirka

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCHv8 0/8] ftrace, perf: Adding support to use function trace
  2012-01-28 18:43           ` [PATCHv7 " Jiri Olsa
                               ` (6 preceding siblings ...)
  2012-01-28 18:43             ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2012-02-07 19:44             ` Jiri Olsa
  2012-02-07 19:44               ` [PATCH 1/8] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
                                 ` (9 more replies)
  7 siblings, 10 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-07 19:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov

hi,
here's another version of perf support for function trace
with filter. 

attached patches:
 - 1/8 ftrace: Change filter/notrace set functions to return exit code
 - 2/8 ftrace: Add enable/disable ftrace_ops control interface
 - 3/8 ftrace, perf: Add open/close tracepoint perf registration actions
 - 4/8 ftrace, perf: Add add/del tracepoint perf registration actions
 - 5/8 ftrace: Add FTRACE_ENTRY_REG macro to allow event registration
 - 6/8 ftrace, perf: Add support to use function tracepoint in perf
 - 7/8 ftrace: Allow to specify filter field type for ftrace events
 - 8/8 ftrace, perf: Add filter support for function trace event

v8 changes:
  1/8 - acked
  2/8 - acked
  3/8 - acked
  4/8 - acked
  5/8 - new patch - added FTRACE_ENTRY_REG macro
  6/8 - using FTRACE_ENTRY_REG macro to define ftrace event
  7/8 - adding filter parameter to FTRACE_ENTRY macro
  8/8 - using only walk_pred_tree to check the filter
        instead using different special filter_ops,

v7 changes:
  2/7 - using int instead of atomic_t for ftrace_ops::disable
      - using this_cpu_ptr to touch ftrace_ops::disable
      - renamed ftrace_ops:disable API
          void ftrace_function_local_enable(struct ftrace_ops *ops)
          void ftrace_function_local_disable(struct ftrace_ops *ops)
          int  ftrace_function_local_disabled(struct ftrace_ops *ops)

v6 changes:
  2/7 - added comments to FTRACE_OPS_FL_* bits enum
  5/7 - added more info to the change log regarding ftrace_ops enable/disable
  7/7 - rebased to the latest filter changes

v5 changes:
  7/7 - fixed to properly support ',' in filter expressions

v4 changes:
  2/7 - FL_GLOBAL_CONTROL changed to FL_GLOBAL_CONTROL_MASK
      - changed WARN_ON_ONCE() to include the !preempt_count()
      - changed this_cpu_ptr to per_cpu_ptr

  ommited Fix possible NULL dereferencing in __ftrace_hash_rec_update
  (2/8 in v3)

v3 changes:
  3/8 - renamed __add/remove_ftrace_ops
      - fixed preemtp_enable/recursion_clear order in ftrace_ops_control_func 
      - renamed/commented API functions -  enable/disable_ftrace_function
  
  ommited graph tracer workarounf patch 10/10  

v2 changes:
 01/10 - keeping the old fix instead of adding hash_has_contents func
         I'll send separating patchset for this
 02/10 - using different way to avoid the issue (3/9 in v1)
 03/10 - using the way proposed by Steven for controling ftrace_ops
         (4/9 in v1)
 06/10 - added check ensuring the ftrace:function event could be used by
         root only (7/9 in v1)
 08/10 - added more description (8/9 in v1)
 09/10 - changed '&&' operator to '||' which seems more suitable
         in this case (9/9 in v1)

thanks,
jirka
---
 include/linux/ftrace.h             |   71 ++++++++++++-
 include/linux/ftrace_event.h       |    9 ++-
 include/linux/perf_event.h         |    3 +
 kernel/trace/ftrace.c              |  132 +++++++++++++++++++++---
 kernel/trace/trace.h               |   34 ++++--
 kernel/trace/trace_entries.h       |   54 +++++++---
 kernel/trace/trace_event_perf.c    |  206 ++++++++++++++++++++++++++++--------
 kernel/trace/trace_events.c        |   12 ++-
 kernel/trace/trace_events_filter.c |  160 +++++++++++++++++++++++++++-
 kernel/trace/trace_export.c        |   64 ++++++++----
 kernel/trace/trace_kprobe.c        |    8 ++-
 kernel/trace/trace_syscalls.c      |   18 +++-
 12 files changed, 654 insertions(+), 117 deletions(-)

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [PATCH 1/8] ftrace: Change filter/notrace set functions to return exit code
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
@ 2012-02-07 19:44               ` Jiri Olsa
  2012-02-07 19:44               ` [PATCH 2/8] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
                                 ` (8 subsequent siblings)
  9 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-07 19:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.

The set_ftrace_filter interface returns error in case the filter
did not match:

  # echo krava > set_ftrace_filter
  bash: echo: write error: Invalid argument

Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/ftrace.h |    4 ++--
 kernel/trace/ftrace.c  |   15 +++++++++------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 028e26f..f33fb3b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -178,9 +178,9 @@ struct dyn_ftrace {
 };
 
 int ftrace_force_update(void);
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset);
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 683d559..e2e0597 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3146,8 +3146,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	mutex_lock(&ftrace_regex_lock);
 	if (reset)
 		ftrace_filter_reset(hash);
-	if (buf)
-		ftrace_match_records(hash, buf, len);
+	if (buf && !ftrace_match_records(hash, buf, len)) {
+		ret = -EINVAL;
+		goto out_regex_unlock;
+	}
 
 	mutex_lock(&ftrace_lock);
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
@@ -3157,6 +3159,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 
 	mutex_unlock(&ftrace_lock);
 
+ out_regex_unlock:
 	mutex_unlock(&ftrace_regex_lock);
 
 	free_ftrace_hash(hash);
@@ -3173,10 +3176,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
  * Filters denote which functions should be enabled when tracing is enabled.
  * If @buf is NULL and reset is set, all functions will be enabled for tracing.
  */
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 1);
+	return ftrace_set_regex(ops, buf, len, reset, 1);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_filter);
 
@@ -3191,10 +3194,10 @@ EXPORT_SYMBOL_GPL(ftrace_set_filter);
  * is enabled. If @buf is NULL and reset is set, all functions will be enabled
  * for tracing.
  */
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 0);
+	return ftrace_set_regex(ops, buf, len, reset, 0);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_notrace);
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 2/8] ftrace: Add enable/disable ftrace_ops control interface
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
  2012-02-07 19:44               ` [PATCH 1/8] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2012-02-07 19:44               ` Jiri Olsa
  2012-02-07 19:44               ` [PATCH 3/8] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
                                 ` (7 subsequent siblings)
  9 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-07 19:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.

Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.

When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.

The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.

Adding 3 inline functions:
  ftrace_function_local_disable/ftrace_function_local_enable
  - enable/disable the ftrace_ops on current cpu
  ftrace_function_local_disabled
  - get disabled ftrace_ops::disabled value for current cpu

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/ftrace.h |   66 ++++++++++++++++++++++++++++
 kernel/trace/ftrace.c  |  111 +++++++++++++++++++++++++++++++++++++++++++++---
 kernel/trace/trace.h   |    2 +
 3 files changed, 172 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index f33fb3b..64a309d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -31,16 +31,33 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
 
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
 
+/*
+ * FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
+ * set in the flags member.
+ *
+ * ENABLED - set/unset when ftrace_ops is registered/unregistered
+ * GLOBAL  - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           is part of the global tracers sharing the same filter
+ *           via set_ftrace_* debugfs files.
+ * DYNAMIC - set when ftrace_ops is registered to denote dynamically
+ *           allocated ftrace_ops which need special care
+ * CONTROL - set manualy by ftrace_ops user to denote the ftrace_ops
+ *           could be controled by following calls:
+ *             ftrace_function_local_enable
+ *             ftrace_function_local_disable
+ */
 enum {
 	FTRACE_OPS_FL_ENABLED		= 1 << 0,
 	FTRACE_OPS_FL_GLOBAL		= 1 << 1,
 	FTRACE_OPS_FL_DYNAMIC		= 1 << 2,
+	FTRACE_OPS_FL_CONTROL		= 1 << 3,
 };
 
 struct ftrace_ops {
 	ftrace_func_t			func;
 	struct ftrace_ops		*next;
 	unsigned long			flags;
+	int __percpu			*disabled;
 #ifdef CONFIG_DYNAMIC_FTRACE
 	struct ftrace_hash		*notrace_hash;
 	struct ftrace_hash		*filter_hash;
@@ -97,6 +114,55 @@ int register_ftrace_function(struct ftrace_ops *ops);
 int unregister_ftrace_function(struct ftrace_ops *ops);
 void clear_ftrace_function(void);
 
+/**
+ * ftrace_function_local_enable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline void ftrace_function_local_enable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))--;
+}
+
+/**
+ * ftrace_function_local_disable - enable controlled ftrace_ops on current cpu
+ *
+ * This function enables tracing on current cpu by decreasing
+ * the per cpu control variable.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline void ftrace_function_local_disable(struct ftrace_ops *ops)
+{
+	if (WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL)))
+		return;
+
+	(*this_cpu_ptr(ops->disabled))++;
+}
+
+/**
+ * ftrace_function_local_disabled - returns ftrace_ops disabled value
+ *                                  on current cpu
+ *
+ * This function returns value of ftrace_ops::disabled on current cpu.
+ * It must be called with preemption disabled and only on ftrace_ops
+ * registered with FTRACE_OPS_FL_CONTROL. If called without preemption
+ * disabled, this_cpu_ptr will complain when CONFIG_DEBUG_PREEMPT is enabled.
+ */
+static inline int ftrace_function_local_disabled(struct ftrace_ops *ops)
+{
+	WARN_ON_ONCE(!(ops->flags & FTRACE_OPS_FL_CONTROL));
+	return *this_cpu_ptr(ops->disabled);
+}
+
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
 #else /* !CONFIG_FUNCTION_TRACER */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e2e0597..c8d2af2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -62,6 +62,8 @@
 #define FTRACE_HASH_DEFAULT_BITS 10
 #define FTRACE_HASH_MAX_BITS 12
 
+#define FL_GLOBAL_CONTROL_MASK (FTRACE_OPS_FL_GLOBAL | FTRACE_OPS_FL_CONTROL)
+
 /* ftrace_enabled is a method to turn ftrace on or off */
 int ftrace_enabled __read_mostly;
 static int last_ftrace_enabled;
@@ -89,12 +91,14 @@ static struct ftrace_ops ftrace_list_end __read_mostly = {
 };
 
 static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_control_list __read_mostly = &ftrace_list_end;
 static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 static ftrace_func_t __ftrace_trace_function_delay __read_mostly = ftrace_stub;
 ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
 ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static struct ftrace_ops global_ops;
+static struct ftrace_ops control_ops;
 
 static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
@@ -168,6 +172,32 @@ static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
 }
 #endif
 
+static void control_ops_disable_all(struct ftrace_ops *ops)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		*per_cpu_ptr(ops->disabled, cpu) = 1;
+}
+
+static int control_ops_alloc(struct ftrace_ops *ops)
+{
+	int __percpu *disabled;
+
+	disabled = alloc_percpu(int);
+	if (!disabled)
+		return -ENOMEM;
+
+	ops->disabled = disabled;
+	control_ops_disable_all(ops);
+	return 0;
+}
+
+static void control_ops_free(struct ftrace_ops *ops)
+{
+	free_percpu(ops->disabled);
+}
+
 static void update_global_ops(void)
 {
 	ftrace_func_t func;
@@ -259,6 +289,26 @@ static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
 	return 0;
 }
 
+static void add_ftrace_list_ops(struct ftrace_ops **list,
+				struct ftrace_ops *main_ops,
+				struct ftrace_ops *ops)
+{
+	int first = *list == &ftrace_list_end;
+	add_ftrace_ops(list, ops);
+	if (first)
+		add_ftrace_ops(&ftrace_ops_list, main_ops);
+}
+
+static int remove_ftrace_list_ops(struct ftrace_ops **list,
+				  struct ftrace_ops *main_ops,
+				  struct ftrace_ops *ops)
+{
+	int ret = remove_ftrace_ops(list, ops);
+	if (!ret && *list == &ftrace_list_end)
+		ret = remove_ftrace_ops(&ftrace_ops_list, main_ops);
+	return ret;
+}
+
 static int __register_ftrace_function(struct ftrace_ops *ops)
 {
 	if (ftrace_disabled)
@@ -270,15 +320,20 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
 		return -EBUSY;
 
+	/* We don't support both control and global flags set. */
+	if ((ops->flags & FL_GLOBAL_CONTROL_MASK) == FL_GLOBAL_CONTROL_MASK)
+		return -EINVAL;
+
 	if (!core_kernel_data((unsigned long)ops))
 		ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		int first = ftrace_global_list == &ftrace_list_end;
-		add_ftrace_ops(&ftrace_global_list, ops);
+		add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);
 		ops->flags |= FTRACE_OPS_FL_ENABLED;
-		if (first)
-			add_ftrace_ops(&ftrace_ops_list, &global_ops);
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		if (control_ops_alloc(ops))
+			return -ENOMEM;
+		add_ftrace_list_ops(&ftrace_control_list, &control_ops, ops);
 	} else
 		add_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -302,11 +357,23 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
 		return -EINVAL;
 
 	if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
-		ret = remove_ftrace_ops(&ftrace_global_list, ops);
-		if (!ret && ftrace_global_list == &ftrace_list_end)
-			ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+		ret = remove_ftrace_list_ops(&ftrace_global_list,
+					     &global_ops, ops);
 		if (!ret)
 			ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
+		ret = remove_ftrace_list_ops(&ftrace_control_list,
+					     &control_ops, ops);
+		if (!ret) {
+			/*
+			 * The ftrace_ops is now removed from the list,
+			 * so there'll be no new users. We must ensure
+			 * all current users are done before we free
+			 * the control data.
+			 */
+			synchronize_sched();
+			control_ops_free(ops);
+		}
 	} else
 		ret = remove_ftrace_ops(&ftrace_ops_list, ops);
 
@@ -3874,6 +3941,36 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 static void
+ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_ops *op;
+
+	if (unlikely(trace_recursion_test(TRACE_CONTROL_BIT)))
+		return;
+
+	/*
+	 * Some of the ops may be dynamically allocated,
+	 * they must be freed after a synchronize_sched().
+	 */
+	preempt_disable_notrace();
+	trace_recursion_set(TRACE_CONTROL_BIT);
+	op = rcu_dereference_raw(ftrace_control_list);
+	while (op != &ftrace_list_end) {
+		if (!ftrace_function_local_disabled(op) &&
+		    ftrace_ops_test(op, ip))
+			op->func(ip, parent_ip);
+
+		op = rcu_dereference_raw(op->next);
+	};
+	trace_recursion_clear(TRACE_CONTROL_BIT);
+	preempt_enable_notrace();
+}
+
+static struct ftrace_ops control_ops = {
+	.func = ftrace_ops_control_func,
+};
+
+static void
 ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
 {
 	struct ftrace_ops *op;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b93ecba..55c6ea0 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ struct tracer {
 /* for function tracing recursion */
 #define TRACE_INTERNAL_BIT		(1<<11)
 #define TRACE_GLOBAL_BIT		(1<<12)
+#define TRACE_CONTROL_BIT		(1<<13)
+
 /*
  * Abuse of the trace_recursion.
  * As we need a way to maintain state if we are tracing the function
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 3/8] ftrace, perf: Add open/close tracepoint perf registration actions
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
  2012-02-07 19:44               ` [PATCH 1/8] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
  2012-02-07 19:44               ` [PATCH 2/8] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
@ 2012-02-07 19:44               ` Jiri Olsa
  2012-02-07 19:44               ` [PATCH 4/8] ftrace, perf: Add add/del " Jiri Olsa
                                 ` (6 subsequent siblings)
  9 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-07 19:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
register/unregister from open/close actions.

The register/unregister actions are invoked for the first/last
tracepoint user when opening/closing the event.

The open/close actions are invoked for each tracepoint user when
opening/closing the event.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/ftrace_event.h    |    6 +-
 kernel/trace/trace_event_perf.c |  116 +++++++++++++++++++++++++--------------
 kernel/trace/trace_events.c     |   10 ++-
 kernel/trace/trace_kprobe.c     |    6 ++-
 kernel/trace/trace_syscalls.c   |   14 +++-
 5 files changed, 101 insertions(+), 51 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index c3da42d..195e360 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -146,6 +146,8 @@ enum trace_reg {
 	TRACE_REG_UNREGISTER,
 	TRACE_REG_PERF_REGISTER,
 	TRACE_REG_PERF_UNREGISTER,
+	TRACE_REG_PERF_OPEN,
+	TRACE_REG_PERF_CLOSE,
 };
 
 struct ftrace_event_call;
@@ -157,7 +159,7 @@ struct ftrace_event_class {
 	void			*perf_probe;
 #endif
 	int			(*reg)(struct ftrace_event_call *event,
-				       enum trace_reg type);
+				       enum trace_reg type, void *data);
 	int			(*define_fields)(struct ftrace_event_call *);
 	struct list_head	*(*get_fields)(struct ftrace_event_call *);
 	struct list_head	fields;
@@ -165,7 +167,7 @@ struct ftrace_event_class {
 };
 
 extern int ftrace_event_reg(struct ftrace_event_call *event,
-			    enum trace_reg type);
+			    enum trace_reg type, void *data);
 
 enum {
 	TRACE_EVENT_FL_ENABLED_BIT,
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 19a359d..0cfcc37 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -44,23 +44,17 @@ static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 	return 0;
 }
 
-static int perf_trace_event_init(struct ftrace_event_call *tp_event,
-				 struct perf_event *p_event)
+static int perf_trace_event_reg(struct ftrace_event_call *tp_event,
+				struct perf_event *p_event)
 {
 	struct hlist_head __percpu *list;
-	int ret;
+	int ret = -ENOMEM;
 	int cpu;
 
-	ret = perf_trace_event_perm(tp_event, p_event);
-	if (ret)
-		return ret;
-
 	p_event->tp_event = tp_event;
 	if (tp_event->perf_refcount++ > 0)
 		return 0;
 
-	ret = -ENOMEM;
-
 	list = alloc_percpu(struct hlist_head);
 	if (!list)
 		goto fail;
@@ -83,7 +77,7 @@ static int perf_trace_event_init(struct ftrace_event_call *tp_event,
 		}
 	}
 
-	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER);
+	ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER, NULL);
 	if (ret)
 		goto fail;
 
@@ -108,6 +102,69 @@ fail:
 	return ret;
 }
 
+static void perf_trace_event_unreg(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	int i;
+
+	if (--tp_event->perf_refcount > 0)
+		goto out;
+
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER, NULL);
+
+	/*
+	 * Ensure our callback won't be called anymore. The buffers
+	 * will be freed after that.
+	 */
+	tracepoint_synchronize_unregister();
+
+	free_percpu(tp_event->perf_events);
+	tp_event->perf_events = NULL;
+
+	if (!--total_ref_count) {
+		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
+			free_percpu(perf_trace_buf[i]);
+			perf_trace_buf[i] = NULL;
+		}
+	}
+out:
+	module_put(tp_event->mod);
+}
+
+static int perf_trace_event_open(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_OPEN, p_event);
+}
+
+static void perf_trace_event_close(struct perf_event *p_event)
+{
+	struct ftrace_event_call *tp_event = p_event->tp_event;
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
+}
+
+static int perf_trace_event_init(struct ftrace_event_call *tp_event,
+				 struct perf_event *p_event)
+{
+	int ret;
+
+	ret = perf_trace_event_perm(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_reg(tp_event, p_event);
+	if (ret)
+		return ret;
+
+	ret = perf_trace_event_open(p_event);
+	if (ret) {
+		perf_trace_event_unreg(p_event);
+		return ret;
+	}
+
+	return 0;
+}
+
 int perf_trace_init(struct perf_event *p_event)
 {
 	struct ftrace_event_call *tp_event;
@@ -130,6 +187,14 @@ int perf_trace_init(struct perf_event *p_event)
 	return ret;
 }
 
+void perf_trace_destroy(struct perf_event *p_event)
+{
+	mutex_lock(&event_mutex);
+	perf_trace_event_close(p_event);
+	perf_trace_event_unreg(p_event);
+	mutex_unlock(&event_mutex);
+}
+
 int perf_trace_add(struct perf_event *p_event, int flags)
 {
 	struct ftrace_event_call *tp_event = p_event->tp_event;
@@ -154,37 +219,6 @@ void perf_trace_del(struct perf_event *p_event, int flags)
 	hlist_del_rcu(&p_event->hlist_entry);
 }
 
-void perf_trace_destroy(struct perf_event *p_event)
-{
-	struct ftrace_event_call *tp_event = p_event->tp_event;
-	int i;
-
-	mutex_lock(&event_mutex);
-	if (--tp_event->perf_refcount > 0)
-		goto out;
-
-	tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER);
-
-	/*
-	 * Ensure our callback won't be called anymore. The buffers
-	 * will be freed after that.
-	 */
-	tracepoint_synchronize_unregister();
-
-	free_percpu(tp_event->perf_events);
-	tp_event->perf_events = NULL;
-
-	if (!--total_ref_count) {
-		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
-			free_percpu(perf_trace_buf[i]);
-			perf_trace_buf[i] = NULL;
-		}
-	}
-out:
-	module_put(tp_event->mod);
-	mutex_unlock(&event_mutex);
-}
-
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 				       struct pt_regs *regs, int *rctxp)
 {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c212a7f..5138fea 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -147,7 +147,8 @@ int trace_event_raw_init(struct ftrace_event_call *call)
 }
 EXPORT_SYMBOL_GPL(trace_event_raw_init);
 
-int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
+int ftrace_event_reg(struct ftrace_event_call *call,
+		     enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -170,6 +171,9 @@ int ftrace_event_reg(struct ftrace_event_call *call, enum trace_reg type)
 					    call->class->perf_probe,
 					    call);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
@@ -209,7 +213,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_stop_cmdline_record();
 				call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			call->class->reg(call, TRACE_REG_UNREGISTER);
+			call->class->reg(call, TRACE_REG_UNREGISTER, NULL);
 		}
 		break;
 	case 1:
@@ -218,7 +222,7 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
 				tracing_start_cmdline_record();
 				call->flags |= TRACE_EVENT_FL_RECORDED_CMD;
 			}
-			ret = call->class->reg(call, TRACE_REG_REGISTER);
+			ret = call->class->reg(call, TRACE_REG_REGISTER, NULL);
 			if (ret) {
 				tracing_stop_cmdline_record();
 				pr_info("event trace: Could not enable event "
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 00d527c..5667f89 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1892,7 +1892,8 @@ static __kprobes void kretprobe_perf_func(struct kretprobe_instance *ri,
 #endif	/* CONFIG_PERF_EVENTS */
 
 static __kprobes
-int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
+int kprobe_register(struct ftrace_event_call *event,
+		    enum trace_reg type, void *data)
 {
 	struct trace_probe *tp = (struct trace_probe *)event->data;
 
@@ -1909,6 +1910,9 @@ int kprobe_register(struct ftrace_event_call *event, enum trace_reg type)
 	case TRACE_REG_PERF_UNREGISTER:
 		disable_trace_probe(tp, TP_FLAG_PROFILE);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index cb65454..6916b0d 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -17,9 +17,9 @@ static DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
 static DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type);
+				 enum trace_reg type, void *data);
 
 static int syscall_enter_define_fields(struct ftrace_event_call *call);
 static int syscall_exit_define_fields(struct ftrace_event_call *call);
@@ -649,7 +649,7 @@ void perf_sysexit_disable(struct ftrace_event_call *call)
 #endif /* CONFIG_PERF_EVENTS */
 
 static int syscall_enter_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -664,13 +664,16 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysenter_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
 }
 
 static int syscall_exit_register(struct ftrace_event_call *event,
-				 enum trace_reg type)
+				 enum trace_reg type, void *data)
 {
 	switch (type) {
 	case TRACE_REG_REGISTER:
@@ -685,6 +688,9 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 	case TRACE_REG_PERF_UNREGISTER:
 		perf_sysexit_disable(event);
 		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+		return 0;
 #endif
 	}
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 4/8] ftrace, perf: Add add/del tracepoint perf registration actions
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                                 ` (2 preceding siblings ...)
  2012-02-07 19:44               ` [PATCH 3/8] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
@ 2012-02-07 19:44               ` Jiri Olsa
  2012-02-07 19:44               ` [PATCH 5/8] ftrace: Add FTRACE_ENTRY_REG macro to allow event registration Jiri Olsa
                                 ` (5 subsequent siblings)
  9 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-07 19:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
perf event schedule in/out actions.

The add action is invoked for when the perf event is scheduled in,
while the del action is invoked when the event is scheduled out.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/ftrace_event.h    |    2 ++
 kernel/trace/trace_event_perf.c |    4 +++-
 kernel/trace/trace_events.c     |    2 ++
 kernel/trace/trace_kprobe.c     |    2 ++
 kernel/trace/trace_syscalls.c   |    4 ++++
 5 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 195e360..2bf677c 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -148,6 +148,8 @@ enum trace_reg {
 	TRACE_REG_PERF_UNREGISTER,
 	TRACE_REG_PERF_OPEN,
 	TRACE_REG_PERF_CLOSE,
+	TRACE_REG_PERF_ADD,
+	TRACE_REG_PERF_DEL,
 };
 
 struct ftrace_event_call;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 0cfcc37..d72af0b 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -211,12 +211,14 @@ int perf_trace_add(struct perf_event *p_event, int flags)
 	list = this_cpu_ptr(pcpu_list);
 	hlist_add_head_rcu(&p_event->hlist_entry, list);
 
-	return 0;
+	return tp_event->class->reg(tp_event, TRACE_REG_PERF_ADD, p_event);
 }
 
 void perf_trace_del(struct perf_event *p_event, int flags)
 {
+	struct ftrace_event_call *tp_event = p_event->tp_event;
 	hlist_del_rcu(&p_event->hlist_entry);
+	tp_event->class->reg(tp_event, TRACE_REG_PERF_DEL, p_event);
 }
 
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 5138fea..079a93a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -173,6 +173,8 @@ int ftrace_event_reg(struct ftrace_event_call *call,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5667f89..580a05e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1912,6 +1912,8 @@ int kprobe_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 6916b0d..dbdd804 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -666,6 +666,8 @@ static int syscall_enter_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
@@ -690,6 +692,8 @@ static int syscall_exit_register(struct ftrace_event_call *event,
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
 		return 0;
 #endif
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 5/8] ftrace: Add FTRACE_ENTRY_REG macro to allow event registration
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                                 ` (3 preceding siblings ...)
  2012-02-07 19:44               ` [PATCH 4/8] ftrace, perf: Add add/del " Jiri Olsa
@ 2012-02-07 19:44               ` Jiri Olsa
  2012-02-07 19:44               ` [PATCH 6/8] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
                                 ` (4 subsequent siblings)
  9 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-07 19:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding FTRACE_ENTRY_REG macro so particular ftrace entries
could specify registration function and thus become accesible
via perf.

This will be used in upcomming patch for function trace.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/trace/trace.h        |    4 ++++
 kernel/trace/trace_export.c |   18 ++++++++++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55c6ea0..638476a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -68,6 +68,10 @@ enum trace_type {
 #undef FTRACE_ENTRY_DUP
 #define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk)
 
+#undef FTRACE_ENTRY_REG
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
+
 #include "trace_entries.h"
 
 /*
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index bbeec31..f74de86 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -18,6 +18,14 @@
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM	ftrace
 
+/*
+ * The FTRACE_ENTRY_REG macro allows ftrace entry to define register
+ * function and thus become accesible via perf.
+ */
+#undef FTRACE_ENTRY_REG
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
+
 /* not needed for this file */
 #undef __field_struct
 #define __field_struct(type, item)
@@ -152,13 +160,14 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 #undef F_printk
 #define F_printk(fmt, args...) #fmt ", "  __stringify(args)
 
-#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
+#undef FTRACE_ENTRY_REG
+#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn)\
 									\
 struct ftrace_event_class event_class_ftrace_##call = {			\
 	.system			= __stringify(TRACE_SYSTEM),		\
 	.define_fields		= ftrace_define_fields_##call,		\
 	.fields			= LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
+	.reg			= regfn,				\
 };									\
 									\
 struct ftrace_event_call __used event_##call = {			\
@@ -170,4 +179,9 @@ struct ftrace_event_call __used event_##call = {			\
 struct ftrace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
+#undef FTRACE_ENTRY
+#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
+	FTRACE_ENTRY_REG(call, struct_name, etype,			\
+			 PARAMS(tstruct), PARAMS(print), NULL)
+
 #include "trace_entries.h"
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 6/8] ftrace, perf: Add support to use function tracepoint in perf
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                                 ` (4 preceding siblings ...)
  2012-02-07 19:44               ` [PATCH 5/8] ftrace: Add FTRACE_ENTRY_REG macro to allow event registration Jiri Olsa
@ 2012-02-07 19:44               ` Jiri Olsa
  2012-02-07 19:44               ` [PATCH 7/8] ftrace: Allow to specify filter field type for ftrace events Jiri Olsa
                                 ` (3 subsequent siblings)
  9 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-07 19:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding perf registration support for the ftrace function event,
so it is now possible to register it via perf interface.

The perf_event struct statically contains ftrace_ops as a handle
for function tracer. The function tracer is registered/unregistered
in open/close actions.

To be efficient, we enable/disable ftrace_ops each time the traced
process is scheduled in/out (via TRACE_REG_PERF_(ADD|DELL) handlers).
This way tracing is enabled only when the process is running.
Intentionally using this way instead of the event's hw state
PERF_HES_STOPPED, which would not disable the ftrace_ops.

It is now possible to use function trace within perf commands
like:

  perf record -e ftrace:function ls
  perf stat -e ftrace:function ls

Allowed only for root.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h      |    3 +
 kernel/trace/trace.h            |    7 +++
 kernel/trace/trace_entries.h    |    6 ++-
 kernel/trace/trace_event_perf.c |   84 +++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_export.c     |    5 ++
 5 files changed, 103 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 412b790..92a056f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -859,6 +859,9 @@ struct perf_event {
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
 	struct event_filter		*filter;
+#ifdef CONFIG_FUNCTION_TRACER
+	struct ftrace_ops               ftrace_ops;
+#endif
 #endif
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 638476a..61bc283 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -595,6 +595,8 @@ static inline int ftrace_trace_task(struct task_struct *task)
 static inline int ftrace_is_dead(void) { return 0; }
 #endif
 
+int ftrace_event_is_function(struct ftrace_event_call *call);
+
 /*
  * struct trace_parser - servers for reading the user input separated by spaces
  * @cont: set if the input is not complete - no final space char was found
@@ -832,4 +834,9 @@ extern const char *__stop___trace_bprintk_fmt[];
 	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
 #include "trace_entries.h"
 
+#ifdef CONFIG_PERF_EVENTS
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data);
+#endif
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index 9336590..47db7ed 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -55,7 +55,7 @@
 /*
  * Function trace entry - function address and parent function address:
  */
-FTRACE_ENTRY(function, ftrace_entry,
+FTRACE_ENTRY_REG(function, ftrace_entry,
 
 	TRACE_FN,
 
@@ -64,7 +64,9 @@ FTRACE_ENTRY(function, ftrace_entry,
 		__field(	unsigned long,	parent_ip	)
 	),
 
-	F_printk(" %lx <-- %lx", __entry->ip, __entry->parent_ip)
+	F_printk(" %lx <-- %lx", __entry->ip, __entry->parent_ip),
+
+	perf_ftrace_event_register
 );
 
 /* Function call entry */
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index d72af0b..32f8806 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -24,6 +24,11 @@ static int	total_ref_count;
 static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
 				 struct perf_event *p_event)
 {
+	/* The ftrace function trace is allowed only for root. */
+	if (ftrace_event_is_function(tp_event) &&
+	    perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	/* No tracing, just counting, so no obvious leak */
 	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
 		return 0;
@@ -250,3 +255,82 @@ __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 	return raw_data;
 }
 EXPORT_SYMBOL_GPL(perf_trace_buf_prepare);
+
+static void
+perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip)
+{
+	struct ftrace_entry *entry;
+	struct hlist_head *head;
+	struct pt_regs regs;
+	int rctx;
+
+#define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
+		    sizeof(u64)) - sizeof(u32))
+
+	BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE);
+
+	perf_fetch_caller_regs(&regs);
+
+	entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);
+	if (!entry)
+		return;
+
+	entry->ip = ip;
+	entry->parent_ip = parent_ip;
+
+	head = this_cpu_ptr(event_function.perf_events);
+	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
+			      1, &regs, head);
+
+#undef ENTRY_SIZE
+}
+
+static int perf_ftrace_function_register(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+
+	ops->flags |= FTRACE_OPS_FL_CONTROL;
+	ops->func = perf_ftrace_function_call;
+	return register_ftrace_function(ops);
+}
+
+static int perf_ftrace_function_unregister(struct perf_event *event)
+{
+	struct ftrace_ops *ops = &event->ftrace_ops;
+	return unregister_ftrace_function(ops);
+}
+
+static void perf_ftrace_function_enable(struct perf_event *event)
+{
+	ftrace_function_local_enable(&event->ftrace_ops);
+}
+
+static void perf_ftrace_function_disable(struct perf_event *event)
+{
+	ftrace_function_local_disable(&event->ftrace_ops);
+}
+
+int perf_ftrace_event_register(struct ftrace_event_call *call,
+			       enum trace_reg type, void *data)
+{
+	switch (type) {
+	case TRACE_REG_REGISTER:
+	case TRACE_REG_UNREGISTER:
+		break;
+	case TRACE_REG_PERF_REGISTER:
+	case TRACE_REG_PERF_UNREGISTER:
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+		return perf_ftrace_function_register(data);
+	case TRACE_REG_PERF_CLOSE:
+		return perf_ftrace_function_unregister(data);
+	case TRACE_REG_PERF_ADD:
+		perf_ftrace_function_enable(data);
+		return 0;
+	case TRACE_REG_PERF_DEL:
+		perf_ftrace_function_disable(data);
+		return 0;
+	}
+
+	return -EINVAL;
+}
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index f74de86..a3dbee6 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -184,4 +184,9 @@ __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 	FTRACE_ENTRY_REG(call, struct_name, etype,			\
 			 PARAMS(tstruct), PARAMS(print), NULL)
 
+int ftrace_event_is_function(struct ftrace_event_call *call)
+{
+	return call == &event_function;
+}
+
 #include "trace_entries.h"
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 7/8] ftrace: Allow to specify filter field type for ftrace events
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                                 ` (5 preceding siblings ...)
  2012-02-07 19:44               ` [PATCH 6/8] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
@ 2012-02-07 19:44               ` Jiri Olsa
  2012-02-07 19:44               ` [PATCH 8/8] ftrace, perf: Add filter support for function trace event Jiri Olsa
                                 ` (2 subsequent siblings)
  9 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-07 19:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding FILTER_TRACE_FN event field type for function tracepoint
event, so it can be properly recognized within filtering code.

Currently all fields of ftrace subsystem events share the common
field type FILTER_OTHER. Since the function trace fields need
special care within the filtering code we need to recognize it
properly, hence adding the FILTER_TRACE_FN event type.

Adding filter parameter to the FTRACE_ENTRY macro, to specify the
filter field type for the event.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace_event.h       |    1 +
 kernel/trace/trace.h               |   23 +++++++++-------
 kernel/trace/trace_entries.h       |   48 ++++++++++++++++++++++++---------
 kernel/trace/trace_events_filter.c |    7 ++++-
 kernel/trace/trace_export.c        |   51 +++++++++++++++++++----------------
 5 files changed, 83 insertions(+), 47 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 2bf677c..dd478fc 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -245,6 +245,7 @@ enum {
 	FILTER_STATIC_STRING,
 	FILTER_DYN_STRING,
 	FILTER_PTR_STRING,
+	FILTER_TRACE_FN,
 };
 
 #define EVENT_STORAGE_SIZE 128
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 61bc283..6ed56cc 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -56,21 +56,23 @@ enum trace_type {
 #define F_STRUCT(args...)		args
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)	\
-	struct struct_name {					\
-		struct trace_entry	ent;			\
-		tstruct						\
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
+	struct struct_name {						\
+		struct trace_entry	ent;				\
+		tstruct							\
 	}
 
 #undef TP_ARGS
 #define TP_ARGS(args...)	args
 
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk)
+#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk, filter)
 
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print,	\
+			 filter, regfn) \
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
+		     filter)
 
 #include "trace_entries.h"
 
@@ -826,12 +828,13 @@ extern const char *__start___trace_bprintk_fmt[];
 extern const char *__stop___trace_bprintk_fmt[];
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, id, tstruct, print)		\
+#define FTRACE_ENTRY(call, struct_name, id, tstruct, print, filter)	\
 	extern struct ftrace_event_call					\
 	__attribute__((__aligned__(4))) event_##call;
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print)		\
-	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
+#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print, filter)	\
+	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
+		     filter)
 #include "trace_entries.h"
 
 #ifdef CONFIG_PERF_EVENTS
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index 47db7ed..d91eb05 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -66,6 +66,8 @@ FTRACE_ENTRY_REG(function, ftrace_entry,
 
 	F_printk(" %lx <-- %lx", __entry->ip, __entry->parent_ip),
 
+	FILTER_TRACE_FN,
+
 	perf_ftrace_event_register
 );
 
@@ -80,7 +82,9 @@ FTRACE_ENTRY(funcgraph_entry, ftrace_graph_ent_entry,
 		__field_desc(	int,		graph_ent,	depth		)
 	),
 
-	F_printk("--> %lx (%d)", __entry->func, __entry->depth)
+	F_printk("--> %lx (%d)", __entry->func, __entry->depth),
+
+	FILTER_OTHER
 );
 
 /* Function return entry */
@@ -100,7 +104,9 @@ FTRACE_ENTRY(funcgraph_exit, ftrace_graph_ret_entry,
 	F_printk("<-- %lx (%d) (start: %llx  end: %llx) over: %d",
 		 __entry->func, __entry->depth,
 		 __entry->calltime, __entry->rettime,
-		 __entry->depth)
+		 __entry->depth),
+
+	FILTER_OTHER
 );
 
 /*
@@ -129,8 +135,9 @@ FTRACE_ENTRY(context_switch, ctx_switch_entry,
 	F_printk("%u:%u:%u  ==> %u:%u:%u [%03u]",
 		 __entry->prev_pid, __entry->prev_prio, __entry->prev_state,
 		 __entry->next_pid, __entry->next_prio, __entry->next_state,
-		 __entry->next_cpu
-		)
+		 __entry->next_cpu),
+
+	FILTER_OTHER
 );
 
 /*
@@ -148,8 +155,9 @@ FTRACE_ENTRY_DUP(wakeup, ctx_switch_entry,
 	F_printk("%u:%u:%u  ==+ %u:%u:%u [%03u]",
 		 __entry->prev_pid, __entry->prev_prio, __entry->prev_state,
 		 __entry->next_pid, __entry->next_prio, __entry->next_state,
-		 __entry->next_cpu
-		)
+		 __entry->next_cpu),
+
+	FILTER_OTHER
 );
 
 /*
@@ -171,7 +179,9 @@ FTRACE_ENTRY(kernel_stack, stack_entry,
 		 "\t=> (%08lx)\n\t=> (%08lx)\n\t=> (%08lx)\n\t=> (%08lx)\n",
 		 __entry->caller[0], __entry->caller[1], __entry->caller[2],
 		 __entry->caller[3], __entry->caller[4], __entry->caller[5],
-		 __entry->caller[6], __entry->caller[7])
+		 __entry->caller[6], __entry->caller[7]),
+
+	FILTER_OTHER
 );
 
 FTRACE_ENTRY(user_stack, userstack_entry,
@@ -187,7 +197,9 @@ FTRACE_ENTRY(user_stack, userstack_entry,
 		 "\t=> (%08lx)\n\t=> (%08lx)\n\t=> (%08lx)\n\t=> (%08lx)\n",
 		 __entry->caller[0], __entry->caller[1], __entry->caller[2],
 		 __entry->caller[3], __entry->caller[4], __entry->caller[5],
-		 __entry->caller[6], __entry->caller[7])
+		 __entry->caller[6], __entry->caller[7]),
+
+	FILTER_OTHER
 );
 
 /*
@@ -204,7 +216,9 @@ FTRACE_ENTRY(bprint, bprint_entry,
 	),
 
 	F_printk("%08lx fmt:%p",
-		 __entry->ip, __entry->fmt)
+		 __entry->ip, __entry->fmt),
+
+	FILTER_OTHER
 );
 
 FTRACE_ENTRY(print, print_entry,
@@ -217,7 +231,9 @@ FTRACE_ENTRY(print, print_entry,
 	),
 
 	F_printk("%08lx %s",
-		 __entry->ip, __entry->buf)
+		 __entry->ip, __entry->buf),
+
+	FILTER_OTHER
 );
 
 FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
@@ -236,7 +252,9 @@ FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
 
 	F_printk("%lx %lx %lx %d %x %x",
 		 (unsigned long)__entry->phys, __entry->value, __entry->pc,
-		 __entry->map_id, __entry->opcode, __entry->width)
+		 __entry->map_id, __entry->opcode, __entry->width),
+
+	FILTER_OTHER
 );
 
 FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
@@ -254,7 +272,9 @@ FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
 
 	F_printk("%lx %lx %lx %d %x",
 		 (unsigned long)__entry->phys, __entry->virt, __entry->len,
-		 __entry->map_id, __entry->opcode)
+		 __entry->map_id, __entry->opcode),
+
+	FILTER_OTHER
 );
 
 
@@ -274,6 +294,8 @@ FTRACE_ENTRY(branch, trace_branch,
 
 	F_printk("%u:%s:%s (%u)",
 		 __entry->line,
-		 __entry->func, __entry->file, __entry->correct)
+		 __entry->func, __entry->file, __entry->correct),
+
+	FILTER_OTHER
 );
 
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 24aee71..eb04a2a 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -900,6 +900,11 @@ int filter_assign_type(const char *type)
 	return FILTER_OTHER;
 }
 
+static bool is_function_field(struct ftrace_event_field *field)
+{
+	return field->filter_type == FILTER_TRACE_FN;
+}
+
 static bool is_string_field(struct ftrace_event_field *field)
 {
 	return field->filter_type == FILTER_DYN_STRING ||
@@ -987,7 +992,7 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else {
+	} else if (!is_function_field(field)) {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index a3dbee6..7b46c9b 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -23,8 +23,10 @@
  * function and thus become accesible via perf.
  */
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, \
+			 filter, regfn) \
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
+		     filter)
 
 /* not needed for this file */
 #undef __field_struct
@@ -52,21 +54,22 @@
 #define F_printk(fmt, args...) fmt, args
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)	\
-struct ____ftrace_##name {					\
-	tstruct							\
-};								\
-static void __always_unused ____ftrace_check_##name(void)	\
-{								\
-	struct ____ftrace_##name *__entry = NULL;		\
-								\
-	/* force compile-time check on F_printk() */		\
-	printk(print);						\
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
+struct ____ftrace_##name {						\
+	tstruct								\
+};									\
+static void __always_unused ____ftrace_check_##name(void)		\
+{									\
+	struct ____ftrace_##name *__entry = NULL;			\
+									\
+	/* force compile-time check on F_printk() */			\
+	printk(print);							\
 }
 
 #undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print)	\
-	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))
+#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print, filter)	\
+	FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
+		     filter)
 
 #include "trace_entries.h"
 
@@ -75,7 +78,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -85,7 +88,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -99,7 +102,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 		ret = trace_define_field(event_call, event_storage, #item, \
 				 offsetof(typeof(field), item),		\
 				 sizeof(field.item),			\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 		mutex_unlock(&event_storage_mutex);			\
 		if (ret)						\
 			return ret;					\
@@ -112,7 +115,7 @@ static void __always_unused ____ftrace_check_##name(void)	\
 				 offsetof(typeof(field),		\
 					  container.item),		\
 				 sizeof(field.container.item),		\
-				 is_signed_type(type), FILTER_OTHER);	\
+				 is_signed_type(type), filter_type);	\
 	if (ret)							\
 		return ret;
 
@@ -120,17 +123,18 @@ static void __always_unused ____ftrace_check_##name(void)	\
 #define __dynamic_array(type, item)					\
 	ret = trace_define_field(event_call, #type, #item,		\
 				 offsetof(typeof(field), item),		\
-				 0, is_signed_type(type), FILTER_OTHER);\
+				 0, is_signed_type(type), filter_type);\
 	if (ret)							\
 		return ret;
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)	\
 int									\
 ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 {									\
 	struct struct_name field;					\
 	int ret;							\
+	int filter_type = filter;					\
 									\
 	tstruct;							\
 									\
@@ -161,7 +165,8 @@ ftrace_define_fields_##name(struct ftrace_event_call *event_call)	\
 #define F_printk(fmt, args...) #fmt ", "  __stringify(args)
 
 #undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn)\
+#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, filter,\
+			 regfn)						\
 									\
 struct ftrace_event_class event_class_ftrace_##call = {			\
 	.system			= __stringify(TRACE_SYSTEM),		\
@@ -180,9 +185,9 @@ struct ftrace_event_call __used						\
 __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
 
 #undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print)		\
+#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print, filter)	\
 	FTRACE_ENTRY_REG(call, struct_name, etype,			\
-			 PARAMS(tstruct), PARAMS(print), NULL)
+			 PARAMS(tstruct), PARAMS(print), filter, NULL)
 
 int ftrace_event_is_function(struct ftrace_event_call *call)
 {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 8/8] ftrace, perf: Add filter support for function trace event
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                                 ` (6 preceding siblings ...)
  2012-02-07 19:44               ` [PATCH 7/8] ftrace: Allow to specify filter field type for ftrace events Jiri Olsa
@ 2012-02-07 19:44               ` Jiri Olsa
  2012-02-10 13:27               ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Steven Rostedt
  2012-02-13 18:02               ` Steven Rostedt
  9 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-07 19:44 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

  ip == f1[, ]f2 ... || ip != f3[, ]f4 ...

with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"', e.g.:

  perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
  perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
  perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h             |    1 +
 kernel/trace/ftrace.c              |    6 ++
 kernel/trace/trace.h               |    2 -
 kernel/trace/trace_event_perf.c    |    4 +-
 kernel/trace/trace_events_filter.c |  157 ++++++++++++++++++++++++++++++++++--
 5 files changed, 161 insertions(+), 9 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 64a309d..76f6c49 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -250,6 +250,7 @@ int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
+void ftrace_free_filter(struct ftrace_ops *ops);
 
 int register_ftrace_command(struct ftrace_func_command *cmd);
 int unregister_ftrace_command(struct ftrace_func_command *cmd);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c8d2af2..239b94a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1186,6 +1186,12 @@ static void free_ftrace_hash_rcu(struct ftrace_hash *hash)
 	call_rcu_sched(&hash->rcu, __free_ftrace_hash_rcu);
 }
 
+void ftrace_free_filter(struct ftrace_ops *ops)
+{
+	free_ftrace_hash(ops->filter_hash);
+	free_ftrace_hash(ops->notrace_hash);
+}
+
 static struct ftrace_hash *alloc_ftrace_hash(int size_bits)
 {
 	struct ftrace_hash *hash;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 6ed56cc..a338bf2 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -776,9 +776,7 @@ struct filter_pred {
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	struct ftrace_event_field *field;
-#endif
 	int 			offset;
 	int 			not;
 	int 			op;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 32f8806..f8e5739 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -297,7 +297,9 @@ static int perf_ftrace_function_register(struct perf_event *event)
 static int perf_ftrace_function_unregister(struct perf_event *event)
 {
 	struct ftrace_ops *ops = &event->ftrace_ops;
-	return unregister_ftrace_function(ops);
+	int ret = unregister_ftrace_function(ops);
+	ftrace_free_filter(ops);
+	return ret;
 }
 
 static void perf_ftrace_function_enable(struct perf_event *event)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index eb04a2a..54f4ea1 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -81,6 +81,7 @@ enum {
 	FILT_ERR_TOO_MANY_PREDS,
 	FILT_ERR_MISSING_FIELD,
 	FILT_ERR_INVALID_FILTER,
+	FILT_ERR_IP_FIELD_ONLY,
 };
 
 static char *err_text[] = {
@@ -96,6 +97,7 @@ static char *err_text[] = {
 	"Too many terms in predicate expression",
 	"Missing field name and/or value",
 	"Meaningless filter expression",
+	"Only 'ip' field is supported for function trace",
 };
 
 struct opstack_op {
@@ -992,7 +994,12 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else if (!is_function_field(field)) {
+	} else if (is_function_field(field)) {
+		if (strcmp(field->name, "ip")) {
+			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
+			return -EINVAL;
+		}
+	} else {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
@@ -1339,10 +1346,7 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
 
 	strcpy(pred.regex.pattern, operand2);
 	pred.regex.len = strlen(pred.regex.pattern);
-
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	pred.field = field;
-#endif
 	return init_pred(ps, field, &pred) ? NULL : &pred;
 }
 
@@ -1955,6 +1959,140 @@ void ftrace_profile_free_filter(struct perf_event *event)
 	__free_filter(filter);
 }
 
+struct function_filter_data {
+	struct ftrace_ops *ops;
+	int first_filter;
+	int first_notrace;
+};
+
+static char **
+ftrace_function_filter_re(char *buf, int len, int *count)
+{
+	char *str, *sep, **re;
+
+	str = kstrndup(buf, len, GFP_KERNEL);
+	if (!str)
+		return NULL;
+
+	/*
+	 * The argv_split function takes white space
+	 * as a separator, so convert ',' into spaces.
+	 */
+	while ((sep = strchr(str, ',')))
+		*sep = ' ';
+
+	re = argv_split(GFP_KERNEL, str, count);
+	kfree(str);
+	return re;
+}
+
+static int ftrace_function_set_regexp(struct ftrace_ops *ops, int filter,
+				      int reset, char *re, int len)
+{
+	int ret;
+
+	if (filter)
+		ret = ftrace_set_filter(ops, re, len, reset);
+	else
+		ret = ftrace_set_notrace(ops, re, len, reset);
+
+	return ret;
+}
+
+static int __ftrace_function_set_filter(int filter, char *buf, int len,
+					struct function_filter_data *data)
+{
+	int i, re_cnt, ret;
+	int *reset;
+	char **re;
+
+	reset = filter ? &data->first_filter : &data->first_notrace;
+
+	/*
+	 * The 'ip' field could have multiple filters set, separated
+	 * either by space or comma. We first cut the filter and apply
+	 * all pieces separatelly.
+	 */
+	re = ftrace_function_filter_re(buf, len, &re_cnt);
+	if (!re)
+		return -EINVAL;
+
+	for (i = 0; i < re_cnt; i++) {
+		ret = ftrace_function_set_regexp(data->ops, filter, *reset,
+						 re[i], strlen(re[i]));
+		if (ret)
+			break;
+
+		if (*reset)
+			*reset = 0;
+	}
+
+	argv_free(re);
+	return ret;
+}
+
+static int ftrace_function_check_pred(struct filter_pred *pred, int leaf)
+{
+	struct ftrace_event_field *field = pred->field;
+
+	if (leaf) {
+		/*
+		 * Check the leaf predicate for function trace, verify:
+		 *  - only '==' and '!=' is used
+		 *  - the 'ip' field is used
+		 */
+		if ((pred->op != OP_EQ) && (pred->op != OP_NE))
+			return -EINVAL;
+
+		if (strcmp(field->name, "ip"))
+			return -EINVAL;
+	} else {
+		/*
+		 * Check the non leaf predicate for function trace, verify:
+		 *  - only '||' is used
+		*/
+		if (pred->op != OP_OR)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ftrace_function_set_filter_cb(enum move_type move,
+					 struct filter_pred *pred,
+					 int *err, void *data)
+{
+	/* Checking the node is valid for function trace. */
+	if ((move != MOVE_DOWN) ||
+	    (pred->left != FILTER_PRED_INVALID)) {
+		*err = ftrace_function_check_pred(pred, 0);
+	} else {
+		*err = ftrace_function_check_pred(pred, 1);
+		if (*err)
+			return WALK_PRED_ABORT;
+
+		*err = __ftrace_function_set_filter(pred->op == OP_EQ,
+						    pred->regex.pattern,
+						    pred->regex.len,
+						    data);
+	}
+
+	return (*err) ? WALK_PRED_ABORT : WALK_PRED_DEFAULT;
+}
+
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	struct function_filter_data data = {
+		.first_filter  = 1,
+		.first_notrace = 1,
+		.ops           = &event->ftrace_ops,
+	};
+
+	return walk_pred_tree(filter->preds, filter->root,
+			      ftrace_function_set_filter_cb, &data);
+}
+
 int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 			      char *filter_str)
 {
@@ -1975,9 +2113,16 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 		goto out_unlock;
 
 	err = create_filter(call, filter_str, false, &filter);
-	if (!err)
-		event->filter = filter;
+	if (err)
+		goto free_filter;
+
+	if (ftrace_event_is_function(call))
+		err = ftrace_function_set_filter(event, filter);
 	else
+		event->filter = filter;
+
+free_filter:
+	if (err || ftrace_event_is_function(call))
 		__free_filter(filter);
 
 out_unlock:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

* Re: [PATCHv8 0/8] ftrace, perf: Adding support to use function trace
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                                 ` (7 preceding siblings ...)
  2012-02-07 19:44               ` [PATCH 8/8] ftrace, perf: Add filter support for function trace event Jiri Olsa
@ 2012-02-10 13:27               ` Steven Rostedt
  2012-02-10 14:45                 ` Steven Rostedt
  2012-02-13 18:02               ` Steven Rostedt
  9 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2012-02-10 13:27 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Tue, 2012-02-07 at 20:44 +0100, Jiri Olsa wrote:
> hi,
> here's another version of perf support for function trace
> with filter. 

OK, I've applied all of these and I'm now going to run them through my
test suite before sending them to Ingo. I'll let you know if anything
goes wrong.

Also, in the future, PLEASE do not send a new version of the patches as
a reply to a previous version. It's OK to do so for a single patch, if
you only modified one patch of a series. But don't make a full series a
reply of the last series. It just screws up my inbox.

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv8 0/8] ftrace, perf: Adding support to use function trace
  2012-02-10 13:27               ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Steven Rostedt
@ 2012-02-10 14:45                 ` Steven Rostedt
  2012-02-10 16:07                   ` Jiri Olsa
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2012-02-10 14:45 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, 2012-02-10 at 08:27 -0500, Steven Rostedt wrote:
> On Tue, 2012-02-07 at 20:44 +0100, Jiri Olsa wrote:
> > hi,
> > here's another version of perf support for function trace
> > with filter. 
> 
> OK, I've applied all of these and I'm now going to run them through my
> test suite before sending them to Ingo. I'll let you know if anything
> goes wrong.

OK, I build booted and installed a kernel with your patches. I also
installed the latest perf code from linus/master. Then I ran the
following:

# perf record -e ftrace:function ls

and then ran

# perf report

It went into an infinite loop of "Processing time ordered events".

Is this expected?

Although perf script works fine.

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv8 0/8] ftrace, perf: Adding support to use function trace
  2012-02-10 14:45                 ` Steven Rostedt
@ 2012-02-10 16:07                   ` Jiri Olsa
  2012-02-10 16:48                     ` Frederic Weisbecker
  0 siblings, 1 reply; 186+ messages in thread
From: Jiri Olsa @ 2012-02-10 16:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Feb 10, 2012 at 09:45:20AM -0500, Steven Rostedt wrote:
> On Fri, 2012-02-10 at 08:27 -0500, Steven Rostedt wrote:
> > On Tue, 2012-02-07 at 20:44 +0100, Jiri Olsa wrote:
> > > hi,
> > > here's another version of perf support for function trace
> > > with filter. 
> > 
> > OK, I've applied all of these and I'm now going to run them through my
> > test suite before sending them to Ingo. I'll let you know if anything
> > goes wrong.
> 
> OK, I build booted and installed a kernel with your patches. I also
> installed the latest perf code from linus/master. Then I ran the
> following:
> 
> # perf record -e ftrace:function ls
> 
> and then ran
> 
> # perf report
> 
> It went into an infinite loop of "Processing time ordered events".
> 
> Is this expected?
> 
> Although perf script works fine.

yep, I was using perf script for testing
I get the same error for perf report, also if you run it like:

[root@dhcp-26-214 perf]# ./perf report --stdio
Warning:
Processed 82392 events and lost 1 chunks!

Check IO/CPU overload!

# ========
# captured on: Fri Feb 10 19:15:37 2012
# hostname : dhcp-26-214.brq.redhat.com
# os release : 3.3.0-rc3+
# perf version : 3.3.rc3.1274.g135d69
# arch : x86_64
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz
# cpuid : GenuineIntel,6,23,10
# total memory : 1969896 kB
# cmdline : /home/jolsa/tip/tools/perf/perf record -e ftrace:function ls 
# event : name = ftrace:function, type = 2, config = 0x1, config1 = 0x0,
# config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 3, 4 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# ========
#
Warning: Timestamp below last timeslice flush
# Events: 78K ftrace:function
#
# Overhead  Command      Shared Object                   Symbol
# ........  .......  .................  .......................
#
   100.00%       ls  [kernel.kallsyms]  [k] ftrace_ops_control_func

#
# (For a higher level overview, try: perf report --sort comm,dso)



I'll check whats going on, thanks
jirka

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv8 0/8] ftrace, perf: Adding support to use function trace
  2012-02-10 16:07                   ` Jiri Olsa
@ 2012-02-10 16:48                     ` Frederic Weisbecker
  2012-02-10 18:00                       ` Steven Rostedt
  0 siblings, 1 reply; 186+ messages in thread
From: Frederic Weisbecker @ 2012-02-10 16:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Feb 10, 2012 at 05:07:56PM +0100, Jiri Olsa wrote:
> On Fri, Feb 10, 2012 at 09:45:20AM -0500, Steven Rostedt wrote:
> > On Fri, 2012-02-10 at 08:27 -0500, Steven Rostedt wrote:
> > > On Tue, 2012-02-07 at 20:44 +0100, Jiri Olsa wrote:
> > > > hi,
> > > > here's another version of perf support for function trace
> > > > with filter. 
> > > 
> > > OK, I've applied all of these and I'm now going to run them through my
> > > test suite before sending them to Ingo. I'll let you know if anything
> > > goes wrong.
> > 
> > OK, I build booted and installed a kernel with your patches. I also
> > installed the latest perf code from linus/master. Then I ran the
> > following:
> > 
> > # perf record -e ftrace:function ls
> > 
> > and then ran
> > 
> > # perf report
> > 
> > It went into an infinite loop of "Processing time ordered events".
> > 
> > Is this expected?
> > 
> > Although perf script works fine.
> 
> yep, I was using perf script for testing
> I get the same error for perf report, also if you run it like:
> 
> [root@dhcp-26-214 perf]# ./perf report --stdio
> Warning:
> Processed 82392 events and lost 1 chunks!
> 
> Check IO/CPU overload!
> 
> # ========
> # captured on: Fri Feb 10 19:15:37 2012
> # hostname : dhcp-26-214.brq.redhat.com
> # os release : 3.3.0-rc3+
> # perf version : 3.3.rc3.1274.g135d69
> # arch : x86_64
> # nrcpus online : 2
> # nrcpus avail : 2
> # cpudesc : Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz
> # cpuid : GenuineIntel,6,23,10
> # total memory : 1969896 kB
> # cmdline : /home/jolsa/tip/tools/perf/perf record -e ftrace:function ls 
> # event : name = ftrace:function, type = 2, config = 0x1, config1 = 0x0,
> # config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 3, 4 }
> # HEADER_CPU_TOPOLOGY info available, use -I to display
> # ========
> #
> Warning: Timestamp below last timeslice flush

Stephane Eranian has reported a similar problem but perhaps not with
the same cause.

I need to have a look.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv8 0/8] ftrace, perf: Adding support to use function trace
  2012-02-10 16:48                     ` Frederic Weisbecker
@ 2012-02-10 18:00                       ` Steven Rostedt
  2012-02-10 18:05                         ` Frederic Weisbecker
  0 siblings, 1 reply; 186+ messages in thread
From: Steven Rostedt @ 2012-02-10 18:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jiri Olsa, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, 2012-02-10 at 17:48 +0100, Frederic Weisbecker wrote:

> Stephane Eranian has reported a similar problem but perhaps not with
> the same cause.
> 
> I need to have a look.

Not sure if this is related, but the "script" output of the problem
perf.dat file had this warning:

Warning: Timestamp below last timeslice flush

-- Steve



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv8 0/8] ftrace, perf: Adding support to use function trace
  2012-02-10 18:00                       ` Steven Rostedt
@ 2012-02-10 18:05                         ` Frederic Weisbecker
  2012-02-10 18:23                           ` David Ahern
  0 siblings, 1 reply; 186+ messages in thread
From: Frederic Weisbecker @ 2012-02-10 18:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Fri, Feb 10, 2012 at 01:00:15PM -0500, Steven Rostedt wrote:
> On Fri, 2012-02-10 at 17:48 +0100, Frederic Weisbecker wrote:
> 
> > Stephane Eranian has reported a similar problem but perhaps not with
> > the same cause.
> > 
> > I need to have a look.
> 
> Not sure if this is related, but the "script" output of the problem
> perf.dat file had this warning:
> 
> Warning: Timestamp below last timeslice flush

Yup, that's what I'm looking at.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv8 0/8] ftrace, perf: Adding support to use function trace
  2012-02-10 18:05                         ` Frederic Weisbecker
@ 2012-02-10 18:23                           ` David Ahern
  0 siblings, 0 replies; 186+ messages in thread
From: David Ahern @ 2012-02-10 18:23 UTC (permalink / raw)
  To: Frederic Weisbecker, Steven Rostedt
  Cc: Jiri Olsa, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov



On 02/10/2012 11:05 AM, Frederic Weisbecker wrote:
> On Fri, Feb 10, 2012 at 01:00:15PM -0500, Steven Rostedt wrote:
>> On Fri, 2012-02-10 at 17:48 +0100, Frederic Weisbecker wrote:
>>
>>> Stephane Eranian has reported a similar problem but perhaps not with
>>> the same cause.
>>>
>>> I need to have a look.
>>
>> Not sure if this is related, but the "script" output of the problem
>> perf.dat file had this warning:
>>
>> Warning: Timestamp below last timeslice flush
> 
> Yup, that's what I'm looking at.

Does increasing MAX_SAMPLE_BUFFER in util/session.c help?

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [PATCHv8 0/8] ftrace, perf: Adding support to use function trace
  2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
                                 ` (8 preceding siblings ...)
  2012-02-10 13:27               ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Steven Rostedt
@ 2012-02-13 18:02               ` Steven Rostedt
  9 siblings, 0 replies; 186+ messages in thread
From: Steven Rostedt @ 2012-02-13 18:02 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: fweisbec, mingo, paulus, acme, a.p.zijlstra, linux-kernel, aarapov

On Tue, 2012-02-07 at 20:44 +0100, Jiri Olsa wrote:
> hi,
> here's another version of perf support for function trace
> with filter. 
> 
> attached patches:
>  - 1/8 ftrace: Change filter/notrace set functions to return exit code
>  - 2/8 ftrace: Add enable/disable ftrace_ops control interface
>  - 3/8 ftrace, perf: Add open/close tracepoint perf registration actions
>  - 4/8 ftrace, perf: Add add/del tracepoint perf registration actions
>  - 5/8 ftrace: Add FTRACE_ENTRY_REG macro to allow event registration
>  - 6/8 ftrace, perf: Add support to use function tracepoint in perf
>  - 7/8 ftrace: Allow to specify filter field type for ftrace events
>  - 8/8 ftrace, perf: Add filter support for function trace event

OK, these break on a lot of configs.

Make sure they build with the following before sending another series.
I'll give you some of the fixes I added already.

Please test: allnoconfig, allmodconfig

http://rostedt.homelinux.com/private/configs/config-ftrace-patchcheck
http://rostedt.homelinux.com/private/configs/config-ftrace-static-patchcheck
http://rostedt.homelinux.com/private/configs/config-nofunc-patchcheck
http://rostedt.homelinux.com/private/configs/config-notrace-patchcheck

Thanks!

-- Steve

Note, These need to be defines and not static inlines, because
ftrace_ops is not defined in some cases.


diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 76f6c49..d984c08 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -381,9 +381,6 @@ extern void ftrace_enable_daemon(void);
 #else
 static inline int skip_trace(unsigned long ip) { return 0; }
 static inline int ftrace_force_update(void) { return 0; }
-static inline void ftrace_set_filter(unsigned char *buf, int len, int reset)
-{
-}
 static inline void ftrace_disable_daemon(void) { }
 static inline void ftrace_enable_daemon(void) { }
 static inline void ftrace_release_mod(struct module *mod) {}
@@ -407,6 +404,9 @@ static inline int ftrace_text_reserved(void *start, void *end)
  */
 #define ftrace_regex_open(ops, flag, inod, file) ({ -ENODEV; })
 #define ftrace_set_early_filter(ops, buf, enable) do { } while (0)
+#define ftrace_free_filter(ops) do { } while (0)
+#define ftrace_set_filter(ops, buf, len, reset) ({ -ENODEV; })
+#define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; })
 
 static inline ssize_t ftrace_filter_write(struct file *file, const char __user *ubuf,
 			    size_t cnt, loff_t *ppos) { return -ENODEV; }




^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [tip:perf/core] ftrace: Change filter/ notrace set functions to return exit code
  2012-01-02  9:04         ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
@ 2012-02-17 13:46           ` tip-bot for Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: tip-bot for Jiri Olsa @ 2012-02-17 13:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, fweisbec, rostedt, tglx, jolsa

Commit-ID:  ac483c446b67870444c9eeaf8325d3d2af9b91bc
Gitweb:     http://git.kernel.org/tip/ac483c446b67870444c9eeaf8325d3d2af9b91bc
Author:     Jiri Olsa <jolsa@redhat.com>
AuthorDate: Mon, 2 Jan 2012 10:04:14 +0100
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Fri, 3 Feb 2012 09:48:18 -0500

ftrace: Change filter/notrace set functions to return exit code

Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.

The set_ftrace_filter interface returns error in case the filter
did not match:

  # echo krava > set_ftrace_filter
  bash: echo: write error: Invalid argument

Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.

Link: http://lkml.kernel.org/r/1325495060-6402-2-git-send-email-jolsa@redhat.com

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace.h |    4 ++--
 kernel/trace/ftrace.c  |   15 +++++++++------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 028e26f..f33fb3b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -178,9 +178,9 @@ struct dyn_ftrace {
 };
 
 int ftrace_force_update(void);
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset);
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 683d559..e2e0597 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3146,8 +3146,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 	mutex_lock(&ftrace_regex_lock);
 	if (reset)
 		ftrace_filter_reset(hash);
-	if (buf)
-		ftrace_match_records(hash, buf, len);
+	if (buf && !ftrace_match_records(hash, buf, len)) {
+		ret = -EINVAL;
+		goto out_regex_unlock;
+	}
 
 	mutex_lock(&ftrace_lock);
 	ret = ftrace_hash_move(ops, enable, orig_hash, hash);
@@ -3157,6 +3159,7 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
 
 	mutex_unlock(&ftrace_lock);
 
+ out_regex_unlock:
 	mutex_unlock(&ftrace_regex_lock);
 
 	free_ftrace_hash(hash);
@@ -3173,10 +3176,10 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
  * Filters denote which functions should be enabled when tracing is enabled.
  * If @buf is NULL and reset is set, all functions will be enabled for tracing.
  */
-void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 		       int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 1);
+	return ftrace_set_regex(ops, buf, len, reset, 1);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_filter);
 
@@ -3191,10 +3194,10 @@ EXPORT_SYMBOL_GPL(ftrace_set_filter);
  * is enabled. If @buf is NULL and reset is set, all functions will be enabled
  * for tracing.
  */
-void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset)
 {
-	ftrace_set_regex(ops, buf, len, reset, 0);
+	return ftrace_set_regex(ops, buf, len, reset, 0);
 }
 EXPORT_SYMBOL_GPL(ftrace_set_notrace);
 /**

^ permalink raw reply related	[flat|nested] 186+ messages in thread

* [PATCH 7/7] ftrace, perf: Add filter support for function trace event
  2012-02-15 14:51 [PATCHv9 0/7] " Jiri Olsa
@ 2012-02-15 14:51 ` Jiri Olsa
  0 siblings, 0 replies; 186+ messages in thread
From: Jiri Olsa @ 2012-02-15 14:51 UTC (permalink / raw)
  To: rostedt, fweisbec, mingo, paulus, acme, a.p.zijlstra
  Cc: linux-kernel, aarapov, Jiri Olsa

Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

  ip == f1[, ]f2 ... || ip != f3[, ]f4 ...

with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"', e.g.:

  perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
  perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
  perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/ftrace.h             |    7 +-
 kernel/trace/ftrace.c              |    6 ++
 kernel/trace/trace.h               |    2 -
 kernel/trace/trace_event_perf.c    |    4 +-
 kernel/trace/trace_events_filter.c |  165 ++++++++++++++++++++++++++++++++++--
 5 files changed, 172 insertions(+), 12 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 64a309d..72a6cab 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -250,6 +250,7 @@ int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 			int len, int reset);
 void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
 void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
+void ftrace_free_filter(struct ftrace_ops *ops);
 
 int register_ftrace_command(struct ftrace_func_command *cmd);
 int unregister_ftrace_command(struct ftrace_func_command *cmd);
@@ -380,9 +381,6 @@ extern void ftrace_enable_daemon(void);
 #else
 static inline int skip_trace(unsigned long ip) { return 0; }
 static inline int ftrace_force_update(void) { return 0; }
-static inline void ftrace_set_filter(unsigned char *buf, int len, int reset)
-{
-}
 static inline void ftrace_disable_daemon(void) { }
 static inline void ftrace_enable_daemon(void) { }
 static inline void ftrace_release_mod(struct module *mod) {}
@@ -406,6 +404,9 @@ static inline int ftrace_text_reserved(void *start, void *end)
  */
 #define ftrace_regex_open(ops, flag, inod, file) ({ -ENODEV; })
 #define ftrace_set_early_filter(ops, buf, enable) do { } while (0)
+#define ftrace_set_filter(ops, buf, len, reset) ({ -ENODEV; })
+#define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; })
+#define ftrace_free_filter(ops) do { } while (0)
 
 static inline ssize_t ftrace_filter_write(struct file *file, const char __user *ubuf,
 			    size_t cnt, loff_t *ppos) { return -ENODEV; }
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c8d2af2..239b94a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1186,6 +1186,12 @@ static void free_ftrace_hash_rcu(struct ftrace_hash *hash)
 	call_rcu_sched(&hash->rcu, __free_ftrace_hash_rcu);
 }
 
+void ftrace_free_filter(struct ftrace_ops *ops)
+{
+	free_ftrace_hash(ops->filter_hash);
+	free_ftrace_hash(ops->notrace_hash);
+}
+
 static struct ftrace_hash *alloc_ftrace_hash(int size_bits)
 {
 	struct ftrace_hash *hash;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 29f93cd..54faec7 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -776,9 +776,7 @@ struct filter_pred {
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	struct ftrace_event_field *field;
-#endif
 	int 			offset;
 	int 			not;
 	int 			op;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index fdeeb5c..fee3752 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -298,7 +298,9 @@ static int perf_ftrace_function_register(struct perf_event *event)
 static int perf_ftrace_function_unregister(struct perf_event *event)
 {
 	struct ftrace_ops *ops = &event->ftrace_ops;
-	return unregister_ftrace_function(ops);
+	int ret = unregister_ftrace_function(ops);
+	ftrace_free_filter(ops);
+	return ret;
 }
 
 static void perf_ftrace_function_enable(struct perf_event *event)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index eb04a2a..4d7ac74 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -81,6 +81,7 @@ enum {
 	FILT_ERR_TOO_MANY_PREDS,
 	FILT_ERR_MISSING_FIELD,
 	FILT_ERR_INVALID_FILTER,
+	FILT_ERR_IP_FIELD_ONLY,
 };
 
 static char *err_text[] = {
@@ -96,6 +97,7 @@ static char *err_text[] = {
 	"Too many terms in predicate expression",
 	"Missing field name and/or value",
 	"Meaningless filter expression",
+	"Only 'ip' field is supported for function trace",
 };
 
 struct opstack_op {
@@ -992,7 +994,12 @@ static int init_pred(struct filter_parse_state *ps,
 			fn = filter_pred_strloc;
 		else
 			fn = filter_pred_pchar;
-	} else if (!is_function_field(field)) {
+	} else if (is_function_field(field)) {
+		if (strcmp(field->name, "ip")) {
+			parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0);
+			return -EINVAL;
+		}
+	} else {
 		if (field->is_signed)
 			ret = strict_strtoll(pred->regex.pattern, 0, &val);
 		else
@@ -1339,10 +1346,7 @@ static struct filter_pred *create_pred(struct filter_parse_state *ps,
 
 	strcpy(pred.regex.pattern, operand2);
 	pred.regex.len = strlen(pred.regex.pattern);
-
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 	pred.field = field;
-#endif
 	return init_pred(ps, field, &pred) ? NULL : &pred;
 }
 
@@ -1955,6 +1959,148 @@ void ftrace_profile_free_filter(struct perf_event *event)
 	__free_filter(filter);
 }
 
+struct function_filter_data {
+	struct ftrace_ops *ops;
+	int first_filter;
+	int first_notrace;
+};
+
+#ifdef CONFIG_FUNCTION_TRACER
+static char **
+ftrace_function_filter_re(char *buf, int len, int *count)
+{
+	char *str, *sep, **re;
+
+	str = kstrndup(buf, len, GFP_KERNEL);
+	if (!str)
+		return NULL;
+
+	/*
+	 * The argv_split function takes white space
+	 * as a separator, so convert ',' into spaces.
+	 */
+	while ((sep = strchr(str, ',')))
+		*sep = ' ';
+
+	re = argv_split(GFP_KERNEL, str, count);
+	kfree(str);
+	return re;
+}
+
+static int ftrace_function_set_regexp(struct ftrace_ops *ops, int filter,
+				      int reset, char *re, int len)
+{
+	int ret;
+
+	if (filter)
+		ret = ftrace_set_filter(ops, re, len, reset);
+	else
+		ret = ftrace_set_notrace(ops, re, len, reset);
+
+	return ret;
+}
+
+static int __ftrace_function_set_filter(int filter, char *buf, int len,
+					struct function_filter_data *data)
+{
+	int i, re_cnt, ret;
+	int *reset;
+	char **re;
+
+	reset = filter ? &data->first_filter : &data->first_notrace;
+
+	/*
+	 * The 'ip' field could have multiple filters set, separated
+	 * either by space or comma. We first cut the filter and apply
+	 * all pieces separatelly.
+	 */
+	re = ftrace_function_filter_re(buf, len, &re_cnt);
+	if (!re)
+		return -EINVAL;
+
+	for (i = 0; i < re_cnt; i++) {
+		ret = ftrace_function_set_regexp(data->ops, filter, *reset,
+						 re[i], strlen(re[i]));
+		if (ret)
+			break;
+
+		if (*reset)
+			*reset = 0;
+	}
+
+	argv_free(re);
+	return ret;
+}
+
+static int ftrace_function_check_pred(struct filter_pred *pred, int leaf)
+{
+	struct ftrace_event_field *field = pred->field;
+
+	if (leaf) {
+		/*
+		 * Check the leaf predicate for function trace, verify:
+		 *  - only '==' and '!=' is used
+		 *  - the 'ip' field is used
+		 */
+		if ((pred->op != OP_EQ) && (pred->op != OP_NE))
+			return -EINVAL;
+
+		if (strcmp(field->name, "ip"))
+			return -EINVAL;
+	} else {
+		/*
+		 * Check the non leaf predicate for function trace, verify:
+		 *  - only '||' is used
+		*/
+		if (pred->op != OP_OR)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ftrace_function_set_filter_cb(enum move_type move,
+					 struct filter_pred *pred,
+					 int *err, void *data)
+{
+	/* Checking the node is valid for function trace. */
+	if ((move != MOVE_DOWN) ||
+	    (pred->left != FILTER_PRED_INVALID)) {
+		*err = ftrace_function_check_pred(pred, 0);
+	} else {
+		*err = ftrace_function_check_pred(pred, 1);
+		if (*err)
+			return WALK_PRED_ABORT;
+
+		*err = __ftrace_function_set_filter(pred->op == OP_EQ,
+						    pred->regex.pattern,
+						    pred->regex.len,
+						    data);
+	}
+
+	return (*err) ? WALK_PRED_ABORT : WALK_PRED_DEFAULT;
+}
+
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	struct function_filter_data data = {
+		.first_filter  = 1,
+		.first_notrace = 1,
+		.ops           = &event->ftrace_ops,
+	};
+
+	return walk_pred_tree(filter->preds, filter->root,
+			      ftrace_function_set_filter_cb, &data);
+}
+#else
+static int ftrace_function_set_filter(struct perf_event *event,
+				      struct event_filter *filter)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_FUNCTION_TRACER */
+
 int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 			      char *filter_str)
 {
@@ -1975,9 +2121,16 @@ int ftrace_profile_set_filter(struct perf_event *event, int event_id,
 		goto out_unlock;
 
 	err = create_filter(call, filter_str, false, &filter);
-	if (!err)
-		event->filter = filter;
+	if (err)
+		goto free_filter;
+
+	if (ftrace_event_is_function(call))
+		err = ftrace_function_set_filter(event, filter);
 	else
+		event->filter = filter;
+
+free_filter:
+	if (err || ftrace_event_is_function(call))
 		__free_filter(filter);
 
 out_unlock:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 186+ messages in thread

end of thread, other threads:[~2012-02-17 13:46 UTC | newest]

Thread overview: 186+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-27 18:04 [RFC] ftrace, perf: Adding support to use function trace Jiri Olsa
2011-11-27 18:04 ` [PATCH 1/9] trace: Fix uninitialized variable compiler warning Jiri Olsa
2011-11-28 16:19   ` Steven Rostedt
2011-11-28 16:25     ` Jiri Olsa
2011-11-28 19:34       ` Steven Rostedt
2011-11-27 18:04 ` [PATCH 2/9] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
2011-11-28 16:24   ` Steven Rostedt
2011-11-27 18:04 ` [PATCH 3/9] ftrace: Fix shutdown to disable calls properly Jiri Olsa
2011-11-28 19:18   ` Steven Rostedt
2011-11-29 11:21     ` Jiri Olsa
2011-11-27 18:04 ` [PATCH 4/9] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
2011-11-28 19:26   ` Steven Rostedt
2011-11-28 20:02     ` Peter Zijlstra
2011-11-28 20:05       ` Peter Zijlstra
2011-11-28 20:14         ` Steven Rostedt
2011-11-28 20:20           ` Peter Zijlstra
2011-11-28 20:12       ` Steven Rostedt
2011-11-28 20:15         ` Peter Zijlstra
2011-11-28 20:24           ` Steven Rostedt
2011-11-28 20:21   ` Steven Rostedt
2011-11-29 10:07     ` Jiri Olsa
2011-11-27 18:04 ` [PATCH 5/9] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
2011-11-27 18:04 ` [PATCH 6/9] ftrace, perf: Add add/del " Jiri Olsa
2011-11-27 18:04 ` [PATCH 7/9] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
2011-11-28 19:58   ` Steven Rostedt
2011-11-28 20:03     ` Peter Zijlstra
2011-11-28 20:13       ` Steven Rostedt
2011-11-29 10:10         ` Jiri Olsa
2011-11-28 20:08     ` Peter Zijlstra
2011-11-28 20:10       ` Peter Zijlstra
2011-11-28 20:16         ` Steven Rostedt
2011-11-28 20:18           ` Peter Zijlstra
2011-11-27 18:04 ` [PATCH 8/9] ftrace, perf: Add FILTER_TRACE_FN event field type Jiri Olsa
2011-11-28 20:01   ` Steven Rostedt
2011-11-29 10:14     ` Jiri Olsa
2011-11-29 11:22     ` Jiri Olsa
2011-11-29 11:51       ` Peter Zijlstra
2011-11-29 12:21         ` Jiri Olsa
2011-11-27 18:04 ` [PATCH 9/9] ftrace, perf: Add filter support for function trace event Jiri Olsa
2011-11-28 20:07   ` Steven Rostedt
2011-12-05 17:22 ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
2011-12-05 17:22   ` [PATCHv2 01/10] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
2011-12-05 17:22   ` [PATCHv2 02/10] ftrace: Change mcount call replacement logic Jiri Olsa
2011-12-19 19:03     ` Steven Rostedt
2011-12-20 13:10       ` Jiri Olsa
2011-12-20 16:33         ` Steven Rostedt
2011-12-20 19:39     ` Steven Rostedt
2011-12-21  9:57       ` Jiri Olsa
2011-12-21 11:34         ` Steven Rostedt
2011-12-21 11:35           ` Steven Rostedt
2011-12-21 11:40             ` Jiri Olsa
2012-01-08  9:13     ` [tip:perf/core] ftrace: Fix unregister ftrace_ops accounting tip-bot for Jiri Olsa
2011-12-05 17:22   ` [PATCHv2 03/10] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
2011-12-19 19:19     ` Steven Rostedt
2011-12-19 19:35     ` Steven Rostedt
2011-12-20 14:57       ` Jiri Olsa
2011-12-20 15:25         ` Steven Rostedt
2011-12-20 15:35           ` Jiri Olsa
2011-12-05 17:22   ` [PATCHv2 04/10] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
2011-12-05 17:22   ` [PATCHv2 05/10] ftrace, perf: Add add/del " Jiri Olsa
2011-12-05 17:22   ` [PATCHv2 06/10] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
2011-12-05 17:22   ` [PATCHv2 07/10] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
2011-12-19 19:22     ` Steven Rostedt
2011-12-05 17:22   ` [PATCHv2 08/10] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
2011-12-05 17:22   ` [PATCHv2 09/10] ftrace, perf: Add filter support for function trace event Jiri Olsa
2011-12-05 17:22   ` [PATCHv2 10/10] ftrace, graph: Add global_ops filter callback for graph tracing Jiri Olsa
2011-12-19 19:27     ` Steven Rostedt
2011-12-19 13:40   ` [RFCv2] ftrace, perf: Adding support to use function trace Jiri Olsa
2011-12-19 16:45     ` Steven Rostedt
2011-12-19 16:58     ` Frederic Weisbecker
2011-12-21 11:48   ` [PATCHv3 0/8] " Jiri Olsa
2011-12-21 11:48     ` [PATCH 1/8] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
2011-12-21 11:48     ` [PATCH 2/8] ftrace: Fix possible NULL dereferencing in __ftrace_hash_rec_update Jiri Olsa
2011-12-21 15:23       ` Steven Rostedt
2011-12-21 11:48     ` [PATCH 3/8] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
2011-12-21 16:01       ` Steven Rostedt
2011-12-21 16:43         ` Jiri Olsa
2011-12-21 16:55           ` Steven Rostedt
2012-01-24  1:26         ` Frederic Weisbecker
2011-12-21 11:48     ` [PATCH 4/8] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
2011-12-21 11:48     ` [PATCH 5/8] ftrace, perf: Add add/del " Jiri Olsa
2011-12-21 11:48     ` [PATCH 6/8] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
2011-12-21 11:48     ` [PATCH 7/8] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
2011-12-21 11:48     ` [PATCH 8/8] ftrace, perf: Add filter support for function trace event Jiri Olsa
2011-12-21 18:56     ` [PATCHv4 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
2011-12-21 18:56       ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
2011-12-22  0:12         ` Steven Rostedt
2011-12-22  8:01           ` [PATCHv5 " Jiri Olsa
2011-12-21 18:56       ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
2011-12-21 18:56       ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
2011-12-21 18:56       ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
2011-12-21 18:56       ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
2011-12-21 18:56       ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
2011-12-21 18:56       ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
2011-12-21 22:07         ` Frederic Weisbecker
2011-12-22 12:55           ` Jiri Olsa
2011-12-22 15:26             ` [PATCHvFIXED " Jiri Olsa
2011-12-24  2:35               ` Frederic Weisbecker
2011-12-21 19:02       ` [PATCHv4 0/7] ftrace, perf: Adding support to use function trace Jiri Olsa
2012-01-02  9:04       ` [PATCHv5 " Jiri Olsa
2012-01-02  9:04         ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
2012-02-17 13:46           ` [tip:perf/core] ftrace: Change filter/ notrace " tip-bot for Jiri Olsa
2012-01-02  9:04         ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
2012-01-17  1:42           ` Frederic Weisbecker
2012-01-17  2:07             ` Steven Rostedt
2012-01-17  2:29               ` Frederic Weisbecker
2012-01-18 13:59             ` Jiri Olsa
2012-01-02  9:04         ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
2012-01-02  9:04         ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
2012-01-02  9:04         ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
2012-01-02  9:04         ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
2012-01-02  9:04         ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
2012-01-16 23:59           ` Steven Rostedt
2012-01-18 13:45             ` Jiri Olsa
2012-01-16  8:57         ` [PATCHv5 0/7] ftrace, perf: Adding support to use function trace Jiri Olsa
2012-01-16 16:17           ` Steven Rostedt
2012-01-18 18:44         ` [PATCHv6 " Jiri Olsa
2012-01-18 18:44           ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
2012-01-19 16:31             ` Frederic Weisbecker
2012-01-18 18:44           ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
2012-01-20 17:02             ` Frederic Weisbecker
2012-01-25 23:13               ` Steven Rostedt
2012-01-26  2:37                 ` Frederic Weisbecker
2012-01-27 10:37                   ` Jiri Olsa
2012-01-27 10:38                     ` Jiri Olsa
2012-01-27 16:40                     ` Frederic Weisbecker
2012-01-27 16:54                       ` Jiri Olsa
2012-01-27 17:02                         ` Frederic Weisbecker
2012-01-27 17:20                           ` Jiri Olsa
2012-01-28 16:39                             ` Frederic Weisbecker
2012-01-27 17:21                         ` Steven Rostedt
2012-01-18 18:44           ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
2012-01-18 18:44           ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
2012-01-18 18:44           ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
2012-01-18 18:44           ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
2012-01-18 18:44           ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
2012-01-18 21:43           ` [PATCHv6 0/7] ftrace, perf: Adding support to use function trace Steven Rostedt
2012-01-28 18:43           ` [PATCHv7 " Jiri Olsa
2012-01-28 18:43             ` [PATCH 1/7] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
2012-01-30  5:42               ` Frederic Weisbecker
2012-01-28 18:43             ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
2012-01-30  5:59               ` Frederic Weisbecker
2012-01-30  9:18                 ` Jiri Olsa
2012-02-03 13:42                   ` Steven Rostedt
2012-02-03 13:50                     ` Jiri Olsa
2012-02-03 14:08                       ` Steven Rostedt
2012-02-03 14:22                         ` [PATCHv8 0/2] first 2 patches passed review Jiri Olsa
2012-02-03 14:22                           ` [PATCH 1/2] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
2012-02-03 14:22                           ` [PATCH 2/2] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
2012-02-04 13:24                           ` [PATCHv8 0/2] first 2 patches passed review Frederic Weisbecker
2012-02-03 13:40               ` [PATCH 2/7] ftrace: Add enable/disable ftrace_ops control interface Steven Rostedt
2012-01-28 18:43             ` [PATCH 3/7] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
2012-02-02 17:35               ` Frederic Weisbecker
2012-02-03 10:23                 ` Jiri Olsa
2012-01-28 18:43             ` [PATCH 4/7] ftrace, perf: Add add/del " Jiri Olsa
2012-02-02 17:42               ` Frederic Weisbecker
2012-01-28 18:43             ` [PATCH 5/7] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
2012-02-02 18:14               ` Frederic Weisbecker
2012-02-03 12:54                 ` Jiri Olsa
2012-02-03 13:00                   ` Jiri Olsa
2012-02-03 14:07                     ` Steven Rostedt
2012-02-04 13:21                   ` Frederic Weisbecker
2012-02-06 19:35                     ` Steven Rostedt
2012-02-03 13:53                 ` Steven Rostedt
2012-01-28 18:43             ` [PATCH 6/7] ftrace, perf: Distinguish ftrace function event field type Jiri Olsa
2012-02-03 14:16               ` Steven Rostedt
2012-01-28 18:43             ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa
2012-02-07  0:20               ` Jiri Olsa
2012-02-07 19:44             ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Jiri Olsa
2012-02-07 19:44               ` [PATCH 1/8] ftrace: Change filter/notrace set functions to return exit code Jiri Olsa
2012-02-07 19:44               ` [PATCH 2/8] ftrace: Add enable/disable ftrace_ops control interface Jiri Olsa
2012-02-07 19:44               ` [PATCH 3/8] ftrace, perf: Add open/close tracepoint perf registration actions Jiri Olsa
2012-02-07 19:44               ` [PATCH 4/8] ftrace, perf: Add add/del " Jiri Olsa
2012-02-07 19:44               ` [PATCH 5/8] ftrace: Add FTRACE_ENTRY_REG macro to allow event registration Jiri Olsa
2012-02-07 19:44               ` [PATCH 6/8] ftrace, perf: Add support to use function tracepoint in perf Jiri Olsa
2012-02-07 19:44               ` [PATCH 7/8] ftrace: Allow to specify filter field type for ftrace events Jiri Olsa
2012-02-07 19:44               ` [PATCH 8/8] ftrace, perf: Add filter support for function trace event Jiri Olsa
2012-02-10 13:27               ` [PATCHv8 0/8] ftrace, perf: Adding support to use function trace Steven Rostedt
2012-02-10 14:45                 ` Steven Rostedt
2012-02-10 16:07                   ` Jiri Olsa
2012-02-10 16:48                     ` Frederic Weisbecker
2012-02-10 18:00                       ` Steven Rostedt
2012-02-10 18:05                         ` Frederic Weisbecker
2012-02-10 18:23                           ` David Ahern
2012-02-13 18:02               ` Steven Rostedt
2012-02-15 14:51 [PATCHv9 0/7] " Jiri Olsa
2012-02-15 14:51 ` [PATCH 7/7] ftrace, perf: Add filter support for function trace event Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).