All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] tracing: Tracing event profiling updates
@ 2009-09-18  1:33 Frederic Weisbecker
  2009-09-18  1:33 ` [PATCH 1/2] tracing: Factorize the events profile accounting Frederic Weisbecker
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-18  1:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Peter Zijlstra,
	Li Zefan, Jason Baron, Masami Hiramatsu


Ingo,

Please pull the latest tracing/core branch which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	tracing/core

Thanks,

	Frederic.


Frederic Weisbecker (2):
      tracing: Factorize the events profile accounting
      tracing: Allocate the ftrace event profile buffer dynamically

 include/linux/ftrace_event.h       |    9 ++-
 include/linux/syscalls.h           |   24 +++-----
 include/trace/ftrace.h             |  111 ++++++++++++++++++++---------------
 kernel/trace/trace_event_profile.c |   68 +++++++++++++++++++++-
 kernel/trace/trace_syscalls.c      |  103 +++++++++++++++++++++++++++------
 5 files changed, 228 insertions(+), 87 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] tracing: Factorize the events profile accounting
  2009-09-18  1:33 [PATCH 0/2] tracing: Tracing event profiling updates Frederic Weisbecker
@ 2009-09-18  1:33 ` Frederic Weisbecker
  2009-09-18  2:15   ` Steven Rostedt
  2009-09-18  3:45   ` Li Zefan
  2009-09-18  1:33 ` [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically Frederic Weisbecker
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-18  1:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Peter Zijlstra,
	Li Zefan, Jason Baron, Masami Hiramatsu, Ingo Molnar

Factorize the events enabling accounting in a common tracing core
helper. This reduces the size of the profile_enable() and
profile_disable() callbacks for each trace events.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 include/linux/ftrace_event.h       |    4 ++--
 include/linux/syscalls.h           |   24 ++++++++----------------
 include/trace/ftrace.h             |   28 ++++++++--------------------
 kernel/trace/trace_event_profile.c |   20 ++++++++++++++++++--
 4 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index bd099ba..bc103d7 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -130,8 +130,8 @@ struct ftrace_event_call {
 	void			*data;
 
 	atomic_t		profile_count;
-	int			(*profile_enable)(struct ftrace_event_call *);
-	void			(*profile_disable)(struct ftrace_event_call *);
+	int			(*profile_enable)(void);
+	void			(*profile_disable)(void);
 };
 
 #define MAX_FILTER_PRED		32
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a8e3782..7d9803c 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -100,33 +100,25 @@ struct perf_counter_attr;
 
 #ifdef CONFIG_EVENT_PROFILE
 #define TRACE_SYS_ENTER_PROFILE(sname)					       \
-static int prof_sysenter_enable_##sname(struct ftrace_event_call *event_call)  \
+static int prof_sysenter_enable_##sname(void)				       \
 {									       \
-	int ret = 0;							       \
-	if (!atomic_inc_return(&event_enter_##sname.profile_count))	       \
-		ret = reg_prof_syscall_enter("sys"#sname);		       \
-	return ret;							       \
+	return reg_prof_syscall_enter("sys"#sname);			       \
 }									       \
 									       \
-static void prof_sysenter_disable_##sname(struct ftrace_event_call *event_call)\
+static void prof_sysenter_disable_##sname(void)				       \
 {									       \
-	if (atomic_add_negative(-1, &event_enter_##sname.profile_count))       \
-		unreg_prof_syscall_enter("sys"#sname);			       \
+	unreg_prof_syscall_enter("sys"#sname);				       \
 }
 
 #define TRACE_SYS_EXIT_PROFILE(sname)					       \
-static int prof_sysexit_enable_##sname(struct ftrace_event_call *event_call)   \
+static int prof_sysexit_enable_##sname(void)				       \
 {									       \
-	int ret = 0;							       \
-	if (!atomic_inc_return(&event_exit_##sname.profile_count))	       \
-		ret = reg_prof_syscall_exit("sys"#sname);		       \
-	return ret;							       \
+	return reg_prof_syscall_exit("sys"#sname);			       \
 }									       \
 									       \
-static void prof_sysexit_disable_##sname(struct ftrace_event_call *event_call) \
+static void prof_sysexit_disable_##sname(void)				       \
 {                                                                              \
-	if (atomic_add_negative(-1, &event_exit_##sname.profile_count))	       \
-		unreg_prof_syscall_exit("sys"#sname);			       \
+	unreg_prof_syscall_exit("sys"#sname);				       \
 }
 
 #define TRACE_SYS_ENTER_PROFILE_INIT(sname)				       \
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 72a3b43..a822087 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -382,20 +382,14 @@ static inline int ftrace_get_offsets_##call(				\
  *
  * NOTE: The insertion profile callback (ftrace_profile_<call>) is defined later
  *
- * static int ftrace_profile_enable_<call>(struct ftrace_event_call *event_call)
+ * static int ftrace_profile_enable_<call>(void)
  * {
- * 	int ret = 0;
- *
- * 	if (!atomic_inc_return(&event_call->profile_count))
- * 		ret = register_trace_<call>(ftrace_profile_<call>);
- *
- * 	return ret;
+ * 	return register_trace_<call>(ftrace_profile_<call>);
  * }
  *
- * static void ftrace_profile_disable_<call>(struct ftrace_event_call *event_call)
+ * static void ftrace_profile_disable_<call>(void)
  * {
- * 	if (atomic_add_negative(-1, &event->call->profile_count))
- * 		unregister_trace_<call>(ftrace_profile_<call>);
+ * 	unregister_trace_<call>(ftrace_profile_<call>);
  * }
  *
  */
@@ -405,20 +399,14 @@ static inline int ftrace_get_offsets_##call(				\
 									\
 static void ftrace_profile_##call(proto);				\
 									\
-static int ftrace_profile_enable_##call(struct ftrace_event_call *event_call) \
+static int ftrace_profile_enable_##call(void)				\
 {									\
-	int ret = 0;							\
-									\
-	if (!atomic_inc_return(&event_call->profile_count))		\
-		ret = register_trace_##call(ftrace_profile_##call);	\
-									\
-	return ret;							\
+	return register_trace_##call(ftrace_profile_##call);		\
 }									\
 									\
-static void ftrace_profile_disable_##call(struct ftrace_event_call *event_call)\
+static void ftrace_profile_disable_##call(void)				\
 {									\
-	if (atomic_add_negative(-1, &event_call->profile_count))	\
-		unregister_trace_##call(ftrace_profile_##call);		\
+	unregister_trace_##call(ftrace_profile_##call);			\
 }
 
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index 55a25c9..df4a74e 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -8,6 +8,14 @@
 #include <linux/module.h>
 #include "trace.h"
 
+static int ftrace_profile_enable_event(struct ftrace_event_call *event)
+{
+	if (atomic_inc_return(&event->profile_count))
+		return 0;
+
+	return event->profile_enable();
+}
+
 int ftrace_profile_enable(int event_id)
 {
 	struct ftrace_event_call *event;
@@ -17,7 +25,7 @@ int ftrace_profile_enable(int event_id)
 	list_for_each_entry(event, &ftrace_events, list) {
 		if (event->id == event_id && event->profile_enable &&
 		    try_module_get(event->mod)) {
-			ret = event->profile_enable(event);
+			ret = ftrace_profile_enable_event(event);
 			break;
 		}
 	}
@@ -26,6 +34,14 @@ int ftrace_profile_enable(int event_id)
 	return ret;
 }
 
+static void ftrace_profile_disable_event(struct ftrace_event_call *event)
+{
+	if (!atomic_add_negative(-1, &event->profile_count))
+		return;
+
+	event->profile_disable();
+}
+
 void ftrace_profile_disable(int event_id)
 {
 	struct ftrace_event_call *event;
@@ -33,7 +49,7 @@ void ftrace_profile_disable(int event_id)
 	mutex_lock(&event_mutex);
 	list_for_each_entry(event, &ftrace_events, list) {
 		if (event->id == event_id) {
-			event->profile_disable(event);
+			ftrace_profile_disable_event(event);
 			module_put(event->mod);
 			break;
 		}
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically
  2009-09-18  1:33 [PATCH 0/2] tracing: Tracing event profiling updates Frederic Weisbecker
  2009-09-18  1:33 ` [PATCH 1/2] tracing: Factorize the events profile accounting Frederic Weisbecker
@ 2009-09-18  1:33 ` Frederic Weisbecker
  2009-09-18  2:15   ` Steven Rostedt
  2009-09-18  4:24 ` [PATCH 0/2 v2] tracing: Tracing event profiling updates Frederic Weisbecker
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-18  1:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Peter Zijlstra,
	Li Zefan, Jason Baron, Masami Hiramatsu, Ingo Molnar

Currently the trace event profile buffer is allocated in the stack. But
this may be too much for the stack, as the events can have large
statically defined field size and can also grow with dynamic arrays.

Allocate two per cpu buffer for each profiled events. The first cpu
buffer is used to host every non-nmi context traces. It is protected
by disabling the interrupts while writing and committing the trace.

The second buffer is reserved for nmi. So that there is no race between
them and the first buffer.

The whole write/commit section is rcu protected because we release
these buffers while deactivating a trace event for profiling.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 include/linux/ftrace_event.h       |    5 ++
 include/trace/ftrace.h             |   83 +++++++++++++++++++----------
 kernel/trace/trace_event_profile.c |   50 +++++++++++++++++-
 kernel/trace/trace_syscalls.c      |  103 +++++++++++++++++++++++++++++-------
 4 files changed, 193 insertions(+), 48 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index bc103d7..497eb37 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -4,6 +4,7 @@
 #include <linux/ring_buffer.h>
 #include <linux/trace_seq.h>
 #include <linux/percpu.h>
+#include <linux/hardirq.h>
 
 struct trace_array;
 struct tracer;
@@ -132,8 +133,12 @@ struct ftrace_event_call {
 	atomic_t		profile_count;
 	int			(*profile_enable)(void);
 	void			(*profile_disable)(void);
+	char			*profile_buf;
+	char			*profile_buf_nmi;
 };
 
+#define FTRACE_MAX_PROFILE_SIZE	2048
+
 #define MAX_FILTER_PRED		32
 #define MAX_FILTER_STR_VAL	256	/* Should handle KSYM_SYMBOL_LEN */
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index a822087..80864cf 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -648,11 +648,12 @@ __attribute__((section("_ftrace_events"))) event_##call = {		\
  *	struct ftrace_raw_##call *entry;
  *	u64 __addr = 0, __count = 1;
  *	unsigned long irq_flags;
+ *	struct trace_entry *ent;
  *	int __entry_size;
  *	int __data_size;
+ *	int __cpu
  *	int pc;
  *
- *	local_save_flags(irq_flags);
  *	pc = preempt_count();
  *
  *	__data_size = ftrace_get_offsets_<call>(&__data_offsets, args);
@@ -663,25 +664,34 @@ __attribute__((section("_ftrace_events"))) event_##call = {		\
  *			     sizeof(u64));
  *	__entry_size -= sizeof(u32);
  *
- *	do {
- *		char raw_data[__entry_size]; <- allocate our sample in the stack
- *		struct trace_entry *ent;
+ *	// Protect the non nmi buffer
+ *	// This also protects the rcu read side
+ *	local_irq_save(irq_flags);
+ *	__cpu = smp_processor_id();
+ *
+ *	if (in_nmi())
+ *		raw_data = rcu_dereference(event_call->profile_buf_nmi);
+ *	else
+ *		raw_data = rcu_dereference(event_call->profile_buf);
+ *
+ *	if (!raw_data)
+ *		goto end;
  *
- *		zero dead bytes from alignment to avoid stack leak to userspace:
+ *	raw_data = per_cpu_ptr(raw_data, __cpu);
  *
- *		*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;
- *		entry = (struct ftrace_raw_<call> *)raw_data;
- *		ent = &entry->ent;
- *		tracing_generic_entry_update(ent, irq_flags, pc);
- *		ent->type = event_call->id;
+ *	//zero dead bytes from alignment to avoid stack leak to userspace:
+ *	*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;
+ *	entry = (struct ftrace_raw_<call> *)raw_data;
+ *	ent = &entry->ent;
+ *	tracing_generic_entry_update(ent, irq_flags, pc);
+ *	ent->type = event_call->id;
  *
- *		<tstruct> <- do some jobs with dynamic arrays
+ *	<tstruct> <- do some jobs with dynamic arrays
  *
- *		<assign>  <- affect our values
+ *	<assign>  <- affect our values
  *
- *		perf_tpcounter_event(event_call->id, __addr, __count, entry,
- *			     __entry_size);  <- submit them to perf counter
- *	} while (0);
+ *	perf_tpcounter_event(event_call->id, __addr, __count, entry,
+ *		     __entry_size);  <- submit them to perf counter
  *
  * }
  */
@@ -704,11 +714,13 @@ static void ftrace_profile_##call(proto)				\
 	struct ftrace_raw_##call *entry;				\
 	u64 __addr = 0, __count = 1;					\
 	unsigned long irq_flags;					\
+	struct trace_entry *ent;					\
 	int __entry_size;						\
 	int __data_size;						\
+	char *raw_data;							\
+	int __cpu;							\
 	int pc;								\
 									\
-	local_save_flags(irq_flags);					\
 	pc = preempt_count();						\
 									\
 	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
@@ -716,23 +728,38 @@ static void ftrace_profile_##call(proto)				\
 			     sizeof(u64));				\
 	__entry_size -= sizeof(u32);					\
 									\
-	do {								\
-		char raw_data[__entry_size];				\
-		struct trace_entry *ent;				\
+	if (WARN_ONCE(__entry_size > FTRACE_MAX_PROFILE_SIZE,		\
+		      "profile buffer not large enough"))		\
+		return;							\
+									\
+	local_irq_save(irq_flags);					\
+	__cpu = smp_processor_id();					\
 									\
-		*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;	\
-		entry = (struct ftrace_raw_##call *)raw_data;		\
-		ent = &entry->ent;					\
-		tracing_generic_entry_update(ent, irq_flags, pc);	\
-		ent->type = event_call->id;				\
+	if (in_nmi())							\
+		raw_data = rcu_dereference(event_call->profile_buf_nmi);\
+	else								\
+		raw_data = rcu_dereference(event_call->profile_buf);	\
 									\
-		tstruct							\
+	if (!raw_data)							\
+		goto end;						\
 									\
-		{ assign; }						\
+	raw_data = per_cpu_ptr(raw_data, __cpu);			\
 									\
-		perf_tpcounter_event(event_call->id, __addr, __count, entry,\
+	*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;		\
+	entry = (struct ftrace_raw_##call *)raw_data;			\
+	ent = &entry->ent;						\
+	tracing_generic_entry_update(ent, irq_flags, pc);		\
+	ent->type = event_call->id;					\
+									\
+	tstruct								\
+									\
+	{ assign; }							\
+									\
+	perf_tpcounter_event(event_call->id, __addr, __count, entry,	\
 			     __entry_size);				\
-	} while (0);							\
+									\
+end:									\
+	local_irq_restore(irq_flags);					\
 									\
 }
 
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index df4a74e..48d9aec 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -8,12 +8,43 @@
 #include <linux/module.h>
 #include "trace.h"
 
+/*
+ * We can't use a size but a type in alloc_percpu()
+ * So let's create a dummy type that matches the desired size
+ */
+typedef struct {char buf[FTRACE_MAX_PROFILE_SIZE];} profile_buf_t;
+
 static int ftrace_profile_enable_event(struct ftrace_event_call *event)
 {
+	char *buf;
+	int ret;
+
 	if (atomic_inc_return(&event->profile_count))
 		return 0;
 
-	return event->profile_enable();
+	buf = (char *)alloc_percpu(profile_buf_t);
+	if (!buf)
+		return -ENOMEM;
+
+	rcu_assign_pointer(event->profile_buf, buf);
+
+	buf = (char *)alloc_percpu(profile_buf_t);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto fail_nmi;
+	}
+
+	rcu_assign_pointer(event->profile_buf_nmi, buf);
+
+	ret = event->profile_enable();
+	if (!ret)
+		return 0;
+
+	kfree(event->profile_buf_nmi);
+fail_nmi:
+	kfree(event->profile_buf);
+
+	return ret;
 }
 
 int ftrace_profile_enable(int event_id)
@@ -36,10 +67,27 @@ int ftrace_profile_enable(int event_id)
 
 static void ftrace_profile_disable_event(struct ftrace_event_call *event)
 {
+	char *buf, *nmi_buf;
+
 	if (!atomic_add_negative(-1, &event->profile_count))
 		return;
 
 	event->profile_disable();
+
+	buf = event->profile_buf;
+	rcu_assign_pointer(event->profile_buf, NULL);
+
+	nmi_buf = event->profile_buf_nmi;
+	rcu_assign_pointer(event->profile_buf_nmi, NULL);
+
+	/*
+	 * Ensure every events in profiling have finished before
+	 * releasing the buffers
+	 */
+	synchronize_sched();
+
+	free_percpu(buf);
+	free_percpu(nmi_buf);
 }
 
 void ftrace_profile_disable(int event_id)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 8712ce3..a7ecc34 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -384,10 +384,13 @@ static int sys_prof_refcount_exit;
 
 static void prof_syscall_enter(struct pt_regs *regs, long id)
 {
-	struct syscall_trace_enter *rec;
 	struct syscall_metadata *sys_data;
+	struct ftrace_event_call *event;
+	struct syscall_trace_enter *rec;
+	char *raw_data;
 	int syscall_nr;
 	int size;
+	int cpu;
 
 	syscall_nr = syscall_get_nr(current, regs);
 	if (!test_bit(syscall_nr, enabled_prof_enter_syscalls))
@@ -397,25 +400,46 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
 	if (!sys_data)
 		return;
 
+	event = sys_data->enter_event;
+
 	/* get the size after alignment with the u32 buffer size field */
 	size = sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec);
 	size = ALIGN(size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	do {
-		char raw_data[size];
+	if (WARN_ONCE(size > FTRACE_MAX_PROFILE_SIZE,
+		      "profile buffer not large enough"))
+		return;
 
-		/* zero the dead bytes from align to not leak stack to user */
-		*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
+	/*
+	 * We are not in nmi. Also we can't be preempted by a sysenter event
+	 * from interrupt, so we can safely take this buffer without
+	 * masking interrupts. But we still need rcu_read_lock to protect
+	 * against concurrent buffer release.
+	 */
+	rcu_read_lock();
+
+	cpu = smp_processor_id();
+	raw_data = rcu_dereference(event->profile_buf);
+
+	if (!raw_data)
+		goto end;
 
-		rec = (struct syscall_trace_enter *) raw_data;
-		tracing_generic_entry_update(&rec->ent, 0, 0);
-		rec->ent.type = sys_data->enter_id;
-		rec->nr = syscall_nr;
-		syscall_get_arguments(current, regs, 0, sys_data->nb_args,
-				       (unsigned long *)&rec->args);
-		perf_tpcounter_event(sys_data->enter_id, 0, 1, rec, size);
-	} while(0);
+	raw_data = per_cpu_ptr(raw_data, cpu);
+
+	/* zero the dead bytes from align to not leak stack to user */
+	*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
+
+	rec = (struct syscall_trace_enter *) raw_data;
+	tracing_generic_entry_update(&rec->ent, 0, 0);
+	rec->ent.type = sys_data->enter_id;
+	rec->nr = syscall_nr;
+	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
+			       (unsigned long *)&rec->args);
+	perf_tpcounter_event(sys_data->enter_id, 0, 1, rec, size);
+
+end:
+	rcu_read_unlock();
 }
 
 int reg_prof_syscall_enter(char *name)
@@ -460,8 +484,12 @@ void unreg_prof_syscall_enter(char *name)
 static void prof_syscall_exit(struct pt_regs *regs, long ret)
 {
 	struct syscall_metadata *sys_data;
-	struct syscall_trace_exit rec;
+	struct ftrace_event_call *event;
+	struct syscall_trace_exit *rec;
 	int syscall_nr;
+	char *raw_data;
+	int size;
+	int cpu;
 
 	syscall_nr = syscall_get_nr(current, regs);
 	if (!test_bit(syscall_nr, enabled_prof_exit_syscalls))
@@ -471,12 +499,49 @@ static void prof_syscall_exit(struct pt_regs *regs, long ret)
 	if (!sys_data)
 		return;
 
-	tracing_generic_entry_update(&rec.ent, 0, 0);
-	rec.ent.type = sys_data->exit_id;
-	rec.nr = syscall_nr;
-	rec.ret = syscall_get_return_value(current, regs);
+	event = sys_data->exit_event;
+
+	/* We can probably do that at build time */
+	size = ALIGN(sizeof(*rec) + sizeof(u32), sizeof(u64));
+	size -= sizeof(u32);
+
+	/*
+	 * Impossible, but be paranoid with the future
+	 * How to put this check outside runtime?
+	 */
+	if (WARN_ONCE(size > FTRACE_MAX_PROFILE_SIZE,
+		"exit event has grown above profile buffer size"))
+		return;
+
+	/*
+	 * This buffer is safe against interrupt.
+	 * But we still need rcu_read_lock to protect
+	 * against concurrent buffer release.
+	 */
+	rcu_read_lock();
+
+	cpu = smp_processor_id();
+	raw_data = rcu_dereference(event->profile_buf);
+
+	if (!raw_data)
+		goto end;
 
-	perf_tpcounter_event(sys_data->exit_id, 0, 1, &rec, sizeof(rec));
+	raw_data = per_cpu_ptr(raw_data, cpu);
+
+	/* zero the dead bytes from align to not leak stack to user */
+	*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
+
+	rec = (struct syscall_trace_exit *)raw_data;
+
+	tracing_generic_entry_update(&rec->ent, 0, 0);
+	rec->ent.type = sys_data->exit_id;
+	rec->nr = syscall_nr;
+	rec->ret = syscall_get_return_value(current, regs);
+
+	perf_tpcounter_event(sys_data->exit_id, 0, 1, rec, size);
+
+end:
+	rcu_read_unlock();
 }
 
 int reg_prof_syscall_exit(char *name)
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically
  2009-09-18  1:33 ` [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically Frederic Weisbecker
@ 2009-09-18  2:15   ` Steven Rostedt
  2009-09-18  2:23     ` Frederic Weisbecker
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2009-09-18  2:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Li Zefan, Jason Baron,
	Masami Hiramatsu

On Fri, 2009-09-18 at 03:33 +0200, Frederic Weisbecker wrote:
> Currently the trace event profile buffer is allocated in the stack. But
> this may be too much for the stack, as the events can have large
> statically defined field size and can also grow with dynamic arrays.
> 
> Allocate two per cpu buffer for each profiled events. The first cpu
> buffer is used to host every non-nmi context traces. It is protected
> by disabling the interrupts while writing and committing the trace.
> 
> The second buffer is reserved for nmi. So that there is no race between
> them and the first buffer.
> 
> The whole write/commit section is rcu protected because we release
> these buffers while deactivating a trace event for profiling.
> 
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Li Zefan <lizf@cn.fujitsu.com>
> Cc: Jason Baron <jbaron@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@redhat.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> ---
>  include/linux/ftrace_event.h       |    5 ++
>  include/trace/ftrace.h             |   83 +++++++++++++++++++----------
>  kernel/trace/trace_event_profile.c |   50 +++++++++++++++++-
>  kernel/trace/trace_syscalls.c      |  103 +++++++++++++++++++++++++++++-------
>  4 files changed, 193 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index bc103d7..497eb37 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -4,6 +4,7 @@
>  #include <linux/ring_buffer.h>
>  #include <linux/trace_seq.h>
>  #include <linux/percpu.h>
> +#include <linux/hardirq.h>
>  
>  struct trace_array;
>  struct tracer;
> @@ -132,8 +133,12 @@ struct ftrace_event_call {
>  	atomic_t		profile_count;
>  	int			(*profile_enable)(void);
>  	void			(*profile_disable)(void);
> +	char			*profile_buf;
> +	char			*profile_buf_nmi;
>  };
>  
> +#define FTRACE_MAX_PROFILE_SIZE	2048
> +
>  #define MAX_FILTER_PRED		32
>  #define MAX_FILTER_STR_VAL	256	/* Should handle KSYM_SYMBOL_LEN */
>  
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index a822087..80864cf 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -648,11 +648,12 @@ __attribute__((section("_ftrace_events"))) event_##call = {		\
>   *	struct ftrace_raw_##call *entry;
>   *	u64 __addr = 0, __count = 1;
>   *	unsigned long irq_flags;
> + *	struct trace_entry *ent;
>   *	int __entry_size;
>   *	int __data_size;
> + *	int __cpu
>   *	int pc;
>   *
> - *	local_save_flags(irq_flags);
>   *	pc = preempt_count();
>   *
>   *	__data_size = ftrace_get_offsets_<call>(&__data_offsets, args);
> @@ -663,25 +664,34 @@ __attribute__((section("_ftrace_events"))) event_##call = {		\
>   *			     sizeof(u64));
>   *	__entry_size -= sizeof(u32);
>   *
> - *	do {
> - *		char raw_data[__entry_size]; <- allocate our sample in the stack
> - *		struct trace_entry *ent;
> + *	// Protect the non nmi buffer
> + *	// This also protects the rcu read side
> + *	local_irq_save(irq_flags);
> + *	__cpu = smp_processor_id();
> + *
> + *	if (in_nmi())
> + *		raw_data = rcu_dereference(event_call->profile_buf_nmi);
> + *	else
> + *		raw_data = rcu_dereference(event_call->profile_buf);
> + *
> + *	if (!raw_data)
> + *		goto end;
>   *
> - *		zero dead bytes from alignment to avoid stack leak to userspace:
> + *	raw_data = per_cpu_ptr(raw_data, __cpu);
>   *
> - *		*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;
> - *		entry = (struct ftrace_raw_<call> *)raw_data;
> - *		ent = &entry->ent;
> - *		tracing_generic_entry_update(ent, irq_flags, pc);
> - *		ent->type = event_call->id;
> + *	//zero dead bytes from alignment to avoid stack leak to userspace:
> + *	*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;
> + *	entry = (struct ftrace_raw_<call> *)raw_data;
> + *	ent = &entry->ent;
> + *	tracing_generic_entry_update(ent, irq_flags, pc);
> + *	ent->type = event_call->id;
>   *
> - *		<tstruct> <- do some jobs with dynamic arrays
> + *	<tstruct> <- do some jobs with dynamic arrays
>   *
> - *		<assign>  <- affect our values
> + *	<assign>  <- affect our values
>   *
> - *		perf_tpcounter_event(event_call->id, __addr, __count, entry,
> - *			     __entry_size);  <- submit them to perf counter
> - *	} while (0);
> + *	perf_tpcounter_event(event_call->id, __addr, __count, entry,
> + *		     __entry_size);  <- submit them to perf counter
>   *
>   * }
>   */
> @@ -704,11 +714,13 @@ static void ftrace_profile_##call(proto)				\
>  	struct ftrace_raw_##call *entry;				\
>  	u64 __addr = 0, __count = 1;					\
>  	unsigned long irq_flags;					\
> +	struct trace_entry *ent;					\
>  	int __entry_size;						\
>  	int __data_size;						\
> +	char *raw_data;							\
> +	int __cpu;							\
>  	int pc;								\
>  									\
> -	local_save_flags(irq_flags);					\
>  	pc = preempt_count();						\
>  									\
>  	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
> @@ -716,23 +728,38 @@ static void ftrace_profile_##call(proto)				\
>  			     sizeof(u64));				\
>  	__entry_size -= sizeof(u32);					\
>  									\
> -	do {								\
> -		char raw_data[__entry_size];				\
> -		struct trace_entry *ent;				\
> +	if (WARN_ONCE(__entry_size > FTRACE_MAX_PROFILE_SIZE,		\
> +		      "profile buffer not large enough"))		\
> +		return;							\
> +									\
> +	local_irq_save(irq_flags);					\
> +	__cpu = smp_processor_id();					\
>  									\
> -		*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;	\
> -		entry = (struct ftrace_raw_##call *)raw_data;		\
> -		ent = &entry->ent;					\
> -		tracing_generic_entry_update(ent, irq_flags, pc);	\
> -		ent->type = event_call->id;				\
> +	if (in_nmi())							\
> +		raw_data = rcu_dereference(event_call->profile_buf_nmi);\
> +	else								\
> +		raw_data = rcu_dereference(event_call->profile_buf);	\
>  									\
> -		tstruct							\
> +	if (!raw_data)							\
> +		goto end;						\
>  									\
> -		{ assign; }						\
> +	raw_data = per_cpu_ptr(raw_data, __cpu);			\
>  									\
> -		perf_tpcounter_event(event_call->id, __addr, __count, entry,\
> +	*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;		\
> +	entry = (struct ftrace_raw_##call *)raw_data;			\
> +	ent = &entry->ent;						\
> +	tracing_generic_entry_update(ent, irq_flags, pc);		\
> +	ent->type = event_call->id;					\
> +									\
> +	tstruct								\
> +									\
> +	{ assign; }							\
> +									\
> +	perf_tpcounter_event(event_call->id, __addr, __count, entry,	\
>  			     __entry_size);				\
> -	} while (0);							\
> +									\
> +end:									\
> +	local_irq_restore(irq_flags);					\
>  									\
>  }
>  
> diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
> index df4a74e..48d9aec 100644
> --- a/kernel/trace/trace_event_profile.c
> +++ b/kernel/trace/trace_event_profile.c
> @@ -8,12 +8,43 @@
>  #include <linux/module.h>
>  #include "trace.h"
>  
> +/*
> + * We can't use a size but a type in alloc_percpu()
> + * So let's create a dummy type that matches the desired size
> + */
> +typedef struct {char buf[FTRACE_MAX_PROFILE_SIZE];} profile_buf_t;
> +
>  static int ftrace_profile_enable_event(struct ftrace_event_call *event)
>  {
> +	char *buf;
> +	int ret;
> +
>  	if (atomic_inc_return(&event->profile_count))
>  		return 0;
>  
> -	return event->profile_enable();
> +	buf = (char *)alloc_percpu(profile_buf_t);

Is this allocating a buffer for every event? Why not just have one? All
events can share the same buffer. It's only a temp buffer, and can only
be used by one event at a time per cpu.

-- Steve

> +	if (!buf)
> +		return -ENOMEM;
> +
> +	rcu_assign_pointer(event->profile_buf, buf);
> +
> +	buf = (char *)alloc_percpu(profile_buf_t);
> +	if (!buf) {
> +		ret = -ENOMEM;
> +		goto fail_nmi;
> +	}
> +
> +	rcu_assign_pointer(event->profile_buf_nmi, buf);
> +
> +	ret = event->profile_enable();
> +	if (!ret)
> +		return 0;
> +
> +	kfree(event->profile_buf_nmi);
> +fail_nmi:
> +	kfree(event->profile_buf);
> +
> +	return ret;
>  }
>  
>  int ftrace_profile_enable(int event_id)
> @@ -36,10 +67,27 @@ int ftrace_profile_enable(int event_id)
>  
>  static void ftrace_profile_disable_event(struct ftrace_event_call *event)
>  {
> +	char *buf, *nmi_buf;
> +
>  	if (!atomic_add_negative(-1, &event->profile_count))
>  		return;
>  
>  	event->profile_disable();
> +
> +	buf = event->profile_buf;
> +	rcu_assign_pointer(event->profile_buf, NULL);
> +
> +	nmi_buf = event->profile_buf_nmi;
> +	rcu_assign_pointer(event->profile_buf_nmi, NULL);
> +
> +	/*
> +	 * Ensure every events in profiling have finished before
> +	 * releasing the buffers
> +	 */
> +	synchronize_sched();
> +
> +	free_percpu(buf);
> +	free_percpu(nmi_buf);
>  }
>  
>  void ftrace_profile_disable(int event_id)
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 8712ce3..a7ecc34 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -384,10 +384,13 @@ static int sys_prof_refcount_exit;
>  
>  static void prof_syscall_enter(struct pt_regs *regs, long id)
>  {
> -	struct syscall_trace_enter *rec;
>  	struct syscall_metadata *sys_data;
> +	struct ftrace_event_call *event;
> +	struct syscall_trace_enter *rec;
> +	char *raw_data;
>  	int syscall_nr;
>  	int size;
> +	int cpu;
>  
>  	syscall_nr = syscall_get_nr(current, regs);
>  	if (!test_bit(syscall_nr, enabled_prof_enter_syscalls))
> @@ -397,25 +400,46 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
>  	if (!sys_data)
>  		return;
>  
> +	event = sys_data->enter_event;
> +
>  	/* get the size after alignment with the u32 buffer size field */
>  	size = sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec);
>  	size = ALIGN(size + sizeof(u32), sizeof(u64));
>  	size -= sizeof(u32);
>  
> -	do {
> -		char raw_data[size];
> +	if (WARN_ONCE(size > FTRACE_MAX_PROFILE_SIZE,
> +		      "profile buffer not large enough"))
> +		return;
>  
> -		/* zero the dead bytes from align to not leak stack to user */
> -		*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
> +	/*
> +	 * We are not in nmi. Also we can't be preempted by a sysenter event
> +	 * from interrupt, so we can safely take this buffer without
> +	 * masking interrupts. But we still need rcu_read_lock to protect
> +	 * against concurrent buffer release.
> +	 */
> +	rcu_read_lock();
> +
> +	cpu = smp_processor_id();
> +	raw_data = rcu_dereference(event->profile_buf);
> +
> +	if (!raw_data)
> +		goto end;
>  
> -		rec = (struct syscall_trace_enter *) raw_data;
> -		tracing_generic_entry_update(&rec->ent, 0, 0);
> -		rec->ent.type = sys_data->enter_id;
> -		rec->nr = syscall_nr;
> -		syscall_get_arguments(current, regs, 0, sys_data->nb_args,
> -				       (unsigned long *)&rec->args);
> -		perf_tpcounter_event(sys_data->enter_id, 0, 1, rec, size);
> -	} while(0);
> +	raw_data = per_cpu_ptr(raw_data, cpu);
> +
> +	/* zero the dead bytes from align to not leak stack to user */
> +	*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
> +
> +	rec = (struct syscall_trace_enter *) raw_data;
> +	tracing_generic_entry_update(&rec->ent, 0, 0);
> +	rec->ent.type = sys_data->enter_id;
> +	rec->nr = syscall_nr;
> +	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
> +			       (unsigned long *)&rec->args);
> +	perf_tpcounter_event(sys_data->enter_id, 0, 1, rec, size);
> +
> +end:
> +	rcu_read_unlock();
>  }
>  
>  int reg_prof_syscall_enter(char *name)
> @@ -460,8 +484,12 @@ void unreg_prof_syscall_enter(char *name)
>  static void prof_syscall_exit(struct pt_regs *regs, long ret)
>  {
>  	struct syscall_metadata *sys_data;
> -	struct syscall_trace_exit rec;
> +	struct ftrace_event_call *event;
> +	struct syscall_trace_exit *rec;
>  	int syscall_nr;
> +	char *raw_data;
> +	int size;
> +	int cpu;
>  
>  	syscall_nr = syscall_get_nr(current, regs);
>  	if (!test_bit(syscall_nr, enabled_prof_exit_syscalls))
> @@ -471,12 +499,49 @@ static void prof_syscall_exit(struct pt_regs *regs, long ret)
>  	if (!sys_data)
>  		return;
>  
> -	tracing_generic_entry_update(&rec.ent, 0, 0);
> -	rec.ent.type = sys_data->exit_id;
> -	rec.nr = syscall_nr;
> -	rec.ret = syscall_get_return_value(current, regs);
> +	event = sys_data->exit_event;
> +
> +	/* We can probably do that at build time */
> +	size = ALIGN(sizeof(*rec) + sizeof(u32), sizeof(u64));
> +	size -= sizeof(u32);
> +
> +	/*
> +	 * Impossible, but be paranoid with the future
> +	 * How to put this check outside runtime?
> +	 */
> +	if (WARN_ONCE(size > FTRACE_MAX_PROFILE_SIZE,
> +		"exit event has grown above profile buffer size"))
> +		return;
> +
> +	/*
> +	 * This buffer is safe against interrupt.
> +	 * But we still need rcu_read_lock to protect
> +	 * against concurrent buffer release.
> +	 */
> +	rcu_read_lock();
> +
> +	cpu = smp_processor_id();
> +	raw_data = rcu_dereference(event->profile_buf);
> +
> +	if (!raw_data)
> +		goto end;
>  
> -	perf_tpcounter_event(sys_data->exit_id, 0, 1, &rec, sizeof(rec));
> +	raw_data = per_cpu_ptr(raw_data, cpu);
> +
> +	/* zero the dead bytes from align to not leak stack to user */
> +	*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
> +
> +	rec = (struct syscall_trace_exit *)raw_data;
> +
> +	tracing_generic_entry_update(&rec->ent, 0, 0);
> +	rec->ent.type = sys_data->exit_id;
> +	rec->nr = syscall_nr;
> +	rec->ret = syscall_get_return_value(current, regs);
> +
> +	perf_tpcounter_event(sys_data->exit_id, 0, 1, rec, size);
> +
> +end:
> +	rcu_read_unlock();
>  }
>  
>  int reg_prof_syscall_exit(char *name)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] tracing: Factorize the events profile accounting
  2009-09-18  1:33 ` [PATCH 1/2] tracing: Factorize the events profile accounting Frederic Weisbecker
@ 2009-09-18  2:15   ` Steven Rostedt
  2009-09-18  3:45   ` Li Zefan
  1 sibling, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2009-09-18  2:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Li Zefan, Jason Baron,
	Masami Hiramatsu

On Fri, 2009-09-18 at 03:33 +0200, Frederic Weisbecker wrote:
> Factorize the events enabling accounting in a common tracing core
> helper. This reduces the size of the profile_enable() and
> profile_disable() callbacks for each trace events.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Li Zefan <lizf@cn.fujitsu.com>
> Cc: Jason Baron <jbaron@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@redhat.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> ---
>  include/linux/ftrace_event.h       |    4 ++--
>  include/linux/syscalls.h           |   24 ++++++++----------------
>  include/trace/ftrace.h             |   28 ++++++++--------------------
>  kernel/trace/trace_event_profile.c |   20 ++++++++++++++++++--
>  4 files changed, 36 insertions(+), 40 deletions(-)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically
  2009-09-18  2:15   ` Steven Rostedt
@ 2009-09-18  2:23     ` Frederic Weisbecker
  0 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-18  2:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Li Zefan, Jason Baron,
	Masami Hiramatsu

On Thu, Sep 17, 2009 at 10:15:03PM -0400, Steven Rostedt wrote:
> 
> Is this allocating a buffer for every event? Why not just have one? All
> events can share the same buffer. It's only a temp buffer, and can only
> be used by one event at a time per cpu.
> 
> -- Steve
> 


Oh you're right. That was so obvious though...

The v2 is coming soon.

Thanks.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] tracing: Factorize the events profile accounting
  2009-09-18  1:33 ` [PATCH 1/2] tracing: Factorize the events profile accounting Frederic Weisbecker
  2009-09-18  2:15   ` Steven Rostedt
@ 2009-09-18  3:45   ` Li Zefan
  1 sibling, 0 replies; 15+ messages in thread
From: Li Zefan @ 2009-09-18  3:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Steven Rostedt, Peter Zijlstra, Jason Baron,
	Masami Hiramatsu

Frederic Weisbecker wrote:
> Factorize the events enabling accounting in a common tracing core
> helper. This reduces the size of the profile_enable() and
> profile_disable() callbacks for each trace events.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 0/2 v2] tracing: Tracing event profiling updates
  2009-09-18  1:33 [PATCH 0/2] tracing: Tracing event profiling updates Frederic Weisbecker
  2009-09-18  1:33 ` [PATCH 1/2] tracing: Factorize the events profile accounting Frederic Weisbecker
  2009-09-18  1:33 ` [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically Frederic Weisbecker
@ 2009-09-18  4:24 ` Frederic Weisbecker
  2009-09-18  5:36   ` [PATCH 0/2 v3] " Frederic Weisbecker
  2009-09-18  5:36   ` [PATCH 2/2 v3] tracing: Allocate the ftrace event profile buffer dynamically Frederic Weisbecker
  2009-09-18  4:24 ` [PATCH 1/2] tracing: Factorize the events profile accounting Frederic Weisbecker
  2009-09-18  4:24 ` [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically Frederic Weisbecker
  4 siblings, 2 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-18  4:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, Frederic Weisbecker

Ingo,

Please forget the previous pull request. This new one addresses
Steve's review.

Thanks,
	Frederic.

  git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	tracing/core-v2

Frederic Weisbecker (2):
      tracing: Factorize the events profile accounting
      tracing: Allocate the ftrace event profile buffer dynamically

 include/linux/ftrace_event.h       |   10 +++-
 include/linux/syscalls.h           |   24 +++-----
 include/trace/ftrace.h             |  111 ++++++++++++++++++++---------------
 kernel/trace/trace_event_profile.c |   79 +++++++++++++++++++++++++-
 kernel/trace/trace_syscalls.c      |   97 +++++++++++++++++++++++++------
 5 files changed, 234 insertions(+), 87 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] tracing: Factorize the events profile accounting
  2009-09-18  1:33 [PATCH 0/2] tracing: Tracing event profiling updates Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2009-09-18  4:24 ` [PATCH 0/2 v2] tracing: Tracing event profiling updates Frederic Weisbecker
@ 2009-09-18  4:24 ` Frederic Weisbecker
  2009-09-18  4:24 ` [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically Frederic Weisbecker
  4 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-18  4:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Jason Baron,
	Masami Hiramatsu, Ingo Molnar

Factorize the events enabling accounting in a common tracing core
helper. This reduces the size of the profile_enable() and
profile_disable() callbacks for each trace events.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 include/linux/ftrace_event.h       |    4 ++--
 include/linux/syscalls.h           |   24 ++++++++----------------
 include/trace/ftrace.h             |   28 ++++++++--------------------
 kernel/trace/trace_event_profile.c |   20 ++++++++++++++++++--
 4 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index bd099ba..bc103d7 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -130,8 +130,8 @@ struct ftrace_event_call {
 	void			*data;
 
 	atomic_t		profile_count;
-	int			(*profile_enable)(struct ftrace_event_call *);
-	void			(*profile_disable)(struct ftrace_event_call *);
+	int			(*profile_enable)(void);
+	void			(*profile_disable)(void);
 };
 
 #define MAX_FILTER_PRED		32
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a8e3782..7d9803c 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -100,33 +100,25 @@ struct perf_counter_attr;
 
 #ifdef CONFIG_EVENT_PROFILE
 #define TRACE_SYS_ENTER_PROFILE(sname)					       \
-static int prof_sysenter_enable_##sname(struct ftrace_event_call *event_call)  \
+static int prof_sysenter_enable_##sname(void)				       \
 {									       \
-	int ret = 0;							       \
-	if (!atomic_inc_return(&event_enter_##sname.profile_count))	       \
-		ret = reg_prof_syscall_enter("sys"#sname);		       \
-	return ret;							       \
+	return reg_prof_syscall_enter("sys"#sname);			       \
 }									       \
 									       \
-static void prof_sysenter_disable_##sname(struct ftrace_event_call *event_call)\
+static void prof_sysenter_disable_##sname(void)				       \
 {									       \
-	if (atomic_add_negative(-1, &event_enter_##sname.profile_count))       \
-		unreg_prof_syscall_enter("sys"#sname);			       \
+	unreg_prof_syscall_enter("sys"#sname);				       \
 }
 
 #define TRACE_SYS_EXIT_PROFILE(sname)					       \
-static int prof_sysexit_enable_##sname(struct ftrace_event_call *event_call)   \
+static int prof_sysexit_enable_##sname(void)				       \
 {									       \
-	int ret = 0;							       \
-	if (!atomic_inc_return(&event_exit_##sname.profile_count))	       \
-		ret = reg_prof_syscall_exit("sys"#sname);		       \
-	return ret;							       \
+	return reg_prof_syscall_exit("sys"#sname);			       \
 }									       \
 									       \
-static void prof_sysexit_disable_##sname(struct ftrace_event_call *event_call) \
+static void prof_sysexit_disable_##sname(void)				       \
 {                                                                              \
-	if (atomic_add_negative(-1, &event_exit_##sname.profile_count))	       \
-		unreg_prof_syscall_exit("sys"#sname);			       \
+	unreg_prof_syscall_exit("sys"#sname);				       \
 }
 
 #define TRACE_SYS_ENTER_PROFILE_INIT(sname)				       \
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 72a3b43..a822087 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -382,20 +382,14 @@ static inline int ftrace_get_offsets_##call(				\
  *
  * NOTE: The insertion profile callback (ftrace_profile_<call>) is defined later
  *
- * static int ftrace_profile_enable_<call>(struct ftrace_event_call *event_call)
+ * static int ftrace_profile_enable_<call>(void)
  * {
- * 	int ret = 0;
- *
- * 	if (!atomic_inc_return(&event_call->profile_count))
- * 		ret = register_trace_<call>(ftrace_profile_<call>);
- *
- * 	return ret;
+ * 	return register_trace_<call>(ftrace_profile_<call>);
  * }
  *
- * static void ftrace_profile_disable_<call>(struct ftrace_event_call *event_call)
+ * static void ftrace_profile_disable_<call>(void)
  * {
- * 	if (atomic_add_negative(-1, &event->call->profile_count))
- * 		unregister_trace_<call>(ftrace_profile_<call>);
+ * 	unregister_trace_<call>(ftrace_profile_<call>);
  * }
  *
  */
@@ -405,20 +399,14 @@ static inline int ftrace_get_offsets_##call(				\
 									\
 static void ftrace_profile_##call(proto);				\
 									\
-static int ftrace_profile_enable_##call(struct ftrace_event_call *event_call) \
+static int ftrace_profile_enable_##call(void)				\
 {									\
-	int ret = 0;							\
-									\
-	if (!atomic_inc_return(&event_call->profile_count))		\
-		ret = register_trace_##call(ftrace_profile_##call);	\
-									\
-	return ret;							\
+	return register_trace_##call(ftrace_profile_##call);		\
 }									\
 									\
-static void ftrace_profile_disable_##call(struct ftrace_event_call *event_call)\
+static void ftrace_profile_disable_##call(void)				\
 {									\
-	if (atomic_add_negative(-1, &event_call->profile_count))	\
-		unregister_trace_##call(ftrace_profile_##call);		\
+	unregister_trace_##call(ftrace_profile_##call);			\
 }
 
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index 55a25c9..df4a74e 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -8,6 +8,14 @@
 #include <linux/module.h>
 #include "trace.h"
 
+static int ftrace_profile_enable_event(struct ftrace_event_call *event)
+{
+	if (atomic_inc_return(&event->profile_count))
+		return 0;
+
+	return event->profile_enable();
+}
+
 int ftrace_profile_enable(int event_id)
 {
 	struct ftrace_event_call *event;
@@ -17,7 +25,7 @@ int ftrace_profile_enable(int event_id)
 	list_for_each_entry(event, &ftrace_events, list) {
 		if (event->id == event_id && event->profile_enable &&
 		    try_module_get(event->mod)) {
-			ret = event->profile_enable(event);
+			ret = ftrace_profile_enable_event(event);
 			break;
 		}
 	}
@@ -26,6 +34,14 @@ int ftrace_profile_enable(int event_id)
 	return ret;
 }
 
+static void ftrace_profile_disable_event(struct ftrace_event_call *event)
+{
+	if (!atomic_add_negative(-1, &event->profile_count))
+		return;
+
+	event->profile_disable();
+}
+
 void ftrace_profile_disable(int event_id)
 {
 	struct ftrace_event_call *event;
@@ -33,7 +49,7 @@ void ftrace_profile_disable(int event_id)
 	mutex_lock(&event_mutex);
 	list_for_each_entry(event, &ftrace_events, list) {
 		if (event->id == event_id) {
-			event->profile_disable(event);
+			ftrace_profile_disable_event(event);
 			module_put(event->mod);
 			break;
 		}
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically
  2009-09-18  1:33 [PATCH 0/2] tracing: Tracing event profiling updates Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2009-09-18  4:24 ` [PATCH 1/2] tracing: Factorize the events profile accounting Frederic Weisbecker
@ 2009-09-18  4:24 ` Frederic Weisbecker
  2009-09-18  4:31   ` Frederic Weisbecker
  4 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-18  4:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Peter Zijlstra,
	Li Zefan, Jason Baron, Masami Hiramatsu, Ingo Molnar

Currently the trace event profile buffer is allocated in the stack. But
this may be too much for the stack, as the events can have large
statically defined field size and can also grow with dynamic arrays.

Allocate two per cpu buffer for all profiled events. The first cpu
buffer is used to host every non-nmi context traces. It is protected
by disabling the interrupts while writing and committing the trace.

The second buffer is reserved for nmi. So that there is no race between
them and the first buffer.

The whole write/commit section is rcu protected because we release
these buffers while deactivating the last profiling trace event.

v2: Move the buffers from trace_event to be global, as pointed by
    Steven Rostedt.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 include/linux/ftrace_event.h       |    6 ++
 include/trace/ftrace.h             |   83 ++++++++++++++++++++----------
 kernel/trace/trace_event_profile.c |   61 ++++++++++++++++++++++-
 kernel/trace/trace_syscalls.c      |   97 +++++++++++++++++++++++++++++-------
 4 files changed, 199 insertions(+), 48 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index bc103d7..4ec5e67 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -4,6 +4,7 @@
 #include <linux/ring_buffer.h>
 #include <linux/trace_seq.h>
 #include <linux/percpu.h>
+#include <linux/hardirq.h>
 
 struct trace_array;
 struct tracer;
@@ -134,6 +135,11 @@ struct ftrace_event_call {
 	void			(*profile_disable)(void);
 };
 
+#define FTRACE_MAX_PROFILE_SIZE	2048
+
+extern char			*trace_profile_buf;
+extern char			*trace_profile_buf_nmi;
+
 #define MAX_FILTER_PRED		32
 #define MAX_FILTER_STR_VAL	256	/* Should handle KSYM_SYMBOL_LEN */
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index a822087..a0361cb 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -648,11 +648,12 @@ __attribute__((section("_ftrace_events"))) event_##call = {		\
  *	struct ftrace_raw_##call *entry;
  *	u64 __addr = 0, __count = 1;
  *	unsigned long irq_flags;
+ *	struct trace_entry *ent;
  *	int __entry_size;
  *	int __data_size;
+ *	int __cpu
  *	int pc;
  *
- *	local_save_flags(irq_flags);
  *	pc = preempt_count();
  *
  *	__data_size = ftrace_get_offsets_<call>(&__data_offsets, args);
@@ -663,25 +664,34 @@ __attribute__((section("_ftrace_events"))) event_##call = {		\
  *			     sizeof(u64));
  *	__entry_size -= sizeof(u32);
  *
- *	do {
- *		char raw_data[__entry_size]; <- allocate our sample in the stack
- *		struct trace_entry *ent;
+ *	// Protect the non nmi buffer
+ *	// This also protects the rcu read side
+ *	local_irq_save(irq_flags);
+ *	__cpu = smp_processor_id();
+ *
+ *	if (in_nmi())
+ *		raw_data = rcu_dereference(trace_profile_buf_nmi);
+ *	else
+ *		raw_data = rcu_dereference(trace_profile_buf);
+ *
+ *	if (!raw_data)
+ *		goto end;
  *
- *		zero dead bytes from alignment to avoid stack leak to userspace:
+ *	raw_data = per_cpu_ptr(raw_data, __cpu);
  *
- *		*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;
- *		entry = (struct ftrace_raw_<call> *)raw_data;
- *		ent = &entry->ent;
- *		tracing_generic_entry_update(ent, irq_flags, pc);
- *		ent->type = event_call->id;
+ *	//zero dead bytes from alignment to avoid stack leak to userspace:
+ *	*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;
+ *	entry = (struct ftrace_raw_<call> *)raw_data;
+ *	ent = &entry->ent;
+ *	tracing_generic_entry_update(ent, irq_flags, pc);
+ *	ent->type = event_call->id;
  *
- *		<tstruct> <- do some jobs with dynamic arrays
+ *	<tstruct> <- do some jobs with dynamic arrays
  *
- *		<assign>  <- affect our values
+ *	<assign>  <- affect our values
  *
- *		perf_tpcounter_event(event_call->id, __addr, __count, entry,
- *			     __entry_size);  <- submit them to perf counter
- *	} while (0);
+ *	perf_tpcounter_event(event_call->id, __addr, __count, entry,
+ *		     __entry_size);  <- submit them to perf counter
  *
  * }
  */
@@ -704,11 +714,13 @@ static void ftrace_profile_##call(proto)				\
 	struct ftrace_raw_##call *entry;				\
 	u64 __addr = 0, __count = 1;					\
 	unsigned long irq_flags;					\
+	struct trace_entry *ent;					\
 	int __entry_size;						\
 	int __data_size;						\
+	char *raw_data;							\
+	int __cpu;							\
 	int pc;								\
 									\
-	local_save_flags(irq_flags);					\
 	pc = preempt_count();						\
 									\
 	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
@@ -716,23 +728,38 @@ static void ftrace_profile_##call(proto)				\
 			     sizeof(u64));				\
 	__entry_size -= sizeof(u32);					\
 									\
-	do {								\
-		char raw_data[__entry_size];				\
-		struct trace_entry *ent;				\
+	if (WARN_ONCE(__entry_size > FTRACE_MAX_PROFILE_SIZE,		\
+		      "profile buffer not large enough"))		\
+		return;							\
+									\
+	local_irq_save(irq_flags);					\
+	__cpu = smp_processor_id();					\
 									\
-		*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;	\
-		entry = (struct ftrace_raw_##call *)raw_data;		\
-		ent = &entry->ent;					\
-		tracing_generic_entry_update(ent, irq_flags, pc);	\
-		ent->type = event_call->id;				\
+	if (in_nmi())							\
+		raw_data = rcu_dereference(trace_profile_buf_nmi);		\
+	else								\
+		raw_data = rcu_dereference(trace_profile_buf);		\
 									\
-		tstruct							\
+	if (!raw_data)							\
+		goto end;						\
 									\
-		{ assign; }						\
+	raw_data = per_cpu_ptr(raw_data, __cpu);			\
 									\
-		perf_tpcounter_event(event_call->id, __addr, __count, entry,\
+	*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;		\
+	entry = (struct ftrace_raw_##call *)raw_data;			\
+	ent = &entry->ent;						\
+	tracing_generic_entry_update(ent, irq_flags, pc);		\
+	ent->type = event_call->id;					\
+									\
+	tstruct								\
+									\
+	{ assign; }							\
+									\
+	perf_tpcounter_event(event_call->id, __addr, __count, entry,	\
 			     __entry_size);				\
-	} while (0);							\
+									\
+end:									\
+	local_irq_restore(irq_flags);					\
 									\
 }
 
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index df4a74e..3aaa77c 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -8,12 +8,52 @@
 #include <linux/module.h>
 #include "trace.h"
 
+/*
+ * We can't use a size but a type in alloc_percpu()
+ * So let's create a dummy type that matches the desired size
+ */
+typedef struct {char buf[FTRACE_MAX_PROFILE_SIZE];} profile_buf_t;
+
+char		*trace_profile_buf;
+char 		*trace_profile_buf_nmi;
+
+/* Count the events in use (per event id, not per instance) */
+static int	total_profile_count;
+
 static int ftrace_profile_enable_event(struct ftrace_event_call *event)
 {
+	char *buf;
+	int ret = -ENOMEM;
+
 	if (atomic_inc_return(&event->profile_count))
 		return 0;
 
-	return event->profile_enable();
+	if (!total_profile_count++) {
+		buf = (char *)alloc_percpu(profile_buf_t);
+		if (!buf)
+			goto fail_buf;
+
+		rcu_assign_pointer(trace_profile_buf, buf);
+
+		buf = (char *)alloc_percpu(profile_buf_t);
+		if (!buf)
+			goto fail_buf_nmi;
+
+		rcu_assign_pointer(trace_profile_buf_nmi, buf);
+	}
+
+	ret = event->profile_enable();
+	if (!ret)
+		return 0;
+
+	kfree(trace_profile_buf_nmi);
+fail_buf_nmi:
+	kfree(trace_profile_buf);
+fail_buf:
+	total_profile_count--;
+	atomic_dec(&event->profile_count);
+
+	return ret;
 }
 
 int ftrace_profile_enable(int event_id)
@@ -36,10 +76,29 @@ int ftrace_profile_enable(int event_id)
 
 static void ftrace_profile_disable_event(struct ftrace_event_call *event)
 {
+	char *buf, *nmi_buf;
+
 	if (!atomic_add_negative(-1, &event->profile_count))
 		return;
 
 	event->profile_disable();
+
+	if (!--total_profile_count) {
+		buf = trace_profile_buf;
+		rcu_assign_pointer(trace_profile_buf, NULL);
+
+		nmi_buf = trace_profile_buf_nmi;
+		rcu_assign_pointer(trace_profile_buf_nmi, NULL);
+
+		/*
+		 * Ensure every events in profiling have finished before
+		 * releasing the buffers
+		 */
+		synchronize_sched();
+
+		free_percpu(buf);
+		free_percpu(nmi_buf);
+	}
 }
 
 void ftrace_profile_disable(int event_id)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 8712ce3..1db1fb5 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -384,10 +384,12 @@ static int sys_prof_refcount_exit;
 
 static void prof_syscall_enter(struct pt_regs *regs, long id)
 {
-	struct syscall_trace_enter *rec;
 	struct syscall_metadata *sys_data;
+	struct syscall_trace_enter *rec;
+	char *raw_data;
 	int syscall_nr;
 	int size;
+	int cpu;
 
 	syscall_nr = syscall_get_nr(current, regs);
 	if (!test_bit(syscall_nr, enabled_prof_enter_syscalls))
@@ -402,20 +404,39 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
 	size = ALIGN(size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	do {
-		char raw_data[size];
+	if (WARN_ONCE(size > FTRACE_MAX_PROFILE_SIZE,
+		      "profile buffer not large enough"))
+		return;
+
+	/*
+	 * We are not in nmi. Also we can't be preempted by a sysenter event
+	 * from interrupt, so we can safely take this buffer without
+	 * masking interrupts. But we still need rcu_read_lock to protect
+	 * against concurrent buffer release.
+	 */
+	rcu_read_lock();
+
+	cpu = smp_processor_id();
+	raw_data = rcu_dereference(trace_profile_buf);
+
+	if (!raw_data)
+		goto end;
+
+	raw_data = per_cpu_ptr(raw_data, cpu);
+
+	/* zero the dead bytes from align to not leak stack to user */
+	*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
 
-		/* zero the dead bytes from align to not leak stack to user */
-		*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
+	rec = (struct syscall_trace_enter *) raw_data;
+	tracing_generic_entry_update(&rec->ent, 0, 0);
+	rec->ent.type = sys_data->enter_id;
+	rec->nr = syscall_nr;
+	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
+			       (unsigned long *)&rec->args);
+	perf_tpcounter_event(sys_data->enter_id, 0, 1, rec, size);
 
-		rec = (struct syscall_trace_enter *) raw_data;
-		tracing_generic_entry_update(&rec->ent, 0, 0);
-		rec->ent.type = sys_data->enter_id;
-		rec->nr = syscall_nr;
-		syscall_get_arguments(current, regs, 0, sys_data->nb_args,
-				       (unsigned long *)&rec->args);
-		perf_tpcounter_event(sys_data->enter_id, 0, 1, rec, size);
-	} while(0);
+end:
+	rcu_read_unlock();
 }
 
 int reg_prof_syscall_enter(char *name)
@@ -460,8 +481,11 @@ void unreg_prof_syscall_enter(char *name)
 static void prof_syscall_exit(struct pt_regs *regs, long ret)
 {
 	struct syscall_metadata *sys_data;
-	struct syscall_trace_exit rec;
+	struct syscall_trace_exit *rec;
 	int syscall_nr;
+	char *raw_data;
+	int size;
+	int cpu;
 
 	syscall_nr = syscall_get_nr(current, regs);
 	if (!test_bit(syscall_nr, enabled_prof_exit_syscalls))
@@ -471,12 +495,47 @@ static void prof_syscall_exit(struct pt_regs *regs, long ret)
 	if (!sys_data)
 		return;
 
-	tracing_generic_entry_update(&rec.ent, 0, 0);
-	rec.ent.type = sys_data->exit_id;
-	rec.nr = syscall_nr;
-	rec.ret = syscall_get_return_value(current, regs);
+	/* We can probably do that at build time */
+	size = ALIGN(sizeof(*rec) + sizeof(u32), sizeof(u64));
+	size -= sizeof(u32);
+
+	/*
+	 * Impossible, but be paranoid with the future
+	 * How to put this check outside runtime?
+	 */
+	if (WARN_ONCE(size > FTRACE_MAX_PROFILE_SIZE,
+		"exit event has grown above profile buffer size"))
+		return;
+
+	/*
+	 * This buffer is safe against interrupt.
+	 * But we still need rcu_read_lock to protect
+	 * against concurrent buffer release.
+	 */
+	rcu_read_lock();
+
+	cpu = smp_processor_id();
+	raw_data = rcu_dereference(trace_profile_buf);
+
+	if (!raw_data)
+		goto end;
+
+	raw_data = per_cpu_ptr(raw_data, cpu);
+
+	/* zero the dead bytes from align to not leak stack to user */
+	*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
+
+	rec = (struct syscall_trace_exit *)raw_data;
 
-	perf_tpcounter_event(sys_data->exit_id, 0, 1, &rec, sizeof(rec));
+	tracing_generic_entry_update(&rec->ent, 0, 0);
+	rec->ent.type = sys_data->exit_id;
+	rec->nr = syscall_nr;
+	rec->ret = syscall_get_return_value(current, regs);
+
+	perf_tpcounter_event(sys_data->exit_id, 0, 1, rec, size);
+
+end:
+	rcu_read_unlock();
 }
 
 int reg_prof_syscall_exit(char *name)
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically
  2009-09-18  4:24 ` [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically Frederic Weisbecker
@ 2009-09-18  4:31   ` Frederic Weisbecker
  0 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-18  4:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Steven Rostedt, Peter Zijlstra, Li Zefan, Jason Baron,
	Masami Hiramatsu

On Fri, Sep 18, 2009 at 06:24:14AM +0200, Frederic Weisbecker wrote:
>  static void prof_syscall_enter(struct pt_regs *regs, long id)
>  {
> -	struct syscall_trace_enter *rec;
>  	struct syscall_metadata *sys_data;
> +	struct syscall_trace_enter *rec;
> +	char *raw_data;
>  	int syscall_nr;
>  	int size;
> +	int cpu;
>  
>  	syscall_nr = syscall_get_nr(current, regs);
>  	if (!test_bit(syscall_nr, enabled_prof_enter_syscalls))
> @@ -402,20 +404,39 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
>  	size = ALIGN(size + sizeof(u32), sizeof(u64));
>  	size -= sizeof(u32);
>  
> -	do {
> -		char raw_data[size];
> +	if (WARN_ONCE(size > FTRACE_MAX_PROFILE_SIZE,
> +		      "profile buffer not large enough"))
> +		return;
> +
> +	/*
> +	 * We are not in nmi. Also we can't be preempted by a sysenter event
> +	 * from interrupt, so we can safely take this buffer without
> +	 * masking interrupts. But we still need rcu_read_lock to protect
> +	 * against concurrent buffer release.
> +	 */
> +	rcu_read_lock();



Oh...now that we use a global buffer, the buffer is not safe anymore against
interrupts or NMIs.

Looks like I will need a third iteration.

Sorry for the noise...


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 0/2 v3] tracing: Tracing event profiling updates
  2009-09-18  4:24 ` [PATCH 0/2 v2] tracing: Tracing event profiling updates Frederic Weisbecker
@ 2009-09-18  5:36   ` Frederic Weisbecker
  2009-09-19  7:34     ` Ingo Molnar
  2009-09-18  5:36   ` [PATCH 2/2 v3] tracing: Allocate the ftrace event profile buffer dynamically Frederic Weisbecker
  1 sibling, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-18  5:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Peter Zijlstra,
	Li Zefan, Jason Baron, Masami Hiramatsu


Ingo,

Hopefully this is my last attempt.
This new iteration fixes the syscalls events to correctly handle
the buffer. In the previous version, they did not care about interrupts.

I only resend the second patch as only this one has changed since the v2.

The new branch is in:
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	tracing/core-v3

Thanks,
	Frederic.

Frederic Weisbecker (2):
      tracing: Factorize the events profile accounting
      tracing: Allocate the ftrace event profile buffer dynamically

 include/linux/ftrace_event.h       |   10 +++-
 include/linux/syscalls.h           |   24 +++-----
 include/trace/ftrace.h             |  111 ++++++++++++++++++++---------------
 kernel/trace/trace_event_profile.c |   79 +++++++++++++++++++++++++-
 kernel/trace/trace_syscalls.c      |   97 +++++++++++++++++++++++++------
 5 files changed, 234 insertions(+), 87 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2 v3] tracing: Allocate the ftrace event profile buffer dynamically
  2009-09-18  4:24 ` [PATCH 0/2 v2] tracing: Tracing event profiling updates Frederic Weisbecker
  2009-09-18  5:36   ` [PATCH 0/2 v3] " Frederic Weisbecker
@ 2009-09-18  5:36   ` Frederic Weisbecker
  1 sibling, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-18  5:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Peter Zijlstra,
	Li Zefan, Jason Baron, Masami Hiramatsu, Ingo Molnar

Currently the trace event profile buffer is allocated in the stack. But
this may be too much for the stack, as the events can have large
statically defined field size and can also grow with dynamic arrays.

Allocate two per cpu buffer for all profiled events. The first cpu
buffer is used to host every non-nmi context traces. It is protected
by disabling the interrupts while writing and committing the trace.

The second buffer is reserved for nmi. So that there is no race between
them and the first buffer.

The whole write/commit section is rcu protected because we release
these buffers while deactivating the last profiling trace event.

v2: Move the buffers from trace_event to be global, as pointed by
    Steven Rostedt.

v3: Fix the syscall events to handle the profiling buffer races
    by disabling interrupts, now that the buffers are globals.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 include/linux/ftrace_event.h       |    6 ++
 include/trace/ftrace.h             |   83 ++++++++++++++++++++----------
 kernel/trace/trace_event_profile.c |   61 ++++++++++++++++++++++-
 kernel/trace/trace_syscalls.c      |   97 +++++++++++++++++++++++++++++-------
 4 files changed, 199 insertions(+), 48 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index bc103d7..4ec5e67 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -4,6 +4,7 @@
 #include <linux/ring_buffer.h>
 #include <linux/trace_seq.h>
 #include <linux/percpu.h>
+#include <linux/hardirq.h>
 
 struct trace_array;
 struct tracer;
@@ -134,6 +135,11 @@ struct ftrace_event_call {
 	void			(*profile_disable)(void);
 };
 
+#define FTRACE_MAX_PROFILE_SIZE	2048
+
+extern char			*trace_profile_buf;
+extern char			*trace_profile_buf_nmi;
+
 #define MAX_FILTER_PRED		32
 #define MAX_FILTER_STR_VAL	256	/* Should handle KSYM_SYMBOL_LEN */
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index a822087..a0361cb 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -648,11 +648,12 @@ __attribute__((section("_ftrace_events"))) event_##call = {		\
  *	struct ftrace_raw_##call *entry;
  *	u64 __addr = 0, __count = 1;
  *	unsigned long irq_flags;
+ *	struct trace_entry *ent;
  *	int __entry_size;
  *	int __data_size;
+ *	int __cpu
  *	int pc;
  *
- *	local_save_flags(irq_flags);
  *	pc = preempt_count();
  *
  *	__data_size = ftrace_get_offsets_<call>(&__data_offsets, args);
@@ -663,25 +664,34 @@ __attribute__((section("_ftrace_events"))) event_##call = {		\
  *			     sizeof(u64));
  *	__entry_size -= sizeof(u32);
  *
- *	do {
- *		char raw_data[__entry_size]; <- allocate our sample in the stack
- *		struct trace_entry *ent;
+ *	// Protect the non nmi buffer
+ *	// This also protects the rcu read side
+ *	local_irq_save(irq_flags);
+ *	__cpu = smp_processor_id();
+ *
+ *	if (in_nmi())
+ *		raw_data = rcu_dereference(trace_profile_buf_nmi);
+ *	else
+ *		raw_data = rcu_dereference(trace_profile_buf);
+ *
+ *	if (!raw_data)
+ *		goto end;
  *
- *		zero dead bytes from alignment to avoid stack leak to userspace:
+ *	raw_data = per_cpu_ptr(raw_data, __cpu);
  *
- *		*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;
- *		entry = (struct ftrace_raw_<call> *)raw_data;
- *		ent = &entry->ent;
- *		tracing_generic_entry_update(ent, irq_flags, pc);
- *		ent->type = event_call->id;
+ *	//zero dead bytes from alignment to avoid stack leak to userspace:
+ *	*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;
+ *	entry = (struct ftrace_raw_<call> *)raw_data;
+ *	ent = &entry->ent;
+ *	tracing_generic_entry_update(ent, irq_flags, pc);
+ *	ent->type = event_call->id;
  *
- *		<tstruct> <- do some jobs with dynamic arrays
+ *	<tstruct> <- do some jobs with dynamic arrays
  *
- *		<assign>  <- affect our values
+ *	<assign>  <- affect our values
  *
- *		perf_tpcounter_event(event_call->id, __addr, __count, entry,
- *			     __entry_size);  <- submit them to perf counter
- *	} while (0);
+ *	perf_tpcounter_event(event_call->id, __addr, __count, entry,
+ *		     __entry_size);  <- submit them to perf counter
  *
  * }
  */
@@ -704,11 +714,13 @@ static void ftrace_profile_##call(proto)				\
 	struct ftrace_raw_##call *entry;				\
 	u64 __addr = 0, __count = 1;					\
 	unsigned long irq_flags;					\
+	struct trace_entry *ent;					\
 	int __entry_size;						\
 	int __data_size;						\
+	char *raw_data;							\
+	int __cpu;							\
 	int pc;								\
 									\
-	local_save_flags(irq_flags);					\
 	pc = preempt_count();						\
 									\
 	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
@@ -716,23 +728,38 @@ static void ftrace_profile_##call(proto)				\
 			     sizeof(u64));				\
 	__entry_size -= sizeof(u32);					\
 									\
-	do {								\
-		char raw_data[__entry_size];				\
-		struct trace_entry *ent;				\
+	if (WARN_ONCE(__entry_size > FTRACE_MAX_PROFILE_SIZE,		\
+		      "profile buffer not large enough"))		\
+		return;							\
+									\
+	local_irq_save(irq_flags);					\
+	__cpu = smp_processor_id();					\
 									\
-		*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;	\
-		entry = (struct ftrace_raw_##call *)raw_data;		\
-		ent = &entry->ent;					\
-		tracing_generic_entry_update(ent, irq_flags, pc);	\
-		ent->type = event_call->id;				\
+	if (in_nmi())							\
+		raw_data = rcu_dereference(trace_profile_buf_nmi);		\
+	else								\
+		raw_data = rcu_dereference(trace_profile_buf);		\
 									\
-		tstruct							\
+	if (!raw_data)							\
+		goto end;						\
 									\
-		{ assign; }						\
+	raw_data = per_cpu_ptr(raw_data, __cpu);			\
 									\
-		perf_tpcounter_event(event_call->id, __addr, __count, entry,\
+	*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;		\
+	entry = (struct ftrace_raw_##call *)raw_data;			\
+	ent = &entry->ent;						\
+	tracing_generic_entry_update(ent, irq_flags, pc);		\
+	ent->type = event_call->id;					\
+									\
+	tstruct								\
+									\
+	{ assign; }							\
+									\
+	perf_tpcounter_event(event_call->id, __addr, __count, entry,	\
 			     __entry_size);				\
-	} while (0);							\
+									\
+end:									\
+	local_irq_restore(irq_flags);					\
 									\
 }
 
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index df4a74e..3aaa77c 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -8,12 +8,52 @@
 #include <linux/module.h>
 #include "trace.h"
 
+/*
+ * We can't use a size but a type in alloc_percpu()
+ * So let's create a dummy type that matches the desired size
+ */
+typedef struct {char buf[FTRACE_MAX_PROFILE_SIZE];} profile_buf_t;
+
+char		*trace_profile_buf;
+char 		*trace_profile_buf_nmi;
+
+/* Count the events in use (per event id, not per instance) */
+static int	total_profile_count;
+
 static int ftrace_profile_enable_event(struct ftrace_event_call *event)
 {
+	char *buf;
+	int ret = -ENOMEM;
+
 	if (atomic_inc_return(&event->profile_count))
 		return 0;
 
-	return event->profile_enable();
+	if (!total_profile_count++) {
+		buf = (char *)alloc_percpu(profile_buf_t);
+		if (!buf)
+			goto fail_buf;
+
+		rcu_assign_pointer(trace_profile_buf, buf);
+
+		buf = (char *)alloc_percpu(profile_buf_t);
+		if (!buf)
+			goto fail_buf_nmi;
+
+		rcu_assign_pointer(trace_profile_buf_nmi, buf);
+	}
+
+	ret = event->profile_enable();
+	if (!ret)
+		return 0;
+
+	kfree(trace_profile_buf_nmi);
+fail_buf_nmi:
+	kfree(trace_profile_buf);
+fail_buf:
+	total_profile_count--;
+	atomic_dec(&event->profile_count);
+
+	return ret;
 }
 
 int ftrace_profile_enable(int event_id)
@@ -36,10 +76,29 @@ int ftrace_profile_enable(int event_id)
 
 static void ftrace_profile_disable_event(struct ftrace_event_call *event)
 {
+	char *buf, *nmi_buf;
+
 	if (!atomic_add_negative(-1, &event->profile_count))
 		return;
 
 	event->profile_disable();
+
+	if (!--total_profile_count) {
+		buf = trace_profile_buf;
+		rcu_assign_pointer(trace_profile_buf, NULL);
+
+		nmi_buf = trace_profile_buf_nmi;
+		rcu_assign_pointer(trace_profile_buf_nmi, NULL);
+
+		/*
+		 * Ensure every events in profiling have finished before
+		 * releasing the buffers
+		 */
+		synchronize_sched();
+
+		free_percpu(buf);
+		free_percpu(nmi_buf);
+	}
 }
 
 void ftrace_profile_disable(int event_id)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 8712ce3..7a3550c 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -384,10 +384,13 @@ static int sys_prof_refcount_exit;
 
 static void prof_syscall_enter(struct pt_regs *regs, long id)
 {
-	struct syscall_trace_enter *rec;
 	struct syscall_metadata *sys_data;
+	struct syscall_trace_enter *rec;
+	unsigned long flags;
+	char *raw_data;
 	int syscall_nr;
 	int size;
+	int cpu;
 
 	syscall_nr = syscall_get_nr(current, regs);
 	if (!test_bit(syscall_nr, enabled_prof_enter_syscalls))
@@ -402,20 +405,38 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
 	size = ALIGN(size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	do {
-		char raw_data[size];
+	if (WARN_ONCE(size > FTRACE_MAX_PROFILE_SIZE,
+		      "profile buffer not large enough"))
+		return;
+
+	/* Protect the per cpu buffer, begin the rcu read side */
+	local_irq_save(flags);
 
-		/* zero the dead bytes from align to not leak stack to user */
-		*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
+	cpu = smp_processor_id();
+
+	if (in_nmi())
+		raw_data = rcu_dereference(trace_profile_buf_nmi);
+	else
+		raw_data = rcu_dereference(trace_profile_buf);
+
+	if (!raw_data)
+		goto end;
 
-		rec = (struct syscall_trace_enter *) raw_data;
-		tracing_generic_entry_update(&rec->ent, 0, 0);
-		rec->ent.type = sys_data->enter_id;
-		rec->nr = syscall_nr;
-		syscall_get_arguments(current, regs, 0, sys_data->nb_args,
-				       (unsigned long *)&rec->args);
-		perf_tpcounter_event(sys_data->enter_id, 0, 1, rec, size);
-	} while(0);
+	raw_data = per_cpu_ptr(raw_data, cpu);
+
+	/* zero the dead bytes from align to not leak stack to user */
+	*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
+
+	rec = (struct syscall_trace_enter *) raw_data;
+	tracing_generic_entry_update(&rec->ent, 0, 0);
+	rec->ent.type = sys_data->enter_id;
+	rec->nr = syscall_nr;
+	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
+			       (unsigned long *)&rec->args);
+	perf_tpcounter_event(sys_data->enter_id, 0, 1, rec, size);
+
+end:
+	local_irq_restore(flags);
 }
 
 int reg_prof_syscall_enter(char *name)
@@ -460,8 +481,12 @@ void unreg_prof_syscall_enter(char *name)
 static void prof_syscall_exit(struct pt_regs *regs, long ret)
 {
 	struct syscall_metadata *sys_data;
-	struct syscall_trace_exit rec;
+	struct syscall_trace_exit *rec;
+	unsigned long flags;
 	int syscall_nr;
+	char *raw_data;
+	int size;
+	int cpu;
 
 	syscall_nr = syscall_get_nr(current, regs);
 	if (!test_bit(syscall_nr, enabled_prof_exit_syscalls))
@@ -471,12 +496,46 @@ static void prof_syscall_exit(struct pt_regs *regs, long ret)
 	if (!sys_data)
 		return;
 
-	tracing_generic_entry_update(&rec.ent, 0, 0);
-	rec.ent.type = sys_data->exit_id;
-	rec.nr = syscall_nr;
-	rec.ret = syscall_get_return_value(current, regs);
+	/* We can probably do that at build time */
+	size = ALIGN(sizeof(*rec) + sizeof(u32), sizeof(u64));
+	size -= sizeof(u32);
 
-	perf_tpcounter_event(sys_data->exit_id, 0, 1, &rec, sizeof(rec));
+	/*
+	 * Impossible, but be paranoid with the future
+	 * How to put this check outside runtime?
+	 */
+	if (WARN_ONCE(size > FTRACE_MAX_PROFILE_SIZE,
+		"exit event has grown above profile buffer size"))
+		return;
+
+	/* Protect the per cpu buffer, begin the rcu read side */
+	local_irq_save(flags);
+	cpu = smp_processor_id();
+
+	if (in_nmi())
+		raw_data = rcu_dereference(trace_profile_buf_nmi);
+	else
+		raw_data = rcu_dereference(trace_profile_buf);
+
+	if (!raw_data)
+		goto end;
+
+	raw_data = per_cpu_ptr(raw_data, cpu);
+
+	/* zero the dead bytes from align to not leak stack to user */
+	*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
+
+	rec = (struct syscall_trace_exit *)raw_data;
+
+	tracing_generic_entry_update(&rec->ent, 0, 0);
+	rec->ent.type = sys_data->exit_id;
+	rec->nr = syscall_nr;
+	rec->ret = syscall_get_return_value(current, regs);
+
+	perf_tpcounter_event(sys_data->exit_id, 0, 1, rec, size);
+
+end:
+	local_irq_restore(flags);
 }
 
 int reg_prof_syscall_exit(char *name)
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2 v3] tracing: Tracing event profiling updates
  2009-09-18  5:36   ` [PATCH 0/2 v3] " Frederic Weisbecker
@ 2009-09-19  7:34     ` Ingo Molnar
  2009-09-19  8:03       ` Frederic Weisbecker
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2009-09-19  7:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Peter Zijlstra, Li Zefan, Jason Baron,
	Masami Hiramatsu


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> 
> Ingo,
> 
> Hopefully this is my last attempt.
> This new iteration fixes the syscalls events to correctly handle
> the buffer. In the previous version, they did not care about interrupts.
> 
> I only resend the second patch as only this one has changed since the v2.
> 
> The new branch is in:
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> 	tracing/core-v3
> 
> Thanks,
> 	Frederic.
> 
> Frederic Weisbecker (2):
>       tracing: Factorize the events profile accounting
>       tracing: Allocate the ftrace event profile buffer dynamically
> 
>  include/linux/ftrace_event.h       |   10 +++-
>  include/linux/syscalls.h           |   24 +++-----
>  include/trace/ftrace.h             |  111 ++++++++++++++++++++---------------
>  kernel/trace/trace_event_profile.c |   79 +++++++++++++++++++++++++-
>  kernel/trace/trace_syscalls.c      |   97 +++++++++++++++++++++++++------
>  5 files changed, 234 insertions(+), 87 deletions(-)

Hm, the naming is quite confusing here i think:

  -132,8 +133,12 @@ struct ftrace_event_call {
         atomic_t                profile_count;
         int                     (*profile_enable)(void);
         void                    (*profile_disable)(void);
 +       char                    *profile_buf;
 +       char                    *profile_buf_nmi;

These are generic events, not just 'profiling' histograms.

Generic events can have _many_ output modi:

 - SVGs                   (perf timeline)
 - histograms             (perf report)
 - traces                 (perf trace)
 - summaries / maximums   (perf sched lat)
 - maps                   (perf sched map)
 - graphs                 (perf report --call-graph)

So it's quite a misnomer to talk just about profiling here. This is an 
event record buffer.

Also, what is the currently maximum possible size of ->profile_buf? The 
max size of an event record? The new codepath looks a bit heavy with 
rcu-lock/unlock and other bits put inbetween - and this is now in the 
event sending critical path. Cannot we do a permanent buffer that needs 
no extra locking/reference protection?

Is the whole thing even justified? I mean, we keep the size of records 
low anyway. It's a _lot_ easier to handle on-stack records, they are the 
ideal (and very fast) dynamic allocator which is NMI and IRQ safe, etc.

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2 v3] tracing: Tracing event profiling updates
  2009-09-19  7:34     ` Ingo Molnar
@ 2009-09-19  8:03       ` Frederic Weisbecker
  0 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-19  8:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Steven Rostedt, Peter Zijlstra, Li Zefan, Jason Baron,
	Masami Hiramatsu

On Sat, Sep 19, 2009 at 09:34:00AM +0200, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > 
> > Ingo,
> > 
> > Hopefully this is my last attempt.
> > This new iteration fixes the syscalls events to correctly handle
> > the buffer. In the previous version, they did not care about interrupts.
> > 
> > I only resend the second patch as only this one has changed since the v2.
> > 
> > The new branch is in:
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> > 	tracing/core-v3
> > 
> > Thanks,
> > 	Frederic.
> > 
> > Frederic Weisbecker (2):
> >       tracing: Factorize the events profile accounting
> >       tracing: Allocate the ftrace event profile buffer dynamically
> > 
> >  include/linux/ftrace_event.h       |   10 +++-
> >  include/linux/syscalls.h           |   24 +++-----
> >  include/trace/ftrace.h             |  111 ++++++++++++++++++++---------------
> >  kernel/trace/trace_event_profile.c |   79 +++++++++++++++++++++++++-
> >  kernel/trace/trace_syscalls.c      |   97 +++++++++++++++++++++++++------
> >  5 files changed, 234 insertions(+), 87 deletions(-)
> 
> Hm, the naming is quite confusing here i think:
> 
>   -132,8 +133,12 @@ struct ftrace_event_call {
>          atomic_t                profile_count;
>          int                     (*profile_enable)(void);
>          void                    (*profile_disable)(void);
>  +       char                    *profile_buf;
>  +       char                    *profile_buf_nmi;
> 
> These are generic events, not just 'profiling' histograms.
> 
> Generic events can have _many_ output modi:
> 
>  - SVGs                   (perf timeline)
>  - histograms             (perf report)
>  - traces                 (perf trace)
>  - summaries / maximums   (perf sched lat)
>  - maps                   (perf sched map)
>  - graphs                 (perf report --call-graph)
> 
> So it's quite a misnomer to talk just about profiling here. This is an 
> event record buffer.



Agreed, I guess we can call the perf_event_buf/perf_event_nmi.
Also may be the profile_enable/profile_disable should follow the
renaming logic.



> Also, what is the currently maximum possible size of ->profile_buf? The 
> max size of an event record? The new codepath looks a bit heavy with 
> rcu-lock/unlock and other bits put inbetween - and this is now in the 
> event sending critical path. Cannot we do a permanent buffer that needs 
> no extra locking/reference protection?
> 
> Is the whole thing even justified? I mean, we keep the size of records 
> low anyway. It's a _lot_ easier to handle on-stack records, they are the 
> ideal (and very fast) dynamic allocator which is NMI and IRQ safe, etc.
> 
> 	Ingo


The max size of an event is undefined once it uses either
a __dynamic_array() or __string() field. (The latter is a subset
of the former anyway).

Those are very special fields that can handle dynamic size arrays.
That makes such events having an unpredictable size each time they
are triggered.

That said, we are currently using a stack based buffer. That coupled
with the unpredicatble event size must really be fixed. I mean, we don't
want to run out of the stack boundaries once an event get a long string,
once a new large event is added, or once an event is randomly triggered
in a path where the stack is already deeply dug.

I've done this easy stack based as a first shot to support ftrace raw
events by perf, but now this becomes something that needs to be fixed
IMO.

I've made this rcu based thing to avoid wasting the buffer in memory
(for each cpu) while we are not profiling the raw events.

I could drop that and keep these buffers static, but this seems
wasteful wrt memory footprint while profiling/tracing is inactive.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-09-19  8:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-18  1:33 [PATCH 0/2] tracing: Tracing event profiling updates Frederic Weisbecker
2009-09-18  1:33 ` [PATCH 1/2] tracing: Factorize the events profile accounting Frederic Weisbecker
2009-09-18  2:15   ` Steven Rostedt
2009-09-18  3:45   ` Li Zefan
2009-09-18  1:33 ` [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically Frederic Weisbecker
2009-09-18  2:15   ` Steven Rostedt
2009-09-18  2:23     ` Frederic Weisbecker
2009-09-18  4:24 ` [PATCH 0/2 v2] tracing: Tracing event profiling updates Frederic Weisbecker
2009-09-18  5:36   ` [PATCH 0/2 v3] " Frederic Weisbecker
2009-09-19  7:34     ` Ingo Molnar
2009-09-19  8:03       ` Frederic Weisbecker
2009-09-18  5:36   ` [PATCH 2/2 v3] tracing: Allocate the ftrace event profile buffer dynamically Frederic Weisbecker
2009-09-18  4:24 ` [PATCH 1/2] tracing: Factorize the events profile accounting Frederic Weisbecker
2009-09-18  4:24 ` [PATCH 2/2] tracing: Allocate the ftrace event profile buffer dynamically Frederic Weisbecker
2009-09-18  4:31   ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.