[PATCH 0/7] [GIT PULL] tracing/ring-buffer: more updates for tip

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/7] [GIT PULL] tracing/ring-buffer: more updates for tip
@ 2009-05-07  3:13 Steven Rostedt
  2009-05-07  3:13 ` [PATCH 1/7] ring-buffer: remove unneeded conditional in rb_reserve_next Steven Rostedt
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Steven Rostedt @ 2009-05-07  3:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig


Ingo,

Please pull the latest tip/tracing/ftrace tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/ftrace


Steven Rostedt (7):
      ring-buffer: remove unneeded conditional in rb_reserve_next
      ring-buffer: check for failed allocation in ring buffer benchmark
      ring-buffer: make moving the tail page a separate function
      ring-buffer: change test to be more latency friendly
      tracing: update sample with TRACE_INCLUDE_FILE
      tracing: reset ring buffer when removing modules with events
      tracing: add hierarchical enabling of events

----
 kernel/trace/ring_buffer.c                 |   97 ++++++++++---------
 kernel/trace/ring_buffer_benchmark.c       |   34 +++++++
 kernel/trace/trace.c                       |   10 ++
 kernel/trace/trace.h                       |    2 +
 kernel/trace/trace_events.c                |  149 ++++++++++++++++++++++++++++
 samples/trace_events/trace-events-sample.h |    7 +-
 6 files changed, 253 insertions(+), 46 deletions(-)
-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/7] ring-buffer: remove unneeded conditional in rb_reserve_next
  2009-05-07  3:13 [PATCH 0/7] [GIT PULL] tracing/ring-buffer: more updates for tip Steven Rostedt
@ 2009-05-07  3:13 ` Steven Rostedt
  2009-05-07  8:23   ` Ingo Molnar
  2009-05-07  3:13 ` [PATCH 2/7] ring-buffer: check for failed allocation in ring buffer benchmark Steven Rostedt
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2009-05-07  3:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig

[-- Attachment #1: 0001-ring-buffer-remove-unneeded-conditional-in-rb_reser.patch --]
[-- Type: text/plain, Size: 1482 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The code in __rb_reserve_next checks on page overflow if it is the
original commiter and then resets the page back to the original
setting.  Although this is fine, and the code is correct, it is
a bit fragil. Some experimental work I did breaks it easily.

The better and more robust solution is to have all commiters that
overflow the page, simply subtract what they added.

[ Impact: more robust ring buffer account management ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 424129e..03ed52b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1290,9 +1290,8 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 		rb_event_set_padding(event);
 	}
 
-	if (tail <= BUF_PAGE_SIZE)
-		/* Set the write back to the previous setting */
-		local_set(&tail_page->write, tail);
+	/* Set the write back to the previous setting */
+	local_sub(length, &tail_page->write);
 
 	/*
 	 * If this was a commit entry that failed,
@@ -1311,8 +1310,7 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 
  out_reset:
 	/* reset write */
-	if (tail <= BUF_PAGE_SIZE)
-		local_set(&tail_page->write, tail);
+	local_sub(length, &tail_page->write);
 
 	if (likely(lock_taken))
 		__raw_spin_unlock(&cpu_buffer->lock);
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/7] ring-buffer: check for failed allocation in ring buffer benchmark
  2009-05-07  3:13 [PATCH 0/7] [GIT PULL] tracing/ring-buffer: more updates for tip Steven Rostedt
  2009-05-07  3:13 ` [PATCH 1/7] ring-buffer: remove unneeded conditional in rb_reserve_next Steven Rostedt
@ 2009-05-07  3:13 ` Steven Rostedt
  2009-05-07  3:13 ` [PATCH 3/7] ring-buffer: make moving the tail page a separate function Steven Rostedt
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2009-05-07  3:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig

[-- Attachment #1: 0002-ring-buffer-check-for-failed-allocation-in-ring-buf.patch --]
[-- Type: text/plain, Size: 900 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The result of the allocation of the ring buffer read page in the
ring buffer bench mark does not check the return to see if a page
was actually allocated. This patch fixes that.

[ Impact: avoid NULL dereference ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer_benchmark.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 747244a..dcd75e9 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -84,6 +84,9 @@ static enum event_status read_page(int cpu)
 	int i;
 
 	bpage = ring_buffer_alloc_read_page(buffer);
+	if (!bpage)
+		return EVENT_DROPPED;
+
 	ret = ring_buffer_read_page(buffer, &bpage, PAGE_SIZE, cpu, 1);
 	if (ret >= 0) {
 		rpage = bpage;
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/7] ring-buffer: make moving the tail page a separate function
  2009-05-07  3:13 [PATCH 0/7] [GIT PULL] tracing/ring-buffer: more updates for tip Steven Rostedt
  2009-05-07  3:13 ` [PATCH 1/7] ring-buffer: remove unneeded conditional in rb_reserve_next Steven Rostedt
  2009-05-07  3:13 ` [PATCH 2/7] ring-buffer: check for failed allocation in ring buffer benchmark Steven Rostedt
@ 2009-05-07  3:13 ` Steven Rostedt
  2009-05-07  8:27   ` Ingo Molnar
  2009-05-07  3:13 ` [PATCH 4/7] ring-buffer: change test to be more latency friendly Steven Rostedt
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2009-05-07  3:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig

[-- Attachment #1: 0003-ring-buffer-make-moving-the-tail-page-a-separate-fu.patch --]
[-- Type: text/plain, Size: 3821 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Ingo Molnar thought the code would be cleaner if we used a function call
instead of a goto for moving the tail page. After implementing this,
it seems that gcc still inlines the result and the output is pretty much
the same. Since this is considered a cleaner approach, might as well
implement it.

[ Impact: code clean up ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   89 ++++++++++++++++++++++++--------------------
 1 files changed, 49 insertions(+), 40 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 03ed52b..3ae5ccf 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1154,51 +1154,18 @@ static unsigned rb_calculate_event_length(unsigned length)
 	return length;
 }
 
+
 static struct ring_buffer_event *
-__rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
-		  unsigned type, unsigned long length, u64 *ts)
+rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
+	     unsigned long length, unsigned long tail,
+	     struct buffer_page *commit_page,
+	     struct buffer_page *tail_page, u64 *ts)
 {
-	struct buffer_page *tail_page, *head_page, *reader_page, *commit_page;
-	struct buffer_page *next_page;
-	unsigned long tail, write;
+	struct buffer_page *next_page, *head_page, *reader_page;
 	struct ring_buffer *buffer = cpu_buffer->buffer;
 	struct ring_buffer_event *event;
-	unsigned long flags;
 	bool lock_taken = false;
-
-	commit_page = cpu_buffer->commit_page;
-	/* we just need to protect against interrupts */
-	barrier();
-	tail_page = cpu_buffer->tail_page;
-	write = local_add_return(length, &tail_page->write);
-	tail = write - length;
-
-	/* See if we shot pass the end of this buffer page */
-	if (write > BUF_PAGE_SIZE)
-		goto next_page;
-
-	/* We reserved something on the buffer */
-
-	if (RB_WARN_ON(cpu_buffer, write > BUF_PAGE_SIZE))
-		return NULL;
-
-	event = __rb_page_index(tail_page, tail);
-	rb_update_event(event, type, length);
-
-	/* The passed in type is zero for DATA */
-	if (likely(!type))
-		local_inc(&tail_page->entries);
-
-	/*
-	 * If this is a commit and the tail is zero, then update
-	 * this page's time stamp.
-	 */
-	if (!tail && rb_is_commit(cpu_buffer, event))
-		cpu_buffer->commit_page->page->time_stamp = *ts;
-
-	return event;
-
- next_page:
+	unsigned long flags;
 
 	next_page = tail_page;
 
@@ -1318,6 +1285,48 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 	return NULL;
 }
 
+static struct ring_buffer_event *
+__rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
+		  unsigned type, unsigned long length, u64 *ts)
+{
+	struct buffer_page *tail_page, *commit_page;
+	struct ring_buffer_event *event;
+	unsigned long tail, write;
+
+	commit_page = cpu_buffer->commit_page;
+	/* we just need to protect against interrupts */
+	barrier();
+	tail_page = cpu_buffer->tail_page;
+	write = local_add_return(length, &tail_page->write);
+	tail = write - length;
+
+	/* See if we shot pass the end of this buffer page */
+	if (write > BUF_PAGE_SIZE)
+		return rb_move_tail(cpu_buffer, length, tail,
+				    commit_page, tail_page, ts);
+
+	/* We reserved something on the buffer */
+
+	if (RB_WARN_ON(cpu_buffer, write > BUF_PAGE_SIZE))
+		return NULL;
+
+	event = __rb_page_index(tail_page, tail);
+	rb_update_event(event, type, length);
+
+	/* The passed in type is zero for DATA */
+	if (likely(!type))
+		local_inc(&tail_page->entries);
+
+	/*
+	 * If this is a commit and the tail is zero, then update
+	 * this page's time stamp.
+	 */
+	if (!tail && rb_is_commit(cpu_buffer, event))
+		cpu_buffer->commit_page->page->time_stamp = *ts;
+
+	return event;
+}
+
 static int
 rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
 		  u64 *ts, u64 *delta)
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/7] ring-buffer: change test to be more latency friendly
  2009-05-07  3:13 [PATCH 0/7] [GIT PULL] tracing/ring-buffer: more updates for tip Steven Rostedt
                   ` (2 preceding siblings ...)
  2009-05-07  3:13 ` [PATCH 3/7] ring-buffer: make moving the tail page a separate function Steven Rostedt
@ 2009-05-07  3:13 ` Steven Rostedt
  2009-05-07  8:31   ` Ingo Molnar
  2009-05-07  3:13 ` [PATCH 5/7] tracing: update sample with TRACE_INCLUDE_FILE Steven Rostedt
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2009-05-07  3:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig

[-- Attachment #1: 0004-ring-buffer-change-test-to-be-more-latency-friendly.patch --]
[-- Type: text/plain, Size: 2522 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The ring buffer benchmark/test runs a producer for 10 seconds.
This is done with preemption and interrupts enabled. But if the kernel
is not compiled with CONFIG_PREEMPT, it basically stops everything
but interrupts for 10 seconds.

Although this is just a test and is not for production, this attribute
can be quite annoying. It can also spawn badness elsewhere.

This patch solves the issues by calling "cond_resched" when the system
is not compiled with CONFIG_PREEMPT. It also keeps track of the time
spent to call cond_resched such that it does not go against the
time calculations. That is, if the task schedules away, the time scheduled
out is removed from the test data. Note, this only works for non PREEMPT
because we do not know when the task is scheduled out if we have PREEMPT
enabled.

[ Impact: prevent test from stopping the world for 10 seconds ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer_benchmark.c |   31 +++++++++++++++++++++++++++++++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index dcd75e9..a26fc67 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -185,6 +185,35 @@ static void ring_buffer_consumer(void)
 	complete(&read_done);
 }
 
+/*
+ * If we are a non preempt kernel, the 10 second run will
+ * stop everything while it runs. Instead, we will call cond_resched
+ * and also add any time that was lost by a rescedule.
+ */
+#ifdef CONFIG_PREEMPT
+static void sched_if_needed(struct timeval *start_tv, struct timeval *end_tv)
+{
+}
+#else
+static void sched_if_needed(struct timeval *start_tv, struct timeval *end_tv)
+{
+	struct timeval tv;
+
+	cond_resched();
+	do_gettimeofday(&tv);
+	if (tv.tv_usec < end_tv->tv_usec) {
+		tv.tv_usec += 1000000;
+		tv.tv_sec--;
+	}
+	start_tv->tv_sec += tv.tv_sec - end_tv->tv_sec;
+	start_tv->tv_usec += tv.tv_usec - end_tv->tv_usec;
+	if (start_tv->tv_usec > 1000000) {
+		start_tv->tv_usec -= 1000000;
+		start_tv->tv_sec++;
+	}
+}
+#endif
+
 static void ring_buffer_producer(void)
 {
 	struct timeval start_tv;
@@ -221,6 +250,8 @@ static void ring_buffer_producer(void)
 		if (consumer && !(++cnt % wakeup_interval))
 			wake_up_process(consumer);
 
+		sched_if_needed(&start_tv, &end_tv);
+
 	} while (end_tv.tv_sec < (start_tv.tv_sec + RUN_TIME) && !kill_test);
 	pr_info("End ring buffer hammer\n");
 
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/7] tracing: update sample with TRACE_INCLUDE_FILE
  2009-05-07  3:13 [PATCH 0/7] [GIT PULL] tracing/ring-buffer: more updates for tip Steven Rostedt
                   ` (3 preceding siblings ...)
  2009-05-07  3:13 ` [PATCH 4/7] ring-buffer: change test to be more latency friendly Steven Rostedt
@ 2009-05-07  3:13 ` Steven Rostedt
  2009-05-07  3:13 ` [PATCH 6/7] tracing: reset ring buffer when removing modules with events Steven Rostedt
  2009-05-07  3:13 ` [PATCH 7/7] tracing: add hierarchical enabling of events Steven Rostedt
  6 siblings, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2009-05-07  3:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig

[-- Attachment #1: 0005-tracing-update-sample-with-TRACE_INCLUDE_FILE.patch --]
[-- Type: text/plain, Size: 1904 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

When creating trace events for ftrace, the header file with the TRACE_EVENT
macros must also have a macro called TRACE_SYSTEM. This macro describes
the name of the system the TRACE_EVENTS are defined for. It also doubles
as a way for the define_trace.h file to include the file that included
it.

For example:

in irq.h

 #define TRACE_SYSTEM irq

[...]

 #include <trace/define_trace.h>

The define_trace will use TRACE_SYSTEM to include irq.h. But if the name
of the trace system does not match the name of the trace header file,
one can override it with:

Which will change define_trace.h to inclued foo_trace.h instead of foo.h

The sample comments this, but people that use the sample code will more
likely use the code and not read the comments. This patch changes the
sample code to use the TRACE_INCLUDE_FILE to better show developers how to
use it.

[ Impact: make sample less confusing to developers ]

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 samples/trace_events/trace-events-sample.h |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/samples/trace_events/trace-events-sample.h b/samples/trace_events/trace-events-sample.h
index eab4644..128a897 100644
--- a/samples/trace_events/trace-events-sample.h
+++ b/samples/trace_events/trace-events-sample.h
@@ -31,7 +31,7 @@
  *
  */
 #undef TRACE_SYSTEM
-#define TRACE_SYSTEM trace-events-sample
+#define TRACE_SYSTEM sample
 
 /*
  * The TRACE_EVENT macro is broken up into 5 parts.
@@ -120,5 +120,10 @@ TRACE_EVENT(foo_bar,
  * result.
  */
 #undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
 #define TRACE_INCLUDE_PATH .
+/*
+ * TRACE_INCLUDE_FILE is not needed if the filename and TRACE_SYSTEM are equal
+ */
+#define TRACE_INCLUDE_FILE trace-events-sample
 #include <trace/define_trace.h>
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/7] tracing: reset ring buffer when removing modules with events
  2009-05-07  3:13 [PATCH 0/7] [GIT PULL] tracing/ring-buffer: more updates for tip Steven Rostedt
                   ` (4 preceding siblings ...)
  2009-05-07  3:13 ` [PATCH 5/7] tracing: update sample with TRACE_INCLUDE_FILE Steven Rostedt
@ 2009-05-07  3:13 ` Steven Rostedt
  2009-05-07  3:51   ` Li Zefan
  2009-05-07 16:24   ` Frederic Weisbecker
  2009-05-07  3:13 ` [PATCH 7/7] tracing: add hierarchical enabling of events Steven Rostedt
  6 siblings, 2 replies; 23+ messages in thread
From: Steven Rostedt @ 2009-05-07  3:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig

[-- Attachment #1: 0006-tracing-reset-ring-buffer-when-removing-modules-wit.patch --]
[-- Type: text/plain, Size: 3418 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Li Zefan found that there's a race using the event ids of events and
modules. When a module is loaded, an event id is incremented. We only
have 16 bits for event ids (65536) and there is a possible (but highly
unlikely) race that we could load and unload a module that registers
events so many times that the event id counter overflows.

When it overflows, it then restarts and goes looking for available
ids. An id is available if it was added by a module and released.

The race is if you have one module add an id, and then is removed.
Another module loaded can use that same event id. But if the old module
still had events in the ring buffer, the new module's call back would
get bogus data.  At best (and most likely) the output would just be
garbage. But if the module for some reason used pointers (not recommended)
then this could potentially crash.

The safest thing to do is just reset the ring buffer if a module that
registered events is removed.

[ Impact: prevent unpredictable results of event id overflows ]

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <49FEAFD0.30106@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c        |   10 ++++++++++
 kernel/trace/trace.h        |    2 ++
 kernel/trace/trace_events.c |    9 +++++++++
 3 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4164a34..dd40d23 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -639,6 +639,16 @@ void tracing_reset_online_cpus(struct trace_array *tr)
 		tracing_reset(tr, cpu);
 }
 
+void tracing_reset_current(int cpu)
+{
+	tracing_reset(&global_trace, cpu);
+}
+
+void tracing_reset_current_online_cpus(void)
+{
+	tracing_reset_online_cpus(&global_trace);
+}
+
 #define SAVED_CMDLINES 128
 #define NO_CMDLINE_MAP UINT_MAX
 static unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1];
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 777c6c3..ba25793 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -409,6 +409,8 @@ int tracing_is_enabled(void);
 void trace_wake_up(void);
 void tracing_reset(struct trace_array *tr, int cpu);
 void tracing_reset_online_cpus(struct trace_array *tr);
+void tracing_reset_current(int cpu);
+void tracing_reset_current_online_cpus(void);
 int tracing_open_generic(struct inode *inode, struct file *filp);
 struct dentry *trace_create_file(const char *name,
 				 mode_t mode,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 8d579ff..6d2c842 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -932,9 +932,11 @@ static void trace_module_remove_events(struct module *mod)
 {
 	struct ftrace_module_file_ops *file_ops;
 	struct ftrace_event_call *call, *p;
+	bool found = false;
 
 	list_for_each_entry_safe(call, p, &ftrace_events, list) {
 		if (call->mod == mod) {
+			found = true;
 			if (call->enabled) {
 				call->enabled = 0;
 				call->unregfunc();
@@ -957,6 +959,13 @@ static void trace_module_remove_events(struct module *mod)
 		list_del(&file_ops->list);
 		kfree(file_ops);
 	}
+
+	/*
+	 * It is safest to reset the ring buffer if the module being unloaded
+	 * registered any events.
+	 */
+	if (found)
+		tracing_reset_current_online_cpus();
 }
 
 static int trace_module_notify(struct notifier_block *self,
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/7] tracing: add hierarchical enabling of events
  2009-05-07  3:13 [PATCH 0/7] [GIT PULL] tracing/ring-buffer: more updates for tip Steven Rostedt
                   ` (5 preceding siblings ...)
  2009-05-07  3:13 ` [PATCH 6/7] tracing: reset ring buffer when removing modules with events Steven Rostedt
@ 2009-05-07  3:13 ` Steven Rostedt
  2009-05-07  3:51   ` Li Zefan
  2009-05-07 16:28   ` Frederic Weisbecker
  6 siblings, 2 replies; 23+ messages in thread
From: Steven Rostedt @ 2009-05-07  3:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig

[-- Attachment #1: 0007-tracing-add-hierarchical-enabling-of-events.patch --]
[-- Type: text/plain, Size: 4690 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

With the current event directory, you can only enable individual events.
The file debugfs/tracing/set_event is used to be able to enable or
disable several events at once. But that can still be awkward.

This patch adds hierarchical enabling of events. That is, each directory
in debugfs/tracing/events has an "enable" file. This file can enable
or disable all events within the directory and below.

 # echo 1 > /debugfs/tracing/events/enable

will enable all events.

 # echo 1 > /debugfs/tracing/events/sched/enable

will enable all events in the sched subsystem.

 # echo 1 > /debugfs/tracing/events/enable
 # echo 0 > /debugfs/tracing/events/irq/enable

will enable all events, but then disable just the irq subsystem events.

When reading one of these enable files, there are four results:

 0 - all events this file affects are disabled
 1 - all events this file affects are enabled
 X - there is a mixture of events enabled and disabled
 ? - this file does not affect any event

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_events.c |  140 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 140 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 6d2c842..87feb01 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -400,6 +400,133 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
 	return cnt;
 }
 
+static ssize_t
+system_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
+		   loff_t *ppos)
+{
+	const char *system = filp->private_data;
+	struct ftrace_event_call *call;
+	char buf[2];
+	int set = -1;
+	int all = 0;
+	int ret;
+
+	if (system[0] == '*')
+		all = 1;
+
+	mutex_lock(&event_mutex);
+	list_for_each_entry(call, &ftrace_events, list) {
+		if (!call->name || !call->regfunc)
+			continue;
+
+		if (!all && strcmp(call->system, system) != 0)
+			continue;
+
+		/*
+		 * We need to find out if all the events are set
+		 * or if all events or cleared, or if we have
+		 * a mixture.
+		 */
+		if (call->enabled) {
+			switch (set) {
+			case -1:
+				set = 1;
+				break;
+			case 0:
+				set = 2;
+				break;
+			}
+		} else {
+			switch (set) {
+			case -1:
+				set = 0;
+				break;
+			case 1:
+				set = 2;
+				break;
+			}
+		}
+		/*
+		 * If we have a mixture, no need to look further.
+		 */
+		if (set == 2)
+			break;
+	}
+	mutex_unlock(&event_mutex);
+
+	buf[1] = '\n';
+	switch (set) {
+	case 0:
+		buf[0] = '0';
+		break;
+	case 1:
+		buf[0] = '1';
+		break;
+	case 2:
+		buf[0] = 'X';
+		break;
+	default:
+		buf[0] = '?';
+	}
+
+	ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, 2);
+
+	return ret;
+}
+
+static ssize_t
+system_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
+		    loff_t *ppos)
+{
+	const char *system = filp->private_data;
+	unsigned long val;
+	char *command;
+	char buf[64];
+	ssize_t ret;
+
+	if (cnt >= sizeof(buf))
+		return -EINVAL;
+
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
+
+	buf[cnt] = 0;
+
+	ret = strict_strtoul(buf, 10, &val);
+	if (ret < 0)
+		return ret;
+
+	ret = tracing_update_buffers();
+	if (ret < 0)
+		return ret;
+
+	switch (val) {
+	case 0:
+	case 1:
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	command = kstrdup(system, GFP_KERNEL);
+	if (!command)
+		return -ENOMEM;
+
+	ret = ftrace_set_clr_event(command, val);
+	if (ret)
+		goto out_free;
+
+	ret = cnt;
+
+ out_free:
+	kfree(command);
+
+	*ppos += cnt;
+
+	return ret;
+}
+
 extern char *__bad_type_size(void);
 
 #undef FIELD
@@ -686,6 +813,12 @@ static const struct file_operations ftrace_subsystem_filter_fops = {
 	.write = subsystem_filter_write,
 };
 
+static const struct file_operations ftrace_system_enable_fops = {
+	.open = tracing_open_generic,
+	.read = system_enable_read,
+	.write = system_enable_write,
+};
+
 static const struct file_operations ftrace_show_header_fops = {
 	.open = tracing_open_generic,
 	.read = show_header,
@@ -768,6 +901,10 @@ event_subsystem_dir(const char *name, struct dentry *d_events)
 			   "'%s/filter' entry\n", name);
 	}
 
+	entry = trace_create_file("enable", 0644, system->entry,
+				  (void *)system->name,
+				  &ftrace_system_enable_fops);
+
 	return system->entry;
 }
 
@@ -1041,6 +1178,9 @@ static __init int event_trace_init(void)
 			  ring_buffer_print_entry_header,
 			  &ftrace_show_header_fops);
 
+	trace_create_file("enable", 0644, d_events,
+			  "*:*", &ftrace_system_enable_fops);
+
 	for_each_event(call, __start_ftrace_events, __stop_ftrace_events) {
 		/* The linker may leave blanks */
 		if (!call->name)
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] tracing: add hierarchical enabling of events
  2009-05-07  3:13 ` [PATCH 7/7] tracing: add hierarchical enabling of events Steven Rostedt
@ 2009-05-07  3:51   ` Li Zefan
  2009-05-07 13:21     ` Steven Rostedt
  2009-05-07 16:28   ` Frederic Weisbecker
  1 sibling, 1 reply; 23+ messages in thread
From: Li Zefan @ 2009-05-07  3:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Frederic Weisbecker,
	Christoph Hellwig

> With the current event directory, you can only enable individual events.
> The file debugfs/tracing/set_event is used to be able to enable or
> disable several events at once. But that can still be awkward.
> 
> This patch adds hierarchical enabling of events. That is, each directory
> in debugfs/tracing/events has an "enable" file. This file can enable
> or disable all events within the directory and below.
> 
>  # echo 1 > /debugfs/tracing/events/enable
> 
> will enable all events.
> 
>  # echo 1 > /debugfs/tracing/events/sched/enable
> 
> will enable all events in the sched subsystem.
> 
>  # echo 1 > /debugfs/tracing/events/enable
>  # echo 0 > /debugfs/tracing/events/irq/enable
> 
> will enable all events, but then disable just the irq subsystem events.
> 
> When reading one of these enable files, there are four results:
> 
>  0 - all events this file affects are disabled
>  1 - all events this file affects are enabled
>  X - there is a mixture of events enabled and disabled
>  ? - this file does not affect any event

I would expect reading an enable file will let me know exactly
which events are disabled and which are enabled.

I think this is useful especially for events/system/enable.

Like this:

$ cat events/irq/enable
0 irq_handler_entry
0 irq_handler_exit
1 softirq_entry
1 softirq_exit

> +static ssize_t
> +system_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
> +		   loff_t *ppos)
> +{
> +	const char *system = filp->private_data;
> +	struct ftrace_event_call *call;
> +	char buf[2];
> +	int set = -1;
> +	int all = 0;
> +	int ret;
> +
> +	if (system[0] == '*')
> +		all = 1;
> +
> +	mutex_lock(&event_mutex);
> +	list_for_each_entry(call, &ftrace_events, list) {
> +		if (!call->name || !call->regfunc)
> +			continue;
> +
> +		if (!all && strcmp(call->system, system) != 0)
> +			continue;
> +
> +		/*
> +		 * We need to find out if all the events are set
> +		 * or if all events or cleared, or if we have
> +		 * a mixture.
> +		 */
> +		if (call->enabled) {
> +			switch (set) {
> +			case -1:
> +				set = 1;
> +				break;
> +			case 0:
> +				set = 2;
> +				break;
> +			}
> +		} else {
> +			switch (set) {
> +			case -1:
> +				set = 0;
> +				break;
> +			case 1:
> +				set = 2;
> +				break;
> +			}
> +		}
> +		/*
> +		 * If we have a mixture, no need to look further.
> +		 */
> +		if (set == 2)
> +			break;

How about:

int set = 0;

...
set |= (1 << call->enabled);
...

set == 0: '?'
set == 1: '0'
set == 2: '1'
set == 3: 'X'

Will this make the code simpler? :)

Or we can go even further:

char result[4] = { '?', '0', '1', 'X' };
...
buf[0] = result[set];

> +	}
> +	mutex_unlock(&event_mutex);
> +
> +	buf[1] = '\n';
> +	switch (set) {
> +	case 0:
> +		buf[0] = '0';
> +		break;
> +	case 1:
> +		buf[0] = '1';
> +		break;
> +	case 2:
> +		buf[0] = 'X';
> +		break;
> +	default:
> +		buf[0] = '?';
> +	}
> +
> +	ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, 2);
> +
> +	return ret;
> +}
> +
> +static ssize_t
> +system_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
> +		    loff_t *ppos)
> +{
> +	const char *system = filp->private_data;
> +	unsigned long val;
> +	char *command;
> +	char buf[64];
> +	ssize_t ret;
> +
> +	if (cnt >= sizeof(buf))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&buf, ubuf, cnt))
> +		return -EFAULT;
> +
> +	buf[cnt] = 0;
> +
> +	ret = strict_strtoul(buf, 10, &val);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = tracing_update_buffers();
> +	if (ret < 0)
> +		return ret;
> +
> +	switch (val) {
> +	case 0:
> +	case 1:
> +		break;
> +
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	command = kstrdup(system, GFP_KERNEL);
> +	if (!command)
> +		return -ENOMEM;
> +
> +	ret = ftrace_set_clr_event(command, val);

I think we should pass "sched:" or "sched:*", instead of "sched",
the comment in ftrace_set_clr_event():

         *  <name> (no ':') means all events in a subsystem with
         *  the name <name> or any event that matches <name>

> +	if (ret)
> +		goto out_free;
> +
> +	ret = cnt;
> +
> + out_free:
> +	kfree(command);
> +
> +	*ppos += cnt;
> +
> +	return ret;
> +}
> +


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/7] tracing: reset ring buffer when removing modules with events
  2009-05-07  3:13 ` [PATCH 6/7] tracing: reset ring buffer when removing modules with events Steven Rostedt
@ 2009-05-07  3:51   ` Li Zefan
  2009-05-07 16:24   ` Frederic Weisbecker
  1 sibling, 0 replies; 23+ messages in thread
From: Li Zefan @ 2009-05-07  3:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Frederic Weisbecker,
	Christoph Hellwig

> From: Steven Rostedt <srostedt@redhat.com>
> 
> Li Zefan found that there's a race using the event ids of events and
> modules. When a module is loaded, an event id is incremented. We only
> have 16 bits for event ids (65536) and there is a possible (but highly
> unlikely) race that we could load and unload a module that registers
> events so many times that the event id counter overflows.
> 
> When it overflows, it then restarts and goes looking for available
> ids. An id is available if it was added by a module and released.
> 
> The race is if you have one module add an id, and then is removed.
> Another module loaded can use that same event id. But if the old module
> still had events in the ring buffer, the new module's call back would
> get bogus data.  At best (and most likely) the output would just be
> garbage. But if the module for some reason used pointers (not recommended)
> then this could potentially crash.
> 
> The safest thing to do is just reset the ring buffer if a module that
> registered events is removed.
> 
> [ Impact: prevent unpredictable results of event id overflows ]
> 
> Reported-by: Li Zefan <lizf@cn.fujitsu.com>
> LKML-Reference: <49FEAFD0.30106@cn.fujitsu.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] ring-buffer: remove unneeded conditional in rb_reserve_next
  2009-05-07  3:13 ` [PATCH 1/7] ring-buffer: remove unneeded conditional in rb_reserve_next Steven Rostedt
@ 2009-05-07  8:23   ` Ingo Molnar
  0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2009-05-07  8:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig


* Steven Rostedt <rostedt@goodmis.org> wrote:

> From: Steven Rostedt <srostedt@redhat.com>
> 
> The code in __rb_reserve_next checks on page overflow if it is the
> original commiter and then resets the page back to the original
> setting.  Although this is fine, and the code is correct, it is
> a bit fragil. Some experimental work I did breaks it easily.

s/fragil/fragile

> The better and more robust solution is to have all commiters that
> overflow the page, simply subtract what they added.
> 
> [ Impact: more robust ring buffer account management ]

it's a bit faster too (eliminates two branches) - so a 
micro-optimization as well.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] ring-buffer: make moving the tail page a separate function
  2009-05-07  3:13 ` [PATCH 3/7] ring-buffer: make moving the tail page a separate function Steven Rostedt
@ 2009-05-07  8:27   ` Ingo Molnar
  2009-05-07 13:26     ` Steven Rostedt
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2009-05-07  8:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig


* Steven Rostedt <rostedt@goodmis.org> wrote:

> From: Steven Rostedt <srostedt@redhat.com>
> 
> Ingo Molnar thought the code would be cleaner if we used a function call
> instead of a goto for moving the tail page. After implementing this,
> it seems that gcc still inlines the result and the output is pretty much
> the same. Since this is considered a cleaner approach, might as well
> implement it.
> 
> [ Impact: code clean up ]
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/ring_buffer.c |   89 ++++++++++++++++++++++++--------------------
>  1 files changed, 49 insertions(+), 40 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 03ed52b..3ae5ccf 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1154,51 +1154,18 @@ static unsigned rb_calculate_event_length(unsigned length)
>  	return length;
>  }
>  
> +
>  static struct ring_buffer_event *
> -__rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
> -		  unsigned type, unsigned long length, u64 *ts)
> +rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
> +	     unsigned long length, unsigned long tail,
> +	     struct buffer_page *commit_page,
> +	     struct buffer_page *tail_page, u64 *ts)
>  {
> -	struct buffer_page *tail_page, *head_page, *reader_page, *commit_page;
> -	struct buffer_page *next_page;
> -	unsigned long tail, write;
> +	struct buffer_page *next_page, *head_page, *reader_page;
>  	struct ring_buffer *buffer = cpu_buffer->buffer;
>  	struct ring_buffer_event *event;
> -	unsigned long flags;
>  	bool lock_taken = false;
> -
> -	commit_page = cpu_buffer->commit_page;
> -	/* we just need to protect against interrupts */
> -	barrier();
> -	tail_page = cpu_buffer->tail_page;
> -	write = local_add_return(length, &tail_page->write);
> -	tail = write - length;
> -
> -	/* See if we shot pass the end of this buffer page */
> -	if (write > BUF_PAGE_SIZE)
> -		goto next_page;
> -
> -	/* We reserved something on the buffer */
> -
> -	if (RB_WARN_ON(cpu_buffer, write > BUF_PAGE_SIZE))
> -		return NULL;
> -
> -	event = __rb_page_index(tail_page, tail);
> -	rb_update_event(event, type, length);
> -
> -	/* The passed in type is zero for DATA */
> -	if (likely(!type))
> -		local_inc(&tail_page->entries);
> -
> -	/*
> -	 * If this is a commit and the tail is zero, then update
> -	 * this page's time stamp.
> -	 */
> -	if (!tail && rb_is_commit(cpu_buffer, event))
> -		cpu_buffer->commit_page->page->time_stamp = *ts;
> -
> -	return event;
> -
> - next_page:
> +	unsigned long flags;
>  
>  	next_page = tail_page;
>  
> @@ -1318,6 +1285,48 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
>  	return NULL;
>  }
>  
> +static struct ring_buffer_event *
> +__rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
> +		  unsigned type, unsigned long length, u64 *ts)
> +{
> +	struct buffer_page *tail_page, *commit_page;
> +	struct ring_buffer_event *event;
> +	unsigned long tail, write;
> +
> +	commit_page = cpu_buffer->commit_page;
> +	/* we just need to protect against interrupts */
> +	barrier();
> +	tail_page = cpu_buffer->tail_page;
> +	write = local_add_return(length, &tail_page->write);
> +	tail = write - length;
> +
> +	/* See if we shot pass the end of this buffer page */
> +	if (write > BUF_PAGE_SIZE)
> +		return rb_move_tail(cpu_buffer, length, tail,
> +				    commit_page, tail_page, ts);

Nice! The __rb_reserve_next() fast-path logic became a lot clearer.

The above branch might be unlikely(), right? With usual record sizes 
of around 40 bytes, we'll have a 100 records for every page 
overflow. That's i think within the reach of unlikely().

Depends on how much of a mess GCC makes of it though.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] ring-buffer: change test to be more latency friendly
  2009-05-07  3:13 ` [PATCH 4/7] ring-buffer: change test to be more latency friendly Steven Rostedt
@ 2009-05-07  8:31   ` Ingo Molnar
  2009-05-07  8:34     ` Ingo Molnar
  2009-05-07 13:51     ` Steven Rostedt
  0 siblings, 2 replies; 23+ messages in thread
From: Ingo Molnar @ 2009-05-07  8:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig


* Steven Rostedt <rostedt@goodmis.org> wrote:

> From: Steven Rostedt <srostedt@redhat.com>
> 
> The ring buffer benchmark/test runs a producer for 10 seconds. 
> This is done with preemption and interrupts enabled. But if the 
> kernel is not compiled with CONFIG_PREEMPT, it basically stops 
> everything but interrupts for 10 seconds.
> 
> Although this is just a test and is not for production, this attribute
> can be quite annoying. It can also spawn badness elsewhere.

Yep, this probably explains that lockdep splat i got in a networking 
driver. Some functionality (a workqueue iirc) of the driver got 
starved and a time-out timer triggered - where lockdep caught 
locking badness.

> This patch solves the issues by calling "cond_resched" when the 
> system is not compiled with CONFIG_PREEMPT. It also keeps track of 
> the time spent to call cond_resched such that it does not go 
> against the time calculations. That is, if the task schedules 
> away, the time scheduled out is removed from the test data. Note, 
> this only works for non PREEMPT because we do not know when the 
> task is scheduled out if we have PREEMPT enabled.
> 
> [ Impact: prevent test from stopping the world for 10 seconds ]
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/ring_buffer_benchmark.c |   31 +++++++++++++++++++++++++++++++
>  1 files changed, 31 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
> index dcd75e9..a26fc67 100644
> --- a/kernel/trace/ring_buffer_benchmark.c
> +++ b/kernel/trace/ring_buffer_benchmark.c
> @@ -185,6 +185,35 @@ static void ring_buffer_consumer(void)
>  	complete(&read_done);
>  }
>  
> +/*
> + * If we are a non preempt kernel, the 10 second run will
> + * stop everything while it runs. Instead, we will call cond_resched
> + * and also add any time that was lost by a rescedule.
> + */
> +#ifdef CONFIG_PREEMPT
> +static void sched_if_needed(struct timeval *start_tv, struct timeval *end_tv)
> +{
> +}
> +#else
> +static void sched_if_needed(struct timeval *start_tv, struct timeval *end_tv)
> +{
> +	struct timeval tv;
> +
> +	cond_resched();
> +	do_gettimeofday(&tv);
> +	if (tv.tv_usec < end_tv->tv_usec) {
> +		tv.tv_usec += 1000000;
> +		tv.tv_sec--;
> +	}
> +	start_tv->tv_sec += tv.tv_sec - end_tv->tv_sec;
> +	start_tv->tv_usec += tv.tv_usec - end_tv->tv_usec;
> +	if (start_tv->tv_usec > 1000000) {
> +		start_tv->tv_usec -= 1000000;
> +		start_tv->tv_sec++;
> +	}
> +}
> +#endif

This is _way_ too ugly. Why not just add a cond_resched() to the 
inner loop and be done with it? cond_resched() is conditional 
already, so it will only schedule 'if needed'.

If the test's timing gets skewed, what's the big deal? If its being 
preempted there will be impact _anyway_. (due to cache footprint 
elimination, etc.) People obviously should only rely on the numbers 
if the system is idle.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] ring-buffer: change test to be more latency friendly
  2009-05-07  8:31   ` Ingo Molnar
@ 2009-05-07  8:34     ` Ingo Molnar
  2009-05-07 13:51     ` Steven Rostedt
  1 sibling, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2009-05-07  8:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > From: Steven Rostedt <srostedt@redhat.com>
> > 
> > The ring buffer benchmark/test runs a producer for 10 seconds. 
> > This is done with preemption and interrupts enabled. But if the 
> > kernel is not compiled with CONFIG_PREEMPT, it basically stops 
> > everything but interrupts for 10 seconds.
> > 
> > Although this is just a test and is not for production, this attribute
> > can be quite annoying. It can also spawn badness elsewhere.
> 
> Yep, this probably explains that lockdep splat i got in a networking 
> driver. Some functionality (a workqueue iirc) of the driver got 
> starved and a time-out timer triggered - where lockdep caught 
> locking badness.
> 
> > This patch solves the issues by calling "cond_resched" when the 
> > system is not compiled with CONFIG_PREEMPT. It also keeps track of 
> > the time spent to call cond_resched such that it does not go 
> > against the time calculations. That is, if the task schedules 
> > away, the time scheduled out is removed from the test data. Note, 
> > this only works for non PREEMPT because we do not know when the 
> > task is scheduled out if we have PREEMPT enabled.
> > 
> > [ Impact: prevent test from stopping the world for 10 seconds ]
> > 
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >  kernel/trace/ring_buffer_benchmark.c |   31 +++++++++++++++++++++++++++++++
> >  1 files changed, 31 insertions(+), 0 deletions(-)
> > 
> > diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
> > index dcd75e9..a26fc67 100644
> > --- a/kernel/trace/ring_buffer_benchmark.c
> > +++ b/kernel/trace/ring_buffer_benchmark.c
> > @@ -185,6 +185,35 @@ static void ring_buffer_consumer(void)
> >  	complete(&read_done);
> >  }
> >  
> > +/*
> > + * If we are a non preempt kernel, the 10 second run will
> > + * stop everything while it runs. Instead, we will call cond_resched
> > + * and also add any time that was lost by a rescedule.
> > + */
> > +#ifdef CONFIG_PREEMPT
> > +static void sched_if_needed(struct timeval *start_tv, struct timeval *end_tv)
> > +{
> > +}
> > +#else
> > +static void sched_if_needed(struct timeval *start_tv, struct timeval *end_tv)
> > +{
> > +	struct timeval tv;
> > +
> > +	cond_resched();
> > +	do_gettimeofday(&tv);
> > +	if (tv.tv_usec < end_tv->tv_usec) {
> > +		tv.tv_usec += 1000000;
> > +		tv.tv_sec--;
> > +	}
> > +	start_tv->tv_sec += tv.tv_sec - end_tv->tv_sec;
> > +	start_tv->tv_usec += tv.tv_usec - end_tv->tv_usec;
> > +	if (start_tv->tv_usec > 1000000) {
> > +		start_tv->tv_usec -= 1000000;
> > +		start_tv->tv_sec++;
> > +	}
> > +}
> > +#endif
> 
> This is _way_ too ugly. Why not just add a cond_resched() to the 
> inner loop and be done with it? cond_resched() is conditional 
> already, so it will only schedule 'if needed'.
> 
> If the test's timing gets skewed, what's the big deal? If its 
> being preempted there will be impact _anyway_. (due to cache 
> footprint elimination, etc.) People obviously should only rely on 
> the numbers if the system is idle.

Note, i pulled, because the other bits are really nice - but it 
would be good to eliminate the code above in a followup patch.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] tracing: add hierarchical enabling of events
  2009-05-07  3:51   ` Li Zefan
@ 2009-05-07 13:21     ` Steven Rostedt
  2009-05-08  1:11       ` Li Zefan
  0 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2009-05-07 13:21 UTC (permalink / raw)
  To: Li Zefan
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Frederic Weisbecker,
	Christoph Hellwig


On Thu, 7 May 2009, Li Zefan wrote:

> > With the current event directory, you can only enable individual events.
> > The file debugfs/tracing/set_event is used to be able to enable or
> > disable several events at once. But that can still be awkward.
> > 
> > This patch adds hierarchical enabling of events. That is, each directory
> > in debugfs/tracing/events has an "enable" file. This file can enable
> > or disable all events within the directory and below.
> > 
> >  # echo 1 > /debugfs/tracing/events/enable
> > 
> > will enable all events.
> > 
> >  # echo 1 > /debugfs/tracing/events/sched/enable
> > 
> > will enable all events in the sched subsystem.
> > 
> >  # echo 1 > /debugfs/tracing/events/enable
> >  # echo 0 > /debugfs/tracing/events/irq/enable
> > 
> > will enable all events, but then disable just the irq subsystem events.
> > 
> > When reading one of these enable files, there are four results:
> > 
> >  0 - all events this file affects are disabled
> >  1 - all events this file affects are enabled
> >  X - there is a mixture of events enabled and disabled
> >  ? - this file does not affect any event
> 
> I would expect reading an enable file will let me know exactly
> which events are disabled and which are enabled.
> 
> I think this is useful especially for events/system/enable.
> 
> Like this:
> 
> $ cat events/irq/enable
> 0 irq_handler_entry
> 0 irq_handler_exit
> 1 softirq_entry
> 1 softirq_exit

I thought about doing something like this, but this idea for the 
hierarchical enabling came to me around 11pm, and I had the code written 
by 11:15pm ;-)

Which means, I figured I would do it as simple as possible. We do have 
"set_event" that gives you a list of enabled events. My thought was still 
having a "1" or "0" if all are either enabled or disabled. And when it is 
a mixture, I would have a list of enabled events.

Though, it is useful. Maybe in the future. But really, the information is 
there, and I did not expect this to be a "what is enabled" file, but 
instead a "I want to enable/disable all these events". In other words, I 
was much more interested in the "write" ability than the read. But who 
knows, maybe this will change in the future.

> 
> > +static ssize_t
> > +system_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
> > +		   loff_t *ppos)
> > +{
> > +	const char *system = filp->private_data;
> > +	struct ftrace_event_call *call;
> > +	char buf[2];
> > +	int set = -1;
> > +	int all = 0;
> > +	int ret;
> > +
> > +	if (system[0] == '*')
> > +		all = 1;
> > +
> > +	mutex_lock(&event_mutex);
> > +	list_for_each_entry(call, &ftrace_events, list) {
> > +		if (!call->name || !call->regfunc)
> > +			continue;
> > +
> > +		if (!all && strcmp(call->system, system) != 0)
> > +			continue;
> > +
> > +		/*
> > +		 * We need to find out if all the events are set
> > +		 * or if all events or cleared, or if we have
> > +		 * a mixture.
> > +		 */
> > +		if (call->enabled) {
> > +			switch (set) {
> > +			case -1:
> > +				set = 1;
> > +				break;
> > +			case 0:
> > +				set = 2;
> > +				break;
> > +			}
> > +		} else {
> > +			switch (set) {
> > +			case -1:
> > +				set = 0;
> > +				break;
> > +			case 1:
> > +				set = 2;
> > +				break;
> > +			}
> > +		}
> > +		/*
> > +		 * If we have a mixture, no need to look further.
> > +		 */
> > +		if (set == 2)
> > +			break;
> 
> How about:
> 
> int set = 0;
> 
> ...
> set |= (1 << call->enabled);

* paranoid *

  set |= (1 << !!call->enabled);

> ...
> 
> set == 0: '?'
> set == 1: '0'
> set == 2: '1'
> set == 3: 'X'
> 
> Will this make the code simpler? :)
> 
> Or we can go even further:
> 
> char result[4] = { '?', '0', '1', 'X' };
> ...
> buf[0] = result[set];

cute, mind sending a patch ;-)


> 
> > +	}
> > +	mutex_unlock(&event_mutex);
> > +
> > +	buf[1] = '\n';
> > +	switch (set) {
> > +	case 0:
> > +		buf[0] = '0';
> > +		break;
> > +	case 1:
> > +		buf[0] = '1';
> > +		break;
> > +	case 2:
> > +		buf[0] = 'X';
> > +		break;
> > +	default:
> > +		buf[0] = '?';
> > +	}
> > +
> > +	ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, 2);
> > +
> > +	return ret;
> > +}
> > +
> > +static ssize_t
> > +system_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
> > +		    loff_t *ppos)
> > +{
> > +	const char *system = filp->private_data;
> > +	unsigned long val;
> > +	char *command;
> > +	char buf[64];
> > +	ssize_t ret;
> > +
> > +	if (cnt >= sizeof(buf))
> > +		return -EINVAL;
> > +
> > +	if (copy_from_user(&buf, ubuf, cnt))
> > +		return -EFAULT;
> > +
> > +	buf[cnt] = 0;
> > +
> > +	ret = strict_strtoul(buf, 10, &val);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	ret = tracing_update_buffers();
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	switch (val) {
> > +	case 0:
> > +	case 1:
> > +		break;
> > +
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	command = kstrdup(system, GFP_KERNEL);
> > +	if (!command)
> > +		return -ENOMEM;
> > +
> > +	ret = ftrace_set_clr_event(command, val);
> 
> I think we should pass "sched:" or "sched:*", instead of "sched",
> the comment in ftrace_set_clr_event():
> 
>          *  <name> (no ':') means all events in a subsystem with
>          *  the name <name> or any event that matches <name>

Yeah, I thought about it too. But writing the patch in 15 minutes, I 
decided that a "kstrdup" was easier than adding a ":" ;-)

I will update it.

Thanks!

-- Steve

> 
> > +	if (ret)
> > +		goto out_free;
> > +
> > +	ret = cnt;
> > +
> > + out_free:
> > +	kfree(command);
> > +
> > +	*ppos += cnt;
> > +
> > +	return ret;
> > +}
> > +
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] ring-buffer: make moving the tail page a separate function
  2009-05-07  8:27   ` Ingo Molnar
@ 2009-05-07 13:26     ` Steven Rostedt
  2009-05-07 13:56       ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2009-05-07 13:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig


On Thu, 7 May 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> >  
> > +static struct ring_buffer_event *
> > +__rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
> > +		  unsigned type, unsigned long length, u64 *ts)
> > +{
> > +	struct buffer_page *tail_page, *commit_page;
> > +	struct ring_buffer_event *event;
> > +	unsigned long tail, write;
> > +
> > +	commit_page = cpu_buffer->commit_page;
> > +	/* we just need to protect against interrupts */
> > +	barrier();
> > +	tail_page = cpu_buffer->tail_page;
> > +	write = local_add_return(length, &tail_page->write);
> > +	tail = write - length;
> > +
> > +	/* See if we shot pass the end of this buffer page */
> > +	if (write > BUF_PAGE_SIZE)
> > +		return rb_move_tail(cpu_buffer, length, tail,
> > +				    commit_page, tail_page, ts);
> 
> Nice! The __rb_reserve_next() fast-path logic became a lot clearer.

Thanks!

> 
> The above branch might be unlikely(), right? With usual record sizes 
> of around 40 bytes, we'll have a 100 records for every page 
> overflow. That's i think within the reach of unlikely().
> 
> Depends on how much of a mess GCC makes of it though.

I looked at the assembly that gcc generates, and it is fine. gcc inlines 
the function and puts it at the end, thus it already treats it as an 
unlikely. I would like to avoid adding unlikely annotations when possible.

-- Steve


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] ring-buffer: change test to be more latency friendly
  2009-05-07  8:31   ` Ingo Molnar
  2009-05-07  8:34     ` Ingo Molnar
@ 2009-05-07 13:51     ` Steven Rostedt
  1 sibling, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2009-05-07 13:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Andrew Morton, Frederic Weisbecker, Li Zefan, Christoph Hellwig


On Thu, 7 May 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > From: Steven Rostedt <srostedt@redhat.com>
> > 
> > The ring buffer benchmark/test runs a producer for 10 seconds. 
> > This is done with preemption and interrupts enabled. But if the 
> > kernel is not compiled with CONFIG_PREEMPT, it basically stops 
> > everything but interrupts for 10 seconds.
> > 
> > Although this is just a test and is not for production, this attribute
> > can be quite annoying. It can also spawn badness elsewhere.
> 
> Yep, this probably explains that lockdep splat i got in a networking 
> driver. Some functionality (a workqueue iirc) of the driver got 
> starved and a time-out timer triggered - where lockdep caught 
> locking badness.

We probably need to notify the network people about that.

> 
> > This patch solves the issues by calling "cond_resched" when the 
> > system is not compiled with CONFIG_PREEMPT. It also keeps track of 
> > the time spent to call cond_resched such that it does not go 
> > against the time calculations. That is, if the task schedules 
> > away, the time scheduled out is removed from the test data. Note, 
> > this only works for non PREEMPT because we do not know when the 
> > task is scheduled out if we have PREEMPT enabled.
> > 
> > [ Impact: prevent test from stopping the world for 10 seconds ]
> > 
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >  kernel/trace/ring_buffer_benchmark.c |   31 +++++++++++++++++++++++++++++++
> >  1 files changed, 31 insertions(+), 0 deletions(-)
> > 
> > diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
> > index dcd75e9..a26fc67 100644
> > --- a/kernel/trace/ring_buffer_benchmark.c
> > +++ b/kernel/trace/ring_buffer_benchmark.c
> > @@ -185,6 +185,35 @@ static void ring_buffer_consumer(void)
> >  	complete(&read_done);
> >  }
> >  
> > +/*
> > + * If we are a non preempt kernel, the 10 second run will
> > + * stop everything while it runs. Instead, we will call cond_resched
> > + * and also add any time that was lost by a rescedule.
> > + */
> > +#ifdef CONFIG_PREEMPT
> > +static void sched_if_needed(struct timeval *start_tv, struct timeval *end_tv)
> > +{
> > +}
> > +#else
> > +static void sched_if_needed(struct timeval *start_tv, struct timeval *end_tv)
> > +{
> > +	struct timeval tv;
> > +
> > +	cond_resched();
> > +	do_gettimeofday(&tv);
> > +	if (tv.tv_usec < end_tv->tv_usec) {
> > +		tv.tv_usec += 1000000;
> > +		tv.tv_sec--;
> > +	}
> > +	start_tv->tv_sec += tv.tv_sec - end_tv->tv_sec;
> > +	start_tv->tv_usec += tv.tv_usec - end_tv->tv_usec;
> > +	if (start_tv->tv_usec > 1000000) {
> > +		start_tv->tv_usec -= 1000000;
> > +		start_tv->tv_sec++;
> > +	}
> > +}
> > +#endif
> 
> This is _way_ too ugly. Why not just add a cond_resched() to the 
> inner loop and be done with it? cond_resched() is conditional 
> already, so it will only schedule 'if needed'.
> 
> If the test's timing gets skewed, what's the big deal? If its being 
> preempted there will be impact _anyway_. (due to cache footprint 
> elimination, etc.) People obviously should only rely on the numbers 
> if the system is idle.

OK, I'll nuke it. I find that the writer is more affected by the reader. 
With this code, I get the same timings as if I run it with reader 
disabled.

But it is more a way to see how a change may affect the buffering than to 
really time the buffering itself. If that was the case, I would have added 
preemption around the timings.

-- Steve


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] ring-buffer: make moving the tail page a separate function
  2009-05-07 13:26     ` Steven Rostedt
@ 2009-05-07 13:56       ` Ingo Molnar
  0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2009-05-07 13:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker, Li Zefan,
	Christoph Hellwig


* Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> On Thu, 7 May 2009, Ingo Molnar wrote:
> 
> > 
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> > >  
> > > +static struct ring_buffer_event *
> > > +__rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
> > > +		  unsigned type, unsigned long length, u64 *ts)
> > > +{
> > > +	struct buffer_page *tail_page, *commit_page;
> > > +	struct ring_buffer_event *event;
> > > +	unsigned long tail, write;
> > > +
> > > +	commit_page = cpu_buffer->commit_page;
> > > +	/* we just need to protect against interrupts */
> > > +	barrier();
> > > +	tail_page = cpu_buffer->tail_page;
> > > +	write = local_add_return(length, &tail_page->write);
> > > +	tail = write - length;
> > > +
> > > +	/* See if we shot pass the end of this buffer page */
> > > +	if (write > BUF_PAGE_SIZE)
> > > +		return rb_move_tail(cpu_buffer, length, tail,
> > > +				    commit_page, tail_page, ts);
> > 
> > Nice! The __rb_reserve_next() fast-path logic became a lot clearer.
> 
> Thanks!
> 
> > 
> > The above branch might be unlikely(), right? With usual record sizes 
> > of around 40 bytes, we'll have a 100 records for every page 
> > overflow. That's i think within the reach of unlikely().
> > 
> > Depends on how much of a mess GCC makes of it though.
> 
> I looked at the assembly that gcc generates, and it is fine. gcc 
> inlines the function and puts it at the end, thus it already 
> treats it as an unlikely. I would like to avoid adding unlikely 
> annotations when possible.

Fair enough!

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/7] tracing: reset ring buffer when removing modules with events
  2009-05-07  3:13 ` [PATCH 6/7] tracing: reset ring buffer when removing modules with events Steven Rostedt
  2009-05-07  3:51   ` Li Zefan
@ 2009-05-07 16:24   ` Frederic Weisbecker
  1 sibling, 0 replies; 23+ messages in thread
From: Frederic Weisbecker @ 2009-05-07 16:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Li Zefan, Christoph Hellwig

On Wed, May 06, 2009 at 11:13:41PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> Li Zefan found that there's a race using the event ids of events and
> modules. When a module is loaded, an event id is incremented. We only
> have 16 bits for event ids (65536) and there is a possible (but highly
> unlikely) race that we could load and unload a module that registers
> events so many times that the event id counter overflows.
> 
> When it overflows, it then restarts and goes looking for available
> ids. An id is available if it was added by a module and released.
> 
> The race is if you have one module add an id, and then is removed.
> Another module loaded can use that same event id. But if the old module
> still had events in the ring buffer, the new module's call back would
> get bogus data.  At best (and most likely) the output would just be
> garbage. But if the module for some reason used pointers (not recommended)
> then this could potentially crash.
> 
> The safest thing to do is just reset the ring buffer if a module that
> registered events is removed.
> 
> [ Impact: prevent unpredictable results of event id overflows ]
> 
> Reported-by: Li Zefan <lizf@cn.fujitsu.com>
> LKML-Reference: <49FEAFD0.30106@cn.fujitsu.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/trace.c        |   10 ++++++++++
>  kernel/trace/trace.h        |    2 ++
>  kernel/trace/trace_events.c |    9 +++++++++
>  3 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 4164a34..dd40d23 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -639,6 +639,16 @@ void tracing_reset_online_cpus(struct trace_array *tr)
>  		tracing_reset(tr, cpu);
>  }
>  
> +void tracing_reset_current(int cpu)
> +{
> +	tracing_reset(&global_trace, cpu);
> +}
> +
> +void tracing_reset_current_online_cpus(void)
> +{
> +	tracing_reset_online_cpus(&global_trace);
> +}
> +
>  #define SAVED_CMDLINES 128
>  #define NO_CMDLINE_MAP UINT_MAX
>  static unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1];
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 777c6c3..ba25793 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -409,6 +409,8 @@ int tracing_is_enabled(void);
>  void trace_wake_up(void);
>  void tracing_reset(struct trace_array *tr, int cpu);
>  void tracing_reset_online_cpus(struct trace_array *tr);
> +void tracing_reset_current(int cpu);
> +void tracing_reset_current_online_cpus(void);
>  int tracing_open_generic(struct inode *inode, struct file *filp);
>  struct dentry *trace_create_file(const char *name,
>  				 mode_t mode,
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 8d579ff..6d2c842 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -932,9 +932,11 @@ static void trace_module_remove_events(struct module *mod)
>  {
>  	struct ftrace_module_file_ops *file_ops;
>  	struct ftrace_event_call *call, *p;
> +	bool found = false;
>  
>  	list_for_each_entry_safe(call, p, &ftrace_events, list) {
>  		if (call->mod == mod) {
> +			found = true;
>  			if (call->enabled) {
>  				call->enabled = 0;
>  				call->unregfunc();
> @@ -957,6 +959,13 @@ static void trace_module_remove_events(struct module *mod)
>  		list_del(&file_ops->list);
>  		kfree(file_ops);
>  	}
> +
> +	/*
> +	 * It is safest to reset the ring buffer if the module being unloaded
> +	 * registered any events.
> +	 */
> +	if (found)
> +		tracing_reset_current_online_cpus();
>  }



I would suggest to send a single trace_printk message after that
which says:

# current trace buffer content deleted due to module %s unloading

So that no one would end up being stucked while searching why tracing
during module unloading (eg: a module_exit callback of a driver)
never produce any trace :-)

Also there is still a narrow race:

tracing_read_pipe() {
  get_entry_from_buffer();
  tracer_output_callback() {
    ...				              module_unloading()
    dereference_freed_pointer_from_entry()

But at least this patch fixes most of the race window.

Frederic.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] tracing: add hierarchical enabling of events
  2009-05-07  3:13 ` [PATCH 7/7] tracing: add hierarchical enabling of events Steven Rostedt
  2009-05-07  3:51   ` Li Zefan
@ 2009-05-07 16:28   ` Frederic Weisbecker
  1 sibling, 0 replies; 23+ messages in thread
From: Frederic Weisbecker @ 2009-05-07 16:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Li Zefan, Christoph Hellwig

On Wed, May 06, 2009 at 11:13:42PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> With the current event directory, you can only enable individual events.
> The file debugfs/tracing/set_event is used to be able to enable or
> disable several events at once. But that can still be awkward.
> 
> This patch adds hierarchical enabling of events. That is, each directory
> in debugfs/tracing/events has an "enable" file. This file can enable
> or disable all events within the directory and below.
> 
>  # echo 1 > /debugfs/tracing/events/enable
> 
> will enable all events.
> 
>  # echo 1 > /debugfs/tracing/events/sched/enable
> 
> will enable all events in the sched subsystem.
> 
>  # echo 1 > /debugfs/tracing/events/enable
>  # echo 0 > /debugfs/tracing/events/irq/enable
> 
> will enable all events, but then disable just the irq subsystem events.
> 
> When reading one of these enable files, there are four results:
> 
>  0 - all events this file affects are disabled
>  1 - all events this file affects are enabled
>  X - there is a mixture of events enabled and disabled
>  ? - this file does not affect any event



Nice, I also planned to support hierarchical events but I'm too
slow, thanks!


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] tracing: add hierarchical enabling of events
  2009-05-07 13:21     ` Steven Rostedt
@ 2009-05-08  1:11       ` Li Zefan
  2009-05-08  1:23         ` Steven Rostedt
  0 siblings, 1 reply; 23+ messages in thread
From: Li Zefan @ 2009-05-08  1:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Frederic Weisbecker,
	Christoph Hellwig

>> Like this:
>>
>> $ cat events/irq/enable
>> 0 irq_handler_entry
>> 0 irq_handler_exit
>> 1 softirq_entry
>> 1 softirq_exit
> 
> I thought about doing something like this, but this idea for the 
> hierarchical enabling came to me around 11pm, and I had the code written 
> by 11:15pm ;-)
> 
> Which means, I figured I would do it as simple as possible. We do have 
> "set_event" that gives you a list of enabled events. My thought was still 
> having a "1" or "0" if all are either enabled or disabled. And when it is 
> a mixture, I would have a list of enabled events.
> 
> Though, it is useful. Maybe in the future. But really, the information is 
> there, and I did not expect this to be a "what is enabled" file, but 
> instead a "I want to enable/disable all these events". In other words, I 
> was much more interested in the "write" ability than the read. But who 
> knows, maybe this will change in the future.
> 

I have no strong opinion on this. So I'm fine with it, if
no one else has objections.

>> How about:
>>
>> int set = 0;
>>
>> ...
>> set |= (1 << call->enabled);
> 
> * paranoid *
> 
>   set |= (1 << !!call->enabled);
> 
>> ...
>>
>> set == 0: '?'
>> set == 1: '0'
>> set == 2: '1'
>> set == 3: 'X'
>>
>> Will this make the code simpler? :)
>>
>> Or we can go even further:
>>
>> char result[4] = { '?', '0', '1', 'X' };
>> ...
>> buf[0] = result[set];
> 
> cute, mind sending a patch ;-)
> 

Sure. :)

>>> +	ret = ftrace_set_clr_event(command, val);
>> I think we should pass "sched:" or "sched:*", instead of "sched",
>> the comment in ftrace_set_clr_event():
>>
>>          *  <name> (no ':') means all events in a subsystem with
>>          *  the name <name> or any event that matches <name>
> 
> Yeah, I thought about it too. But writing the patch in 15 minutes, I 
> decided that a "kstrdup" was easier than adding a ":" ;-)
> 

I think we can just avoid any kstrdup() or kmalloc(). I'll send a patch.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] tracing: add hierarchical enabling of events
  2009-05-08  1:11       ` Li Zefan
@ 2009-05-08  1:23         ` Steven Rostedt
  2009-05-08  1:24           ` Steven Rostedt
  0 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2009-05-08  1:23 UTC (permalink / raw)
  To: Li Zefan
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Frederic Weisbecker,
	Christoph Hellwig


On Fri, 8 May 2009, Li Zefan wrote:

> >> Like this:
> >>
> >> $ cat events/irq/enable
> >> 0 irq_handler_entry
> >> 0 irq_handler_exit
> >> 1 softirq_entry
> >> 1 softirq_exit
> > 
> > I thought about doing something like this, but this idea for the 
> > hierarchical enabling came to me around 11pm, and I had the code written 
> > by 11:15pm ;-)
> > 
> > Which means, I figured I would do it as simple as possible. We do have 
> > "set_event" that gives you a list of enabled events. My thought was still 
> > having a "1" or "0" if all are either enabled or disabled. And when it is 
> > a mixture, I would have a list of enabled events.
> > 
> > Though, it is useful. Maybe in the future. But really, the information is 
> > there, and I did not expect this to be a "what is enabled" file, but 
> > instead a "I want to enable/disable all these events". In other words, I 
> > was much more interested in the "write" ability than the read. But who 
> > knows, maybe this will change in the future.
> > 
> 
> I have no strong opinion on this. So I'm fine with it, if
> no one else has objections.
> 
> >> How about:
> >>
> >> int set = 0;
> >>
> >> ...
> >> set |= (1 << call->enabled);
> > 
> > * paranoid *
> > 
> >   set |= (1 << !!call->enabled);
> > 
> >> ...
> >>
> >> set == 0: '?'
> >> set == 1: '0'
> >> set == 2: '1'
> >> set == 3: 'X'
> >>
> >> Will this make the code simpler? :)
> >>
> >> Or we can go even further:
> >>
> >> char result[4] = { '?', '0', '1', 'X' };
> >> ...
> >> buf[0] = result[set];
> > 
> > cute, mind sending a patch ;-)
> > 
> 
> Sure. :)

Great, looking forward to it.

> 
> >>> +	ret = ftrace_set_clr_event(command, val);
> >> I think we should pass "sched:" or "sched:*", instead of "sched",
> >> the comment in ftrace_set_clr_event():
> >>
> >>          *  <name> (no ':') means all events in a subsystem with
> >>          *  the name <name> or any event that matches <name>
> > 
> > Yeah, I thought about it too. But writing the patch in 15 minutes, I 
> > decided that a "kstrdup" was easier than adding a ":" ;-)
> > 
> 
> I think we can just avoid any kstrdup() or kmalloc(). I'll send a patch.

Hmm, careful. The ftrace_set_clr_event will modify the string passed in.

-- Steve


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] tracing: add hierarchical enabling of events
  2009-05-08  1:23         ` Steven Rostedt
@ 2009-05-08  1:24           ` Steven Rostedt
  0 siblings, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2009-05-08  1:24 UTC (permalink / raw)
  To: Li Zefan
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Frederic Weisbecker,
	Christoph Hellwig


On Thu, 7 May 2009, Steven Rostedt wrote:
> 
> > 
> > >>> +	ret = ftrace_set_clr_event(command, val);
> > >> I think we should pass "sched:" or "sched:*", instead of "sched",
> > >> the comment in ftrace_set_clr_event():
> > >>
> > >>          *  <name> (no ':') means all events in a subsystem with
> > >>          *  the name <name> or any event that matches <name>
> > > 
> > > Yeah, I thought about it too. But writing the patch in 15 minutes, I 
> > > decided that a "kstrdup" was easier than adding a ":" ;-)
> > > 
> > 
> > I think we can just avoid any kstrdup() or kmalloc(). I'll send a patch.
> 
> Hmm, careful. The ftrace_set_clr_event will modify the string passed in.

Also note, I already updated it to use ":*". You should have received a 
notice from "tip-bot".

-- Steve


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-05-08  1:24 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-07  3:13 [PATCH 0/7] [GIT PULL] tracing/ring-buffer: more updates for tip Steven Rostedt
2009-05-07  3:13 ` [PATCH 1/7] ring-buffer: remove unneeded conditional in rb_reserve_next Steven Rostedt
2009-05-07  8:23   ` Ingo Molnar
2009-05-07  3:13 ` [PATCH 2/7] ring-buffer: check for failed allocation in ring buffer benchmark Steven Rostedt
2009-05-07  3:13 ` [PATCH 3/7] ring-buffer: make moving the tail page a separate function Steven Rostedt
2009-05-07  8:27   ` Ingo Molnar
2009-05-07 13:26     ` Steven Rostedt
2009-05-07 13:56       ` Ingo Molnar
2009-05-07  3:13 ` [PATCH 4/7] ring-buffer: change test to be more latency friendly Steven Rostedt
2009-05-07  8:31   ` Ingo Molnar
2009-05-07  8:34     ` Ingo Molnar
2009-05-07 13:51     ` Steven Rostedt
2009-05-07  3:13 ` [PATCH 5/7] tracing: update sample with TRACE_INCLUDE_FILE Steven Rostedt
2009-05-07  3:13 ` [PATCH 6/7] tracing: reset ring buffer when removing modules with events Steven Rostedt
2009-05-07  3:51   ` Li Zefan
2009-05-07 16:24   ` Frederic Weisbecker
2009-05-07  3:13 ` [PATCH 7/7] tracing: add hierarchical enabling of events Steven Rostedt
2009-05-07  3:51   ` Li Zefan
2009-05-07 13:21     ` Steven Rostedt
2009-05-08  1:11       ` Li Zefan
2009-05-08  1:23         ` Steven Rostedt
2009-05-08  1:24           ` Steven Rostedt
2009-05-07 16:28   ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.