linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] [git pull] tip updates for 2.6.29
@ 2009-02-19  3:35 Steven Rostedt
  2009-02-19  3:35 ` [PATCH 1/2] tracing: disable tracing while testing ring buffer Steven Rostedt
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Steven Rostedt @ 2009-02-19  3:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker


Ingo,

I found the cause of the hard lock up you were seeing. It is one
of those cases where a new patch does not create a bug, but unveils
one. The change that showed the bug was:

e68746a: ftrace: enable filtering only when a function is filtered on

The bug was there all along, but his change revealed it. There were
two bugs actually.

1) The function tracer is useless without KALLSYMS. Without KALLSYMS
   you will only get hex values for your funtion traces.
   This also totally breaks the dynamic function tracer. It depends
   on having names to compare to select functions.

2) In the self test, there is a while loop that consumes the buffer
  and will not end until the buffer is empty. If we still have a
  producer present, this becomes an infinite loop.

The above two bugs are needed for the lock up, as well as the
mentioned patch.  Without the patch, the function filter is activated
whenever we pass in a filter, even if we do not select any function.
The patch changes that to only activate the filter if we succeed in
selecting a function.

Back to the bugs.

Without KALLSYMS, we never select a function, but we still activate
the filter. This causes all functions to be disabled from tracing.
The dynamic ftrace self test fails because it never sees the selected
function get traced.

With the patch and without KALLSYMS selected, we now do not activate
the filter, because no function was selected (all compares of a given
name to a NULL pointer will fail).  Now all functions are still enabled
to be traced.

So, what happens?  The dynamic function tracer self test will call
the test routine while the tracer is still on. The self test will
start consuming all the cpu ring buffers to test them, and will not
end until they are all finished. But you also have RCU_TORTURE selected.
The RCU torture test will run, filling up the ring buffer on other
CPUS. The consumer will never catch up, and we run forever!

Both of these are true bugs that have been in ftrace for a long time.
I think they are candidates for getting in 29, even this late in
the game. You never know what other config combination can hit these
bugs.

The fixes are simple. One is to simply disable the ring buffer
while the consumer runs. This prevents any producer from keeping
the consumer from finishing. The other is to make the function
tracer select KALLSYMS.

And yes, this was a bitch to debug. This was all I did today :-(

Please pull the latest tip/tracing/urgent tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/urgent


Steven Rostedt (2):
      tracing: disable tracing while testing ring buffer
      tracing: have function trace select kallsyms

----
 kernel/trace/Kconfig          |    2 ++
 kernel/trace/trace_selftest.c |    9 +++++++++
 2 files changed, 11 insertions(+), 0 deletions(-)
-- 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] tracing: disable tracing while testing ring buffer
  2009-02-19  3:35 [PATCH 0/2] [git pull] tip updates for 2.6.29 Steven Rostedt
@ 2009-02-19  3:35 ` Steven Rostedt
  2009-02-19  3:35 ` [PATCH 2/2] tracing: have function trace select kallsyms Steven Rostedt
  2009-02-19  9:19 ` [PATCH 0/2] [git pull] tip updates for 2.6.29 Ingo Molnar
  2 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2009-02-19  3:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0001-tracing-disable-tracing-while-testing-ring-buffer.patch --]
[-- Type: text/plain, Size: 1685 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: fix to prevent hard lockup on self tests

If one of the tracers are broken and is constantly filling the ring
buffer while the test of the ring buffer is running, it will hang
the box. The reason is that the test is a consumer that will not
stop till the ring buffer is empty. But if the tracer is broken and
is constantly producing input to the buffer, this test will never
end. The result is a lockup of the box.

This happened when KALLSYMS was not defined and the dynamic ftrace
test constantly filled the ring buffer, because the filter failed
and all functions were being traced. Something was being called
that constantly filled the buffer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 kernel/trace/trace_selftest.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 88c8eb7..a7e0ef6 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -57,11 +57,20 @@ static int trace_test_buffer(struct trace_array *tr, unsigned long *count)
 
 	cnt = ring_buffer_entries(tr->buffer);
 
+	/*
+	 * The trace_test_buffer_cpu runs a while loop to consume all data.
+	 * If the calling tracer is broken, and is constantly filling
+	 * the buffer, this will run forever, and hard lock the box.
+	 * We disable the ring buffer while we do this test to prevent
+	 * a hard lock up.
+	 */
+	tracing_off();
 	for_each_possible_cpu(cpu) {
 		ret = trace_test_buffer_cpu(tr, cpu);
 		if (ret)
 			break;
 	}
+	tracing_on();
 	__raw_spin_unlock(&ftrace_max_lock);
 	local_irq_restore(flags);
 
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] tracing: have function trace select kallsyms
  2009-02-19  3:35 [PATCH 0/2] [git pull] tip updates for 2.6.29 Steven Rostedt
  2009-02-19  3:35 ` [PATCH 1/2] tracing: disable tracing while testing ring buffer Steven Rostedt
@ 2009-02-19  3:35 ` Steven Rostedt
  2009-02-19  9:19 ` [PATCH 0/2] [git pull] tip updates for 2.6.29 Ingo Molnar
  2 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2009-02-19  3:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0002-tracing-have-function-trace-select-kallsyms.patch --]
[-- Type: text/plain, Size: 1115 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: fix output of function tracer to be useful

The function tracer is pretty useless if KALLSYMS is not configured.
Unless you are good at reading hex values, the function tracer should
select the KALLSYMS configuration.

Also, the dynamic function tracer will fail its self test if KALLSYMS
is not selected.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 kernel/trace/Kconfig |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 58a93fb..34e707e 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -52,6 +52,7 @@ config FUNCTION_TRACER
 	depends on HAVE_FUNCTION_TRACER
 	depends on DEBUG_KERNEL
 	select FRAME_POINTER
+	select KALLSYMS
 	select TRACING
 	select CONTEXT_SWITCH_TRACER
 	help
@@ -238,6 +239,7 @@ config STACK_TRACER
 	depends on DEBUG_KERNEL
 	select FUNCTION_TRACER
 	select STACKTRACE
+	select KALLSYMS
 	help
 	  This special tracer records the maximum stack footprint of the
 	  kernel and displays it in debugfs/tracing/stack_trace.
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] [git pull] tip updates for 2.6.29
  2009-02-19  3:35 [PATCH 0/2] [git pull] tip updates for 2.6.29 Steven Rostedt
  2009-02-19  3:35 ` [PATCH 1/2] tracing: disable tracing while testing ring buffer Steven Rostedt
  2009-02-19  3:35 ` [PATCH 2/2] tracing: have function trace select kallsyms Steven Rostedt
@ 2009-02-19  9:19 ` Ingo Molnar
  2009-02-19 18:07   ` Steven Rostedt
  2 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2009-02-19  9:19 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Andrew Morton, Frederic Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Ingo,
> 
> I found the cause of the hard lock up you were seeing. It is one
> of those cases where a new patch does not create a bug, but unveils
> one. The change that showed the bug was:
> 
> e68746a: ftrace: enable filtering only when a function is filtered on
> 
> The bug was there all along, but his change revealed it. There were
> two bugs actually.
> 
> 1) The function tracer is useless without KALLSYMS. Without KALLSYMS
>    you will only get hex values for your funtion traces.
>    This also totally breaks the dynamic function tracer. It depends
>    on having names to compare to select functions.
> 
> 2) In the self test, there is a while loop that consumes the buffer
>   and will not end until the buffer is empty. If we still have a
>   producer present, this becomes an infinite loop.
> 
> The above two bugs are needed for the lock up, as well as the
> mentioned patch.  Without the patch, the function filter is activated
> whenever we pass in a filter, even if we do not select any function.
> The patch changes that to only activate the filter if we succeed in
> selecting a function.
> 
> Back to the bugs.
> 
> Without KALLSYMS, we never select a function, but we still activate
> the filter. This causes all functions to be disabled from tracing.
> The dynamic ftrace self test fails because it never sees the selected
> function get traced.
> 
> With the patch and without KALLSYMS selected, we now do not activate
> the filter, because no function was selected (all compares of a given
> name to a NULL pointer will fail).  Now all functions are still enabled
> to be traced.
> 
> So, what happens?  The dynamic function tracer self test will call
> the test routine while the tracer is still on. The self test will
> start consuming all the cpu ring buffers to test them, and will not
> end until they are all finished. But you also have RCU_TORTURE selected.
> The RCU torture test will run, filling up the ring buffer on other
> CPUS. The consumer will never catch up, and we run forever!

Can this problem hit other types of consumers - the 
/debug/tracing/ ones?

> Both of these are true bugs that have been in ftrace for a long time.
> I think they are candidates for getting in 29, even this late in
> the game. You never know what other config combination can hit these
> bugs.
> 
> The fixes are simple. One is to simply disable the ring buffer
> while the consumer runs. This prevents any producer from keeping
> the consumer from finishing. The other is to make the function
> tracer select KALLSYMS.
> 
> And yes, this was a bitch to debug. This was all I did today :-(

Looks quite subtle indeed.

It might be a safer approach to switch the self-test to 
excercise the actual /debug/tracing paths, instead of having its 
own home-brewn access methods. That way we debug all those 
facilities too - beyond having a self-test - and will avoid bugs 
like this too perhaps.

> Please pull the latest tip/tracing/urgent tree, which can be found at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/urgent
> 
> 
> Steven Rostedt (2):
>       tracing: disable tracing while testing ring buffer
>       tracing: have function trace select kallsyms
> 
> ----
>  kernel/trace/Kconfig          |    2 ++
>  kernel/trace/trace_selftest.c |    9 +++++++++
>  2 files changed, 11 insertions(+), 0 deletions(-)

Pulled into tip:tracing/urgent, thanks Steve!

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] [git pull] tip updates for 2.6.29
  2009-02-19  9:19 ` [PATCH 0/2] [git pull] tip updates for 2.6.29 Ingo Molnar
@ 2009-02-19 18:07   ` Steven Rostedt
  0 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2009-02-19 18:07 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton, Frederic Weisbecker


On Thu, 19 Feb 2009, Ingo Molnar wrote:
> > 
> > So, what happens?  The dynamic function tracer self test will call
> > the test routine while the tracer is still on. The self test will
> > start consuming all the cpu ring buffers to test them, and will not
> > end until they are all finished. But you also have RCU_TORTURE selected.
> > The RCU torture test will run, filling up the ring buffer on other
> > CPUS. The consumer will never catch up, and we run forever!
> 
> Can this problem hit other types of consumers - the 
> /debug/tracing/ ones?

No, because the consumer is just a process, and can be preempted. In fact, 
this is normal producer/consumer behavior. This was a bug because it was 
during self tests. The self test is at boot up and will not continue boot 
up processing until it is finished. It's a special case because it is a 
"reader" in the boot up code.  All other readers are done by user tasks. 
The only other kernel reader is ftrace_dump, but it disables all of ftrace 
(ring buffers and all) before dumping, and only on crashes anyway.


> 
> > Both of these are true bugs that have been in ftrace for a long time.
> > I think they are candidates for getting in 29, even this late in
> > the game. You never know what other config combination can hit these
> > bugs.
> > 
> > The fixes are simple. One is to simply disable the ring buffer
> > while the consumer runs. This prevents any producer from keeping
> > the consumer from finishing. The other is to make the function
> > tracer select KALLSYMS.
> > 
> > And yes, this was a bitch to debug. This was all I did today :-(
> 
> Looks quite subtle indeed.
> 
> It might be a safer approach to switch the self-test to 
> excercise the actual /debug/tracing paths, instead of having its 
> own home-brewn access methods. That way we debug all those 
> facilities too - beyond having a self-test - and will avoid bugs 
> like this too perhaps.


We could think of ways to redesign the self tests. But for now, they have
helped ups find bugs. Again, I'm not sure if we can change things much,
because the bug that caused this was because the reading was running 
during boot up time, and expected to finish. Where as, the /debug/tracing 
code is accessed by user tasks that can run for as long as they want.

-- Steve


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-02-19 18:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-19  3:35 [PATCH 0/2] [git pull] tip updates for 2.6.29 Steven Rostedt
2009-02-19  3:35 ` [PATCH 1/2] tracing: disable tracing while testing ring buffer Steven Rostedt
2009-02-19  3:35 ` [PATCH 2/2] tracing: have function trace select kallsyms Steven Rostedt
2009-02-19  9:19 ` [PATCH 0/2] [git pull] tip updates for 2.6.29 Ingo Molnar
2009-02-19 18:07   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).