All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about ARM function graph tracing
@ 2011-02-03 23:42 Tim Bird
  2011-02-04  4:07 ` Rabin Vincent
  0 siblings, 1 reply; 6+ messages in thread
From: Tim Bird @ 2011-02-03 23:42 UTC (permalink / raw)
  To: linux-arm-kernel

Vincent,

I sheepishly admit that I haven't been following mainline very
well.  I was happily surprised to see you have mainlined ARM
function graph tracing.  I gave it a test on my old OMAP
OSK board, with the results shown below.

My quick questions, before I dive into this, are:
1) what targets did you test with?
2) Did you test with and without dynamic tracing?
3) what version of compiler did you use?

I'm just checking, to narrow down the search space for
debugging through this.

Thanks,
 -- Tim


Kernel is version 2.6.38-rc3

I compiled the kernel with:
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_RCU_TRACE is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_TRACEPOINTS=y
CONFIG_STACKTRACE=y
# CONFIG_BACKTRACE_SELF_TEST is not set
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_GENERIC_TRACER=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_PREEMPT_TRACER is not set
CONFIG_SCHED_TRACER=y
# CONFIG_STACK_TRACER is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_DYNAMIC_FTRACE is not set
# CONFIG_FTRACE_STARTUP_TEST is not set

The kernel compiles and runs.  Function tracing works OK

Function graph tracing hangs the kernel, when it is enabled:

/debug/tracing # echo function_graph >current_tracer
Unable to handle kernel NULL pointer dereference at virtual address 0000033c
pgd = c1a4e578
[0000033c] *pgd=0000033c
Unable to handle kernel paging request at virtual address b0000000
pgd = c1a4e578
[b0000000] *pgd=c1a51210
Unable to handle kernel paging request at virtual address 71a51000
pgd = c1a4e578
[71a51000] *pgd=00000001
Unable to handle kernel paging request at virtual address b0000144
pgd = c1a4e578
[b0000144] *pgd=c1a51210
Unable to handle kernel paging request at virtual address 71a51000
pgd = c1a4e578
[71a51000] *pgd=00000001
Unable to handle kernel paging request at virtual address b0000144
pgd = c1a4e578
...

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Network Entertainment
=============================

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Question about ARM function graph tracing
  2011-02-03 23:42 Question about ARM function graph tracing Tim Bird
@ 2011-02-04  4:07 ` Rabin Vincent
  2011-02-04 20:29   ` Tim Bird
  0 siblings, 1 reply; 6+ messages in thread
From: Rabin Vincent @ 2011-02-04  4:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 4, 2011 at 05:12, Tim Bird <tim.bird@am.sony.com> wrote:
> I sheepishly admit that I haven't been following mainline very
> well. ?I was happily surprised to see you have mainlined ARM
> function graph tracing. ?I gave it a test on my old OMAP
> OSK board, with the results shown below.

OMAP is missing a notrace annotation on omap_readl():

http://www.mail-archive.com/linux-omap at vger.kernel.org/msg38911.html

Part of that patch was merged through rmk's work, but the
omap_readl() annotation seems to have been missed.

Also, if this is OMAP1 and not latest mainline, I think you
will need  a notrace on omap_rev() also (called from inside
omap_readl() until recently).

I think this is most probably what is wrong, since IIRC I saw
crashes like this on BeagleBoard before I added the notrace on
omap_readl.  Could you please try with these changes?

Rabin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Question about ARM function graph tracing
  2011-02-04  4:07 ` Rabin Vincent
@ 2011-02-04 20:29   ` Tim Bird
  2011-02-16  1:39     ` Tim Bird
  0 siblings, 1 reply; 6+ messages in thread
From: Tim Bird @ 2011-02-04 20:29 UTC (permalink / raw)
  To: linux-arm-kernel

Rabin,

For a quick update, see below.

On 02/03/2011 08:07 PM, Rabin Vincent wrote:
> On Fri, Feb 4, 2011 at 05:12, Tim Bird <tim.bird@am.sony.com> wrote:
>> I sheepishly admit that I haven't been following mainline very
>> well. ?I was happily surprised to see you have mainlined ARMj
>> function graph tracing. ?I gave it a test on my old OMAP
>> OSK board, with the results shown below.
> 
> OMAP is missing a notrace annotation on omap_readl():
> 
> http://www.mail-archive.com/linux-omap at vger.kernel.org/msg38911.html
> 
> Part of that patch was merged through rmk's work, but the
> omap_readl() annotation seems to have been missed.
Yep.  It's missing in 2.6.38-rc3.

> Also, if this is OMAP1 and not latest mainline, I think you
> will need  a notrace on omap_rev() also (called from inside
> omap_readl() until recently).

I couldn't find that in the sched_clock code paths, but
I 'notrace'd it anyways.

> 
> I think this is most probably what is wrong, since IIRC I saw
> crashes like this on BeagleBoard before I added the notrace on
> omap_readl.  Could you please try with these changes?

I have sprinkled 'notrace's liberally throughout the sched_clock
code (including omap_readl() and omap_rev()), and I'm still
seeing problems.  I put a recursion guard in
prepare_ftrace_return, and I'm seeing lots of recursion.  So
there's still a notrace missing somewhere.

I'm sure this is the problem, so thanks for pointing me in the
right direction.  I'll keep trying to pin it down, and let
you know what I find.
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Network Entertainment
=============================

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Question about ARM function graph tracing
  2011-02-04 20:29   ` Tim Bird
@ 2011-02-16  1:39     ` Tim Bird
  2011-02-16  2:05       ` Steven Rostedt
  0 siblings, 1 reply; 6+ messages in thread
From: Tim Bird @ 2011-02-16  1:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/04/2011 12:29 PM, Tim Bird wrote:
>> OMAP is missing a notrace annotation on omap_readl():
>>
>> http://www.mail-archive.com/linux-omap at vger.kernel.org/msg38911.html
>>
>> Part of that patch was merged through rmk's work, but the
>> omap_readl() annotation seems to have been missed.
> Yep.  It's missing in 2.6.38-rc3.
> 
>> Also, if this is OMAP1 and not latest mainline, I think you
>> will need  a notrace on omap_rev() also (called from inside
>> omap_readl() until recently).
> 
> I couldn't find that in the sched_clock code paths, but
> I 'notrace'd it anyways.
> 
>>
>> I think this is most probably what is wrong, since IIRC I saw
>> crashes like this on BeagleBoard before I added the notrace on
>> omap_readl.  Could you please try with these changes?
> 
> I have sprinkled 'notrace's liberally throughout the sched_clock
> code (including omap_readl() and omap_rev()), and I'm still
> seeing problems.  I put a recursion guard in
> prepare_ftrace_return, and I'm seeing lots of recursion.  So
> there's still a notrace missing somewhere.

OK.  The notrace was indeed missing fro omap_readl(), but fixing
this did not completely fix the problem.  I put the following
(crude) recursion guard in arch/arm/kernel/ftrace.c:prepare_ftrace_return:

diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index c0062ad..5872c25 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -209,6 +209,8 @@ int __init ftrace_dyn_arch_init(void *data)
 #endif /* CONFIG_DYNAMIC_FTRACE */

 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
+static int already_here = 0;
+
 void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
                           unsigned long frame_pointer)
 {
@@ -217,8 +219,16 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_a
        unsigned long old;
        int err;

-       if (unlikely(atomic_read(&current->tracing_graph_pause)))
+       if(unlikely(already_here)) {
                return;
+       } else {
+               already_here=1;
+       }
+
+       if (unlikely(atomic_read(&current->tracing_graph_pause))) {
+               already_here=0;
+               return;
+       }

        old = *parent;
        *parent = return_hooker;
@@ -227,6 +237,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_ad
                                       frame_pointer);
        if (err == -EBUSY) {
                *parent = old;
+               already_here=0;
                return;
        }

@@ -237,6 +248,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_ad
                current->curr_ret_stack--;
                *parent = old;
        }
+       already_here=0;
 }

 #ifdef CONFIG_DYNAMIC_FTRACE

Without this, the system hangs for long periods of time after starting
the function graph tracer (echo function_graph >/debug/tracing/current_tracer)
Sometimes the system comes back and sometimes it doesn't.  With the above
code, the system works, and I can get traces (but obviously I'm dumping
a bunch of tracepoints whenever the tracer nests.)

I put some extra instrumentation in, and MOST of the time, the item
causing nesting is asm_do_IRQ(), which is to be expected.  However,
I have seen a recursion from preempt_schedule(), which I didn't really
expect.  Does the tracer invoke anything that would cause a schedule
from inside the trace-recording code?

Should I try this with CONFIG_PREEMPT off? (It's currently on.)

Any ideas for debugging this would be appreciated.
 -- Tim


=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Network Entertainment
=============================

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Question about ARM function graph tracing
  2011-02-16  1:39     ` Tim Bird
@ 2011-02-16  2:05       ` Steven Rostedt
  2011-02-16  4:39         ` Frederic Weisbecker
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2011-02-16  2:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2011-02-15 at 17:39 -0800, Tim Bird wrote:
> On 02/04/2011 12:29 PM, Tim Bird wrote:
> >> OMAP is missing a notrace annotation on omap_readl():
> >>
> >> http://www.mail-archive.com/linux-omap at vger.kernel.org/msg38911.html
> >>
> >> Part of that patch was merged through rmk's work, but the
> >> omap_readl() annotation seems to have been missed.
> > Yep.  It's missing in 2.6.38-rc3.
> > 
> >> Also, if this is OMAP1 and not latest mainline, I think you
> >> will need  a notrace on omap_rev() also (called from inside
> >> omap_readl() until recently).
> > 
> > I couldn't find that in the sched_clock code paths, but
> > I 'notrace'd it anyways.
> > 
> >>
> >> I think this is most probably what is wrong, since IIRC I saw
> >> crashes like this on BeagleBoard before I added the notrace on
> >> omap_readl.  Could you please try with these changes?
> > 
> > I have sprinkled 'notrace's liberally throughout the sched_clock
> > code (including omap_readl() and omap_rev()), and I'm still
> > seeing problems.  I put a recursion guard in
> > prepare_ftrace_return, and I'm seeing lots of recursion.  So
> > there's still a notrace missing somewhere.

Note, if you find that you need to add "notrace" to most functions in a
file, you can simply make the entire file "notrace" from the make file:

CFLAGS_REMOVE_myfile.o = -pg


> 
> OK.  The notrace was indeed missing fro omap_readl(), but fixing
> this did not completely fix the problem.  I put the following
> (crude) recursion guard in arch/arm/kernel/ftrace.c:prepare_ftrace_return:
> 
> diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
> index c0062ad..5872c25 100644
> --- a/arch/arm/kernel/ftrace.c
> +++ b/arch/arm/kernel/ftrace.c
> @@ -209,6 +209,8 @@ int __init ftrace_dyn_arch_init(void *data)
>  #endif /* CONFIG_DYNAMIC_FTRACE */
> 
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +static int already_here = 0;
> +
>  void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
>                            unsigned long frame_pointer)
>  {
> @@ -217,8 +219,16 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_a
>         unsigned long old;
>         int err;
> 
> -       if (unlikely(atomic_read(&current->tracing_graph_pause)))
> +       if(unlikely(already_here)) {
>                 return;
> +       } else {
> +               already_here=1;
> +       }

Um, single cpu ?


> +
> +       if (unlikely(atomic_read(&current->tracing_graph_pause))) {
> +               already_here=0;
> +               return;
> +       }
> 
>         old = *parent;
>         *parent = return_hooker;
> @@ -227,6 +237,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_ad
>                                        frame_pointer);
>         if (err == -EBUSY) {
>                 *parent = old;
> +               already_here=0;
>                 return;
>         }
> 
> @@ -237,6 +248,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_ad
>                 current->curr_ret_stack--;
>                 *parent = old;
>         }
> +       already_here=0;
>  }
> 
>  #ifdef CONFIG_DYNAMIC_FTRACE
> 
> Without this, the system hangs for long periods of time after starting
> the function graph tracer (echo function_graph >/debug/tracing/current_tracer)
> Sometimes the system comes back and sometimes it doesn't.  With the above
> code, the system works, and I can get traces (but obviously I'm dumping
> a bunch of tracepoints whenever the tracer nests.)
> 
> I put some extra instrumentation in, and MOST of the time, the item
> causing nesting is asm_do_IRQ(), which is to be expected.  However,
> I have seen a recursion from preempt_schedule(), which I didn't really
> expect.  Does the tracer invoke anything that would cause a schedule
> from inside the trace-recording code?
> 
> Should I try this with CONFIG_PREEMPT off? (It's currently on.)
> 
> Any ideas for debugging this would be appreciated.

I've had to debug things that broke in the function graph tracer like
this before. Is dynamic ftrace available in arm?

Hmm, from arch/arm/Kconfig:

	select HAVE_DYNAMIC_FTRACE if (!XIP_KERNEL)


I guess so. If you are running something other than this XIP_KERNEL, I
would suggest using the dynamic functions to binary search the issue.

cat available_filter_functions > /tmp/funcs

Then copy half the funcs to /tmp/funcs1 and the other half
to /tmp/funcs2

cat /tmp/funcs1 > set_ftrace_filter

This may take a while, as setting the filter one by one is slow.

then run the function graph tracer. If it locks up, try it again
with /tmp/funcs2. If it locks up, you may have two issues or it could be
something wrong with the function graph tracer itself. But I'm betting
one will work and one wont.

Then just keep chopping these funcs files until you get down to a set of
functions that are causing issues. Then it should point out the problem.

This is the procedure I used to find out why a kvm guest was locking up
with function graph tracer.

See commit:

258af47479980d8238a04568b94a4e55aa1cb537

Something tells me you may have a similar issue ;)

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Question about ARM function graph tracing
  2011-02-16  2:05       ` Steven Rostedt
@ 2011-02-16  4:39         ` Frederic Weisbecker
  0 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2011-02-16  4:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 15, 2011 at 09:05:45PM -0500, Steven Rostedt wrote:
> On Tue, 2011-02-15 at 17:39 -0800, Tim Bird wrote:
> I've had to debug things that broke in the function graph tracer like
> this before. Is dynamic ftrace available in arm?
> 
> Hmm, from arch/arm/Kconfig:
> 
> 	select HAVE_DYNAMIC_FTRACE if (!XIP_KERNEL)
> 
> 
> I guess so. If you are running something other than this XIP_KERNEL, I
> would suggest using the dynamic functions to binary search the issue.
> 
> cat available_filter_functions > /tmp/funcs
> 
> Then copy half the funcs to /tmp/funcs1 and the other half
> to /tmp/funcs2
> 
> cat /tmp/funcs1 > set_ftrace_filter
> 
> This may take a while, as setting the filter one by one is slow.
> 
> then run the function graph tracer. If it locks up, try it again
> with /tmp/funcs2. If it locks up, you may have two issues or it could be
> something wrong with the function graph tracer itself. But I'm betting
> one will work and one wont.
> 
> Then just keep chopping these funcs files until you get down to a set of
> functions that are causing issues. Then it should point out the problem.
> 
> This is the procedure I used to find out why a kvm guest was locking up
> with function graph tracer.
> 
> See commit:
> 
> 258af47479980d8238a04568b94a4e55aa1cb537
> 
> Something tells me you may have a similar issue ;)


That's a very nice idea to debug ftrace recursions!

As a last resort, in case the binary search doesn't work, you can try:

1) do a local_irq_save() in asm (to avoid possible recursions with common funcs)
2) test recursion, if so, printk_once() the traced func
3) exit recursion
3) local_irq_restore() in asm

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-02-16  4:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-03 23:42 Question about ARM function graph tracing Tim Bird
2011-02-04  4:07 ` Rabin Vincent
2011-02-04 20:29   ` Tim Bird
2011-02-16  1:39     ` Tim Bird
2011-02-16  2:05       ` Steven Rostedt
2011-02-16  4:39         ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.