All of lore.kernel.org
 help / color / mirror / Atom feed
* ftrace function_graph causes system crash
@ 2016-09-20 13:10 ` Bean Huo (beanhuo)
  0 siblings, 0 replies; 14+ messages in thread
From: Bean Huo (beanhuo) @ 2016-09-20 13:10 UTC (permalink / raw)
  To: rostedt, mingo, will.deacon, catalin.marinas
  Cc: rfi, linux-kernel, linux-arm-kernel, Bean Huo (beanhuo),
	Zoltan Szubbocsev (zszubbocsev)

Hi, all 
I just use ftrace to do some latency study, found that function_graph can not
Work, as long as enable it, will cause kernel panic. I searched this online.
Found that there are also some cause the same as mine. I am a newer of ftrace. 
I want to know who know what root cause? Here is some partial log:


echo function_graph > current_tracer 
[    9.583813] Unable to handle kernel paging request at virtual address b0200083
[    9.590997] pgd = c0004000
[    9.593683] [b0200083] *pgd=00000000
[    9.597253] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[    9.602542] Modules linked in:
[    9.605586] CPU: 1 PID: 15 Comm: kworker/1:0 Not tainted 4.0.0-xilinx-00043-gc701690-dirty #515
[    9.614256] Hardware name: Xilinx Zynq Platform
[    9.618793] Workqueue:  0xe3a00001 (\x04 “å)
[    9.622765] task: df517500 ti: df518000 task.ti: df518000
[    9.628162] PC is at rb_update_write_stamp+0x18/0xa0
[    9.633100] LR is at rb_commit+0x34/0xdc
[    9.637005] pc : [<c00b6674>]    lr : [<c00b716c>]    psr: 60000093
[    9.637005] sp : df51a0f0  ip : df40b00c  fp : df51a104
[    9.648456] r10: 00000000  r9 : 00000000  r8 : 00bbff80
[    9.653666] r7 : c0edcf18  r6 : df51a190  r5 : df404100  r4 : df484680
[    9.660175] r3 : b020006b  r2 : df40b000  r1 : df40b2bc  r0 : df484680
[    9.666687] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[    9.674065] Control: 18c5387d  Table: 1ee7004a  DAC: 00000015
[    9.679792] Process kworker/1:0 (pid: 15, stack limit = 0xdf518210)
[    9.686041] Stack: (0xdf51a0f0 to 0xdf51a000)
[    9.690411] [<c00b6674>] (rb_update_write_stamp) from [<c00b716c>] (rb_commit+0x34/0xdc)
[    9.698478] [<c00b716c>] (rb_commit) from [<c00b7968>] (ring_buffer_unlock_commit+0x30/0x18c)
[    9.706987] [<c00b7968>] (ring_buffer_unlock_commit) from [<c00beca4>] (__buffer_unlock_commit+0x20/0x28)
[    9.716534] [<c00beca4>] (__buffer_unlock_commit) from [<c00ca094>] (__trace_graph_entry+0x94/0xa8)
[    9.725557] [<c00ca094>] (__trace_graph_entry) from [<c00ca260>] (trace_graph_entry+0x1b8/0x224)
[    9.734323] [<c00ca260>] (trace_graph_entry) from [<c0020824>] (prepare_ftrace_return+0x70/0xac)
[    9.743089] [<c0020824>] (prepare_ftrace_return) from [<c0020404>] (ftrace_graph_caller+0x18/0x20)
[    9.752020] Code: e24cb004 e5903048 e3c1ceff e3cc200f (e5933018) 
[    9.758104] ---[ end trace 5781781938261a54 ]---
[    9.762689] Kernel panic - not syncing: Fatal exception
[   10.933397] SMP: failed to stop secondary CPUs
[   10.937805] ---[ end Kernel panic - not syncing: Fatal exception
[   10.943779] CPU1: stopping
[   10.946471] CPU: 1 PID: 15 Comm: kworker/1:0 Tainted: G      D         4.0.0-xilinx-00043-gc701690-dirty #515
[   10.956358] Hardware name: Xilinx Zynq Platform
[   10.960889] Workqueue:  0xe3a00001 (\x04 “å)
[   10.964896] [<c0021d64>] (unwind_backtrace) from [<c001c1f4>] (show_stack+0x20/0x24)
[   10.972620] [<c001c1f4>] (show_stack) from [<c0592250>] (dump_stack+0x80/0xd0)
[   10.979822] [<c0592250>] (dump_stack) from [<c001efa4>] (ipi_cpu_stop+0x4c/0x80)
[   10.987197] [<c001efa4>] (ipi_cpu_stop) from [<c001f728>] (handle_IPI+0x74/0xbc)
[   10.994573] [<c001f728>] (handle_IPI) from [<c0008720>] (gic_handle_irq+0x68/0x70)
[   11.002123] [<c0008720>] (gic_handle_irq) from [<c001cd84>] (__irq_svc+0x44/0x7c)
[   11.009569] Exception stack(0xdf519e70 to 0xdf519eb8)
[   11.014608] 9e60:                                     00000034 df518000 00000000 c08d9524
[   11.022770] 9e80: 00000000 00000000 c00317c8 00000000 c00b6674 00000000 00000008 df519edc
[   11.030929] 9ea0: df519eb8 df519eb8 c058f64c c058f650 60000113 ffffffff
[   11.037541] [<c001cd84>] (__irq_svc) from [<c058f650>] (panic+0x188/0x20c)
[   11.044399] [<c058f650>] (panic) from [<c001c578>] (die+0x380/0x3e0)
[   11.050739] [<c001c578>] (die) from [<c0027c20>] (__do_kernel_fault+0x74/0x94)
[   11.057941] [<c0027c20>] (__do_kernel_fault) from [<c0027f3c>] (do_page_fault+0x2fc/0x31c)
[   11.066185] [<c0027f3c>] (do_page_fault) from [<c0028004>] (do_translation_fault+0x2c/0xc0)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ftrace function_graph causes system crash
@ 2016-09-20 13:10 ` Bean Huo (beanhuo)
  0 siblings, 0 replies; 14+ messages in thread
From: Bean Huo (beanhuo) @ 2016-09-20 13:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi, all 
I just use ftrace to do some latency study, found that function_graph can not
Work, as long as enable it, will cause kernel panic. I searched this online.
Found that there are also some cause the same as mine. I am a newer of ftrace. 
I want to know who know what root cause? Here is some partial log:


echo function_graph > current_tracer 
[    9.583813] Unable to handle kernel paging request at virtual address b0200083
[    9.590997] pgd = c0004000
[    9.593683] [b0200083] *pgd=00000000
[    9.597253] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[    9.602542] Modules linked in:
[    9.605586] CPU: 1 PID: 15 Comm: kworker/1:0 Not tainted 4.0.0-xilinx-00043-gc701690-dirty #515
[    9.614256] Hardware name: Xilinx Zynq Platform
[    9.618793] Workqueue:  0xe3a00001 (\x04 ??)
[    9.622765] task: df517500 ti: df518000 task.ti: df518000
[    9.628162] PC is at rb_update_write_stamp+0x18/0xa0
[    9.633100] LR is at rb_commit+0x34/0xdc
[    9.637005] pc : [<c00b6674>]    lr : [<c00b716c>]    psr: 60000093
[    9.637005] sp : df51a0f0  ip : df40b00c  fp : df51a104
[    9.648456] r10: 00000000  r9 : 00000000  r8 : 00bbff80
[    9.653666] r7 : c0edcf18  r6 : df51a190  r5 : df404100  r4 : df484680
[    9.660175] r3 : b020006b  r2 : df40b000  r1 : df40b2bc  r0 : df484680
[    9.666687] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[    9.674065] Control: 18c5387d  Table: 1ee7004a  DAC: 00000015
[    9.679792] Process kworker/1:0 (pid: 15, stack limit = 0xdf518210)
[    9.686041] Stack: (0xdf51a0f0 to 0xdf51a000)
[    9.690411] [<c00b6674>] (rb_update_write_stamp) from [<c00b716c>] (rb_commit+0x34/0xdc)
[    9.698478] [<c00b716c>] (rb_commit) from [<c00b7968>] (ring_buffer_unlock_commit+0x30/0x18c)
[    9.706987] [<c00b7968>] (ring_buffer_unlock_commit) from [<c00beca4>] (__buffer_unlock_commit+0x20/0x28)
[    9.716534] [<c00beca4>] (__buffer_unlock_commit) from [<c00ca094>] (__trace_graph_entry+0x94/0xa8)
[    9.725557] [<c00ca094>] (__trace_graph_entry) from [<c00ca260>] (trace_graph_entry+0x1b8/0x224)
[    9.734323] [<c00ca260>] (trace_graph_entry) from [<c0020824>] (prepare_ftrace_return+0x70/0xac)
[    9.743089] [<c0020824>] (prepare_ftrace_return) from [<c0020404>] (ftrace_graph_caller+0x18/0x20)
[    9.752020] Code: e24cb004 e5903048 e3c1ceff e3cc200f (e5933018) 
[    9.758104] ---[ end trace 5781781938261a54 ]---
[    9.762689] Kernel panic - not syncing: Fatal exception
[   10.933397] SMP: failed to stop secondary CPUs
[   10.937805] ---[ end Kernel panic - not syncing: Fatal exception
[   10.943779] CPU1: stopping
[   10.946471] CPU: 1 PID: 15 Comm: kworker/1:0 Tainted: G      D         4.0.0-xilinx-00043-gc701690-dirty #515
[   10.956358] Hardware name: Xilinx Zynq Platform
[   10.960889] Workqueue:  0xe3a00001 (\x04 ??)
[   10.964896] [<c0021d64>] (unwind_backtrace) from [<c001c1f4>] (show_stack+0x20/0x24)
[   10.972620] [<c001c1f4>] (show_stack) from [<c0592250>] (dump_stack+0x80/0xd0)
[   10.979822] [<c0592250>] (dump_stack) from [<c001efa4>] (ipi_cpu_stop+0x4c/0x80)
[   10.987197] [<c001efa4>] (ipi_cpu_stop) from [<c001f728>] (handle_IPI+0x74/0xbc)
[   10.994573] [<c001f728>] (handle_IPI) from [<c0008720>] (gic_handle_irq+0x68/0x70)
[   11.002123] [<c0008720>] (gic_handle_irq) from [<c001cd84>] (__irq_svc+0x44/0x7c)
[   11.009569] Exception stack(0xdf519e70 to 0xdf519eb8)
[   11.014608] 9e60:                                     00000034 df518000 00000000 c08d9524
[   11.022770] 9e80: 00000000 00000000 c00317c8 00000000 c00b6674 00000000 00000008 df519edc
[   11.030929] 9ea0: df519eb8 df519eb8 c058f64c c058f650 60000113 ffffffff
[   11.037541] [<c001cd84>] (__irq_svc) from [<c058f650>] (panic+0x188/0x20c)
[   11.044399] [<c058f650>] (panic) from [<c001c578>] (die+0x380/0x3e0)
[   11.050739] [<c001c578>] (die) from [<c0027c20>] (__do_kernel_fault+0x74/0x94)
[   11.057941] [<c0027c20>] (__do_kernel_fault) from [<c0027f3c>] (do_page_fault+0x2fc/0x31c)
[   11.066185] [<c0027f3c>] (do_page_fault) from [<c0028004>] (do_translation_fault+0x2c/0xc0)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ftrace function_graph causes system crash
  2016-09-20 13:10 ` Bean Huo (beanhuo)
@ 2016-09-20 14:07   ` Steven Rostedt
  -1 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2016-09-20 14:07 UTC (permalink / raw)
  To: Bean Huo (beanhuo)
  Cc: mingo, will.deacon, catalin.marinas, rfi, linux-kernel,
	linux-arm-kernel, Zoltan Szubbocsev (zszubbocsev)

[-- Attachment #1: Type: text/plain, Size: 2382 bytes --]

On Tue, 20 Sep 2016 13:10:39 +0000
"Bean Huo (beanhuo)" <beanhuo@micron.com> wrote:

> Hi, all 
> I just use ftrace to do some latency study, found that function_graph can not
> Work, as long as enable it, will cause kernel panic. I searched this online.
> Found that there are also some cause the same as mine. I am a newer of ftrace. 
> I want to know who know what root cause? Here is some partial log:
> 
> 

Can you do a function bisect to find what function this is.

This script is used to help find functions that are being traced by
function tracer or function graph tracing that causes the machine to
reboot, hang, or crash. Here's the steps to take.

First, determine if function graph is working with a single function:

# cd /sys/kernel/debug/tracing
# echo schedule > set_ftrace_filter
# echo function_graph > current_tracer

If this works, then we know that something is being traced that
shouldn't be.

# echo nop > current_tracer

# cat available_filter_functions > ~/full-file
# ftrace-bisect ~/full-file ~/test-file ~/non-test-file
# cat ~/test-file > set_ftrace_filter

*** Note *** this will take several minutes. Setting multiple functions
is an O(n^2) operation, and we are dealing with thousands of functions.
So go have  coffee, talk with your coworkers, read facebook. And
eventually, this operation will end.

# echo function_graph > current_tracer

If it crashes, we know that ~/test-file has a bad function.

   Reboot back to test kernel.

   # cd /sys/kernel/debug/tracing
   # mv ~/test-file ~/full-file

If it didn't crash.

   # echo nop > current_tracer
   # mv ~/non-test-file ~/full-file

Get rid of the other test file from previous run (or save them off
somewhere.
# rm -f ~/test-file ~/non-test-file

And start again:

# ftrace-bisect ~/full-file ~/test-file ~/non-test-file

The good thing is, because this cuts the number of functions in
~/test-file by half, the cat of it into set_ftrace_filter takes half as
long each iteration, so don't talk so much at the water cooler the
second time.

Eventually, if you did this correctly, you will get down to the problem
function, and all we need to do is to notrace it.

The way to figure out if the problem function is bad, just do:

# echo <problem-function> > set_ftrace_notrace
# echo > set_ftrace_filter
# echo function_graph > current_tracer

And if it doesn't crash, we are done.

-- Steve

[-- Attachment #2: ftrace-bisect --]
[-- Type: application/octet-stream, Size: 1487 bytes --]

#!/bin/bash
#
# Here's how to use this:
#
# Start off by doing:
#
#   cat /sys/kernel/debug/tracing/available_filter_functions > full-file
#   ftrace-bisect full-file test-file non-test-file
#   cat test-file > /sys/kernel/debug/tracing/set_ftrace_filter
#
#    [ this may take several minutes to complete, be patient ]
#
#   echo function > /sys/kernel/debug/tracing/current_tracer
#
# If it crashes/locks-up whatever, then reboot and do:
#
#   mv test-file new-file
#
# Else, if it did not crash, then do:
#
#   mv non-test-file  new-file
#
# Then you can continue with:
#
#   ftrace-bisect new-file test-file non-test-file
#   cat test-file > /sys/kernel/debug/tracing/set_ftrace_filter
#     [ again, this can take time to run ]
#   echo function > /sys/kernel/debug/tracing/current_tracer
#
# And repeat.
#


if [ $# -ne 3 ]; then
  echo 'usage: ftrace-bisect full-file test-file  non-test-file'
  exit
fi

full=$1
test=$2
nontest=$3

x=`cat $full | wc -l`
if [ $x -eq 1 ]; then
	echo "There's only one function left, must be the bad one"
	cat $full
	exit 0
fi

let x=$x/2
let y=$x+1

if [ ! -f $full ]; then
	echo "$full does not exist"
	exit 1
fi

if [ -f $test ]; then
	echo -n "$test exists, delete it? [y/N]"
	read a
	if [ "$a" != "y" -a "$a" != "Y" ]; then
		exit 1
	fi
fi

if [ -f $nontest ]; then
	echo -n "$nontest exists, delete it? [y/N]"
	read a
	if [ "$a" != "y" -a "$a" != "Y" ]; then
		exit 1
	fi
fi

sed -ne "1,${x}p" $full > $test
sed -ne "$y,\$p" $full > $nontest

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ftrace function_graph causes system crash
@ 2016-09-20 14:07   ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2016-09-20 14:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 20 Sep 2016 13:10:39 +0000
"Bean Huo (beanhuo)" <beanhuo@micron.com> wrote:

> Hi, all 
> I just use ftrace to do some latency study, found that function_graph can not
> Work, as long as enable it, will cause kernel panic. I searched this online.
> Found that there are also some cause the same as mine. I am a newer of ftrace. 
> I want to know who know what root cause? Here is some partial log:
> 
> 

Can you do a function bisect to find what function this is.

This script is used to help find functions that are being traced by
function tracer or function graph tracing that causes the machine to
reboot, hang, or crash. Here's the steps to take.

First, determine if function graph is working with a single function:

# cd /sys/kernel/debug/tracing
# echo schedule > set_ftrace_filter
# echo function_graph > current_tracer

If this works, then we know that something is being traced that
shouldn't be.

# echo nop > current_tracer

# cat available_filter_functions > ~/full-file
# ftrace-bisect ~/full-file ~/test-file ~/non-test-file
# cat ~/test-file > set_ftrace_filter

*** Note *** this will take several minutes. Setting multiple functions
is an O(n^2) operation, and we are dealing with thousands of functions.
So go have  coffee, talk with your coworkers, read facebook. And
eventually, this operation will end.

# echo function_graph > current_tracer

If it crashes, we know that ~/test-file has a bad function.

   Reboot back to test kernel.

   # cd /sys/kernel/debug/tracing
   # mv ~/test-file ~/full-file

If it didn't crash.

   # echo nop > current_tracer
   # mv ~/non-test-file ~/full-file

Get rid of the other test file from previous run (or save them off
somewhere.
# rm -f ~/test-file ~/non-test-file

And start again:

# ftrace-bisect ~/full-file ~/test-file ~/non-test-file

The good thing is, because this cuts the number of functions in
~/test-file by half, the cat of it into set_ftrace_filter takes half as
long each iteration, so don't talk so much at the water cooler the
second time.

Eventually, if you did this correctly, you will get down to the problem
function, and all we need to do is to notrace it.

The way to figure out if the problem function is bad, just do:

# echo <problem-function> > set_ftrace_notrace
# echo > set_ftrace_filter
# echo function_graph > current_tracer

And if it doesn't crash, we are done.

-- Steve
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ftrace-bisect
Type: application/octet-stream
Size: 1487 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160920/de1e152c/attachment.obj>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: ftrace function_graph causes system crash
  2016-09-20 14:07   ` Steven Rostedt
@ 2016-09-21  7:50     ` Bean Huo (beanhuo)
  -1 siblings, 0 replies; 14+ messages in thread
From: Bean Huo (beanhuo) @ 2016-09-21  7:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Zoltan Szubbocsev (zszubbocsev),
	catalin.marinas, will.deacon, rfi, linux-kernel, mingo,
	linux-arm-kernel

> From: linux-arm-kernel [mailto:linux-arm-kernel-bounces@lists.infradead.org]
> On Behalf Of Steven Rostedt
> Sent: Dienstag, 20. September 2016 16:07
> To: Bean Huo (beanhuo) <beanhuo@micron.com>
> Cc: Zoltan Szubbocsev (zszubbocsev) <zszubbocsev@micron.com>;
> catalin.marinas@arm.com; will.deacon@arm.com; rfi@lists.rocketboards.org;
> linux-kernel@vger.kernel.org; mingo@redhat.com; linux-arm-
> kernel@lists.infradead.org
> Subject: Re: ftrace function_graph causes system crash
> 
> On Tue, 20 Sep 2016 13:10:39 +0000
> "Bean Huo (beanhuo)" <beanhuo@micron.com> wrote:
> 
> > Hi, all
> > I just use ftrace to do some latency study, found that function_graph
> > can not Work, as long as enable it, will cause kernel panic. I searched this
> online.
> > Found that there are also some cause the same as mine. I am a newer of
> ftrace.
> > I want to know who know what root cause? Here is some partial log:
> >
> >
> 
> Can you do a function bisect to find what function this is.
> 
> This script is used to help find functions that are being traced by function tracer
> or function graph tracing that causes the machine to reboot, hang, or crash.
> Here's the steps to take.
> 
> First, determine if function graph is working with a single function:
> 
> # cd /sys/kernel/debug/tracing
> # echo schedule > set_ftrace_filter
> # echo function_graph > current_tracer
> 
> If this works, then we know that something is being traced that shouldn't be.
> 
> # echo nop > current_tracer
> 
> # cat available_filter_functions > ~/full-file # ftrace-bisect ~/full-file ~/test-file
> ~/non-test-file # cat ~/test-file > set_ftrace_filter
> 
> *** Note *** this will take several minutes. Setting multiple functions is an
> O(n^2) operation, and we are dealing with thousands of functions.
> So go have  coffee, talk with your coworkers, read facebook. And eventually,
> this operation will end.
> 
> # echo function_graph > current_tracer
> 
> If it crashes, we know that ~/test-file has a bad function.
> 
>    Reboot back to test kernel.
> 
>    # cd /sys/kernel/debug/tracing
>    # mv ~/test-file ~/full-file
> 
> If it didn't crash.
> 
>    # echo nop > current_tracer
>    # mv ~/non-test-file ~/full-file
> 
> Get rid of the other test file from previous run (or save them off somewhere.
> # rm -f ~/test-file ~/non-test-file
> 
> And start again:
> 
> # ftrace-bisect ~/full-file ~/test-file ~/non-test-file
> 
> The good thing is, because this cuts the number of functions in ~/test-file by half,
> the cat of it into set_ftrace_filter takes half as long each iteration, so don't talk
> so much at the water cooler the second time.
> 
> Eventually, if you did this correctly, you will get down to the problem function,
> and all we need to do is to notrace it.
> 
> The way to figure out if the problem function is bad, just do:
> 
> # echo <problem-function> > set_ftrace_notrace # echo > set_ftrace_filter #
> echo function_graph > current_tracer
> 
> And if it doesn't crash, we are done.
> 
> -- Steve


Hi, Steve
Thanks very much! This is a very useful trace tool, I now know the problem function,
It is gt_counter_read, if not trace this function, ftrace function_graph work well.
Do you know now how to deeply debug and trace which line is wrong through Ftrace?

--Bean

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ftrace function_graph causes system crash
@ 2016-09-21  7:50     ` Bean Huo (beanhuo)
  0 siblings, 0 replies; 14+ messages in thread
From: Bean Huo (beanhuo) @ 2016-09-21  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

> From: linux-arm-kernel [mailto:linux-arm-kernel-bounces at lists.infradead.org]
> On Behalf Of Steven Rostedt
> Sent: Dienstag, 20. September 2016 16:07
> To: Bean Huo (beanhuo) <beanhuo@micron.com>
> Cc: Zoltan Szubbocsev (zszubbocsev) <zszubbocsev@micron.com>;
> catalin.marinas at arm.com; will.deacon at arm.com; rfi at lists.rocketboards.org;
> linux-kernel at vger.kernel.org; mingo at redhat.com; linux-arm-
> kernel at lists.infradead.org
> Subject: Re: ftrace function_graph causes system crash
> 
> On Tue, 20 Sep 2016 13:10:39 +0000
> "Bean Huo (beanhuo)" <beanhuo@micron.com> wrote:
> 
> > Hi, all
> > I just use ftrace to do some latency study, found that function_graph
> > can not Work, as long as enable it, will cause kernel panic. I searched this
> online.
> > Found that there are also some cause the same as mine. I am a newer of
> ftrace.
> > I want to know who know what root cause? Here is some partial log:
> >
> >
> 
> Can you do a function bisect to find what function this is.
> 
> This script is used to help find functions that are being traced by function tracer
> or function graph tracing that causes the machine to reboot, hang, or crash.
> Here's the steps to take.
> 
> First, determine if function graph is working with a single function:
> 
> # cd /sys/kernel/debug/tracing
> # echo schedule > set_ftrace_filter
> # echo function_graph > current_tracer
> 
> If this works, then we know that something is being traced that shouldn't be.
> 
> # echo nop > current_tracer
> 
> # cat available_filter_functions > ~/full-file # ftrace-bisect ~/full-file ~/test-file
> ~/non-test-file # cat ~/test-file > set_ftrace_filter
> 
> *** Note *** this will take several minutes. Setting multiple functions is an
> O(n^2) operation, and we are dealing with thousands of functions.
> So go have  coffee, talk with your coworkers, read facebook. And eventually,
> this operation will end.
> 
> # echo function_graph > current_tracer
> 
> If it crashes, we know that ~/test-file has a bad function.
> 
>    Reboot back to test kernel.
> 
>    # cd /sys/kernel/debug/tracing
>    # mv ~/test-file ~/full-file
> 
> If it didn't crash.
> 
>    # echo nop > current_tracer
>    # mv ~/non-test-file ~/full-file
> 
> Get rid of the other test file from previous run (or save them off somewhere.
> # rm -f ~/test-file ~/non-test-file
> 
> And start again:
> 
> # ftrace-bisect ~/full-file ~/test-file ~/non-test-file
> 
> The good thing is, because this cuts the number of functions in ~/test-file by half,
> the cat of it into set_ftrace_filter takes half as long each iteration, so don't talk
> so much at the water cooler the second time.
> 
> Eventually, if you did this correctly, you will get down to the problem function,
> and all we need to do is to notrace it.
> 
> The way to figure out if the problem function is bad, just do:
> 
> # echo <problem-function> > set_ftrace_notrace # echo > set_ftrace_filter #
> echo function_graph > current_tracer
> 
> And if it doesn't crash, we are done.
> 
> -- Steve


Hi, Steve
Thanks very much! This is a very useful trace tool, I now know the problem function,
It is gt_counter_read, if not trace this function, ftrace function_graph work well.
Do you know now how to deeply debug and trace which line is wrong through Ftrace?

--Bean

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ftrace function_graph causes system crash
  2016-09-21  7:50     ` Bean Huo (beanhuo)
@ 2016-09-21  9:13       ` Jisheng Zhang
  -1 siblings, 0 replies; 14+ messages in thread
From: Jisheng Zhang @ 2016-09-21  9:13 UTC (permalink / raw)
  To: Bean Huo (beanhuo)
  Cc: Steven Rostedt, Zoltan Szubbocsev (zszubbocsev),
	catalin.marinas, will.deacon, rfi, linux-kernel, mingo,
	linux-arm-kernel

Hi Bean,

On Wed, 21 Sep 2016 07:50:58 +0000 "Bean Huo (beanhuo)"  wrote:

> > From: linux-arm-kernel [mailto:linux-arm-kernel-bounces@lists.infradead.org]
> > On Behalf Of Steven Rostedt
> > Sent: Dienstag, 20. September 2016 16:07
> > To: Bean Huo (beanhuo) <beanhuo@micron.com>
> > Cc: Zoltan Szubbocsev (zszubbocsev) <zszubbocsev@micron.com>;
> > catalin.marinas@arm.com; will.deacon@arm.com; rfi@lists.rocketboards.org;
> > linux-kernel@vger.kernel.org; mingo@redhat.com; linux-arm-
> > kernel@lists.infradead.org
> > Subject: Re: ftrace function_graph causes system crash
> > 
> > On Tue, 20 Sep 2016 13:10:39 +0000
> > "Bean Huo (beanhuo)" <beanhuo@micron.com> wrote:
> >   
> > > Hi, all
> > > I just use ftrace to do some latency study, found that function_graph
> > > can not Work, as long as enable it, will cause kernel panic. I searched this  
> > online.  
> > > Found that there are also some cause the same as mine. I am a newer of  
> > ftrace.  
> > > I want to know who know what root cause? Here is some partial log:
> > >
> > >  
> > 
> > Can you do a function bisect to find what function this is.
> > 
> > This script is used to help find functions that are being traced by function tracer
> > or function graph tracing that causes the machine to reboot, hang, or crash.
> > Here's the steps to take.
> > 
> > First, determine if function graph is working with a single function:
> > 
> > # cd /sys/kernel/debug/tracing
> > # echo schedule > set_ftrace_filter
> > # echo function_graph > current_tracer
> > 
> > If this works, then we know that something is being traced that shouldn't be.
> > 
> > # echo nop > current_tracer
> > 
> > # cat available_filter_functions > ~/full-file # ftrace-bisect ~/full-file ~/test-file
> > ~/non-test-file # cat ~/test-file > set_ftrace_filter
> > 
> > *** Note *** this will take several minutes. Setting multiple functions is an
> > O(n^2) operation, and we are dealing with thousands of functions.
> > So go have  coffee, talk with your coworkers, read facebook. And eventually,
> > this operation will end.
> > 
> > # echo function_graph > current_tracer
> > 
> > If it crashes, we know that ~/test-file has a bad function.
> > 
> >    Reboot back to test kernel.
> > 
> >    # cd /sys/kernel/debug/tracing
> >    # mv ~/test-file ~/full-file
> > 
> > If it didn't crash.
> > 
> >    # echo nop > current_tracer
> >    # mv ~/non-test-file ~/full-file
> > 
> > Get rid of the other test file from previous run (or save them off somewhere.
> > # rm -f ~/test-file ~/non-test-file
> > 
> > And start again:
> > 
> > # ftrace-bisect ~/full-file ~/test-file ~/non-test-file
> > 
> > The good thing is, because this cuts the number of functions in ~/test-file by half,
> > the cat of it into set_ftrace_filter takes half as long each iteration, so don't talk
> > so much at the water cooler the second time.
> > 
> > Eventually, if you did this correctly, you will get down to the problem function,
> > and all we need to do is to notrace it.
> > 
> > The way to figure out if the problem function is bad, just do:
> > 
> > # echo <problem-function> > set_ftrace_notrace # echo > set_ftrace_filter #
> > echo function_graph > current_tracer
> > 
> > And if it doesn't crash, we are done.
> > 
> > -- Steve  
> 
> 
> Hi, Steve
> Thanks very much! This is a very useful trace tool, I now know the problem function,
> It is gt_counter_read, if not trace this function, ftrace function_graph work well.

I'm not sure whether the commit d6df3576e6b4 ("clocksource/drivers/arm_global_timer
: Prevent ftrace recursion") can fix this issue.

this commit is merged since v4.3, I noticed your kernel version is v4.0

Thanks,
Jisheng

> Do you know now how to deeply debug and trace which line is wrong through Ftrace?
> 
> --Bean
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ftrace function_graph causes system crash
@ 2016-09-21  9:13       ` Jisheng Zhang
  0 siblings, 0 replies; 14+ messages in thread
From: Jisheng Zhang @ 2016-09-21  9:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Bean,

On Wed, 21 Sep 2016 07:50:58 +0000 "Bean Huo (beanhuo)"  wrote:

> > From: linux-arm-kernel [mailto:linux-arm-kernel-bounces at lists.infradead.org]
> > On Behalf Of Steven Rostedt
> > Sent: Dienstag, 20. September 2016 16:07
> > To: Bean Huo (beanhuo) <beanhuo@micron.com>
> > Cc: Zoltan Szubbocsev (zszubbocsev) <zszubbocsev@micron.com>;
> > catalin.marinas at arm.com; will.deacon at arm.com; rfi at lists.rocketboards.org;
> > linux-kernel at vger.kernel.org; mingo at redhat.com; linux-arm-
> > kernel at lists.infradead.org
> > Subject: Re: ftrace function_graph causes system crash
> > 
> > On Tue, 20 Sep 2016 13:10:39 +0000
> > "Bean Huo (beanhuo)" <beanhuo@micron.com> wrote:
> >   
> > > Hi, all
> > > I just use ftrace to do some latency study, found that function_graph
> > > can not Work, as long as enable it, will cause kernel panic. I searched this  
> > online.  
> > > Found that there are also some cause the same as mine. I am a newer of  
> > ftrace.  
> > > I want to know who know what root cause? Here is some partial log:
> > >
> > >  
> > 
> > Can you do a function bisect to find what function this is.
> > 
> > This script is used to help find functions that are being traced by function tracer
> > or function graph tracing that causes the machine to reboot, hang, or crash.
> > Here's the steps to take.
> > 
> > First, determine if function graph is working with a single function:
> > 
> > # cd /sys/kernel/debug/tracing
> > # echo schedule > set_ftrace_filter
> > # echo function_graph > current_tracer
> > 
> > If this works, then we know that something is being traced that shouldn't be.
> > 
> > # echo nop > current_tracer
> > 
> > # cat available_filter_functions > ~/full-file # ftrace-bisect ~/full-file ~/test-file
> > ~/non-test-file # cat ~/test-file > set_ftrace_filter
> > 
> > *** Note *** this will take several minutes. Setting multiple functions is an
> > O(n^2) operation, and we are dealing with thousands of functions.
> > So go have  coffee, talk with your coworkers, read facebook. And eventually,
> > this operation will end.
> > 
> > # echo function_graph > current_tracer
> > 
> > If it crashes, we know that ~/test-file has a bad function.
> > 
> >    Reboot back to test kernel.
> > 
> >    # cd /sys/kernel/debug/tracing
> >    # mv ~/test-file ~/full-file
> > 
> > If it didn't crash.
> > 
> >    # echo nop > current_tracer
> >    # mv ~/non-test-file ~/full-file
> > 
> > Get rid of the other test file from previous run (or save them off somewhere.
> > # rm -f ~/test-file ~/non-test-file
> > 
> > And start again:
> > 
> > # ftrace-bisect ~/full-file ~/test-file ~/non-test-file
> > 
> > The good thing is, because this cuts the number of functions in ~/test-file by half,
> > the cat of it into set_ftrace_filter takes half as long each iteration, so don't talk
> > so much at the water cooler the second time.
> > 
> > Eventually, if you did this correctly, you will get down to the problem function,
> > and all we need to do is to notrace it.
> > 
> > The way to figure out if the problem function is bad, just do:
> > 
> > # echo <problem-function> > set_ftrace_notrace # echo > set_ftrace_filter #
> > echo function_graph > current_tracer
> > 
> > And if it doesn't crash, we are done.
> > 
> > -- Steve  
> 
> 
> Hi, Steve
> Thanks very much! This is a very useful trace tool, I now know the problem function,
> It is gt_counter_read, if not trace this function, ftrace function_graph work well.

I'm not sure whether the commit d6df3576e6b4 ("clocksource/drivers/arm_global_timer
: Prevent ftrace recursion") can fix this issue.

this commit is merged since v4.3, I noticed your kernel version is v4.0

Thanks,
Jisheng

> Do you know now how to deeply debug and trace which line is wrong through Ftrace?
> 
> --Bean
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ftrace function_graph causes system crash
  2016-09-21  7:50     ` Bean Huo (beanhuo)
@ 2016-09-21 13:45       ` Steven Rostedt
  -1 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2016-09-21 13:45 UTC (permalink / raw)
  To: Bean Huo (beanhuo)
  Cc: Zoltan Szubbocsev (zszubbocsev),
	catalin.marinas, will.deacon, rfi, linux-kernel, mingo,
	linux-arm-kernel

On Wed, 21 Sep 2016 07:50:58 +0000
"Bean Huo (beanhuo)" <beanhuo@micron.com> wrote:
 
> 
> Hi, Steve
> Thanks very much! This is a very useful trace tool, I now know the problem function,
> It is gt_counter_read, if not trace this function, ftrace function_graph work well.
> Do you know now how to deeply debug and trace which line is wrong through Ftrace?

Hmm, maybe I should add this into the scripts directory. Yeah, I should
do that.

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ftrace function_graph causes system crash
@ 2016-09-21 13:45       ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2016-09-21 13:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 21 Sep 2016 07:50:58 +0000
"Bean Huo (beanhuo)" <beanhuo@micron.com> wrote:
 
> 
> Hi, Steve
> Thanks very much! This is a very useful trace tool, I now know the problem function,
> It is gt_counter_read, if not trace this function, ftrace function_graph work well.
> Do you know now how to deeply debug and trace which line is wrong through Ftrace?

Hmm, maybe I should add this into the scripts directory. Yeah, I should
do that.

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ftrace function_graph causes system crash
  2016-09-21  9:13       ` Jisheng Zhang
@ 2016-09-21 18:17         ` Steven Rostedt
  -1 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2016-09-21 18:17 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Bean Huo (beanhuo), Zoltan Szubbocsev (zszubbocsev),
	catalin.marinas, will.deacon, rfi, linux-kernel, mingo,
	linux-arm-kernel

On Wed, 21 Sep 2016 17:13:07 +0800
Jisheng Zhang <jszhang@marvell.com> wrote:

> I'm not sure whether the commit d6df3576e6b4 ("clocksource/drivers/arm_global_timer
> : Prevent ftrace recursion") can fix this issue.
> 
> this commit is merged since v4.3, I noticed your kernel version is v4.0

BTW, yes, that would be the fix.

-- Steve

> 
> Thanks,
> Jisheng
> 
> > Do you know now how to deeply debug and trace which line is wrong through Ftrace?
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ftrace function_graph causes system crash
@ 2016-09-21 18:17         ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2016-09-21 18:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 21 Sep 2016 17:13:07 +0800
Jisheng Zhang <jszhang@marvell.com> wrote:

> I'm not sure whether the commit d6df3576e6b4 ("clocksource/drivers/arm_global_timer
> : Prevent ftrace recursion") can fix this issue.
> 
> this commit is merged since v4.3, I noticed your kernel version is v4.0

BTW, yes, that would be the fix.

-- Steve

> 
> Thanks,
> Jisheng
> 
> > Do you know now how to deeply debug and trace which line is wrong through Ftrace?
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: ftrace function_graph causes system crash
  2016-09-21 18:17         ` Steven Rostedt
@ 2016-09-22  8:36           ` Bean Huo (beanhuo)
  -1 siblings, 0 replies; 14+ messages in thread
From: Bean Huo (beanhuo) @ 2016-09-22  8:36 UTC (permalink / raw)
  To: Steven Rostedt, Jisheng Zhang
  Cc: Zoltan Szubbocsev (zszubbocsev),
	catalin.marinas, will.deacon, rfi, linux-kernel, mingo,
	linux-arm-kernel

> -----Original Message-----
> From: Steven Rostedt [mailto:rostedt@goodmis.org]
> Sent: Mittwoch, 21. September 2016 20:17
> To: Jisheng Zhang <jszhang@marvell.com>
> Cc: Bean Huo (beanhuo) <beanhuo@micron.com>; Zoltan Szubbocsev
> (zszubbocsev) <zszubbocsev@micron.com>; catalin.marinas@arm.com;
> will.deacon@arm.com; rfi@lists.rocketboards.org; linux-
> kernel@vger.kernel.org; mingo@redhat.com; linux-arm-
> kernel@lists.infradead.org
> Subject: Re: ftrace function_graph causes system crash
> 
> On Wed, 21 Sep 2016 17:13:07 +0800
> Jisheng Zhang <jszhang@marvell.com> wrote:
> 
> > I'm not sure whether the commit d6df3576e6b4
> ("clocksource/drivers/arm_global_timer
> > : Prevent ftrace recursion") can fix this issue.
> >
> > this commit is merged since v4.3, I noticed your kernel version is v4.0
> 
> BTW, yes, that would be the fix.
> 
> -- Steve
> 
> >
> > Thanks,
> > Jisheng
> >
> > > Do you know now how to deeply debug and trace which line is wrong
> through Ftrace?
> > >

Hi, Steven and Jisheng
Thanks to both warm-hearted guys. I merged  d6df3576e6b4 patch into my kernel 4.0.
Then it is true, no cash appears again.

I have one more question that current ftrace can trace DMA latency, include mapping and unmapping?
Means I want to know when one BIO request be completed. Just like blktrace.
But blktrace can not tell me the function calling sequence.

--Bean 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ftrace function_graph causes system crash
@ 2016-09-22  8:36           ` Bean Huo (beanhuo)
  0 siblings, 0 replies; 14+ messages in thread
From: Bean Huo (beanhuo) @ 2016-09-22  8:36 UTC (permalink / raw)
  To: linux-arm-kernel

> -----Original Message-----
> From: Steven Rostedt [mailto:rostedt at goodmis.org]
> Sent: Mittwoch, 21. September 2016 20:17
> To: Jisheng Zhang <jszhang@marvell.com>
> Cc: Bean Huo (beanhuo) <beanhuo@micron.com>; Zoltan Szubbocsev
> (zszubbocsev) <zszubbocsev@micron.com>; catalin.marinas at arm.com;
> will.deacon at arm.com; rfi at lists.rocketboards.org; linux-
> kernel at vger.kernel.org; mingo at redhat.com; linux-arm-
> kernel at lists.infradead.org
> Subject: Re: ftrace function_graph causes system crash
> 
> On Wed, 21 Sep 2016 17:13:07 +0800
> Jisheng Zhang <jszhang@marvell.com> wrote:
> 
> > I'm not sure whether the commit d6df3576e6b4
> ("clocksource/drivers/arm_global_timer
> > : Prevent ftrace recursion") can fix this issue.
> >
> > this commit is merged since v4.3, I noticed your kernel version is v4.0
> 
> BTW, yes, that would be the fix.
> 
> -- Steve
> 
> >
> > Thanks,
> > Jisheng
> >
> > > Do you know now how to deeply debug and trace which line is wrong
> through Ftrace?
> > >

Hi, Steven and Jisheng
Thanks to both warm-hearted guys. I merged  d6df3576e6b4 patch into my kernel 4.0.
Then it is true, no cash appears again.

I have one more question that current ftrace can trace DMA latency, include mapping and unmapping?
Means I want to know when one BIO request be completed. Just like blktrace.
But blktrace can not tell me the function calling sequence.

--Bean 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-09-22  8:37 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-20 13:10 ftrace function_graph causes system crash Bean Huo (beanhuo)
2016-09-20 13:10 ` Bean Huo (beanhuo)
2016-09-20 14:07 ` Steven Rostedt
2016-09-20 14:07   ` Steven Rostedt
2016-09-21  7:50   ` Bean Huo (beanhuo)
2016-09-21  7:50     ` Bean Huo (beanhuo)
2016-09-21  9:13     ` Jisheng Zhang
2016-09-21  9:13       ` Jisheng Zhang
2016-09-21 18:17       ` Steven Rostedt
2016-09-21 18:17         ` Steven Rostedt
2016-09-22  8:36         ` Bean Huo (beanhuo)
2016-09-22  8:36           ` Bean Huo (beanhuo)
2016-09-21 13:45     ` Steven Rostedt
2016-09-21 13:45       ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.