All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace
@ 2017-08-28 12:53 ` Pratyush Anand
  0 siblings, 0 replies; 6+ messages in thread
From: Pratyush Anand @ 2017-08-28 12:53 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: Pratyush Anand, linux-kernel

Testcase:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in
PANIC: "Unable to handle kernel paging request at virtual address
ffff801bcbde7000"

vmcore analysis:
1)
crash> bt
PID: 1561   TASK: ffff8003cb7e4000  CPU: 0   COMMAND: "ps"
 #0 [ffff8003c4ff77b0] crash_kexec at ffff00000816b9b8
 #1 [ffff8003c4ff77e0] die at ffff000008088b34
 #2 [ffff8003c4ff7820] __do_kernel_fault at ffff00000809b830
 #3 [ffff8003c4ff7850] do_bad_area at ffff000008098b90
 #4 [ffff8003c4ff7880] do_translation_fault at ffff0000087c6cdc
 #5 [ffff8003c4ff78b0] do_mem_abort at ffff000008081334
 #6 [ffff8003c4ff7ab0] el1_ia at ffff000008082cc0
     PC: ffff00000808811c  [unwind_frame+300]
     LR: ffff0000080858a8  [get_wchan+212]
     SP: ffff8003c4ff7ab0  PSTATE: 60000145
    X29: ffff8003c4ff7ab0  X28: 0000000000000001  X27: 0000000000000000
    X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
    X23: ffff8003c1c20000  X22: ffff000008c73000  X21: ffff8003c1c1c000
    X20: 000000000000000f  X19: ffff8003c1bc7000  X18: 0000000000000010
    X17: 0000000000000000  X16: 0000000000000001  X15: ffffffffffffffed
    X14: 0000000000000010  X13: ffffffffffffffff  X12: 0000000000000004
    X11: 0000000000000000  X10: 0000000002dd14c0   X9: 1999999999999999
     X8: 000000000000003f   X7: 00008003f71b0000   X6: 0000000000000018
     X5: 0000000000000000   X4: ffff8003c1c1fd20   X3: ffff8003c1c1fd10
     X2: 00000017ffe80000   X1: ffff8003c4ff7af8   X0: ffff8003cbf67000
 #7 [ffff8003c4ff7b20] do_task_stat at ffff000008304f0c
 #8 [ffff8003c4ff7c60] proc_tgid_stat at ffff000008305b48
 #9 [ffff8003c4ff7ca0] proc_single_show at ffff0000082fdd10
 #10 [ffff8003c4ff7ce0] seq_read at ffff0000082b27bc
 #11 [ffff8003c4ff7d70] __vfs_read at ffff000008289e54
 #12 [ffff8003c4ff7e30] vfs_read at ffff00000828b14c
 #13 [ffff8003c4ff7e70] sys_read at ffff00000828c2d0
 #14 [ffff8003c4ff7ed0] __sys_trace at ffff00000808349c
     PC: 00000006  LR: 00000000  SP: ffffffffffffffed  PSTATE: 0000003f
    X12: 00000010 X11: ffffffffffffffff X10: 00000004  X9: ffff7febe8d0
     X8: 00000000  X7: 1999999999999999  X6: 0000003f  X5: 0000000c
     X4: ffff7fce9c78  X3: 0000000c  X2: 00000000  X1: 00000000
     X0: 00000400

(2) Instruction at ffff00000808811c caused IA/DA.

crash> dis -l ffff000008088108 6
/usr/src/debug/xxxxxxxxxxxx/xxxxxxxxxx/arch/arm64/kernel/stacktrace.c:
84
0xffff000008088108 <unwind_frame+280>:  ldr     w2, [x1,#24]
0xffff00000808810c <unwind_frame+284>:  sub     w6, w2, #0x1
0xffff000008088110 <unwind_frame+288>:  str     w6, [x1,#24]
0xffff000008088114 <unwind_frame+292>:  mov     w6, #0x18 // #24
0xffff000008088118 <unwind_frame+296>:  umull   x2, w2, w6
0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]

Corresponding c statement is
frame->pc = tsk->ret_stack[frame->graph--].ret;

(3) So, it caused data abort while executing
0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]

x0 + x2 = ffff8003cbf67000 + 00000017ffe80000 = ffff801bcbde7000
Access of ffff801bcbde7000 resulted in "Unable to handle kernel paging
request"

from above data: frame->graph = task->curr_ret_stack which  should be,
x2 / 0x18 = FFFF0000, which is -FTRACE_NOTRACE_DEPTH.

OK, so problem is here:
do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
-FTRACE_NOTRACE_DEPTH in the failed case. It means, when do_task_stat()
has been called for task A (ps in this case) on CPUm,task A was in mid
of execution on CPUn, and was in the mid of mcount() execution where
curr_ret_stack had been decremented in ftrace_push_return_trace() for
hitting schedule() function, but it was yet to be incremented in
ftrace_return_to_handler().

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to not to manipulate frame->pc for
function graph tracer if the function has been marked in
set_graph_notrace.

This patch fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's
output under function graph tracer)

Signed-off-by: Pratyush Anand <panand@redhat.com>
---
 arch/arm64/kernel/stacktrace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..e79035d673b3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -74,7 +74,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	if (tsk->ret_stack &&
-			(frame->pc == (unsigned long)return_to_handler)) {
+			(frame->pc == (unsigned long)return_to_handler) &&
+			(frame->graph > -1)) {
 		/*
 		 * This is a case where function graph tracer has
 		 * modified a return address (LR) in a stack frame
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace
@ 2017-08-28 12:53 ` Pratyush Anand
  0 siblings, 0 replies; 6+ messages in thread
From: Pratyush Anand @ 2017-08-28 12:53 UTC (permalink / raw)
  To: linux-arm-kernel

Testcase:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in
PANIC: "Unable to handle kernel paging request at virtual address
ffff801bcbde7000"

vmcore analysis:
1)
crash> bt
PID: 1561   TASK: ffff8003cb7e4000  CPU: 0   COMMAND: "ps"
 #0 [ffff8003c4ff77b0] crash_kexec at ffff00000816b9b8
 #1 [ffff8003c4ff77e0] die at ffff000008088b34
 #2 [ffff8003c4ff7820] __do_kernel_fault at ffff00000809b830
 #3 [ffff8003c4ff7850] do_bad_area at ffff000008098b90
 #4 [ffff8003c4ff7880] do_translation_fault at ffff0000087c6cdc
 #5 [ffff8003c4ff78b0] do_mem_abort at ffff000008081334
 #6 [ffff8003c4ff7ab0] el1_ia at ffff000008082cc0
     PC: ffff00000808811c  [unwind_frame+300]
     LR: ffff0000080858a8  [get_wchan+212]
     SP: ffff8003c4ff7ab0  PSTATE: 60000145
    X29: ffff8003c4ff7ab0  X28: 0000000000000001  X27: 0000000000000000
    X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
    X23: ffff8003c1c20000  X22: ffff000008c73000  X21: ffff8003c1c1c000
    X20: 000000000000000f  X19: ffff8003c1bc7000  X18: 0000000000000010
    X17: 0000000000000000  X16: 0000000000000001  X15: ffffffffffffffed
    X14: 0000000000000010  X13: ffffffffffffffff  X12: 0000000000000004
    X11: 0000000000000000  X10: 0000000002dd14c0   X9: 1999999999999999
     X8: 000000000000003f   X7: 00008003f71b0000   X6: 0000000000000018
     X5: 0000000000000000   X4: ffff8003c1c1fd20   X3: ffff8003c1c1fd10
     X2: 00000017ffe80000   X1: ffff8003c4ff7af8   X0: ffff8003cbf67000
 #7 [ffff8003c4ff7b20] do_task_stat at ffff000008304f0c
 #8 [ffff8003c4ff7c60] proc_tgid_stat at ffff000008305b48
 #9 [ffff8003c4ff7ca0] proc_single_show at ffff0000082fdd10
 #10 [ffff8003c4ff7ce0] seq_read at ffff0000082b27bc
 #11 [ffff8003c4ff7d70] __vfs_read at ffff000008289e54
 #12 [ffff8003c4ff7e30] vfs_read at ffff00000828b14c
 #13 [ffff8003c4ff7e70] sys_read at ffff00000828c2d0
 #14 [ffff8003c4ff7ed0] __sys_trace at ffff00000808349c
     PC: 00000006  LR: 00000000  SP: ffffffffffffffed  PSTATE: 0000003f
    X12: 00000010 X11: ffffffffffffffff X10: 00000004  X9: ffff7febe8d0
     X8: 00000000  X7: 1999999999999999  X6: 0000003f  X5: 0000000c
     X4: ffff7fce9c78  X3: 0000000c  X2: 00000000  X1: 00000000
     X0: 00000400

(2) Instruction at ffff00000808811c caused IA/DA.

crash> dis -l ffff000008088108 6
/usr/src/debug/xxxxxxxxxxxx/xxxxxxxxxx/arch/arm64/kernel/stacktrace.c:
84
0xffff000008088108 <unwind_frame+280>:  ldr     w2, [x1,#24]
0xffff00000808810c <unwind_frame+284>:  sub     w6, w2, #0x1
0xffff000008088110 <unwind_frame+288>:  str     w6, [x1,#24]
0xffff000008088114 <unwind_frame+292>:  mov     w6, #0x18 // #24
0xffff000008088118 <unwind_frame+296>:  umull   x2, w2, w6
0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]

Corresponding c statement is
frame->pc = tsk->ret_stack[frame->graph--].ret;

(3) So, it caused data abort while executing
0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]

x0 + x2 = ffff8003cbf67000 + 00000017ffe80000 = ffff801bcbde7000
Access of ffff801bcbde7000 resulted in "Unable to handle kernel paging
request"

from above data: frame->graph = task->curr_ret_stack which  should be,
x2 / 0x18 = FFFF0000, which is -FTRACE_NOTRACE_DEPTH.

OK, so problem is here:
do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
-FTRACE_NOTRACE_DEPTH in the failed case. It means, when do_task_stat()
has been called for task A (ps in this case) on CPUm,task A was in mid
of execution on CPUn, and was in the mid of mcount() execution where
curr_ret_stack had been decremented in ftrace_push_return_trace() for
hitting schedule() function, but it was yet to be incremented in
ftrace_return_to_handler().

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to not to manipulate frame->pc for
function graph tracer if the function has been marked in
set_graph_notrace.

This patch fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's
output under function graph tracer)

Signed-off-by: Pratyush Anand <panand@redhat.com>
---
 arch/arm64/kernel/stacktrace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..e79035d673b3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -74,7 +74,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	if (tsk->ret_stack &&
-			(frame->pc == (unsigned long)return_to_handler)) {
+			(frame->pc == (unsigned long)return_to_handler) &&
+			(frame->graph > -1)) {
 		/*
 		 * This is a case where function graph tracer has
 		 * modified a return address (LR) in a stack frame
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace
  2017-08-28 12:53 ` Pratyush Anand
@ 2017-08-29 17:03   ` James Morse
  -1 siblings, 0 replies; 6+ messages in thread
From: James Morse @ 2017-08-29 17:03 UTC (permalink / raw)
  To: Pratyush Anand; +Cc: linux-arm-kernel, linux-kernel

Hi Pratyush,

(Nit: get_maintainer.pl will give you the list of people to CC, you're reliant
on the maintainers being eagle-eyed to spot this!)

On 28/08/17 13:53, Pratyush Anand wrote:
> Testcase:
> cd /sys/kernel/debug/tracing/
> echo schedule > set_graph_notrace
> echo 1 > options/display-graph
> echo wakeup > current_tracer
> ps -ef | grep -i agent
> 
> Above commands result in
> PANIC: "Unable to handle kernel paging request at virtual address
> ffff801bcbde7000"

This didn't panic for me, it just killed the running task. (I guess you have
panic-on-oops set)


> vmcore analysis:
> 1)
> crash> bt
> PID: 1561   TASK: ffff8003cb7e4000  CPU: 0   COMMAND: "ps"
>  #0 [ffff8003c4ff77b0] crash_kexec at ffff00000816b9b8
>  #1 [ffff8003c4ff77e0] die at ffff000008088b34
>  #2 [ffff8003c4ff7820] __do_kernel_fault at ffff00000809b830
>  #3 [ffff8003c4ff7850] do_bad_area at ffff000008098b90
>  #4 [ffff8003c4ff7880] do_translation_fault at ffff0000087c6cdc
>  #5 [ffff8003c4ff78b0] do_mem_abort at ffff000008081334
>  #6 [ffff8003c4ff7ab0] el1_ia at ffff000008082cc0
>      PC: ffff00000808811c  [unwind_frame+300]
>      LR: ffff0000080858a8  [get_wchan+212]
>      SP: ffff8003c4ff7ab0  PSTATE: 60000145
>     X29: ffff8003c4ff7ab0  X28: 0000000000000001  X27: 0000000000000000
>     X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
>     X23: ffff8003c1c20000  X22: ffff000008c73000  X21: ffff8003c1c1c000
>     X20: 000000000000000f  X19: ffff8003c1bc7000  X18: 0000000000000010
>     X17: 0000000000000000  X16: 0000000000000001  X15: ffffffffffffffed
>     X14: 0000000000000010  X13: ffffffffffffffff  X12: 0000000000000004
>     X11: 0000000000000000  X10: 0000000002dd14c0   X9: 1999999999999999
>      X8: 000000000000003f   X7: 00008003f71b0000   X6: 0000000000000018
>      X5: 0000000000000000   X4: ffff8003c1c1fd20   X3: ffff8003c1c1fd10
>      X2: 00000017ffe80000   X1: ffff8003c4ff7af8   X0: ffff8003cbf67000
>  #7 [ffff8003c4ff7b20] do_task_stat at ffff000008304f0c
>  #8 [ffff8003c4ff7c60] proc_tgid_stat at ffff000008305b48
>  #9 [ffff8003c4ff7ca0] proc_single_show at ffff0000082fdd10
>  #10 [ffff8003c4ff7ce0] seq_read at ffff0000082b27bc
>  #11 [ffff8003c4ff7d70] __vfs_read at ffff000008289e54
>  #12 [ffff8003c4ff7e30] vfs_read at ffff00000828b14c
>  #13 [ffff8003c4ff7e70] sys_read at ffff00000828c2d0
>  #14 [ffff8003c4ff7ed0] __sys_trace at ffff00000808349c
>      PC: 00000006  LR: 00000000  SP: ffffffffffffffed  PSTATE: 0000003f
>     X12: 00000010 X11: ffffffffffffffff X10: 00000004  X9: ffff7febe8d0
>      X8: 00000000  X7: 1999999999999999  X6: 0000003f  X5: 0000000c
>      X4: ffff7fce9c78  X3: 0000000c  X2: 00000000  X1: 00000000
>      X0: 00000400
> 
> (2) Instruction at ffff00000808811c caused IA/DA.
> 
> crash> dis -l ffff000008088108 6
> /usr/src/debug/xxxxxxxxxxxx/xxxxxxxxxx/arch/arm64/kernel/stacktrace.c:
> 84
> 0xffff000008088108 <unwind_frame+280>:  ldr     w2, [x1,#24]
> 0xffff00000808810c <unwind_frame+284>:  sub     w6, w2, #0x1
> 0xffff000008088110 <unwind_frame+288>:  str     w6, [x1,#24]
> 0xffff000008088114 <unwind_frame+292>:  mov     w6, #0x18 // #24
> 0xffff000008088118 <unwind_frame+296>:  umull   x2, w2, w6
> 0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]
> 
> Corresponding c statement is
> frame->pc = tsk->ret_stack[frame->graph--].ret;
> 
> (3) So, it caused data abort while executing
> 0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]
> 
> x0 + x2 = ffff8003cbf67000 + 00000017ffe80000 = ffff801bcbde7000
> Access of ffff801bcbde7000 resulted in "Unable to handle kernel paging
> request"
> 
> from above data: frame->graph = task->curr_ret_stack which  should be,
> x2 / 0x18 = FFFF0000, which is -FTRACE_NOTRACE_DEPTH.

(0x18 is the size of struct ftrace_ret_stack for your config?)


> OK, so problem is here:
> do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
> -FTRACE_NOTRACE_DEPTH in the failed case.

> It means, when do_task_stat()
> has been called for task A (ps in this case) on CPUm,task A was in mid
> of execution on CPUn,

get_wchan() on a running processes can't work: the stack may be modified while
we walk it.
>From arch/arm64/kernel/process.c::get_wchan():
> 	if (!p || p == current || p->state == TASK_RUNNING)
>		return 0;

As far as I can see, if task A is running on CPU-N then CPU-Ms attempt to
get_wchan() it will return 0.


> and was in the mid of mcount() execution where
> curr_ret_stack had been decremented in ftrace_push_return_trace() for
> hitting schedule() function, but it was yet to be incremented in
> ftrace_return_to_handler().

So if the function-graph-tracer has hooked the return values on a stack and hit
a filtered function, (schedule() in your example), we can't unwind it as
ftrace_push_return_trace() has biased the index with a 'huge negative' offset to
flag it as 'this should be filtered out'.

The arm64 stack walker isn't aware of this and interprets it as an unsigned
index. Nasty!


> Similar problem we can have with calling of walk_stackframe() from
> save_stack_trace_tsk() or dump_backtrace().
> 
> This patch fixes unwind_frame() to not to manipulate frame->pc for

Nit: strictly this is is 'restore frame->pc from ftrace's ret_stack', but I
think we still need to do this restore...


> function graph tracer if the function has been marked in
> set_graph_notrace.

Nit: ftrace describes these as 'filtered out', set_graph_notrace is the debugfs
interface.


> This patch fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's
> output under function graph tracer)

> Signed-off-by: Pratyush Anand <panand@redhat.com>


> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 09d37d66b630..e79035d673b3 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -74,7 +74,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct
stackframe *frame)
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  	if (tsk->ret_stack &&
> -			(frame->pc == (unsigned long)return_to_handler)) {
> +			(frame->pc == (unsigned long)return_to_handler) &&

> +			(frame->graph > -1)) {

This should give you a compiler warning: its declared as an unsigned int in
struct stackframe.


... but with this patch /proc/<pid>/wchan now reports:
> cat /proc/1/wchan
> return_to_handler

So it looks like 'filtered out' doesn't mean 'not hooked'. I think a more
complete fix is to test if frame->graph is negative and add FTRACE_NOTRACE_DEPTH
back to it:

>From kernel/trace/ftrace_functions_graph.c::trace_pop_return_trace():
>	index = current->curr_ret_stack;
>
>	/*
>	 * A negative index here means that it's just returned from a
>	 * notrace'd function.  Recover index to get an original
>	 * return address.  See ftrace_push_return_trace().
>	 *
>	 * TODO: Need to check whether the stack gets corrupted.
>	 */
>	if (index < 0)
>		index += FTRACE_NOTRACE_DEPTH;

(looks like this is the only magic offset)


I don't think we need to preserve the vmcore debugging in the kernel log, could
you cut the commit message down to describe the negative curr_ret_stack being
interpreted as a signed value instead of skipped by unwind_frame(), then have
the reproducer and a chunk of the splat.


Thanks for getting to the bottom of this, it looks like this was a mammoth
debugging session!


James

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace
@ 2017-08-29 17:03   ` James Morse
  0 siblings, 0 replies; 6+ messages in thread
From: James Morse @ 2017-08-29 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Pratyush,

(Nit: get_maintainer.pl will give you the list of people to CC, you're reliant
on the maintainers being eagle-eyed to spot this!)

On 28/08/17 13:53, Pratyush Anand wrote:
> Testcase:
> cd /sys/kernel/debug/tracing/
> echo schedule > set_graph_notrace
> echo 1 > options/display-graph
> echo wakeup > current_tracer
> ps -ef | grep -i agent
> 
> Above commands result in
> PANIC: "Unable to handle kernel paging request at virtual address
> ffff801bcbde7000"

This didn't panic for me, it just killed the running task. (I guess you have
panic-on-oops set)


> vmcore analysis:
> 1)
> crash> bt
> PID: 1561   TASK: ffff8003cb7e4000  CPU: 0   COMMAND: "ps"
>  #0 [ffff8003c4ff77b0] crash_kexec at ffff00000816b9b8
>  #1 [ffff8003c4ff77e0] die at ffff000008088b34
>  #2 [ffff8003c4ff7820] __do_kernel_fault at ffff00000809b830
>  #3 [ffff8003c4ff7850] do_bad_area at ffff000008098b90
>  #4 [ffff8003c4ff7880] do_translation_fault at ffff0000087c6cdc
>  #5 [ffff8003c4ff78b0] do_mem_abort at ffff000008081334
>  #6 [ffff8003c4ff7ab0] el1_ia at ffff000008082cc0
>      PC: ffff00000808811c  [unwind_frame+300]
>      LR: ffff0000080858a8  [get_wchan+212]
>      SP: ffff8003c4ff7ab0  PSTATE: 60000145
>     X29: ffff8003c4ff7ab0  X28: 0000000000000001  X27: 0000000000000000
>     X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
>     X23: ffff8003c1c20000  X22: ffff000008c73000  X21: ffff8003c1c1c000
>     X20: 000000000000000f  X19: ffff8003c1bc7000  X18: 0000000000000010
>     X17: 0000000000000000  X16: 0000000000000001  X15: ffffffffffffffed
>     X14: 0000000000000010  X13: ffffffffffffffff  X12: 0000000000000004
>     X11: 0000000000000000  X10: 0000000002dd14c0   X9: 1999999999999999
>      X8: 000000000000003f   X7: 00008003f71b0000   X6: 0000000000000018
>      X5: 0000000000000000   X4: ffff8003c1c1fd20   X3: ffff8003c1c1fd10
>      X2: 00000017ffe80000   X1: ffff8003c4ff7af8   X0: ffff8003cbf67000
>  #7 [ffff8003c4ff7b20] do_task_stat at ffff000008304f0c
>  #8 [ffff8003c4ff7c60] proc_tgid_stat at ffff000008305b48
>  #9 [ffff8003c4ff7ca0] proc_single_show at ffff0000082fdd10
>  #10 [ffff8003c4ff7ce0] seq_read at ffff0000082b27bc
>  #11 [ffff8003c4ff7d70] __vfs_read at ffff000008289e54
>  #12 [ffff8003c4ff7e30] vfs_read at ffff00000828b14c
>  #13 [ffff8003c4ff7e70] sys_read at ffff00000828c2d0
>  #14 [ffff8003c4ff7ed0] __sys_trace at ffff00000808349c
>      PC: 00000006  LR: 00000000  SP: ffffffffffffffed  PSTATE: 0000003f
>     X12: 00000010 X11: ffffffffffffffff X10: 00000004  X9: ffff7febe8d0
>      X8: 00000000  X7: 1999999999999999  X6: 0000003f  X5: 0000000c
>      X4: ffff7fce9c78  X3: 0000000c  X2: 00000000  X1: 00000000
>      X0: 00000400
> 
> (2) Instruction at ffff00000808811c caused IA/DA.
> 
> crash> dis -l ffff000008088108 6
> /usr/src/debug/xxxxxxxxxxxx/xxxxxxxxxx/arch/arm64/kernel/stacktrace.c:
> 84
> 0xffff000008088108 <unwind_frame+280>:  ldr     w2, [x1,#24]
> 0xffff00000808810c <unwind_frame+284>:  sub     w6, w2, #0x1
> 0xffff000008088110 <unwind_frame+288>:  str     w6, [x1,#24]
> 0xffff000008088114 <unwind_frame+292>:  mov     w6, #0x18 // #24
> 0xffff000008088118 <unwind_frame+296>:  umull   x2, w2, w6
> 0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]
> 
> Corresponding c statement is
> frame->pc = tsk->ret_stack[frame->graph--].ret;
> 
> (3) So, it caused data abort while executing
> 0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]
> 
> x0 + x2 = ffff8003cbf67000 + 00000017ffe80000 = ffff801bcbde7000
> Access of ffff801bcbde7000 resulted in "Unable to handle kernel paging
> request"
> 
> from above data: frame->graph = task->curr_ret_stack which  should be,
> x2 / 0x18 = FFFF0000, which is -FTRACE_NOTRACE_DEPTH.

(0x18 is the size of struct ftrace_ret_stack for your config?)


> OK, so problem is here:
> do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
> -FTRACE_NOTRACE_DEPTH in the failed case.

> It means, when do_task_stat()
> has been called for task A (ps in this case) on CPUm,task A was in mid
> of execution on CPUn,

get_wchan() on a running processes can't work: the stack may be modified while
we walk it.
>From arch/arm64/kernel/process.c::get_wchan():
> 	if (!p || p == current || p->state == TASK_RUNNING)
>		return 0;

As far as I can see, if task A is running on CPU-N then CPU-Ms attempt to
get_wchan() it will return 0.


> and was in the mid of mcount() execution where
> curr_ret_stack had been decremented in ftrace_push_return_trace() for
> hitting schedule() function, but it was yet to be incremented in
> ftrace_return_to_handler().

So if the function-graph-tracer has hooked the return values on a stack and hit
a filtered function, (schedule() in your example), we can't unwind it as
ftrace_push_return_trace() has biased the index with a 'huge negative' offset to
flag it as 'this should be filtered out'.

The arm64 stack walker isn't aware of this and interprets it as an unsigned
index. Nasty!


> Similar problem we can have with calling of walk_stackframe() from
> save_stack_trace_tsk() or dump_backtrace().
> 
> This patch fixes unwind_frame() to not to manipulate frame->pc for

Nit: strictly this is is 'restore frame->pc from ftrace's ret_stack', but I
think we still need to do this restore...


> function graph tracer if the function has been marked in
> set_graph_notrace.

Nit: ftrace describes these as 'filtered out', set_graph_notrace is the debugfs
interface.


> This patch fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's
> output under function graph tracer)

> Signed-off-by: Pratyush Anand <panand@redhat.com>


> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 09d37d66b630..e79035d673b3 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -74,7 +74,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct
stackframe *frame)
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  	if (tsk->ret_stack &&
> -			(frame->pc == (unsigned long)return_to_handler)) {
> +			(frame->pc == (unsigned long)return_to_handler) &&

> +			(frame->graph > -1)) {

This should give you a compiler warning: its declared as an unsigned int in
struct stackframe.


... but with this patch /proc/<pid>/wchan now reports:
> cat /proc/1/wchan
> return_to_handler

So it looks like 'filtered out' doesn't mean 'not hooked'. I think a more
complete fix is to test if frame->graph is negative and add FTRACE_NOTRACE_DEPTH
back to it:

>From kernel/trace/ftrace_functions_graph.c::trace_pop_return_trace():
>	index = current->curr_ret_stack;
>
>	/*
>	 * A negative index here means that it's just returned from a
>	 * notrace'd function.  Recover index to get an original
>	 * return address.  See ftrace_push_return_trace().
>	 *
>	 * TODO: Need to check whether the stack gets corrupted.
>	 */
>	if (index < 0)
>		index += FTRACE_NOTRACE_DEPTH;

(looks like this is the only magic offset)


I don't think we need to preserve the vmcore debugging in the kernel log, could
you cut the commit message down to describe the negative curr_ret_stack being
interpreted as a signed value instead of skipped by unwind_frame(), then have
the reproducer and a chunk of the splat.


Thanks for getting to the bottom of this, it looks like this was a mammoth
debugging session!


James

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace
  2017-08-29 17:03   ` James Morse
@ 2017-08-30 12:59     ` Pratyush Anand
  -1 siblings, 0 replies; 6+ messages in thread
From: Pratyush Anand @ 2017-08-30 12:59 UTC (permalink / raw)
  To: James Morse; +Cc: linux-arm-kernel, linux-kernel, Catalin Marinas, Will Deacon



On Tuesday 29 August 2017 10:33 PM, James Morse wrote:
> Hi Pratyush,
> 
> (Nit: get_maintainer.pl will give you the list of people to CC, you're reliant
> on the maintainers being eagle-eyed to spot this!)

I noticed it after sending. I do use it, but there was a bug in my cccmd 
script. I have fixed it. I hope, it won't miss from next time.

> 
> On 28/08/17 13:53, Pratyush Anand wrote:
>> Testcase:
>> cd /sys/kernel/debug/tracing/
>> echo schedule > set_graph_notrace
>> echo 1 > options/display-graph
>> echo wakeup > current_tracer
>> ps -ef | grep -i agent
>>
>> Above commands result in
>> PANIC: "Unable to handle kernel paging request at virtual address
>> ffff801bcbde7000"
> 
> This didn't panic for me, it just killed the running task. (I guess you have
> panic-on-oops set)
> 

yes..

> 
>> vmcore analysis:
>> 1)
>> crash> bt
>> PID: 1561   TASK: ffff8003cb7e4000  CPU: 0   COMMAND: "ps"
>>   #0 [ffff8003c4ff77b0] crash_kexec at ffff00000816b9b8
>>   #1 [ffff8003c4ff77e0] die at ffff000008088b34
>>   #2 [ffff8003c4ff7820] __do_kernel_fault at ffff00000809b830
>>   #3 [ffff8003c4ff7850] do_bad_area at ffff000008098b90
>>   #4 [ffff8003c4ff7880] do_translation_fault at ffff0000087c6cdc
>>   #5 [ffff8003c4ff78b0] do_mem_abort at ffff000008081334
>>   #6 [ffff8003c4ff7ab0] el1_ia at ffff000008082cc0
>>       PC: ffff00000808811c  [unwind_frame+300]
>>       LR: ffff0000080858a8  [get_wchan+212]
>>       SP: ffff8003c4ff7ab0  PSTATE: 60000145
>>      X29: ffff8003c4ff7ab0  X28: 0000000000000001  X27: 0000000000000000
>>      X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
>>      X23: ffff8003c1c20000  X22: ffff000008c73000  X21: ffff8003c1c1c000
>>      X20: 000000000000000f  X19: ffff8003c1bc7000  X18: 0000000000000010
>>      X17: 0000000000000000  X16: 0000000000000001  X15: ffffffffffffffed
>>      X14: 0000000000000010  X13: ffffffffffffffff  X12: 0000000000000004
>>      X11: 0000000000000000  X10: 0000000002dd14c0   X9: 1999999999999999
>>       X8: 000000000000003f   X7: 00008003f71b0000   X6: 0000000000000018
>>       X5: 0000000000000000   X4: ffff8003c1c1fd20   X3: ffff8003c1c1fd10
>>       X2: 00000017ffe80000   X1: ffff8003c4ff7af8   X0: ffff8003cbf67000
>>   #7 [ffff8003c4ff7b20] do_task_stat at ffff000008304f0c
>>   #8 [ffff8003c4ff7c60] proc_tgid_stat at ffff000008305b48
>>   #9 [ffff8003c4ff7ca0] proc_single_show at ffff0000082fdd10
>>   #10 [ffff8003c4ff7ce0] seq_read at ffff0000082b27bc
>>   #11 [ffff8003c4ff7d70] __vfs_read at ffff000008289e54
>>   #12 [ffff8003c4ff7e30] vfs_read at ffff00000828b14c
>>   #13 [ffff8003c4ff7e70] sys_read at ffff00000828c2d0
>>   #14 [ffff8003c4ff7ed0] __sys_trace at ffff00000808349c
>>       PC: 00000006  LR: 00000000  SP: ffffffffffffffed  PSTATE: 0000003f
>>      X12: 00000010 X11: ffffffffffffffff X10: 00000004  X9: ffff7febe8d0
>>       X8: 00000000  X7: 1999999999999999  X6: 0000003f  X5: 0000000c
>>       X4: ffff7fce9c78  X3: 0000000c  X2: 00000000  X1: 00000000
>>       X0: 00000400
>>
>> (2) Instruction at ffff00000808811c caused IA/DA.
>>
>> crash> dis -l ffff000008088108 6
>> /usr/src/debug/xxxxxxxxxxxx/xxxxxxxxxx/arch/arm64/kernel/stacktrace.c:
>> 84
>> 0xffff000008088108 <unwind_frame+280>:  ldr     w2, [x1,#24]
>> 0xffff00000808810c <unwind_frame+284>:  sub     w6, w2, #0x1
>> 0xffff000008088110 <unwind_frame+288>:  str     w6, [x1,#24]
>> 0xffff000008088114 <unwind_frame+292>:  mov     w6, #0x18 // #24
>> 0xffff000008088118 <unwind_frame+296>:  umull   x2, w2, w6
>> 0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]
>>
>> Corresponding c statement is
>> frame->pc = tsk->ret_stack[frame->graph--].ret;
>>
>> (3) So, it caused data abort while executing
>> 0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]
>>
>> x0 + x2 = ffff8003cbf67000 + 00000017ffe80000 = ffff801bcbde7000
>> Access of ffff801bcbde7000 resulted in "Unable to handle kernel paging
>> request"
>>
>> from above data: frame->graph = task->curr_ret_stack which  should be,
>> x2 / 0x18 = FFFF0000, which is -FTRACE_NOTRACE_DEPTH.
> 
> (0x18 is the size of struct ftrace_ret_stack for your config?)

yes.

> 
> 
>> OK, so problem is here:
>> do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
>> -FTRACE_NOTRACE_DEPTH in the failed case.
> 
>> It means, when do_task_stat()
>> has been called for task A (ps in this case) on CPUm,task A was in mid
>> of execution on CPUn,
> 
> get_wchan() on a running processes can't work: the stack may be modified while
> we walk it.
>  From arch/arm64/kernel/process.c::get_wchan():
>> 	if (!p || p == current || p->state == TASK_RUNNING)
>> 		return 0;
> 
> As far as I can see, if task A is running on CPU-N then CPU-Ms attempt to
> get_wchan() it will return 0.

You seems right.

> 
> 
>> and was in the mid of mcount() execution where
>> curr_ret_stack had been decremented in ftrace_push_return_trace() for
>> hitting schedule() function, but it was yet to be incremented in
>> ftrace_return_to_handler().
> 
> So if the function-graph-tracer has hooked the return values on a stack and hit
> a filtered function, (schedule() in your example), we can't unwind it as
> ftrace_push_return_trace() has biased the index with a 'huge negative' offset to
> flag it as 'this should be filtered out'.
> 
> The arm64 stack walker isn't aware of this and interprets it as an unsigned
> index. Nasty!
> 

Hummm...frame.graph always takes tsk->curr_ret_stack which is int. It has been 
assigned as -1 in arch/arm64/kernel/time.c. frame.graph should be defined as int.


> 
>> Similar problem we can have with calling of walk_stackframe() from
>> save_stack_trace_tsk() or dump_backtrace().
>>
>> This patch fixes unwind_frame() to not to manipulate frame->pc for
> 
> Nit: strictly this is is 'restore frame->pc from ftrace's ret_stack', but I
> think we still need to do this restore...
> 
> 
>> function graph tracer if the function has been marked in
>> set_graph_notrace.
> 
> Nit: ftrace describes these as 'filtered out', set_graph_notrace is the debugfs
> interface.
> 
> 
>> This patch fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's
>> output under function graph tracer)
> 
>> Signed-off-by: Pratyush Anand <panand@redhat.com>
> 
> 
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index 09d37d66b630..e79035d673b3 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -74,7 +74,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct
> stackframe *frame)
>>
>>   #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>   	if (tsk->ret_stack &&
>> -			(frame->pc == (unsigned long)return_to_handler)) {
>> +			(frame->pc == (unsigned long)return_to_handler) &&
> 
>> +			(frame->graph > -1)) {
> 
> This should give you a compiler warning: its declared as an unsigned int in
> struct stackframe.

I did not get this warning :(
# gcc --version
gcc (GCC) 4.8.5 20150623

> 
> 
> ... but with this patch /proc/<pid>/wchan now reports:
>> cat /proc/1/wchan
>> return_to_handler
> 
> So it looks like 'filtered out' doesn't mean 'not hooked'. I think a more
> complete fix is to test if frame->graph is negative and add FTRACE_NOTRACE_DEPTH
> back to it:
> 

Yep, it should allow to read correct /proc/.../wchan.


>  From kernel/trace/ftrace_functions_graph.c::trace_pop_return_trace():
>> 	index = current->curr_ret_stack;
>>
>> 	/*
>> 	 * A negative index here means that it's just returned from a
>> 	 * notrace'd function.  Recover index to get an original
>> 	 * return address.  See ftrace_push_return_trace().
>> 	 *
>> 	 * TODO: Need to check whether the stack gets corrupted.
>> 	 */
>> 	if (index < 0)
>> 		index += FTRACE_NOTRACE_DEPTH;
> 
> (looks like this is the only magic offset)
> 
> 
> I don't think we need to preserve the vmcore debugging in the kernel log, could
> you cut the commit message down to describe the negative curr_ret_stack being
> interpreted as a signed value instead of skipped by unwind_frame(), then have
> the reproducer and a chunk of the splat.
> 
> 
> Thanks for getting to the bottom of this, it looks like this was a mammoth
> debugging session!
> 

Certainly few hours :-).

Thanks for your review.

-- 
Regards
Pratyush

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace
@ 2017-08-30 12:59     ` Pratyush Anand
  0 siblings, 0 replies; 6+ messages in thread
From: Pratyush Anand @ 2017-08-30 12:59 UTC (permalink / raw)
  To: linux-arm-kernel



On Tuesday 29 August 2017 10:33 PM, James Morse wrote:
> Hi Pratyush,
> 
> (Nit: get_maintainer.pl will give you the list of people to CC, you're reliant
> on the maintainers being eagle-eyed to spot this!)

I noticed it after sending. I do use it, but there was a bug in my cccmd 
script. I have fixed it. I hope, it won't miss from next time.

> 
> On 28/08/17 13:53, Pratyush Anand wrote:
>> Testcase:
>> cd /sys/kernel/debug/tracing/
>> echo schedule > set_graph_notrace
>> echo 1 > options/display-graph
>> echo wakeup > current_tracer
>> ps -ef | grep -i agent
>>
>> Above commands result in
>> PANIC: "Unable to handle kernel paging request at virtual address
>> ffff801bcbde7000"
> 
> This didn't panic for me, it just killed the running task. (I guess you have
> panic-on-oops set)
> 

yes..

> 
>> vmcore analysis:
>> 1)
>> crash> bt
>> PID: 1561   TASK: ffff8003cb7e4000  CPU: 0   COMMAND: "ps"
>>   #0 [ffff8003c4ff77b0] crash_kexec at ffff00000816b9b8
>>   #1 [ffff8003c4ff77e0] die at ffff000008088b34
>>   #2 [ffff8003c4ff7820] __do_kernel_fault at ffff00000809b830
>>   #3 [ffff8003c4ff7850] do_bad_area at ffff000008098b90
>>   #4 [ffff8003c4ff7880] do_translation_fault at ffff0000087c6cdc
>>   #5 [ffff8003c4ff78b0] do_mem_abort at ffff000008081334
>>   #6 [ffff8003c4ff7ab0] el1_ia at ffff000008082cc0
>>       PC: ffff00000808811c  [unwind_frame+300]
>>       LR: ffff0000080858a8  [get_wchan+212]
>>       SP: ffff8003c4ff7ab0  PSTATE: 60000145
>>      X29: ffff8003c4ff7ab0  X28: 0000000000000001  X27: 0000000000000000
>>      X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
>>      X23: ffff8003c1c20000  X22: ffff000008c73000  X21: ffff8003c1c1c000
>>      X20: 000000000000000f  X19: ffff8003c1bc7000  X18: 0000000000000010
>>      X17: 0000000000000000  X16: 0000000000000001  X15: ffffffffffffffed
>>      X14: 0000000000000010  X13: ffffffffffffffff  X12: 0000000000000004
>>      X11: 0000000000000000  X10: 0000000002dd14c0   X9: 1999999999999999
>>       X8: 000000000000003f   X7: 00008003f71b0000   X6: 0000000000000018
>>       X5: 0000000000000000   X4: ffff8003c1c1fd20   X3: ffff8003c1c1fd10
>>       X2: 00000017ffe80000   X1: ffff8003c4ff7af8   X0: ffff8003cbf67000
>>   #7 [ffff8003c4ff7b20] do_task_stat at ffff000008304f0c
>>   #8 [ffff8003c4ff7c60] proc_tgid_stat at ffff000008305b48
>>   #9 [ffff8003c4ff7ca0] proc_single_show at ffff0000082fdd10
>>   #10 [ffff8003c4ff7ce0] seq_read at ffff0000082b27bc
>>   #11 [ffff8003c4ff7d70] __vfs_read at ffff000008289e54
>>   #12 [ffff8003c4ff7e30] vfs_read at ffff00000828b14c
>>   #13 [ffff8003c4ff7e70] sys_read at ffff00000828c2d0
>>   #14 [ffff8003c4ff7ed0] __sys_trace at ffff00000808349c
>>       PC: 00000006  LR: 00000000  SP: ffffffffffffffed  PSTATE: 0000003f
>>      X12: 00000010 X11: ffffffffffffffff X10: 00000004  X9: ffff7febe8d0
>>       X8: 00000000  X7: 1999999999999999  X6: 0000003f  X5: 0000000c
>>       X4: ffff7fce9c78  X3: 0000000c  X2: 00000000  X1: 00000000
>>       X0: 00000400
>>
>> (2) Instruction at ffff00000808811c caused IA/DA.
>>
>> crash> dis -l ffff000008088108 6
>> /usr/src/debug/xxxxxxxxxxxx/xxxxxxxxxx/arch/arm64/kernel/stacktrace.c:
>> 84
>> 0xffff000008088108 <unwind_frame+280>:  ldr     w2, [x1,#24]
>> 0xffff00000808810c <unwind_frame+284>:  sub     w6, w2, #0x1
>> 0xffff000008088110 <unwind_frame+288>:  str     w6, [x1,#24]
>> 0xffff000008088114 <unwind_frame+292>:  mov     w6, #0x18 // #24
>> 0xffff000008088118 <unwind_frame+296>:  umull   x2, w2, w6
>> 0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]
>>
>> Corresponding c statement is
>> frame->pc = tsk->ret_stack[frame->graph--].ret;
>>
>> (3) So, it caused data abort while executing
>> 0xffff00000808811c <unwind_frame+300>:  ldr     x0, [x0,x2]
>>
>> x0 + x2 = ffff8003cbf67000 + 00000017ffe80000 = ffff801bcbde7000
>> Access of ffff801bcbde7000 resulted in "Unable to handle kernel paging
>> request"
>>
>> from above data: frame->graph = task->curr_ret_stack which  should be,
>> x2 / 0x18 = FFFF0000, which is -FTRACE_NOTRACE_DEPTH.
> 
> (0x18 is the size of struct ftrace_ret_stack for your config?)

yes.

> 
> 
>> OK, so problem is here:
>> do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
>> -FTRACE_NOTRACE_DEPTH in the failed case.
> 
>> It means, when do_task_stat()
>> has been called for task A (ps in this case) on CPUm,task A was in mid
>> of execution on CPUn,
> 
> get_wchan() on a running processes can't work: the stack may be modified while
> we walk it.
>  From arch/arm64/kernel/process.c::get_wchan():
>> 	if (!p || p == current || p->state == TASK_RUNNING)
>> 		return 0;
> 
> As far as I can see, if task A is running on CPU-N then CPU-Ms attempt to
> get_wchan() it will return 0.

You seems right.

> 
> 
>> and was in the mid of mcount() execution where
>> curr_ret_stack had been decremented in ftrace_push_return_trace() for
>> hitting schedule() function, but it was yet to be incremented in
>> ftrace_return_to_handler().
> 
> So if the function-graph-tracer has hooked the return values on a stack and hit
> a filtered function, (schedule() in your example), we can't unwind it as
> ftrace_push_return_trace() has biased the index with a 'huge negative' offset to
> flag it as 'this should be filtered out'.
> 
> The arm64 stack walker isn't aware of this and interprets it as an unsigned
> index. Nasty!
> 

Hummm...frame.graph always takes tsk->curr_ret_stack which is int. It has been 
assigned as -1 in arch/arm64/kernel/time.c. frame.graph should be defined as int.


> 
>> Similar problem we can have with calling of walk_stackframe() from
>> save_stack_trace_tsk() or dump_backtrace().
>>
>> This patch fixes unwind_frame() to not to manipulate frame->pc for
> 
> Nit: strictly this is is 'restore frame->pc from ftrace's ret_stack', but I
> think we still need to do this restore...
> 
> 
>> function graph tracer if the function has been marked in
>> set_graph_notrace.
> 
> Nit: ftrace describes these as 'filtered out', set_graph_notrace is the debugfs
> interface.
> 
> 
>> This patch fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's
>> output under function graph tracer)
> 
>> Signed-off-by: Pratyush Anand <panand@redhat.com>
> 
> 
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index 09d37d66b630..e79035d673b3 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -74,7 +74,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct
> stackframe *frame)
>>
>>   #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>   	if (tsk->ret_stack &&
>> -			(frame->pc == (unsigned long)return_to_handler)) {
>> +			(frame->pc == (unsigned long)return_to_handler) &&
> 
>> +			(frame->graph > -1)) {
> 
> This should give you a compiler warning: its declared as an unsigned int in
> struct stackframe.

I did not get this warning :(
# gcc --version
gcc (GCC) 4.8.5 20150623

> 
> 
> ... but with this patch /proc/<pid>/wchan now reports:
>> cat /proc/1/wchan
>> return_to_handler
> 
> So it looks like 'filtered out' doesn't mean 'not hooked'. I think a more
> complete fix is to test if frame->graph is negative and add FTRACE_NOTRACE_DEPTH
> back to it:
> 

Yep, it should allow to read correct /proc/.../wchan.


>  From kernel/trace/ftrace_functions_graph.c::trace_pop_return_trace():
>> 	index = current->curr_ret_stack;
>>
>> 	/*
>> 	 * A negative index here means that it's just returned from a
>> 	 * notrace'd function.  Recover index to get an original
>> 	 * return address.  See ftrace_push_return_trace().
>> 	 *
>> 	 * TODO: Need to check whether the stack gets corrupted.
>> 	 */
>> 	if (index < 0)
>> 		index += FTRACE_NOTRACE_DEPTH;
> 
> (looks like this is the only magic offset)
> 
> 
> I don't think we need to preserve the vmcore debugging in the kernel log, could
> you cut the commit message down to describe the negative curr_ret_stack being
> interpreted as a signed value instead of skipped by unwind_frame(), then have
> the reproducer and a chunk of the splat.
> 
> 
> Thanks for getting to the bottom of this, it looks like this was a mammoth
> debugging session!
> 

Certainly few hours :-).

Thanks for your review.

-- 
Regards
Pratyush

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-08-30 13:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-28 12:53 [PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace Pratyush Anand
2017-08-28 12:53 ` Pratyush Anand
2017-08-29 17:03 ` James Morse
2017-08-29 17:03   ` James Morse
2017-08-30 12:59   ` Pratyush Anand
2017-08-30 12:59     ` Pratyush Anand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.