* [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-02 9:41 Jiping Ma 2019-08-02 15:22 ` Steven Rostedt 0 siblings, 1 reply; 28+ messages in thread From: Jiping Ma @ 2019-08-02 9:41 UTC (permalink / raw) To: rostedt, mingo, catalin.marinas, will.deacon, joel Cc: linux-kernel, linux-arm-kernel, jiping.ma2 There is not PC in ARM64 stack, LR is used to for walk_stackframe in ARM64. Tere is no the issue in ARM32 because there is PC in ARM32 stack. PC is used to calculate the stack size in trace_stack.c, so the function name and its stack size appear to be off-by-one. ARM64 stack layout: LR FP ...... LR FP ...... Wrong info: Depth Size Location (16 entries) ----- ---- -------- 0) 5400 16 __update_load_avg_se.isra.2+0x28/0x220 1) 5384 96 put_prev_entity+0x250/0x338 2) 5288 80 pick_next_task_fair+0x4c4/0x508 3) 5208 72 __schedule+0x100/0x600 4) 5136 184 preempt_schedule_common+0x28/0x48 5) 4952 32 preempt_schedule+0x28/0x30 6) 4920 16 vprintk_emit+0x170/0x1f8 7) 4904 128 vprintk_default+0x48/0x58 8) 4776 64 vprintk_func+0xf8/0x1c8 9) 4712 112 printk+0x70/0x90 10) 4600 176 occupy_stack_init+0x64/0xc0 [kernel_stack] 11) 4424 3376 do_one_initcall+0x68/0x248 12) 1048 144 do_init_module+0x60/0x1f0 13) 904 48 load_module+0x1d50/0x2340 14) 856 352 sys_finit_module+0xd0/0xe8 15) 504 504 el0_svc_naked+0x30/0x34 Correct info: Depth Size Location (18 entries) ----- ---- -------- 0) 5464 48 cgroup_rstat_updated+0x20/0x100 1) 5416 32 cgroup_base_stat_cputime_account_end.isra.0+0x30/0x60 2) 5384 32 __cgroup_account_cputime+0x3c/0x48 3) 5352 64 update_curr+0xc4/0x1d0 4) 5288 72 pick_next_task_fair+0x444/0x508 5) 5216 184 __schedule+0x100/0x600 6) 5032 32 preempt_schedule_common+0x28/0x48 7) 5000 16 preempt_schedule+0x28/0x30 8) 4984 128 vprintk_emit+0x170/0x1f8 9) 4856 64 vprintk_default+0x48/0x58 10) 4792 112 vprintk_func+0xf8/0x1c8 11) 4680 176 printk+0x70/0x90 12) 4504 80 func_test+0x7c/0xb8 [kernel_stack] 13) 4424 3376 occupy_stack_init+0x7c/0xc0 [kernel_stack] 14) 1048 144 do_one_initcall+0x68/0x248 15) 904 48 do_init_module+0x60/0x1f0 16) 856 352 load_module+0x1d50/0x2340 17) 504 504 sys_finit_module+0xd0/0xe8 Signed-off-by: Jiping Ma <jiping.ma2@windriver.com> --- kernel/trace/trace_stack.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index 5d16f73898db..ed80b95abf06 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -40,16 +40,28 @@ static void print_max_stack(void) pr_emerg(" Depth Size Location (%d entries)\n" " ----- ---- --------\n", +#ifdef CONFIG_ARM64 + stack_trace_nr_entries - 1); +#else stack_trace_nr_entries); - +#endif +#ifdef CONFIG_ARM64 + for (i = 1; i < stack_trace_nr_entries; i++) { +#else for (i = 0; i < stack_trace_nr_entries; i++) { +#endif if (i + 1 == stack_trace_nr_entries) size = stack_trace_index[i]; else size = stack_trace_index[i] - stack_trace_index[i+1]; +#ifdef CONFIG_ARM64 + pr_emerg("%3ld) %8d %5d %pS\n", i-1, stack_trace_index[i], + size, (void *)stack_dump_trace[i-1]); +#else pr_emerg("%3ld) %8d %5d %pS\n", i, stack_trace_index[i], size, (void *)stack_dump_trace[i]); +#endif } } @@ -324,8 +336,11 @@ static int t_show(struct seq_file *m, void *v) seq_printf(m, " Depth Size Location" " (%d entries)\n" " ----- ---- --------\n", +#ifdef CONFIG_ARM64 + stack_trace_nr_entries - 1); +#else stack_trace_nr_entries); - +#endif if (!stack_tracer_enabled && !stack_trace_max_size) print_disabled(m); @@ -334,6 +349,10 @@ static int t_show(struct seq_file *m, void *v) i = *(long *)v; +#ifdef CONFIG_ARM64 + if (i == 0) + return 0; +#endif if (i >= stack_trace_nr_entries) return 0; @@ -342,9 +361,14 @@ static int t_show(struct seq_file *m, void *v) else size = stack_trace_index[i] - stack_trace_index[i+1]; +#ifdef CONFIG_ARM64 + seq_printf(m, "%3ld) %8d %5d ", i-1, stack_trace_index[i], size); + trace_lookup_stack(m, i-1); +#else seq_printf(m, "%3ld) %8d %5d ", i, stack_trace_index[i], size); trace_lookup_stack(m, i); +#endif return 0; } -- 2.18.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-02 9:41 [PATCH v3] tracing: Function stack size and its name mismatch in arm64 Jiping Ma @ 2019-08-02 15:22 ` Steven Rostedt 0 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-02 15:22 UTC (permalink / raw) To: Jiping Ma Cc: mingo, catalin.marinas, will.deacon, joel, linux-kernel, linux-arm-kernel On Fri, 2 Aug 2019 17:41:03 +0800 Jiping Ma <jiping.ma2@windriver.com> wrote: First of all, this patch will not be accepted. I will not allow arch specific defines in generic code like this. That said, there are better solutions. > There is not PC in ARM64 stack, LR is used to for walk_stackframe in > ARM64. Tere is no the issue in ARM32 because there is PC in ARM32 stack. > PC is used to calculate the stack size in trace_stack.c, so the > function name and its stack size appear to be off-by-one. > ARM64 stack layout: > LR > FP > ...... > LR > FP > ...... I think you are not explaining the issue correctly. From looking at the document, I think what you want to say is that the LR is saved *after* the data for the function. Is that correct? If so, then yes, it would cause the stack tracing algorithm to be incorrect. Most archs do this: On entry to a function: save return address reserve local variables and such for current function I think you are saying that arm64 does this backwards. reserve local variables and such for current function save return address (LR) For normal archs, we usually have something like this: sys_call_entry: [save regs and crap on stack ] bl sys_finit_module sys_call_entry+x: [ here's the address put into the lr regsister ] sys_finit_module: save lr on stack [syscall_entry+x] save more stuff on stack [..] bl load_module sys_finit_module+y: load_module: save lr on stack [sys_finit_module+y] save more stuff on stack [..] bl do_init_module load_module+z: do_init_module: save lr on stack [load_module+z] [ let's say we do the stack trace here! ] On the stack we have: data for kernel entry sys_call_entry+x [ lr saved in sys_finit_module ] data for sys_finit_module sys_finit_module+y [ lr saved in load_module ] data for load_module load_module+z [ lr saved in do_init_module When we call stack_trace_save() we get: sys_call_entry+x sys_finit_module+y load_module+z We then find those locations on the stack. Let's add some numbers here: 0: top of stack data for kernel entry 504: sys_call_entry+x data for sys_finit_module 856: sys_finit_module+y data for sys_load_module 904: load_module+z 1048: BOTTOM OF STACK HERE To do the work, we start from the bottom of the stack, looking for what was returned from the stack_trace_save start = 1048 *p == load_module+z stack size = 1048 - 904 Then we say load_module stack size is 144. What you are describing doesn't make any sense. Yes, we all add LR to the stack, not the PC! But if the issue is that the LR is saved *after* the function data, then that *will* make a difference. We then have: sys_call_entry: [save regs and crap on stack ] bl sys_finit_module sys_call_entry+x: [ here's the address put into the lr regsister ] sys_finit_module: save sys_finit_module stuff on stack save lr on stack [syscall_entry+x] <-- reversed order from before! [..] bl load_module sys_finit_module+y: load_module: save load_module stuff on stack save lr on stack [sys_finit_module+y] [..] bl do_init_module load_module+z: do_init_module: [ let's say we do the stack trace here! ] On the stack we have: data for sys_call_entry data for sys_finit_module sys_call_entry+x [ lr saved in sys_finit_module ] data for load_module sys_finit_module+y [ lr saved in load_module ] data for do_init_module load_module+z [ lr saved in do_init_module This also brings up the question of when the mcount is called? We get the same data from stack_trace_save(): sys_call_entry+x sys_finit_module+y load_module+z But this time, the stack is different! 0: top of stack data from kernel entry data for sys_call_entry data for sys_finit_module 504: sys_call_entry+x data for sys_finit_module 856: sys_finit_module+y data for load_module 904: load_module+z data for do_init_module? 1048: BOTTOM OF STACK HERE So yes, the same algorithm is will come up with the wrong result, because... start = 1048 *p == load_module+z stack size = 1048 - 904 We just calculated the size of data for do_init_module, and not for load_module! Can someone confirm that this is the real issue? -- Steve > > Wrong info: > Depth Size Location (16 entries) > ----- ---- -------- > 0) 5400 16 __update_load_avg_se.isra.2+0x28/0x220 > 1) 5384 96 put_prev_entity+0x250/0x338 > 2) 5288 80 pick_next_task_fair+0x4c4/0x508 > 3) 5208 72 __schedule+0x100/0x600 > 4) 5136 184 preempt_schedule_common+0x28/0x48 > 5) 4952 32 preempt_schedule+0x28/0x30 > 6) 4920 16 vprintk_emit+0x170/0x1f8 > 7) 4904 128 vprintk_default+0x48/0x58 > 8) 4776 64 vprintk_func+0xf8/0x1c8 > 9) 4712 112 printk+0x70/0x90 > 10) 4600 176 occupy_stack_init+0x64/0xc0 [kernel_stack] > 11) 4424 3376 do_one_initcall+0x68/0x248 > 12) 1048 144 do_init_module+0x60/0x1f0 > 13) 904 48 load_module+0x1d50/0x2340 > 14) 856 352 sys_finit_module+0xd0/0xe8 > 15) 504 504 el0_svc_naked+0x30/0x34 > > Correct info: > Depth Size Location (18 entries) > ----- ---- -------- > 0) 5464 48 cgroup_rstat_updated+0x20/0x100 > 1) 5416 32 cgroup_base_stat_cputime_account_end.isra.0+0x30/0x60 > 2) 5384 32 __cgroup_account_cputime+0x3c/0x48 > 3) 5352 64 update_curr+0xc4/0x1d0 > 4) 5288 72 pick_next_task_fair+0x444/0x508 > 5) 5216 184 __schedule+0x100/0x600 > 6) 5032 32 preempt_schedule_common+0x28/0x48 > 7) 5000 16 preempt_schedule+0x28/0x30 > 8) 4984 128 vprintk_emit+0x170/0x1f8 > 9) 4856 64 vprintk_default+0x48/0x58 > 10) 4792 112 vprintk_func+0xf8/0x1c8 > 11) 4680 176 printk+0x70/0x90 > 12) 4504 80 func_test+0x7c/0xb8 [kernel_stack] > 13) 4424 3376 occupy_stack_init+0x7c/0xc0 [kernel_stack] > 14) 1048 144 do_one_initcall+0x68/0x248 > 15) 904 48 do_init_module+0x60/0x1f0 > 16) 856 352 load_module+0x1d50/0x2340 > 17) 504 504 sys_finit_module+0xd0/0xe8 > > Signed-off-by: Jiping Ma <jiping.ma2@windriver.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-02 15:22 ` Steven Rostedt 0 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-02 15:22 UTC (permalink / raw) To: Jiping Ma Cc: catalin.marinas, will.deacon, linux-kernel, mingo, joel, linux-arm-kernel On Fri, 2 Aug 2019 17:41:03 +0800 Jiping Ma <jiping.ma2@windriver.com> wrote: First of all, this patch will not be accepted. I will not allow arch specific defines in generic code like this. That said, there are better solutions. > There is not PC in ARM64 stack, LR is used to for walk_stackframe in > ARM64. Tere is no the issue in ARM32 because there is PC in ARM32 stack. > PC is used to calculate the stack size in trace_stack.c, so the > function name and its stack size appear to be off-by-one. > ARM64 stack layout: > LR > FP > ...... > LR > FP > ...... I think you are not explaining the issue correctly. From looking at the document, I think what you want to say is that the LR is saved *after* the data for the function. Is that correct? If so, then yes, it would cause the stack tracing algorithm to be incorrect. Most archs do this: On entry to a function: save return address reserve local variables and such for current function I think you are saying that arm64 does this backwards. reserve local variables and such for current function save return address (LR) For normal archs, we usually have something like this: sys_call_entry: [save regs and crap on stack ] bl sys_finit_module sys_call_entry+x: [ here's the address put into the lr regsister ] sys_finit_module: save lr on stack [syscall_entry+x] save more stuff on stack [..] bl load_module sys_finit_module+y: load_module: save lr on stack [sys_finit_module+y] save more stuff on stack [..] bl do_init_module load_module+z: do_init_module: save lr on stack [load_module+z] [ let's say we do the stack trace here! ] On the stack we have: data for kernel entry sys_call_entry+x [ lr saved in sys_finit_module ] data for sys_finit_module sys_finit_module+y [ lr saved in load_module ] data for load_module load_module+z [ lr saved in do_init_module When we call stack_trace_save() we get: sys_call_entry+x sys_finit_module+y load_module+z We then find those locations on the stack. Let's add some numbers here: 0: top of stack data for kernel entry 504: sys_call_entry+x data for sys_finit_module 856: sys_finit_module+y data for sys_load_module 904: load_module+z 1048: BOTTOM OF STACK HERE To do the work, we start from the bottom of the stack, looking for what was returned from the stack_trace_save start = 1048 *p == load_module+z stack size = 1048 - 904 Then we say load_module stack size is 144. What you are describing doesn't make any sense. Yes, we all add LR to the stack, not the PC! But if the issue is that the LR is saved *after* the function data, then that *will* make a difference. We then have: sys_call_entry: [save regs and crap on stack ] bl sys_finit_module sys_call_entry+x: [ here's the address put into the lr regsister ] sys_finit_module: save sys_finit_module stuff on stack save lr on stack [syscall_entry+x] <-- reversed order from before! [..] bl load_module sys_finit_module+y: load_module: save load_module stuff on stack save lr on stack [sys_finit_module+y] [..] bl do_init_module load_module+z: do_init_module: [ let's say we do the stack trace here! ] On the stack we have: data for sys_call_entry data for sys_finit_module sys_call_entry+x [ lr saved in sys_finit_module ] data for load_module sys_finit_module+y [ lr saved in load_module ] data for do_init_module load_module+z [ lr saved in do_init_module This also brings up the question of when the mcount is called? We get the same data from stack_trace_save(): sys_call_entry+x sys_finit_module+y load_module+z But this time, the stack is different! 0: top of stack data from kernel entry data for sys_call_entry data for sys_finit_module 504: sys_call_entry+x data for sys_finit_module 856: sys_finit_module+y data for load_module 904: load_module+z data for do_init_module? 1048: BOTTOM OF STACK HERE So yes, the same algorithm is will come up with the wrong result, because... start = 1048 *p == load_module+z stack size = 1048 - 904 We just calculated the size of data for do_init_module, and not for load_module! Can someone confirm that this is the real issue? -- Steve > > Wrong info: > Depth Size Location (16 entries) > ----- ---- -------- > 0) 5400 16 __update_load_avg_se.isra.2+0x28/0x220 > 1) 5384 96 put_prev_entity+0x250/0x338 > 2) 5288 80 pick_next_task_fair+0x4c4/0x508 > 3) 5208 72 __schedule+0x100/0x600 > 4) 5136 184 preempt_schedule_common+0x28/0x48 > 5) 4952 32 preempt_schedule+0x28/0x30 > 6) 4920 16 vprintk_emit+0x170/0x1f8 > 7) 4904 128 vprintk_default+0x48/0x58 > 8) 4776 64 vprintk_func+0xf8/0x1c8 > 9) 4712 112 printk+0x70/0x90 > 10) 4600 176 occupy_stack_init+0x64/0xc0 [kernel_stack] > 11) 4424 3376 do_one_initcall+0x68/0x248 > 12) 1048 144 do_init_module+0x60/0x1f0 > 13) 904 48 load_module+0x1d50/0x2340 > 14) 856 352 sys_finit_module+0xd0/0xe8 > 15) 504 504 el0_svc_naked+0x30/0x34 > > Correct info: > Depth Size Location (18 entries) > ----- ---- -------- > 0) 5464 48 cgroup_rstat_updated+0x20/0x100 > 1) 5416 32 cgroup_base_stat_cputime_account_end.isra.0+0x30/0x60 > 2) 5384 32 __cgroup_account_cputime+0x3c/0x48 > 3) 5352 64 update_curr+0xc4/0x1d0 > 4) 5288 72 pick_next_task_fair+0x444/0x508 > 5) 5216 184 __schedule+0x100/0x600 > 6) 5032 32 preempt_schedule_common+0x28/0x48 > 7) 5000 16 preempt_schedule+0x28/0x30 > 8) 4984 128 vprintk_emit+0x170/0x1f8 > 9) 4856 64 vprintk_default+0x48/0x58 > 10) 4792 112 vprintk_func+0xf8/0x1c8 > 11) 4680 176 printk+0x70/0x90 > 12) 4504 80 func_test+0x7c/0xb8 [kernel_stack] > 13) 4424 3376 occupy_stack_init+0x7c/0xc0 [kernel_stack] > 14) 1048 144 do_one_initcall+0x68/0x248 > 15) 904 48 do_init_module+0x60/0x1f0 > 16) 856 352 load_module+0x1d50/0x2340 > 17) 504 504 sys_finit_module+0xd0/0xe8 > > Signed-off-by: Jiping Ma <jiping.ma2@windriver.com> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-02 15:22 ` Steven Rostedt @ 2019-08-02 16:09 ` Steven Rostedt -1 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-02 16:09 UTC (permalink / raw) To: Jiping Ma Cc: mingo, catalin.marinas, will.deacon, joel, linux-kernel, linux-arm-kernel [-- Attachment #1: Type: text/plain, Size: 1556 bytes --] On Fri, 2 Aug 2019 11:22:59 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > I think you are not explaining the issue correctly. From looking at the > document, I think what you want to say is that the LR is saved *after* > the data for the function. Is that correct? If so, then yes, it would > cause the stack tracing algorithm to be incorrect. > [..] > Can someone confirm that this is the real issue? Does this patch fix your issue? -- Steve diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index 5ab5200b2bdc..13a4832cfb00 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -13,6 +13,7 @@ #define HAVE_FUNCTION_GRAPH_FP_TEST #define MCOUNT_ADDR ((unsigned long)_mcount) #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 #ifndef __ASSEMBLY__ #include <linux/compat.h> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index 5d16f73898db..050c6bd9beac 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) i++; } +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS + /* + * Most archs store the return address before storing the + * function's local variables. But some archs do this backwards. + */ + if (x > 1) { + memmove(&stack_trace_index[0], &stack_trace_index[1], + sizeof(stack_trace_index[0]) * (x - 1)); + x--; + } +#endif + stack_trace_nr_entries = x; if (task_stack_end_corrupted(current)) { [-- Attachment #2: f.patch --] [-- Type: text/x-patch, Size: 1100 bytes --] diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index 5ab5200b2bdc..13a4832cfb00 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -13,6 +13,7 @@ #define HAVE_FUNCTION_GRAPH_FP_TEST #define MCOUNT_ADDR ((unsigned long)_mcount) #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 #ifndef __ASSEMBLY__ #include <linux/compat.h> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index 5d16f73898db..050c6bd9beac 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) i++; } +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS + /* + * Most archs store the return address before storing the + * function's local variables. But some archs do this backwards. + */ + if (x > 1) { + memmove(&stack_trace_index[0], &stack_trace_index[1], + sizeof(stack_trace_index[0]) * (x - 1)); + x--; + } +#endif + stack_trace_nr_entries = x; if (task_stack_end_corrupted(current)) { ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-02 16:09 ` Steven Rostedt 0 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-02 16:09 UTC (permalink / raw) To: Jiping Ma Cc: catalin.marinas, will.deacon, linux-kernel, mingo, joel, linux-arm-kernel [-- Attachment #1: Type: text/plain, Size: 1556 bytes --] On Fri, 2 Aug 2019 11:22:59 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > I think you are not explaining the issue correctly. From looking at the > document, I think what you want to say is that the LR is saved *after* > the data for the function. Is that correct? If so, then yes, it would > cause the stack tracing algorithm to be incorrect. > [..] > Can someone confirm that this is the real issue? Does this patch fix your issue? -- Steve diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index 5ab5200b2bdc..13a4832cfb00 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -13,6 +13,7 @@ #define HAVE_FUNCTION_GRAPH_FP_TEST #define MCOUNT_ADDR ((unsigned long)_mcount) #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 #ifndef __ASSEMBLY__ #include <linux/compat.h> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index 5d16f73898db..050c6bd9beac 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) i++; } +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS + /* + * Most archs store the return address before storing the + * function's local variables. But some archs do this backwards. + */ + if (x > 1) { + memmove(&stack_trace_index[0], &stack_trace_index[1], + sizeof(stack_trace_index[0]) * (x - 1)); + x--; + } +#endif + stack_trace_nr_entries = x; if (task_stack_end_corrupted(current)) { [-- Attachment #2: f.patch --] [-- Type: text/x-patch, Size: 1100 bytes --] diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index 5ab5200b2bdc..13a4832cfb00 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -13,6 +13,7 @@ #define HAVE_FUNCTION_GRAPH_FP_TEST #define MCOUNT_ADDR ((unsigned long)_mcount) #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 #ifndef __ASSEMBLY__ #include <linux/compat.h> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index 5d16f73898db..050c6bd9beac 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) i++; } +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS + /* + * Most archs store the return address before storing the + * function's local variables. But some archs do this backwards. + */ + if (x > 1) { + memmove(&stack_trace_index[0], &stack_trace_index[1], + sizeof(stack_trace_index[0]) * (x - 1)); + x--; + } +#endif + stack_trace_nr_entries = x; if (task_stack_end_corrupted(current)) { [-- Attachment #3: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-02 16:09 ` Steven Rostedt @ 2019-08-02 16:11 ` Steven Rostedt -1 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-02 16:11 UTC (permalink / raw) To: Jiping Ma Cc: mingo, catalin.marinas, will.deacon, joel, linux-kernel, linux-arm-kernel On Fri, 2 Aug 2019 12:09:20 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > On Fri, 2 Aug 2019 11:22:59 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > I think you are not explaining the issue correctly. From looking at the > > document, I think what you want to say is that the LR is saved *after* > > the data for the function. Is that correct? If so, then yes, it would > > cause the stack tracing algorithm to be incorrect. > > > > [..] > > > Can someone confirm that this is the real issue? > > Does this patch fix your issue? > Bah, I hit "attach" instead of "insert" (I wondered why it didn't insert). Here's the patch without the attachment. -- Steve diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index 5ab5200b2bdc..13a4832cfb00 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -13,6 +13,7 @@ #define HAVE_FUNCTION_GRAPH_FP_TEST #define MCOUNT_ADDR ((unsigned long)_mcount) #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 #ifndef __ASSEMBLY__ #include <linux/compat.h> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index 5d16f73898db..050c6bd9beac 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) i++; } +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS + /* + * Most archs store the return address before storing the + * function's local variables. But some archs do this backwards. + */ + if (x > 1) { + memmove(&stack_trace_index[0], &stack_trace_index[1], + sizeof(stack_trace_index[0]) * (x - 1)); + x--; + } +#endif + stack_trace_nr_entries = x; if (task_stack_end_corrupted(current)) { ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-02 16:11 ` Steven Rostedt 0 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-02 16:11 UTC (permalink / raw) To: Jiping Ma Cc: catalin.marinas, will.deacon, linux-kernel, mingo, joel, linux-arm-kernel On Fri, 2 Aug 2019 12:09:20 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > On Fri, 2 Aug 2019 11:22:59 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > I think you are not explaining the issue correctly. From looking at the > > document, I think what you want to say is that the LR is saved *after* > > the data for the function. Is that correct? If so, then yes, it would > > cause the stack tracing algorithm to be incorrect. > > > > [..] > > > Can someone confirm that this is the real issue? > > Does this patch fix your issue? > Bah, I hit "attach" instead of "insert" (I wondered why it didn't insert). Here's the patch without the attachment. -- Steve diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index 5ab5200b2bdc..13a4832cfb00 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -13,6 +13,7 @@ #define HAVE_FUNCTION_GRAPH_FP_TEST #define MCOUNT_ADDR ((unsigned long)_mcount) #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 #ifndef __ASSEMBLY__ #include <linux/compat.h> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index 5d16f73898db..050c6bd9beac 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) i++; } +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS + /* + * Most archs store the return address before storing the + * function's local variables. But some archs do this backwards. + */ + if (x > 1) { + memmove(&stack_trace_index[0], &stack_trace_index[1], + sizeof(stack_trace_index[0]) * (x - 1)); + x--; + } +#endif + stack_trace_nr_entries = x; if (task_stack_end_corrupted(current)) { _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-02 16:11 ` Steven Rostedt @ 2019-08-06 15:48 ` Joel Fernandes -1 siblings, 0 replies; 28+ messages in thread From: Joel Fernandes @ 2019-08-06 15:48 UTC (permalink / raw) To: Steven Rostedt Cc: Jiping Ma, mingo, catalin.marinas, will.deacon, linux-kernel, linux-arm-kernel On Fri, Aug 02, 2019 at 12:11:24PM -0400, Steven Rostedt wrote: > On Fri, 2 Aug 2019 12:09:20 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Fri, 2 Aug 2019 11:22:59 -0400 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > I think you are not explaining the issue correctly. From looking at the > > > document, I think what you want to say is that the LR is saved *after* > > > the data for the function. Is that correct? If so, then yes, it would > > > cause the stack tracing algorithm to be incorrect. > > > > > > > [..] > > > > > Can someone confirm that this is the real issue? > > > > Does this patch fix your issue? > > > > Bah, I hit "attach" instead of "insert" (I wondered why it didn't > insert). Here's the patch without the attachment. > > -- Steve > > diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h > index 5ab5200b2bdc..13a4832cfb00 100644 > --- a/arch/arm64/include/asm/ftrace.h > +++ b/arch/arm64/include/asm/ftrace.h > @@ -13,6 +13,7 @@ > #define HAVE_FUNCTION_GRAPH_FP_TEST > #define MCOUNT_ADDR ((unsigned long)_mcount) > #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE > +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 > > #ifndef __ASSEMBLY__ > #include <linux/compat.h> > diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c > index 5d16f73898db..050c6bd9beac 100644 > --- a/kernel/trace/trace_stack.c > +++ b/kernel/trace/trace_stack.c > @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) > i++; > } > > +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS > + /* > + * Most archs store the return address before storing the > + * function's local variables. But some archs do this backwards. > + */ > + if (x > 1) { > + memmove(&stack_trace_index[0], &stack_trace_index[1], > + sizeof(stack_trace_index[0]) * (x - 1)); > + x--; > + } > +#endif > + > stack_trace_nr_entries = x; > > if (task_stack_end_corrupted(current)) { I am not fully understanding the fix :(. If the positions of the data and FP/LR are swapped, then there should be a loop of some sort where the FP/LR are copied repeatedly to undo the mess we are discussing. But in this patch I see only one copy happening. May be I just don't understand this code well enough. Are there any more clues for helping understand the fix? Also, this stack trace loop (original code) is a bit hairy :) It appears there is a call to stack_trace_save() followed by another loop that goes through the returned entries from there and tries to generate a set of indexes. Isn't the real issue that the entries returned by stack_trace_save() are a out of whack? I am curious also if other users of stack_trace_save() will suffer from the same issue. thanks, - Joel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-06 15:48 ` Joel Fernandes 0 siblings, 0 replies; 28+ messages in thread From: Joel Fernandes @ 2019-08-06 15:48 UTC (permalink / raw) To: Steven Rostedt Cc: Jiping Ma, catalin.marinas, will.deacon, linux-kernel, mingo, linux-arm-kernel On Fri, Aug 02, 2019 at 12:11:24PM -0400, Steven Rostedt wrote: > On Fri, 2 Aug 2019 12:09:20 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Fri, 2 Aug 2019 11:22:59 -0400 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > I think you are not explaining the issue correctly. From looking at the > > > document, I think what you want to say is that the LR is saved *after* > > > the data for the function. Is that correct? If so, then yes, it would > > > cause the stack tracing algorithm to be incorrect. > > > > > > > [..] > > > > > Can someone confirm that this is the real issue? > > > > Does this patch fix your issue? > > > > Bah, I hit "attach" instead of "insert" (I wondered why it didn't > insert). Here's the patch without the attachment. > > -- Steve > > diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h > index 5ab5200b2bdc..13a4832cfb00 100644 > --- a/arch/arm64/include/asm/ftrace.h > +++ b/arch/arm64/include/asm/ftrace.h > @@ -13,6 +13,7 @@ > #define HAVE_FUNCTION_GRAPH_FP_TEST > #define MCOUNT_ADDR ((unsigned long)_mcount) > #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE > +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 > > #ifndef __ASSEMBLY__ > #include <linux/compat.h> > diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c > index 5d16f73898db..050c6bd9beac 100644 > --- a/kernel/trace/trace_stack.c > +++ b/kernel/trace/trace_stack.c > @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) > i++; > } > > +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS > + /* > + * Most archs store the return address before storing the > + * function's local variables. But some archs do this backwards. > + */ > + if (x > 1) { > + memmove(&stack_trace_index[0], &stack_trace_index[1], > + sizeof(stack_trace_index[0]) * (x - 1)); > + x--; > + } > +#endif > + > stack_trace_nr_entries = x; > > if (task_stack_end_corrupted(current)) { I am not fully understanding the fix :(. If the positions of the data and FP/LR are swapped, then there should be a loop of some sort where the FP/LR are copied repeatedly to undo the mess we are discussing. But in this patch I see only one copy happening. May be I just don't understand this code well enough. Are there any more clues for helping understand the fix? Also, this stack trace loop (original code) is a bit hairy :) It appears there is a call to stack_trace_save() followed by another loop that goes through the returned entries from there and tries to generate a set of indexes. Isn't the real issue that the entries returned by stack_trace_save() are a out of whack? I am curious also if other users of stack_trace_save() will suffer from the same issue. thanks, - Joel _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-06 15:48 ` Joel Fernandes @ 2019-08-06 16:34 ` Steven Rostedt -1 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-06 16:34 UTC (permalink / raw) To: Joel Fernandes Cc: Jiping Ma, mingo, catalin.marinas, will.deacon, linux-kernel, linux-arm-kernel On Tue, 6 Aug 2019 11:48:11 -0400 Joel Fernandes <joel@joelfernandes.org> wrote: > > diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h > > index 5ab5200b2bdc..13a4832cfb00 100644 > > --- a/arch/arm64/include/asm/ftrace.h > > +++ b/arch/arm64/include/asm/ftrace.h > > @@ -13,6 +13,7 @@ > > #define HAVE_FUNCTION_GRAPH_FP_TEST > > #define MCOUNT_ADDR ((unsigned long)_mcount) > > #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE > > +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 > > > > #ifndef __ASSEMBLY__ > > #include <linux/compat.h> > > diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c > > index 5d16f73898db..050c6bd9beac 100644 > > --- a/kernel/trace/trace_stack.c > > +++ b/kernel/trace/trace_stack.c > > @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) > > i++; > > } > > > > +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS > > + /* > > + * Most archs store the return address before storing the > > + * function's local variables. But some archs do this backwards. > > + */ > > + if (x > 1) { > > + memmove(&stack_trace_index[0], &stack_trace_index[1], > > + sizeof(stack_trace_index[0]) * (x - 1)); > > + x--; > > + } > > +#endif > > + > > stack_trace_nr_entries = x; > > > > if (task_stack_end_corrupted(current)) { > > > I am not fully understanding the fix :(. If the positions of the data and > FP/LR are swapped, then there should be a loop of some sort where the FP/LR > are copied repeatedly to undo the mess we are discussing. But in this patch > I see only one copy happening. May be I just don't understand this code well > enough. Are there any more clues for helping understand the fix? Here's the best way to explain this. The code is using the stack trace to figure out which function is the stack hog. Or perhaps a serious of stack hogs. On x86, a call stores the return address as it calls the next function. Then that function allocates its stack frame for its local variables and saving of registers. on x86: [ top of stack ] 0: sys call entry frame 10: return addr to entry code 11: start of sys_foo frame 20: return addr to sys_foo 21: start of kernel_func_bar frame 30: return addr to kernel_func_bar 31: [ do trace stack here ] Then we do a save_stack_trace which returns the addresses of the functions it finds. Which would be (from the bottom of the stack to the top) return addr to kernel_func_bar return addr to sys_foo return addr to entry code What we do here is try to figure out how much stack each of theses functions have. So we loop through the stack looking for the addresses returned by the save_stack trace, and see where on the stack this is. This gives us: return addr to kernel_func_bar [ 30 ] return addr to sys_foo [ 20 ] return addr to entry frame [ 10 ] From this, we can conclude (on x86) that the size of the stack used for kernel_func_bar is 30 - 20 = 10. Because on the stack we have: 20: return addr to sys_foo 21: start of kernel_func_bar frame <<-- kernel_func_bar stack frame 30: return addr to kernel_func_bar Now, what Jiping reported, is that on arm64, it saves the link register (the return address) when it is needed to, which is after the stack frame for the current function has been saved. That means we have something that looks like this: on arm64: [ top of stack ] 0: sys call entry frame 10: start of sys_foo_frame 19: return addr to entry code << lr saved before calling kern_func_bar 20: start of kernel_func_bar frame 29: return addr to sys_foo_frame << lr saved before calling next function 30: [ do trace stack here ] Now, I have a question. To call the mcount code (ftrace and the stack tracing), you need to save the return address of kern_func_bar somewhere, otherwise the call to mcount will overwrite the lr. But let's say it does and then forgets it, so we have: 30: return addr of kernel_func_bar frame 31: [ do trace stack here ] Now save_stack_trace gives us the same result: return addr to kernel_func bar return addr to sys_foo return addr to entry frame But we get a different result when finding them in the location of the stack. return addr to kernel_func_bar [ 30 ] return addr to sys foo [ 29 ] return addr to entry frame [ 19 ] The simple subtractions will be off: kernel_func_bar stack size = 30 - 29 = 1 Or even, sys_foo 29 - 19 = 10, but if we look at the stack: 10: start of sys_foo_frame 19: return addr to entry_code 20: start of kernel_func_bar frame 29: return addr to sys_foo We are measuring the kernel_func_bar frame for sys_foo! We are off by one here. stack_trace_index[] is an array of the offsets mapping to the function return addresses found. If we shift it by one, then we then sync the functions found with their frames: stack_trace_index[0] = 30 stack_trace_index[1] = 29 stack_trace_index[2] = 19 memmove((&stack_trace_index[0], &stack_trace_index[1], sizeof(stack_trace_index[0]) * (x - 1)); Makes that: stack_trace_index[0] = 29 stack_trace_index[1] = 19 And we do x-- to lose the last frame. With the stack_dump_trace being: stack_dump_trace[0] = return addr kernel_func_bar stack_dump_trace[1] = return addr sys_foo we then match which frame size belongs to which function better. > > Also, this stack trace loop (original code) is a bit hairy :) It appears > there is a call to stack_trace_save() followed by another loop that goes > through the returned entries from there and tries to generate a set of > indexes. Isn't the real issue that the entries returned by stack_trace_save() > are a out of whack? I am curious also if other users of stack_trace_save() > will suffer from the same issue. No, the order is fine. The issue is that we are using the location of the return address in the stack to find out what function has the biggest stack usage, and our assumption for arm64 is incorrect in that location. -- Steve ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-06 16:34 ` Steven Rostedt 0 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-06 16:34 UTC (permalink / raw) To: Joel Fernandes Cc: Jiping Ma, catalin.marinas, will.deacon, linux-kernel, mingo, linux-arm-kernel On Tue, 6 Aug 2019 11:48:11 -0400 Joel Fernandes <joel@joelfernandes.org> wrote: > > diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h > > index 5ab5200b2bdc..13a4832cfb00 100644 > > --- a/arch/arm64/include/asm/ftrace.h > > +++ b/arch/arm64/include/asm/ftrace.h > > @@ -13,6 +13,7 @@ > > #define HAVE_FUNCTION_GRAPH_FP_TEST > > #define MCOUNT_ADDR ((unsigned long)_mcount) > > #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE > > +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 > > > > #ifndef __ASSEMBLY__ > > #include <linux/compat.h> > > diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c > > index 5d16f73898db..050c6bd9beac 100644 > > --- a/kernel/trace/trace_stack.c > > +++ b/kernel/trace/trace_stack.c > > @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) > > i++; > > } > > > > +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS > > + /* > > + * Most archs store the return address before storing the > > + * function's local variables. But some archs do this backwards. > > + */ > > + if (x > 1) { > > + memmove(&stack_trace_index[0], &stack_trace_index[1], > > + sizeof(stack_trace_index[0]) * (x - 1)); > > + x--; > > + } > > +#endif > > + > > stack_trace_nr_entries = x; > > > > if (task_stack_end_corrupted(current)) { > > > I am not fully understanding the fix :(. If the positions of the data and > FP/LR are swapped, then there should be a loop of some sort where the FP/LR > are copied repeatedly to undo the mess we are discussing. But in this patch > I see only one copy happening. May be I just don't understand this code well > enough. Are there any more clues for helping understand the fix? Here's the best way to explain this. The code is using the stack trace to figure out which function is the stack hog. Or perhaps a serious of stack hogs. On x86, a call stores the return address as it calls the next function. Then that function allocates its stack frame for its local variables and saving of registers. on x86: [ top of stack ] 0: sys call entry frame 10: return addr to entry code 11: start of sys_foo frame 20: return addr to sys_foo 21: start of kernel_func_bar frame 30: return addr to kernel_func_bar 31: [ do trace stack here ] Then we do a save_stack_trace which returns the addresses of the functions it finds. Which would be (from the bottom of the stack to the top) return addr to kernel_func_bar return addr to sys_foo return addr to entry code What we do here is try to figure out how much stack each of theses functions have. So we loop through the stack looking for the addresses returned by the save_stack trace, and see where on the stack this is. This gives us: return addr to kernel_func_bar [ 30 ] return addr to sys_foo [ 20 ] return addr to entry frame [ 10 ] From this, we can conclude (on x86) that the size of the stack used for kernel_func_bar is 30 - 20 = 10. Because on the stack we have: 20: return addr to sys_foo 21: start of kernel_func_bar frame <<-- kernel_func_bar stack frame 30: return addr to kernel_func_bar Now, what Jiping reported, is that on arm64, it saves the link register (the return address) when it is needed to, which is after the stack frame for the current function has been saved. That means we have something that looks like this: on arm64: [ top of stack ] 0: sys call entry frame 10: start of sys_foo_frame 19: return addr to entry code << lr saved before calling kern_func_bar 20: start of kernel_func_bar frame 29: return addr to sys_foo_frame << lr saved before calling next function 30: [ do trace stack here ] Now, I have a question. To call the mcount code (ftrace and the stack tracing), you need to save the return address of kern_func_bar somewhere, otherwise the call to mcount will overwrite the lr. But let's say it does and then forgets it, so we have: 30: return addr of kernel_func_bar frame 31: [ do trace stack here ] Now save_stack_trace gives us the same result: return addr to kernel_func bar return addr to sys_foo return addr to entry frame But we get a different result when finding them in the location of the stack. return addr to kernel_func_bar [ 30 ] return addr to sys foo [ 29 ] return addr to entry frame [ 19 ] The simple subtractions will be off: kernel_func_bar stack size = 30 - 29 = 1 Or even, sys_foo 29 - 19 = 10, but if we look at the stack: 10: start of sys_foo_frame 19: return addr to entry_code 20: start of kernel_func_bar frame 29: return addr to sys_foo We are measuring the kernel_func_bar frame for sys_foo! We are off by one here. stack_trace_index[] is an array of the offsets mapping to the function return addresses found. If we shift it by one, then we then sync the functions found with their frames: stack_trace_index[0] = 30 stack_trace_index[1] = 29 stack_trace_index[2] = 19 memmove((&stack_trace_index[0], &stack_trace_index[1], sizeof(stack_trace_index[0]) * (x - 1)); Makes that: stack_trace_index[0] = 29 stack_trace_index[1] = 19 And we do x-- to lose the last frame. With the stack_dump_trace being: stack_dump_trace[0] = return addr kernel_func_bar stack_dump_trace[1] = return addr sys_foo we then match which frame size belongs to which function better. > > Also, this stack trace loop (original code) is a bit hairy :) It appears > there is a call to stack_trace_save() followed by another loop that goes > through the returned entries from there and tries to generate a set of > indexes. Isn't the real issue that the entries returned by stack_trace_save() > are a out of whack? I am curious also if other users of stack_trace_save() > will suffer from the same issue. No, the order is fine. The issue is that we are using the location of the return address in the stack to find out what function has the biggest stack usage, and our assumption for arm64 is incorrect in that location. -- Steve _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-06 16:34 ` Steven Rostedt @ 2019-08-06 17:03 ` Steven Rostedt -1 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-06 17:03 UTC (permalink / raw) To: Joel Fernandes Cc: Jiping Ma, mingo, catalin.marinas, will.deacon, linux-kernel, linux-arm-kernel On Tue, 6 Aug 2019 12:34:55 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > Here's the best way to explain this. The code is using the stack trace > to figure out which function is the stack hog. Or perhaps a serious of Why so serious? .... s/serious/series/ -- Steve > stack hogs. On x86, a call stores the return address as it calls the > next function. Then that function allocates its stack frame for its > local variables and saving of registers. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-06 17:03 ` Steven Rostedt 0 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-06 17:03 UTC (permalink / raw) To: Joel Fernandes Cc: Jiping Ma, catalin.marinas, will.deacon, linux-kernel, mingo, linux-arm-kernel On Tue, 6 Aug 2019 12:34:55 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > Here's the best way to explain this. The code is using the stack trace > to figure out which function is the stack hog. Or perhaps a serious of Why so serious? .... s/serious/series/ -- Steve > stack hogs. On x86, a call stores the return address as it calls the > next function. Then that function allocates its stack frame for its > local variables and saving of registers. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-06 16:34 ` Steven Rostedt @ 2019-08-06 17:25 ` Joel Fernandes -1 siblings, 0 replies; 28+ messages in thread From: Joel Fernandes @ 2019-08-06 17:25 UTC (permalink / raw) To: Steven Rostedt Cc: Jiping Ma, mingo, catalin.marinas, will.deacon, linux-kernel, linux-arm-kernel On Tue, Aug 06, 2019 at 12:34:55PM -0400, Steven Rostedt wrote: > On Tue, 6 Aug 2019 11:48:11 -0400 > Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h > > > index 5ab5200b2bdc..13a4832cfb00 100644 > > > --- a/arch/arm64/include/asm/ftrace.h > > > +++ b/arch/arm64/include/asm/ftrace.h > > > @@ -13,6 +13,7 @@ > > > #define HAVE_FUNCTION_GRAPH_FP_TEST > > > #define MCOUNT_ADDR ((unsigned long)_mcount) > > > #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE > > > +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 > > > > > > #ifndef __ASSEMBLY__ > > > #include <linux/compat.h> > > > diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c > > > index 5d16f73898db..050c6bd9beac 100644 > > > --- a/kernel/trace/trace_stack.c > > > +++ b/kernel/trace/trace_stack.c > > > @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) > > > i++; > > > } > > > > > > +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS > > > + /* > > > + * Most archs store the return address before storing the > > > + * function's local variables. But some archs do this backwards. > > > + */ > > > + if (x > 1) { > > > + memmove(&stack_trace_index[0], &stack_trace_index[1], > > > + sizeof(stack_trace_index[0]) * (x - 1)); > > > + x--; > > > + } > > > +#endif > > > + > > > stack_trace_nr_entries = x; > > > > > > if (task_stack_end_corrupted(current)) { > > > > > > I am not fully understanding the fix :(. If the positions of the data and > > FP/LR are swapped, then there should be a loop of some sort where the FP/LR > > are copied repeatedly to undo the mess we are discussing. But in this patch > > I see only one copy happening. May be I just don't understand this code well > > enough. Are there any more clues for helping understand the fix? > > Here's the best way to explain this. The code is using the stack trace > to figure out which function is the stack hog. Or perhaps a serious of > stack hogs. On x86, a call stores the return address as it calls the > next function. Then that function allocates its stack frame for its > local variables and saving of registers. This makes perfect sense, (probably also makes sense to push this whole explanation into either the changelog or the kernel documentation) Thanks a lot, Steve! - Joel > on x86: > > [ top of stack ] > 0: sys call entry frame > 10: return addr to entry code > 11: start of sys_foo frame > 20: return addr to sys_foo > 21: start of kernel_func_bar frame > 30: return addr to kernel_func_bar > 31: [ do trace stack here ] > > > Then we do a save_stack_trace which returns the addresses of the > functions it finds. Which would be (from the bottom of the stack to the > top) > > return addr to kernel_func_bar > return addr to sys_foo > return addr to entry code > > What we do here is try to figure out how much stack each of theses > functions have. So we loop through the stack looking for the addresses > returned by the save_stack trace, and see where on the stack this is. > This gives us: > > return addr to kernel_func_bar [ 30 ] > return addr to sys_foo [ 20 ] > return addr to entry frame [ 10 ] > > From this, we can conclude (on x86) that the size of the stack used for > kernel_func_bar is 30 - 20 = 10. Because on the stack we have: > > 20: return addr to sys_foo > 21: start of kernel_func_bar frame <<-- kernel_func_bar stack frame > 30: return addr to kernel_func_bar > > > Now, what Jiping reported, is that on arm64, it saves the link register > (the return address) when it is needed to, which is after the stack > frame for the current function has been saved. That means we have > something that looks like this: > > on arm64: > > [ top of stack ] > 0: sys call entry frame > 10: start of sys_foo_frame > 19: return addr to entry code << lr saved before calling kern_func_bar > 20: start of kernel_func_bar frame > 29: return addr to sys_foo_frame << lr saved before calling next function > 30: [ do trace stack here ] > > Now, I have a question. To call the mcount code (ftrace and the stack > tracing), you need to save the return address of kern_func_bar > somewhere, otherwise the call to mcount will overwrite the lr. But > let's say it does and then forgets it, so we have: > > 30: return addr of kernel_func_bar frame > 31: [ do trace stack here ] > > Now save_stack_trace gives us the same result: > > return addr to kernel_func bar > return addr to sys_foo > return addr to entry frame > > But we get a different result when finding them in the location of the > stack. > > return addr to kernel_func_bar [ 30 ] > return addr to sys foo [ 29 ] > return addr to entry frame [ 19 ] > > The simple subtractions will be off: > > kernel_func_bar stack size = 30 - 29 = 1 > Or even, sys_foo 29 - 19 = 10, but if we look at the stack: > > 10: start of sys_foo_frame > 19: return addr to entry_code > 20: start of kernel_func_bar frame > 29: return addr to sys_foo > > We are measuring the kernel_func_bar frame for sys_foo! > > We are off by one here. > > stack_trace_index[] is an array of the offsets mapping to the function > return addresses found. If we shift it by one, then we then sync the > functions found with their frames: > > stack_trace_index[0] = 30 > stack_trace_index[1] = 29 > stack_trace_index[2] = 19 > > memmove((&stack_trace_index[0], &stack_trace_index[1], > sizeof(stack_trace_index[0]) * (x - 1)); > > Makes that: > > stack_trace_index[0] = 29 > stack_trace_index[1] = 19 > > And we do x-- to lose the last frame. > > With the stack_dump_trace being: > > stack_dump_trace[0] = return addr kernel_func_bar > stack_dump_trace[1] = return addr sys_foo > > we then match which frame size belongs to which function better. > > > > > > Also, this stack trace loop (original code) is a bit hairy :) It appears > > there is a call to stack_trace_save() followed by another loop that goes > > through the returned entries from there and tries to generate a set of > > indexes. Isn't the real issue that the entries returned by stack_trace_save() > > are a out of whack? I am curious also if other users of stack_trace_save() > > will suffer from the same issue. > > No, the order is fine. The issue is that we are using the location of > the return address in the stack to find out what function has the > biggest stack usage, and our assumption for arm64 is incorrect in that > location. > > -- Steve ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-06 17:25 ` Joel Fernandes 0 siblings, 0 replies; 28+ messages in thread From: Joel Fernandes @ 2019-08-06 17:25 UTC (permalink / raw) To: Steven Rostedt Cc: Jiping Ma, catalin.marinas, will.deacon, linux-kernel, mingo, linux-arm-kernel On Tue, Aug 06, 2019 at 12:34:55PM -0400, Steven Rostedt wrote: > On Tue, 6 Aug 2019 11:48:11 -0400 > Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h > > > index 5ab5200b2bdc..13a4832cfb00 100644 > > > --- a/arch/arm64/include/asm/ftrace.h > > > +++ b/arch/arm64/include/asm/ftrace.h > > > @@ -13,6 +13,7 @@ > > > #define HAVE_FUNCTION_GRAPH_FP_TEST > > > #define MCOUNT_ADDR ((unsigned long)_mcount) > > > #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE > > > +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 > > > > > > #ifndef __ASSEMBLY__ > > > #include <linux/compat.h> > > > diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c > > > index 5d16f73898db..050c6bd9beac 100644 > > > --- a/kernel/trace/trace_stack.c > > > +++ b/kernel/trace/trace_stack.c > > > @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) > > > i++; > > > } > > > > > > +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS > > > + /* > > > + * Most archs store the return address before storing the > > > + * function's local variables. But some archs do this backwards. > > > + */ > > > + if (x > 1) { > > > + memmove(&stack_trace_index[0], &stack_trace_index[1], > > > + sizeof(stack_trace_index[0]) * (x - 1)); > > > + x--; > > > + } > > > +#endif > > > + > > > stack_trace_nr_entries = x; > > > > > > if (task_stack_end_corrupted(current)) { > > > > > > I am not fully understanding the fix :(. If the positions of the data and > > FP/LR are swapped, then there should be a loop of some sort where the FP/LR > > are copied repeatedly to undo the mess we are discussing. But in this patch > > I see only one copy happening. May be I just don't understand this code well > > enough. Are there any more clues for helping understand the fix? > > Here's the best way to explain this. The code is using the stack trace > to figure out which function is the stack hog. Or perhaps a serious of > stack hogs. On x86, a call stores the return address as it calls the > next function. Then that function allocates its stack frame for its > local variables and saving of registers. This makes perfect sense, (probably also makes sense to push this whole explanation into either the changelog or the kernel documentation) Thanks a lot, Steve! - Joel > on x86: > > [ top of stack ] > 0: sys call entry frame > 10: return addr to entry code > 11: start of sys_foo frame > 20: return addr to sys_foo > 21: start of kernel_func_bar frame > 30: return addr to kernel_func_bar > 31: [ do trace stack here ] > > > Then we do a save_stack_trace which returns the addresses of the > functions it finds. Which would be (from the bottom of the stack to the > top) > > return addr to kernel_func_bar > return addr to sys_foo > return addr to entry code > > What we do here is try to figure out how much stack each of theses > functions have. So we loop through the stack looking for the addresses > returned by the save_stack trace, and see where on the stack this is. > This gives us: > > return addr to kernel_func_bar [ 30 ] > return addr to sys_foo [ 20 ] > return addr to entry frame [ 10 ] > > From this, we can conclude (on x86) that the size of the stack used for > kernel_func_bar is 30 - 20 = 10. Because on the stack we have: > > 20: return addr to sys_foo > 21: start of kernel_func_bar frame <<-- kernel_func_bar stack frame > 30: return addr to kernel_func_bar > > > Now, what Jiping reported, is that on arm64, it saves the link register > (the return address) when it is needed to, which is after the stack > frame for the current function has been saved. That means we have > something that looks like this: > > on arm64: > > [ top of stack ] > 0: sys call entry frame > 10: start of sys_foo_frame > 19: return addr to entry code << lr saved before calling kern_func_bar > 20: start of kernel_func_bar frame > 29: return addr to sys_foo_frame << lr saved before calling next function > 30: [ do trace stack here ] > > Now, I have a question. To call the mcount code (ftrace and the stack > tracing), you need to save the return address of kern_func_bar > somewhere, otherwise the call to mcount will overwrite the lr. But > let's say it does and then forgets it, so we have: > > 30: return addr of kernel_func_bar frame > 31: [ do trace stack here ] > > Now save_stack_trace gives us the same result: > > return addr to kernel_func bar > return addr to sys_foo > return addr to entry frame > > But we get a different result when finding them in the location of the > stack. > > return addr to kernel_func_bar [ 30 ] > return addr to sys foo [ 29 ] > return addr to entry frame [ 19 ] > > The simple subtractions will be off: > > kernel_func_bar stack size = 30 - 29 = 1 > Or even, sys_foo 29 - 19 = 10, but if we look at the stack: > > 10: start of sys_foo_frame > 19: return addr to entry_code > 20: start of kernel_func_bar frame > 29: return addr to sys_foo > > We are measuring the kernel_func_bar frame for sys_foo! > > We are off by one here. > > stack_trace_index[] is an array of the offsets mapping to the function > return addresses found. If we shift it by one, then we then sync the > functions found with their frames: > > stack_trace_index[0] = 30 > stack_trace_index[1] = 29 > stack_trace_index[2] = 19 > > memmove((&stack_trace_index[0], &stack_trace_index[1], > sizeof(stack_trace_index[0]) * (x - 1)); > > Makes that: > > stack_trace_index[0] = 29 > stack_trace_index[1] = 19 > > And we do x-- to lose the last frame. > > With the stack_dump_trace being: > > stack_dump_trace[0] = return addr kernel_func_bar > stack_dump_trace[1] = return addr sys_foo > > we then match which frame size belongs to which function better. > > > > > > Also, this stack trace loop (original code) is a bit hairy :) It appears > > there is a call to stack_trace_save() followed by another loop that goes > > through the returned entries from there and tries to generate a set of > > indexes. Isn't the real issue that the entries returned by stack_trace_save() > > are a out of whack? I am curious also if other users of stack_trace_save() > > will suffer from the same issue. > > No, the order is fine. The issue is that we are using the location of > the return address in the stack to find out what function has the > biggest stack usage, and our assumption for arm64 is incorrect in that > location. > > -- Steve _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-02 16:09 ` Steven Rostedt (?) (?) @ 2019-08-06 6:01 ` Jiping Ma -1 siblings, 0 replies; 28+ messages in thread From: Jiping Ma @ 2019-08-06 6:01 UTC (permalink / raw) To: Steven Rostedt Cc: catalin.marinas, will.deacon, linux-kernel, mingo, joel, linux-arm-kernel On 2019年08月03日 00:09, Steven Rostedt wrote: > On Fri, 2 Aug 2019 11:22:59 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > >> I think you are not explaining the issue correctly. From looking at the >> document, I think what you want to say is that the LR is saved *after* >> the data for the function. Is that correct? If so, then yes, it would >> cause the stack tracing algorithm to be incorrect. >> > [..] > >> Can someone confirm that this is the real issue? > Does this patch fix your issue? Yes, it does. -- Jiping > > -- Steve > > diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h > index 5ab5200b2bdc..13a4832cfb00 100644 > --- a/arch/arm64/include/asm/ftrace.h > +++ b/arch/arm64/include/asm/ftrace.h > @@ -13,6 +13,7 @@ > #define HAVE_FUNCTION_GRAPH_FP_TEST > #define MCOUNT_ADDR ((unsigned long)_mcount) > #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE > +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 > > #ifndef __ASSEMBLY__ > #include <linux/compat.h> > diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c > index 5d16f73898db..050c6bd9beac 100644 > --- a/kernel/trace/trace_stack.c > +++ b/kernel/trace/trace_stack.c > @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) > i++; > } > > +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS > + /* > + * Most archs store the return address before storing the > + * function's local variables. But some archs do this backwards. > + */ > + if (x > 1) { > + memmove(&stack_trace_index[0], &stack_trace_index[1], > + sizeof(stack_trace_index[0]) * (x - 1)); > + x--; > + } > +#endif > + > stack_trace_nr_entries = x; > > if (task_stack_end_corrupted(current)) { > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-02 15:22 ` Steven Rostedt @ 2019-08-03 8:26 ` Joel Fernandes -1 siblings, 0 replies; 28+ messages in thread From: Joel Fernandes @ 2019-08-03 8:26 UTC (permalink / raw) To: Steven Rostedt Cc: Jiping Ma, mingo, catalin.marinas, will.deacon, linux-kernel, linux-arm-kernel, kernel-team On Fri, Aug 02, 2019 at 11:22:59AM -0400, Steven Rostedt wrote: [snip] > > There is not PC in ARM64 stack, LR is used to for walk_stackframe in > > ARM64. Tere is no the issue in ARM32 because there is PC in ARM32 stack. > > PC is used to calculate the stack size in trace_stack.c, so the > > function name and its stack size appear to be off-by-one. > > ARM64 stack layout: > > LR > > FP > > ...... > > LR > > FP > > ...... > > I think you are not explaining the issue correctly. From looking at the > document, I think what you want to say is that the LR is saved *after* > the data for the function. Is that correct? If so, then yes, it would > cause the stack tracing algorithm to be incorrect. > > Most archs do this: > > On entry to a function: > > save return address > reserve local variables and such for current function > > I think you are saying that arm64 does this backwards. > > reserve local variables and such for current function > save return address (LR) Actually for arm64 it is like what you said about 'Most archs'. It saves FP and LR first onto the current stack frame, then assigns the top of the stack to FP (marking the new frame). Then executes branch-link, and then allocates space to variables on stack in the callee. Disassembly of prog: int foo(int x, int y) { int a[32]; return (x + y + a[31]); } int bar(void) { foo(1, 2); } confirms it: 000000000000073c <bar>: 73c: a9bf7bfd stp x29, x30, [sp, #-16]! <-- save FP and LR 740: 910003fd mov x29, sp <-- create new FP 744: 52800041 mov w1, #0x2 <-- pass arguments 748: 52800020 mov w0, #0x1 74c: 97fffff4 bl 71c <foo> <-- branch (sets LR) 750: d503201f nop 754: a8c17bfd ldp x29, x30, [sp], #16 <-- restore FP, LR 758: d65f03c0 ret 000000000000071c <foo>: 71c: d10243ff sub sp, sp, #0x90 <-- space for local var 720: b9000fe0 str w0, [sp, #12] 724: b9000be1 str w1, [sp, #8] 728: b9400fe1 ldr w1, [sp, #12] 72c: b9400be0 ldr w0, [sp, #8] 730: 0b000021 add w1, w1, w0 734: b9408fe0 ldr w0, [sp, #140] 738: 0b000020 add w0, w1, w0 73c: 910243ff add sp, sp, #0x90 <-- restore sp before ret 740: d65f03c0 ret thanks, - Joel > For normal archs, we usually have something like this: > > sys_call_entry: > [save regs and crap on stack ] > bl sys_finit_module > sys_call_entry+x: > [ here's the address put into the lr regsister ] > > sys_finit_module: > save lr on stack [syscall_entry+x] > save more stuff on stack > [..] > bl load_module > sys_finit_module+y: > > load_module: > save lr on stack [sys_finit_module+y] > save more stuff on stack > [..] > bl do_init_module > load_module+z: > > do_init_module: > save lr on stack [load_module+z] > [ let's say we do the stack trace here! ] > > On the stack we have: > > data for kernel entry > sys_call_entry+x [ lr saved in sys_finit_module ] > data for sys_finit_module > sys_finit_module+y [ lr saved in load_module ] > data for load_module > load_module+z [ lr saved in do_init_module > > When we call stack_trace_save() we get: > > sys_call_entry+x > sys_finit_module+y > load_module+z > > We then find those locations on the stack. Let's add some numbers here: > > 0: top of stack > data for kernel entry > 504: sys_call_entry+x > data for sys_finit_module > 856: sys_finit_module+y > data for sys_load_module > 904: load_module+z > 1048: BOTTOM OF STACK HERE > > To do the work, we start from the bottom of the stack, looking for what > was returned from the stack_trace_save > > start = 1048 > *p == load_module+z > stack size = 1048 - 904 > > Then we say load_module stack size is 144. > > What you are describing doesn't make any sense. Yes, we all add LR to > the stack, not the PC! But if the issue is that the LR is saved *after* > the function data, then that *will* make a difference. We then have: > > > sys_call_entry: > [save regs and crap on stack ] > bl sys_finit_module > sys_call_entry+x: > [ here's the address put into the lr regsister ] > > sys_finit_module: > save sys_finit_module stuff on stack > save lr on stack [syscall_entry+x] <-- reversed order from before! > [..] > bl load_module > sys_finit_module+y: > > load_module: > save load_module stuff on stack > save lr on stack [sys_finit_module+y] > [..] > bl do_init_module > load_module+z: > > do_init_module: > [ let's say we do the stack trace here! ] > > On the stack we have: > > data for sys_call_entry > data for sys_finit_module > sys_call_entry+x [ lr saved in sys_finit_module ] > data for load_module > sys_finit_module+y [ lr saved in load_module ] > data for do_init_module > load_module+z [ lr saved in do_init_module > > This also brings up the question of when the mcount is called? > > We get the same data from stack_trace_save(): > > sys_call_entry+x > sys_finit_module+y > load_module+z > > But this time, the stack is different! > > 0: top of stack > data from kernel entry > data for sys_call_entry > data for sys_finit_module > 504: sys_call_entry+x > data for sys_finit_module > 856: sys_finit_module+y > data for load_module > 904: load_module+z > data for do_init_module? > 1048: BOTTOM OF STACK HERE > > So yes, the same algorithm is will come up with the wrong result, > because... > > start = 1048 > *p == load_module+z > stack size = 1048 - 904 > > We just calculated the size of data for do_init_module, and not for > load_module! > > Can someone confirm that this is the real issue? > > -- Steve > > > > > > Wrong info: > > Depth Size Location (16 entries) > > ----- ---- -------- > > 0) 5400 16 __update_load_avg_se.isra.2+0x28/0x220 > > 1) 5384 96 put_prev_entity+0x250/0x338 > > 2) 5288 80 pick_next_task_fair+0x4c4/0x508 > > 3) 5208 72 __schedule+0x100/0x600 > > 4) 5136 184 preempt_schedule_common+0x28/0x48 > > 5) 4952 32 preempt_schedule+0x28/0x30 > > 6) 4920 16 vprintk_emit+0x170/0x1f8 > > 7) 4904 128 vprintk_default+0x48/0x58 > > 8) 4776 64 vprintk_func+0xf8/0x1c8 > > 9) 4712 112 printk+0x70/0x90 > > 10) 4600 176 occupy_stack_init+0x64/0xc0 [kernel_stack] > > 11) 4424 3376 do_one_initcall+0x68/0x248 > > 12) 1048 144 do_init_module+0x60/0x1f0 > > 13) 904 48 load_module+0x1d50/0x2340 > > 14) 856 352 sys_finit_module+0xd0/0xe8 > > 15) 504 504 el0_svc_naked+0x30/0x34 > > > > Correct info: > > Depth Size Location (18 entries) > > ----- ---- -------- > > 0) 5464 48 cgroup_rstat_updated+0x20/0x100 > > 1) 5416 32 cgroup_base_stat_cputime_account_end.isra.0+0x30/0x60 > > 2) 5384 32 __cgroup_account_cputime+0x3c/0x48 > > 3) 5352 64 update_curr+0xc4/0x1d0 > > 4) 5288 72 pick_next_task_fair+0x444/0x508 > > 5) 5216 184 __schedule+0x100/0x600 > > 6) 5032 32 preempt_schedule_common+0x28/0x48 > > 7) 5000 16 preempt_schedule+0x28/0x30 > > 8) 4984 128 vprintk_emit+0x170/0x1f8 > > 9) 4856 64 vprintk_default+0x48/0x58 > > 10) 4792 112 vprintk_func+0xf8/0x1c8 > > 11) 4680 176 printk+0x70/0x90 > > 12) 4504 80 func_test+0x7c/0xb8 [kernel_stack] > > 13) 4424 3376 occupy_stack_init+0x7c/0xc0 [kernel_stack] > > 14) 1048 144 do_one_initcall+0x68/0x248 > > 15) 904 48 do_init_module+0x60/0x1f0 > > 16) 856 352 load_module+0x1d50/0x2340 > > 17) 504 504 sys_finit_module+0xd0/0xe8 > > > > Signed-off-by: Jiping Ma <jiping.ma2@windriver.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-03 8:26 ` Joel Fernandes 0 siblings, 0 replies; 28+ messages in thread From: Joel Fernandes @ 2019-08-03 8:26 UTC (permalink / raw) To: Steven Rostedt Cc: Jiping Ma, catalin.marinas, will.deacon, linux-kernel, mingo, kernel-team, linux-arm-kernel On Fri, Aug 02, 2019 at 11:22:59AM -0400, Steven Rostedt wrote: [snip] > > There is not PC in ARM64 stack, LR is used to for walk_stackframe in > > ARM64. Tere is no the issue in ARM32 because there is PC in ARM32 stack. > > PC is used to calculate the stack size in trace_stack.c, so the > > function name and its stack size appear to be off-by-one. > > ARM64 stack layout: > > LR > > FP > > ...... > > LR > > FP > > ...... > > I think you are not explaining the issue correctly. From looking at the > document, I think what you want to say is that the LR is saved *after* > the data for the function. Is that correct? If so, then yes, it would > cause the stack tracing algorithm to be incorrect. > > Most archs do this: > > On entry to a function: > > save return address > reserve local variables and such for current function > > I think you are saying that arm64 does this backwards. > > reserve local variables and such for current function > save return address (LR) Actually for arm64 it is like what you said about 'Most archs'. It saves FP and LR first onto the current stack frame, then assigns the top of the stack to FP (marking the new frame). Then executes branch-link, and then allocates space to variables on stack in the callee. Disassembly of prog: int foo(int x, int y) { int a[32]; return (x + y + a[31]); } int bar(void) { foo(1, 2); } confirms it: 000000000000073c <bar>: 73c: a9bf7bfd stp x29, x30, [sp, #-16]! <-- save FP and LR 740: 910003fd mov x29, sp <-- create new FP 744: 52800041 mov w1, #0x2 <-- pass arguments 748: 52800020 mov w0, #0x1 74c: 97fffff4 bl 71c <foo> <-- branch (sets LR) 750: d503201f nop 754: a8c17bfd ldp x29, x30, [sp], #16 <-- restore FP, LR 758: d65f03c0 ret 000000000000071c <foo>: 71c: d10243ff sub sp, sp, #0x90 <-- space for local var 720: b9000fe0 str w0, [sp, #12] 724: b9000be1 str w1, [sp, #8] 728: b9400fe1 ldr w1, [sp, #12] 72c: b9400be0 ldr w0, [sp, #8] 730: 0b000021 add w1, w1, w0 734: b9408fe0 ldr w0, [sp, #140] 738: 0b000020 add w0, w1, w0 73c: 910243ff add sp, sp, #0x90 <-- restore sp before ret 740: d65f03c0 ret thanks, - Joel > For normal archs, we usually have something like this: > > sys_call_entry: > [save regs and crap on stack ] > bl sys_finit_module > sys_call_entry+x: > [ here's the address put into the lr regsister ] > > sys_finit_module: > save lr on stack [syscall_entry+x] > save more stuff on stack > [..] > bl load_module > sys_finit_module+y: > > load_module: > save lr on stack [sys_finit_module+y] > save more stuff on stack > [..] > bl do_init_module > load_module+z: > > do_init_module: > save lr on stack [load_module+z] > [ let's say we do the stack trace here! ] > > On the stack we have: > > data for kernel entry > sys_call_entry+x [ lr saved in sys_finit_module ] > data for sys_finit_module > sys_finit_module+y [ lr saved in load_module ] > data for load_module > load_module+z [ lr saved in do_init_module > > When we call stack_trace_save() we get: > > sys_call_entry+x > sys_finit_module+y > load_module+z > > We then find those locations on the stack. Let's add some numbers here: > > 0: top of stack > data for kernel entry > 504: sys_call_entry+x > data for sys_finit_module > 856: sys_finit_module+y > data for sys_load_module > 904: load_module+z > 1048: BOTTOM OF STACK HERE > > To do the work, we start from the bottom of the stack, looking for what > was returned from the stack_trace_save > > start = 1048 > *p == load_module+z > stack size = 1048 - 904 > > Then we say load_module stack size is 144. > > What you are describing doesn't make any sense. Yes, we all add LR to > the stack, not the PC! But if the issue is that the LR is saved *after* > the function data, then that *will* make a difference. We then have: > > > sys_call_entry: > [save regs and crap on stack ] > bl sys_finit_module > sys_call_entry+x: > [ here's the address put into the lr regsister ] > > sys_finit_module: > save sys_finit_module stuff on stack > save lr on stack [syscall_entry+x] <-- reversed order from before! > [..] > bl load_module > sys_finit_module+y: > > load_module: > save load_module stuff on stack > save lr on stack [sys_finit_module+y] > [..] > bl do_init_module > load_module+z: > > do_init_module: > [ let's say we do the stack trace here! ] > > On the stack we have: > > data for sys_call_entry > data for sys_finit_module > sys_call_entry+x [ lr saved in sys_finit_module ] > data for load_module > sys_finit_module+y [ lr saved in load_module ] > data for do_init_module > load_module+z [ lr saved in do_init_module > > This also brings up the question of when the mcount is called? > > We get the same data from stack_trace_save(): > > sys_call_entry+x > sys_finit_module+y > load_module+z > > But this time, the stack is different! > > 0: top of stack > data from kernel entry > data for sys_call_entry > data for sys_finit_module > 504: sys_call_entry+x > data for sys_finit_module > 856: sys_finit_module+y > data for load_module > 904: load_module+z > data for do_init_module? > 1048: BOTTOM OF STACK HERE > > So yes, the same algorithm is will come up with the wrong result, > because... > > start = 1048 > *p == load_module+z > stack size = 1048 - 904 > > We just calculated the size of data for do_init_module, and not for > load_module! > > Can someone confirm that this is the real issue? > > -- Steve > > > > > > Wrong info: > > Depth Size Location (16 entries) > > ----- ---- -------- > > 0) 5400 16 __update_load_avg_se.isra.2+0x28/0x220 > > 1) 5384 96 put_prev_entity+0x250/0x338 > > 2) 5288 80 pick_next_task_fair+0x4c4/0x508 > > 3) 5208 72 __schedule+0x100/0x600 > > 4) 5136 184 preempt_schedule_common+0x28/0x48 > > 5) 4952 32 preempt_schedule+0x28/0x30 > > 6) 4920 16 vprintk_emit+0x170/0x1f8 > > 7) 4904 128 vprintk_default+0x48/0x58 > > 8) 4776 64 vprintk_func+0xf8/0x1c8 > > 9) 4712 112 printk+0x70/0x90 > > 10) 4600 176 occupy_stack_init+0x64/0xc0 [kernel_stack] > > 11) 4424 3376 do_one_initcall+0x68/0x248 > > 12) 1048 144 do_init_module+0x60/0x1f0 > > 13) 904 48 load_module+0x1d50/0x2340 > > 14) 856 352 sys_finit_module+0xd0/0xe8 > > 15) 504 504 el0_svc_naked+0x30/0x34 > > > > Correct info: > > Depth Size Location (18 entries) > > ----- ---- -------- > > 0) 5464 48 cgroup_rstat_updated+0x20/0x100 > > 1) 5416 32 cgroup_base_stat_cputime_account_end.isra.0+0x30/0x60 > > 2) 5384 32 __cgroup_account_cputime+0x3c/0x48 > > 3) 5352 64 update_curr+0xc4/0x1d0 > > 4) 5288 72 pick_next_task_fair+0x444/0x508 > > 5) 5216 184 __schedule+0x100/0x600 > > 6) 5032 32 preempt_schedule_common+0x28/0x48 > > 7) 5000 16 preempt_schedule+0x28/0x30 > > 8) 4984 128 vprintk_emit+0x170/0x1f8 > > 9) 4856 64 vprintk_default+0x48/0x58 > > 10) 4792 112 vprintk_func+0xf8/0x1c8 > > 11) 4680 176 printk+0x70/0x90 > > 12) 4504 80 func_test+0x7c/0xb8 [kernel_stack] > > 13) 4424 3376 occupy_stack_init+0x7c/0xc0 [kernel_stack] > > 14) 1048 144 do_one_initcall+0x68/0x248 > > 15) 904 48 do_init_module+0x60/0x1f0 > > 16) 856 352 load_module+0x1d50/0x2340 > > 17) 504 504 sys_finit_module+0xd0/0xe8 > > > > Signed-off-by: Jiping Ma <jiping.ma2@windriver.com> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-03 8:26 ` Joel Fernandes @ 2019-08-03 8:32 ` Joel Fernandes -1 siblings, 0 replies; 28+ messages in thread From: Joel Fernandes @ 2019-08-03 8:32 UTC (permalink / raw) To: Steven Rostedt Cc: Jiping Ma, Ingo Molnar, Catalin Marinas, Will Deacon, LKML, moderated list:ARM/STM32 ARCHITECTURE, kernel-team On Sat, Aug 3, 2019 at 4:26 AM Joel Fernandes <joel@joelfernandes.org> wrote: > > On Fri, Aug 02, 2019 at 11:22:59AM -0400, Steven Rostedt wrote: > [snip] > > > There is not PC in ARM64 stack, LR is used to for walk_stackframe in > > > ARM64. Tere is no the issue in ARM32 because there is PC in ARM32 stack. > > > PC is used to calculate the stack size in trace_stack.c, so the > > > function name and its stack size appear to be off-by-one. > > > ARM64 stack layout: > > > LR > > > FP > > > ...... > > > LR > > > FP > > > ...... > > > > I think you are not explaining the issue correctly. From looking at the > > document, I think what you want to say is that the LR is saved *after* > > the data for the function. Is that correct? If so, then yes, it would > > cause the stack tracing algorithm to be incorrect. > > > > Most archs do this: > > > > On entry to a function: > > > > save return address > > reserve local variables and such for current function > > > > I think you are saying that arm64 does this backwards. > > > > reserve local variables and such for current function > > save return address (LR) > > Actually for arm64 it is like what you said about 'Most archs'. It saves FP > and LR first onto the current stack frame, then assigns the top of the stack > to FP (marking the new frame). Then executes branch-link, and then allocates > space to variables on stack in the callee. Just to add to that, when the branch-link (BL) instruction is executed, the Link Register (LR) will contain the return address. This why the existing LR needs to be saved first before the call. The existing LR will contain the return address of the caller's original caller. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-03 8:32 ` Joel Fernandes 0 siblings, 0 replies; 28+ messages in thread From: Joel Fernandes @ 2019-08-03 8:32 UTC (permalink / raw) To: Steven Rostedt Cc: Jiping Ma, Catalin Marinas, Will Deacon, LKML, Ingo Molnar, kernel-team, moderated list:ARM/STM32 ARCHITECTURE On Sat, Aug 3, 2019 at 4:26 AM Joel Fernandes <joel@joelfernandes.org> wrote: > > On Fri, Aug 02, 2019 at 11:22:59AM -0400, Steven Rostedt wrote: > [snip] > > > There is not PC in ARM64 stack, LR is used to for walk_stackframe in > > > ARM64. Tere is no the issue in ARM32 because there is PC in ARM32 stack. > > > PC is used to calculate the stack size in trace_stack.c, so the > > > function name and its stack size appear to be off-by-one. > > > ARM64 stack layout: > > > LR > > > FP > > > ...... > > > LR > > > FP > > > ...... > > > > I think you are not explaining the issue correctly. From looking at the > > document, I think what you want to say is that the LR is saved *after* > > the data for the function. Is that correct? If so, then yes, it would > > cause the stack tracing algorithm to be incorrect. > > > > Most archs do this: > > > > On entry to a function: > > > > save return address > > reserve local variables and such for current function > > > > I think you are saying that arm64 does this backwards. > > > > reserve local variables and such for current function > > save return address (LR) > > Actually for arm64 it is like what you said about 'Most archs'. It saves FP > and LR first onto the current stack frame, then assigns the top of the stack > to FP (marking the new frame). Then executes branch-link, and then allocates > space to variables on stack in the callee. Just to add to that, when the branch-link (BL) instruction is executed, the Link Register (LR) will contain the return address. This why the existing LR needs to be saved first before the call. The existing LR will contain the return address of the caller's original caller. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-03 8:26 ` Joel Fernandes @ 2019-08-05 11:25 ` Will Deacon -1 siblings, 0 replies; 28+ messages in thread From: Will Deacon @ 2019-08-05 11:25 UTC (permalink / raw) To: Joel Fernandes Cc: Steven Rostedt, Jiping Ma, catalin.marinas, will.deacon, linux-kernel, mingo, kernel-team, linux-arm-kernel, takahiro.akashi [+Akashi, since he may remember more of the gory details here] On Sat, Aug 03, 2019 at 04:26:42AM -0400, Joel Fernandes wrote: > On Fri, Aug 02, 2019 at 11:22:59AM -0400, Steven Rostedt wrote: > [snip] > > > There is not PC in ARM64 stack, LR is used to for walk_stackframe in > > > ARM64. Tere is no the issue in ARM32 because there is PC in ARM32 stack. > > > PC is used to calculate the stack size in trace_stack.c, so the > > > function name and its stack size appear to be off-by-one. > > > ARM64 stack layout: > > > LR > > > FP > > > ...... > > > LR > > > FP > > > ...... > > > > I think you are not explaining the issue correctly. From looking at the > > document, I think what you want to say is that the LR is saved *after* > > the data for the function. Is that correct? If so, then yes, it would > > cause the stack tracing algorithm to be incorrect. > > > > Most archs do this: > > > > On entry to a function: > > > > save return address > > reserve local variables and such for current function > > > > I think you are saying that arm64 does this backwards. > > > > reserve local variables and such for current function > > save return address (LR) > > Actually for arm64 it is like what you said about 'Most archs'. It saves FP > and LR first onto the current stack frame, then assigns the top of the stack > to FP (marking the new frame). Then executes branch-link, and then allocates > space to variables on stack in the callee. > > Disassembly of prog: > > int foo(int x, int y) { > int a[32]; > return (x + y + a[31]); > } > > int bar(void) { > foo(1, 2); > } > > confirms it: > > 000000000000073c <bar>: > 73c: a9bf7bfd stp x29, x30, [sp, #-16]! <-- save FP and LR > 740: 910003fd mov x29, sp <-- create new FP > 744: 52800041 mov w1, #0x2 <-- pass arguments > 748: 52800020 mov w0, #0x1 > 74c: 97fffff4 bl 71c <foo> <-- branch (sets LR) > 750: d503201f nop > 754: a8c17bfd ldp x29, x30, [sp], #16 <-- restore FP, LR > 758: d65f03c0 ret > > 000000000000071c <foo>: > 71c: d10243ff sub sp, sp, #0x90 <-- space for local var > 720: b9000fe0 str w0, [sp, #12] > 724: b9000be1 str w1, [sp, #8] > 728: b9400fe1 ldr w1, [sp, #12] > 72c: b9400be0 ldr w0, [sp, #8] > 730: 0b000021 add w1, w1, w0 > 734: b9408fe0 ldr w0, [sp, #140] > 738: 0b000020 add w0, w1, w0 > 73c: 910243ff add sp, sp, #0x90 <-- restore sp before ret > 740: d65f03c0 ret I think we need to untangle things a bit here. The arm64 PCS makes no guarantee about the position of the frame record with respect to stack allocations, so relying on this is fragile at best. This is partly why the ftrace-with-regs work currently relies on -fpatchable-function-entry, since that allows the very beginning of the function to be intercepted which I don't think is necessarily the case with -pg/_mcount. For the snippet above, foo is a leaf function and does not have a frame record placed on the stack. If we instead look at something like: __attribute__((noinline)) int baz(int x) { return x; } __attribute__((noinline)) int bar(int x) { int a[32]; return baz(x * a[0]); } int foo(int x) { return bar(x); } then the first instruction of bar() allocates stack space and pushes the frame record at the same time: stp x29, x30, [sp, -144]! This can be read as "subtract 144 bytes (32*4 + 16) from the stack pointer, write the frame record there and then update the stack pointer to point at the bottom of the newly allocated stack", which means that the array 'a[32]' sits directly /above/ the frame record on the stack. However, this is just what my GCC happened to do today. When we looked at this back in 2015, there were other cases we ended up having to identify with heuristics based on what had been observed under various compilers: http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393721.html This was deemed to be far too fragile, so we didn't merge it in the end. If this is to work reliably, then we need support from the tools. This was raised when we first merged support for ftrace, but I'm not sure it went anywhere: https://gcc.gnu.org/ml/gcc/2016-01/msg00035.html So, I completely agree with Steve that we shouldn't be littering the core code with #ifdef CONFIG_ARM64, but we probably do need something in the arch backend if we're going to do this properly, and that in turn is likely to need a very different algorithm from the one currently in use. Will ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-05 11:25 ` Will Deacon 0 siblings, 0 replies; 28+ messages in thread From: Will Deacon @ 2019-08-05 11:25 UTC (permalink / raw) To: Joel Fernandes Cc: Jiping Ma, catalin.marinas, will.deacon, linux-kernel, Steven Rostedt, takahiro.akashi, mingo, kernel-team, linux-arm-kernel [+Akashi, since he may remember more of the gory details here] On Sat, Aug 03, 2019 at 04:26:42AM -0400, Joel Fernandes wrote: > On Fri, Aug 02, 2019 at 11:22:59AM -0400, Steven Rostedt wrote: > [snip] > > > There is not PC in ARM64 stack, LR is used to for walk_stackframe in > > > ARM64. Tere is no the issue in ARM32 because there is PC in ARM32 stack. > > > PC is used to calculate the stack size in trace_stack.c, so the > > > function name and its stack size appear to be off-by-one. > > > ARM64 stack layout: > > > LR > > > FP > > > ...... > > > LR > > > FP > > > ...... > > > > I think you are not explaining the issue correctly. From looking at the > > document, I think what you want to say is that the LR is saved *after* > > the data for the function. Is that correct? If so, then yes, it would > > cause the stack tracing algorithm to be incorrect. > > > > Most archs do this: > > > > On entry to a function: > > > > save return address > > reserve local variables and such for current function > > > > I think you are saying that arm64 does this backwards. > > > > reserve local variables and such for current function > > save return address (LR) > > Actually for arm64 it is like what you said about 'Most archs'. It saves FP > and LR first onto the current stack frame, then assigns the top of the stack > to FP (marking the new frame). Then executes branch-link, and then allocates > space to variables on stack in the callee. > > Disassembly of prog: > > int foo(int x, int y) { > int a[32]; > return (x + y + a[31]); > } > > int bar(void) { > foo(1, 2); > } > > confirms it: > > 000000000000073c <bar>: > 73c: a9bf7bfd stp x29, x30, [sp, #-16]! <-- save FP and LR > 740: 910003fd mov x29, sp <-- create new FP > 744: 52800041 mov w1, #0x2 <-- pass arguments > 748: 52800020 mov w0, #0x1 > 74c: 97fffff4 bl 71c <foo> <-- branch (sets LR) > 750: d503201f nop > 754: a8c17bfd ldp x29, x30, [sp], #16 <-- restore FP, LR > 758: d65f03c0 ret > > 000000000000071c <foo>: > 71c: d10243ff sub sp, sp, #0x90 <-- space for local var > 720: b9000fe0 str w0, [sp, #12] > 724: b9000be1 str w1, [sp, #8] > 728: b9400fe1 ldr w1, [sp, #12] > 72c: b9400be0 ldr w0, [sp, #8] > 730: 0b000021 add w1, w1, w0 > 734: b9408fe0 ldr w0, [sp, #140] > 738: 0b000020 add w0, w1, w0 > 73c: 910243ff add sp, sp, #0x90 <-- restore sp before ret > 740: d65f03c0 ret I think we need to untangle things a bit here. The arm64 PCS makes no guarantee about the position of the frame record with respect to stack allocations, so relying on this is fragile at best. This is partly why the ftrace-with-regs work currently relies on -fpatchable-function-entry, since that allows the very beginning of the function to be intercepted which I don't think is necessarily the case with -pg/_mcount. For the snippet above, foo is a leaf function and does not have a frame record placed on the stack. If we instead look at something like: __attribute__((noinline)) int baz(int x) { return x; } __attribute__((noinline)) int bar(int x) { int a[32]; return baz(x * a[0]); } int foo(int x) { return bar(x); } then the first instruction of bar() allocates stack space and pushes the frame record at the same time: stp x29, x30, [sp, -144]! This can be read as "subtract 144 bytes (32*4 + 16) from the stack pointer, write the frame record there and then update the stack pointer to point at the bottom of the newly allocated stack", which means that the array 'a[32]' sits directly /above/ the frame record on the stack. However, this is just what my GCC happened to do today. When we looked at this back in 2015, there were other cases we ended up having to identify with heuristics based on what had been observed under various compilers: http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393721.html This was deemed to be far too fragile, so we didn't merge it in the end. If this is to work reliably, then we need support from the tools. This was raised when we first merged support for ftrace, but I'm not sure it went anywhere: https://gcc.gnu.org/ml/gcc/2016-01/msg00035.html So, I completely agree with Steve that we shouldn't be littering the core code with #ifdef CONFIG_ARM64, but we probably do need something in the arch backend if we're going to do this properly, and that in turn is likely to need a very different algorithm from the one currently in use. Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-05 11:25 ` Will Deacon @ 2019-08-05 13:59 ` Steven Rostedt -1 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-05 13:59 UTC (permalink / raw) To: Will Deacon Cc: Joel Fernandes, Jiping Ma, catalin.marinas, will.deacon, linux-kernel, mingo, kernel-team, linux-arm-kernel, takahiro.akashi On Mon, 5 Aug 2019 12:25:25 +0100 Will Deacon <will@kernel.org> wrote: > So, I completely agree with Steve that we shouldn't be littering the core code > with #ifdef CONFIG_ARM64, but we probably do need something in the arch backend > if we're going to do this properly, and that in turn is likely to need a very > different algorithm from the one currently in use. And I'm perfectly fine if someone wants to tear up the current implementation, and make one that has callbacks to arch code (with defaults for those that don't need it). But it has to be clean. We worked hard to get rid of most "#ifdef" being scattered in the kernel, and I'm not bringing it back in my code. -- Steve ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-05 13:59 ` Steven Rostedt 0 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-05 13:59 UTC (permalink / raw) To: Will Deacon Cc: Jiping Ma, catalin.marinas, will.deacon, linux-kernel, takahiro.akashi, mingo, Joel Fernandes, kernel-team, linux-arm-kernel On Mon, 5 Aug 2019 12:25:25 +0100 Will Deacon <will@kernel.org> wrote: > So, I completely agree with Steve that we shouldn't be littering the core code > with #ifdef CONFIG_ARM64, but we probably do need something in the arch backend > if we're going to do this properly, and that in turn is likely to need a very > different algorithm from the one currently in use. And I'm perfectly fine if someone wants to tear up the current implementation, and make one that has callbacks to arch code (with defaults for those that don't need it). But it has to be clean. We worked hard to get rid of most "#ifdef" being scattered in the kernel, and I'm not bringing it back in my code. -- Steve _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-05 11:25 ` Will Deacon @ 2019-08-06 13:00 ` Steven Rostedt -1 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-06 13:00 UTC (permalink / raw) To: Will Deacon Cc: Joel Fernandes, Jiping Ma, catalin.marinas, will.deacon, linux-kernel, mingo, kernel-team, linux-arm-kernel, takahiro.akashi On Mon, 5 Aug 2019 12:25:25 +0100 Will Deacon <will@kernel.org> wrote: > This can be read as "subtract 144 bytes (32*4 + 16) from the stack pointer, > write the frame record there and then update the stack pointer to point at the > bottom of the newly allocated stack", which means that the array 'a[32]' sits > directly /above/ the frame record on the stack. However, this is just what my > GCC happened to do today. When we looked at this back in 2015, there were other > cases we ended up having to identify with heuristics based on what had been > observed under various compilers: > > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393721.html That's a bit more involved than what I came up with. > > This was deemed to be far too fragile, so we didn't merge it in the end. > > If this is to work reliably, then we need support from the tools. This was > raised when we first merged support for ftrace, but I'm not sure it went > anywhere: > > https://gcc.gnu.org/ml/gcc/2016-01/msg00035.html > > So, I completely agree with Steve that we shouldn't be littering the core code > with #ifdef CONFIG_ARM64, but we probably do need something in the arch backend > if we're going to do this properly, and that in turn is likely to need a very > different algorithm from the one currently in use. So basically it seems that on arm64, gcc only saves the lr when needed (leaf functions don't need it). And I can even see that if you have several paths that don't call other functions, why save the lr? It does seem that doing the slight change makes the current code more accurate without need for any heuristics. Here's my patch again, slightly tweaked and Jiping said it solved the issue for him. Are you OK with this change? -- Steve diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index 5ab5200b2bdc..13a4832cfb00 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -13,6 +13,7 @@ #define HAVE_FUNCTION_GRAPH_FP_TEST #define MCOUNT_ADDR ((unsigned long)_mcount) #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 #ifndef __ASSEMBLY__ #include <linux/compat.h> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index 5d16f73898db..40e4a88eea8f 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -158,6 +158,20 @@ static void check_stack(unsigned long ip, unsigned long *stack) i++; } +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS + /* + * Some archs will store the link register before calling + * nested functions. This means the saved return address + * comes after the local storage, and we need to shift + * for that. + */ + if (x > 1) { + memmove(&stack_trace_index[0], &stack_trace_index[1], + sizeof(stack_trace_index[0]) * (x - 1)); + x--; + } +#endif + stack_trace_nr_entries = x; if (task_stack_end_corrupted(current)) { ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-06 13:00 ` Steven Rostedt 0 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2019-08-06 13:00 UTC (permalink / raw) To: Will Deacon Cc: Jiping Ma, catalin.marinas, will.deacon, linux-kernel, takahiro.akashi, mingo, Joel Fernandes, kernel-team, linux-arm-kernel On Mon, 5 Aug 2019 12:25:25 +0100 Will Deacon <will@kernel.org> wrote: > This can be read as "subtract 144 bytes (32*4 + 16) from the stack pointer, > write the frame record there and then update the stack pointer to point at the > bottom of the newly allocated stack", which means that the array 'a[32]' sits > directly /above/ the frame record on the stack. However, this is just what my > GCC happened to do today. When we looked at this back in 2015, there were other > cases we ended up having to identify with heuristics based on what had been > observed under various compilers: > > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393721.html That's a bit more involved than what I came up with. > > This was deemed to be far too fragile, so we didn't merge it in the end. > > If this is to work reliably, then we need support from the tools. This was > raised when we first merged support for ftrace, but I'm not sure it went > anywhere: > > https://gcc.gnu.org/ml/gcc/2016-01/msg00035.html > > So, I completely agree with Steve that we shouldn't be littering the core code > with #ifdef CONFIG_ARM64, but we probably do need something in the arch backend > if we're going to do this properly, and that in turn is likely to need a very > different algorithm from the one currently in use. So basically it seems that on arm64, gcc only saves the lr when needed (leaf functions don't need it). And I can even see that if you have several paths that don't call other functions, why save the lr? It does seem that doing the slight change makes the current code more accurate without need for any heuristics. Here's my patch again, slightly tweaked and Jiping said it solved the issue for him. Are you OK with this change? -- Steve diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index 5ab5200b2bdc..13a4832cfb00 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -13,6 +13,7 @@ #define HAVE_FUNCTION_GRAPH_FP_TEST #define MCOUNT_ADDR ((unsigned long)_mcount) #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 #ifndef __ASSEMBLY__ #include <linux/compat.h> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index 5d16f73898db..40e4a88eea8f 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -158,6 +158,20 @@ static void check_stack(unsigned long ip, unsigned long *stack) i++; } +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS + /* + * Some archs will store the link register before calling + * nested functions. This means the saved return address + * comes after the local storage, and we need to shift + * for that. + */ + if (x > 1) { + memmove(&stack_trace_index[0], &stack_trace_index[1], + sizeof(stack_trace_index[0]) * (x - 1)); + x--; + } +#endif + stack_trace_nr_entries = x; if (task_stack_end_corrupted(current)) { _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 2019-08-05 11:25 ` Will Deacon @ 2019-08-06 14:47 ` Joel Fernandes -1 siblings, 0 replies; 28+ messages in thread From: Joel Fernandes @ 2019-08-06 14:47 UTC (permalink / raw) To: Will Deacon Cc: Steven Rostedt, Jiping Ma, catalin.marinas, will.deacon, linux-kernel, mingo, kernel-team, linux-arm-kernel, takahiro.akashi On Mon, Aug 05, 2019 at 12:25:25PM +0100, Will Deacon wrote: > [+Akashi, since he may remember more of the gory details here] > > On Sat, Aug 03, 2019 at 04:26:42AM -0400, Joel Fernandes wrote: > > On Fri, Aug 02, 2019 at 11:22:59AM -0400, Steven Rostedt wrote: > > [snip] > > > > There is not PC in ARM64 stack, LR is used to for walk_stackframe in > > > > ARM64. Tere is no the issue in ARM32 because there is PC in ARM32 stack. > > > > PC is used to calculate the stack size in trace_stack.c, so the > > > > function name and its stack size appear to be off-by-one. > > > > ARM64 stack layout: > > > > LR > > > > FP > > > > ...... > > > > LR > > > > FP > > > > ...... > > > > > > I think you are not explaining the issue correctly. From looking at the > > > document, I think what you want to say is that the LR is saved *after* > > > the data for the function. Is that correct? If so, then yes, it would > > > cause the stack tracing algorithm to be incorrect. > > > > > > Most archs do this: > > > > > > On entry to a function: > > > > > > save return address > > > reserve local variables and such for current function > > > > > > I think you are saying that arm64 does this backwards. > > > > > > reserve local variables and such for current function > > > save return address (LR) > > > > Actually for arm64 it is like what you said about 'Most archs'. It saves FP > > and LR first onto the current stack frame, then assigns the top of the stack > > to FP (marking the new frame). Then executes branch-link, and then allocates > > space to variables on stack in the callee. > > > > Disassembly of prog: > > > > int foo(int x, int y) { > > int a[32]; > > return (x + y + a[31]); > > } > > > > int bar(void) { > > foo(1, 2); > > } > > > > confirms it: > > > > 000000000000073c <bar>: > > 73c: a9bf7bfd stp x29, x30, [sp, #-16]! <-- save FP and LR > > 740: 910003fd mov x29, sp <-- create new FP > > 744: 52800041 mov w1, #0x2 <-- pass arguments > > 748: 52800020 mov w0, #0x1 > > 74c: 97fffff4 bl 71c <foo> <-- branch (sets LR) > > 750: d503201f nop > > 754: a8c17bfd ldp x29, x30, [sp], #16 <-- restore FP, LR > > 758: d65f03c0 ret > > > > 000000000000071c <foo>: > > 71c: d10243ff sub sp, sp, #0x90 <-- space for local var > > 720: b9000fe0 str w0, [sp, #12] > > 724: b9000be1 str w1, [sp, #8] > > 728: b9400fe1 ldr w1, [sp, #12] > > 72c: b9400be0 ldr w0, [sp, #8] > > 730: 0b000021 add w1, w1, w0 > > 734: b9408fe0 ldr w0, [sp, #140] > > 738: 0b000020 add w0, w1, w0 > > 73c: 910243ff add sp, sp, #0x90 <-- restore sp before ret > > 740: d65f03c0 ret > > I think we need to untangle things a bit here. > > The arm64 PCS makes no guarantee about the position of the frame record with > respect to stack allocations, so relying on this is fragile at best. This is > partly why the ftrace-with-regs work currently relies on > -fpatchable-function-entry, since that allows the very beginning of the > function to be intercepted which I don't think is necessarily the case with > -pg/_mcount. > > For the snippet above, foo is a leaf function and does not have a frame record > placed on the stack. If we instead look at something like: > > __attribute__((noinline)) int baz(int x) > { > return x; > } > > __attribute__((noinline)) int bar(int x) > { > int a[32]; > return baz(x * a[0]); > } > > int foo(int x) > { > return bar(x); > } > > then the first instruction of bar() allocates stack space and pushes > the frame record at the same time: > > stp x29, x30, [sp, -144]! > > This can be read as "subtract 144 bytes (32*4 + 16) from the stack pointer, > write the frame record there and then update the stack pointer to point at the > bottom of the newly allocated stack", which means that the array 'a[32]' sits > directly /above/ the frame record on the stack. However, this is just what my > GCC happened to do today. When we looked at this back in 2015, there were other > cases we ended up having to identify with heuristics based on what had been > observed under various compilers: Right point taken, I was looking at bar()'s stack frame instead of foo()'s. foo() did not have to save any frame record, so probably my example was confusing. My bad. So indeed, it seems an odd ball case. > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393721.html > > This was deemed to be far too fragile, so we didn't merge it in the end. > > If this is to work reliably, then we need support from the tools. This was > raised when we first merged support for ftrace, but I'm not sure it went > anywhere: > > https://gcc.gnu.org/ml/gcc/2016-01/msg00035.html > So, I completely agree with Steve that we shouldn't be littering the core code > with #ifdef CONFIG_ARM64, but we probably do need something in the arch backend > if we're going to do this properly, and that in turn is likely to need a very > different algorithm from the one currently in use. Yeah, I am in line with that thought as well. I am reviewing the latest patch from Steve now to understand it better. thanks, - Joel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 @ 2019-08-06 14:47 ` Joel Fernandes 0 siblings, 0 replies; 28+ messages in thread From: Joel Fernandes @ 2019-08-06 14:47 UTC (permalink / raw) To: Will Deacon Cc: Jiping Ma, catalin.marinas, will.deacon, linux-kernel, Steven Rostedt, takahiro.akashi, mingo, kernel-team, linux-arm-kernel On Mon, Aug 05, 2019 at 12:25:25PM +0100, Will Deacon wrote: > [+Akashi, since he may remember more of the gory details here] > > On Sat, Aug 03, 2019 at 04:26:42AM -0400, Joel Fernandes wrote: > > On Fri, Aug 02, 2019 at 11:22:59AM -0400, Steven Rostedt wrote: > > [snip] > > > > There is not PC in ARM64 stack, LR is used to for walk_stackframe in > > > > ARM64. Tere is no the issue in ARM32 because there is PC in ARM32 stack. > > > > PC is used to calculate the stack size in trace_stack.c, so the > > > > function name and its stack size appear to be off-by-one. > > > > ARM64 stack layout: > > > > LR > > > > FP > > > > ...... > > > > LR > > > > FP > > > > ...... > > > > > > I think you are not explaining the issue correctly. From looking at the > > > document, I think what you want to say is that the LR is saved *after* > > > the data for the function. Is that correct? If so, then yes, it would > > > cause the stack tracing algorithm to be incorrect. > > > > > > Most archs do this: > > > > > > On entry to a function: > > > > > > save return address > > > reserve local variables and such for current function > > > > > > I think you are saying that arm64 does this backwards. > > > > > > reserve local variables and such for current function > > > save return address (LR) > > > > Actually for arm64 it is like what you said about 'Most archs'. It saves FP > > and LR first onto the current stack frame, then assigns the top of the stack > > to FP (marking the new frame). Then executes branch-link, and then allocates > > space to variables on stack in the callee. > > > > Disassembly of prog: > > > > int foo(int x, int y) { > > int a[32]; > > return (x + y + a[31]); > > } > > > > int bar(void) { > > foo(1, 2); > > } > > > > confirms it: > > > > 000000000000073c <bar>: > > 73c: a9bf7bfd stp x29, x30, [sp, #-16]! <-- save FP and LR > > 740: 910003fd mov x29, sp <-- create new FP > > 744: 52800041 mov w1, #0x2 <-- pass arguments > > 748: 52800020 mov w0, #0x1 > > 74c: 97fffff4 bl 71c <foo> <-- branch (sets LR) > > 750: d503201f nop > > 754: a8c17bfd ldp x29, x30, [sp], #16 <-- restore FP, LR > > 758: d65f03c0 ret > > > > 000000000000071c <foo>: > > 71c: d10243ff sub sp, sp, #0x90 <-- space for local var > > 720: b9000fe0 str w0, [sp, #12] > > 724: b9000be1 str w1, [sp, #8] > > 728: b9400fe1 ldr w1, [sp, #12] > > 72c: b9400be0 ldr w0, [sp, #8] > > 730: 0b000021 add w1, w1, w0 > > 734: b9408fe0 ldr w0, [sp, #140] > > 738: 0b000020 add w0, w1, w0 > > 73c: 910243ff add sp, sp, #0x90 <-- restore sp before ret > > 740: d65f03c0 ret > > I think we need to untangle things a bit here. > > The arm64 PCS makes no guarantee about the position of the frame record with > respect to stack allocations, so relying on this is fragile at best. This is > partly why the ftrace-with-regs work currently relies on > -fpatchable-function-entry, since that allows the very beginning of the > function to be intercepted which I don't think is necessarily the case with > -pg/_mcount. > > For the snippet above, foo is a leaf function and does not have a frame record > placed on the stack. If we instead look at something like: > > __attribute__((noinline)) int baz(int x) > { > return x; > } > > __attribute__((noinline)) int bar(int x) > { > int a[32]; > return baz(x * a[0]); > } > > int foo(int x) > { > return bar(x); > } > > then the first instruction of bar() allocates stack space and pushes > the frame record at the same time: > > stp x29, x30, [sp, -144]! > > This can be read as "subtract 144 bytes (32*4 + 16) from the stack pointer, > write the frame record there and then update the stack pointer to point at the > bottom of the newly allocated stack", which means that the array 'a[32]' sits > directly /above/ the frame record on the stack. However, this is just what my > GCC happened to do today. When we looked at this back in 2015, there were other > cases we ended up having to identify with heuristics based on what had been > observed under various compilers: Right point taken, I was looking at bar()'s stack frame instead of foo()'s. foo() did not have to save any frame record, so probably my example was confusing. My bad. So indeed, it seems an odd ball case. > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393721.html > > This was deemed to be far too fragile, so we didn't merge it in the end. > > If this is to work reliably, then we need support from the tools. This was > raised when we first merged support for ftrace, but I'm not sure it went > anywhere: > > https://gcc.gnu.org/ml/gcc/2016-01/msg00035.html > So, I completely agree with Steve that we shouldn't be littering the core code > with #ifdef CONFIG_ARM64, but we probably do need something in the arch backend > if we're going to do this properly, and that in turn is likely to need a very > different algorithm from the one currently in use. Yeah, I am in line with that thought as well. I am reviewing the latest patch from Steve now to understand it better. thanks, - Joel _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2019-08-06 17:25 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-02 9:41 [PATCH v3] tracing: Function stack size and its name mismatch in arm64 Jiping Ma 2019-08-02 15:22 ` Steven Rostedt 2019-08-02 15:22 ` Steven Rostedt 2019-08-02 16:09 ` Steven Rostedt 2019-08-02 16:09 ` Steven Rostedt 2019-08-02 16:11 ` Steven Rostedt 2019-08-02 16:11 ` Steven Rostedt 2019-08-06 15:48 ` Joel Fernandes 2019-08-06 15:48 ` Joel Fernandes 2019-08-06 16:34 ` Steven Rostedt 2019-08-06 16:34 ` Steven Rostedt 2019-08-06 17:03 ` Steven Rostedt 2019-08-06 17:03 ` Steven Rostedt 2019-08-06 17:25 ` Joel Fernandes 2019-08-06 17:25 ` Joel Fernandes 2019-08-06 6:01 ` Jiping Ma 2019-08-03 8:26 ` Joel Fernandes 2019-08-03 8:26 ` Joel Fernandes 2019-08-03 8:32 ` Joel Fernandes 2019-08-03 8:32 ` Joel Fernandes 2019-08-05 11:25 ` Will Deacon 2019-08-05 11:25 ` Will Deacon 2019-08-05 13:59 ` Steven Rostedt 2019-08-05 13:59 ` Steven Rostedt 2019-08-06 13:00 ` Steven Rostedt 2019-08-06 13:00 ` Steven Rostedt 2019-08-06 14:47 ` Joel Fernandes 2019-08-06 14:47 ` Joel Fernandes
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.