* [RFC PATCH v2 0/1] arm64: Implement stack trace termination record [not found] <659f3d5cc025896ba4c49aea431aa8b1abc2b741> @ 2021-04-02 3:24 ` madvenka 2021-04-02 3:24 ` [RFC PATCH v2 1/1] " madvenka 2021-04-19 18:16 ` [RFC PATCH v2 0/1] " Madhavan T. Venkataraman 0 siblings, 2 replies; 11+ messages in thread From: madvenka @ 2021-04-02 3:24 UTC (permalink / raw) To: mark.rutland, broonie, jpoimboe, jthierry, catalin.marinas, will, linux-arm-kernel, live-patching, linux-kernel, madvenka From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> Reliable stacktracing requires that we identify when a stacktrace is terminated early. We can do this by ensuring all tasks have a final frame record at a known location on their task stack, and checking that this is the final frame record in the chain. All tasks have a pt_regs structure right after the task stack in the stack page. The pt_regs structure contains a stackframe field. Make this stackframe field the final frame in the task stack so all stack traces end at a fixed stack offset. For kernel tasks, this is simple to understand. For user tasks, there is some extra detail. User tasks get created via fork() et al. Once they return from fork, they enter the kernel only on an EL0 exception. In arm64, system calls are also EL0 exceptions. The EL0 exception handler uses the task pt_regs mentioned above to save register state and call different exception functions. All stack traces from EL0 exception code must end at the pt_regs. So, make pt_regs->stackframe the final frame in the EL0 exception stack. To summarize, task_pt_regs(task)->stackframe will always be the final frame in a stack trace. Sample stack traces =================== The final frame for the idle tasks is different from v1. The rest of the stack traces are the same. Primary CPU's idle task (changed from v1) ======================= [ 0.022365] arch_stack_walk+0x0/0xd0 [ 0.022376] callfd_stack+0x30/0x60 [ 0.022387] rest_init+0xd8/0xf8 [ 0.022397] arch_call_rest_init+0x18/0x24 [ 0.022411] start_kernel+0x5b8/0x5f4 [ 0.022424] __primary_switched+0xa8/0xac Secondary CPU's idle task (changed from v1) ========================= [ 0.022484] arch_stack_walk+0x0/0xd0 [ 0.022494] callfd_stack+0x30/0x60 [ 0.022502] secondary_start_kernel+0x188/0x1e0 [ 0.022513] __secondary_switched+0x80/0x84 --- Changelog: v1 - Set up task_pt_regs(current)->stackframe as the final frame when a new task is initialized in copy_thread(). - Create pt_regs for the idle tasks and set up pt_regs->stackframe as the final frame for the idle tasks. - Set up task_pt_regs(current)->stackframe as the final frame in the EL0 exception handler so the EL0 exception stack trace ends there. - Terminate the stack trace successfully in unwind_frame() when the FP reaches task_pt_regs(current)->stackframe. - The stack traces (above) in the kernel will terminate at the correct place. Debuggers may show an extra record 0x0 at the end for pt_regs->stackframe. That said, I did not see that extra frame when I did stack traces using gdb. v2 - Changed some wordings as suggested by Mark Rutland. - Removed the synthetic return PC for idle tasks. Changed the branches to start_kernel() and secondary_start_kernel() to calls so that they will have a proper return PC. Madhavan T. Venkataraman (1): arm64: Implement stack trace termination record arch/arm64/kernel/entry.S | 8 +++++--- arch/arm64/kernel/head.S | 29 +++++++++++++++++++++++------ arch/arm64/kernel/process.c | 5 +++++ arch/arm64/kernel/stacktrace.c | 10 +++++----- 4 files changed, 38 insertions(+), 14 deletions(-) base-commit: 0d02ec6b3136c73c09e7859f0d0e4e2c4c07b49b -- 2.25.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH v2 1/1] arm64: Implement stack trace termination record 2021-04-02 3:24 ` [RFC PATCH v2 0/1] arm64: Implement stack trace termination record madvenka @ 2021-04-02 3:24 ` madvenka 2021-04-03 15:59 ` Josh Poimboeuf ` (2 more replies) 2021-04-19 18:16 ` [RFC PATCH v2 0/1] " Madhavan T. Venkataraman 1 sibling, 3 replies; 11+ messages in thread From: madvenka @ 2021-04-02 3:24 UTC (permalink / raw) To: mark.rutland, broonie, jpoimboe, jthierry, catalin.marinas, will, linux-arm-kernel, live-patching, linux-kernel, madvenka From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> Reliable stacktracing requires that we identify when a stacktrace is terminated early. We can do this by ensuring all tasks have a final frame record at a known location on their task stack, and checking that this is the final frame record in the chain. Kernel Tasks ============ All tasks except the idle task have a pt_regs structure right after the task stack. This is called the task pt_regs. The pt_regs structure has a special stackframe field. Make this stackframe field the final frame in the task stack. This needs to be done in copy_thread() which initializes a new task's pt_regs and initial CPU context. For the idle task, there is no task pt_regs. For our purpose, we need one. So, create a pt_regs just like other kernel tasks and make pt_regs->stackframe the final frame in the idle task stack. This needs to be done at two places: - On the primary CPU, the boot task runs. It calls start_kernel() and eventually becomes the idle task for the primary CPU. Just before start_kernel() is called, set up the final frame. - On each secondary CPU, a startup task runs that calls secondary_startup_kernel() and eventually becomes the idle task on the secondary CPU. Just before secondary_start_kernel() is called, set up the final frame. User Tasks ========== User tasks are initially set up like kernel tasks when they are created. Then, they return to userland after fork via ret_from_fork(). After that, they enter the kernel only on an EL0 exception. (In arm64, system calls are also EL0 exceptions). The EL0 exception handler stores state in the task pt_regs and calls different functions based on the type of exception. The stack trace for an EL0 exception must end at the task pt_regs. So, make task pt_regs->stackframe as the final frame in the EL0 exception stack. In summary, task pt_regs->stackframe is where a successful stack trace ends. Stack trace termination ======================= In the unwinder, terminate the stack trace successfully when task_pt_regs(task)->stackframe is reached. For stack traces in the kernel, this will correctly terminate the stack trace at the right place. However, debuggers terminate the stack trace when FP == 0. In the pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the debugger may print an extra record 0x0 at the end. While this is not pretty, this does not do any harm. This is a small price to pay for having reliable stack trace termination in the kernel. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> --- arch/arm64/kernel/entry.S | 8 +++++--- arch/arm64/kernel/head.S | 29 +++++++++++++++++++++++------ arch/arm64/kernel/process.c | 5 +++++ arch/arm64/kernel/stacktrace.c | 10 +++++----- 4 files changed, 38 insertions(+), 14 deletions(-) diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index a31a0a713c85..e2dc2e998934 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -261,16 +261,18 @@ alternative_else_nop_endif stp lr, x21, [sp, #S_LR] /* - * For exceptions from EL0, terminate the callchain here. + * For exceptions from EL0, terminate the callchain here at + * task_pt_regs(current)->stackframe. + * * For exceptions from EL1, create a synthetic frame record so the * interrupted code shows up in the backtrace. */ .if \el == 0 - mov x29, xzr + stp xzr, xzr, [sp, #S_STACKFRAME] .else stp x29, x22, [sp, #S_STACKFRAME] - add x29, sp, #S_STACKFRAME .endif + add x29, sp, #S_STACKFRAME #ifdef CONFIG_ARM64_SW_TTBR0_PAN alternative_if_not ARM64_HAS_PAN diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 840bda1869e9..743c019a42c7 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -393,6 +393,23 @@ SYM_FUNC_START_LOCAL(__create_page_tables) ret x28 SYM_FUNC_END(__create_page_tables) + /* + * The boot task becomes the idle task for the primary CPU. The + * CPU startup task on each secondary CPU becomes the idle task + * for the secondary CPU. + * + * The idle task does not require pt_regs. But create a dummy + * pt_regs so that task_pt_regs(idle_task)->stackframe can be + * set up to be the final frame on the idle task stack just like + * all the other kernel tasks. This helps the unwinder to + * terminate the stack trace at a well-known stack offset. + */ + .macro setup_final_frame + sub sp, sp, #PT_REGS_SIZE + stp xzr, xzr, [sp, #S_STACKFRAME] + add x29, sp, #S_STACKFRAME + .endm + /* * The following fragment of code is executed with the MMU enabled. * @@ -447,9 +464,9 @@ SYM_FUNC_START_LOCAL(__primary_switched) #endif bl switch_to_vhe // Prefer VHE if possible add sp, sp, #16 - mov x29, #0 - mov x30, #0 - b start_kernel + setup_final_frame + bl start_kernel + nop SYM_FUNC_END(__primary_switched) .pushsection ".rodata", "a" @@ -606,14 +623,14 @@ SYM_FUNC_START_LOCAL(__secondary_switched) cbz x2, __secondary_too_slow msr sp_el0, x2 scs_load x2, x3 - mov x29, #0 - mov x30, #0 + setup_final_frame #ifdef CONFIG_ARM64_PTR_AUTH ptrauth_keys_init_cpu x2, x3, x4, x5 #endif - b secondary_start_kernel + bl secondary_start_kernel + nop SYM_FUNC_END(__secondary_switched) SYM_FUNC_START_LOCAL(__secondary_too_slow) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 325c83b1a24d..906baa232a89 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -437,6 +437,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, } p->thread.cpu_context.pc = (unsigned long)ret_from_fork; p->thread.cpu_context.sp = (unsigned long)childregs; + /* + * For the benefit of the unwinder, set up childregs->stackframe + * as the final frame for the new task. + */ + p->thread.cpu_context.fp = (unsigned long)childregs->stackframe; ptrace_hw_copy_thread(p); diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index ad20981dfda4..72f5af8c69dc 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -44,16 +44,16 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) unsigned long fp = frame->fp; struct stack_info info; - /* Terminal record; nothing to unwind */ - if (!fp) + if (!tsk) + tsk = current; + + /* Final frame; nothing to unwind */ + if (fp == (unsigned long) task_pt_regs(tsk)->stackframe) return -ENOENT; if (fp & 0xf) return -EINVAL; - if (!tsk) - tsk = current; - if (!on_accessible_stack(tsk, fp, &info)) return -EINVAL; -- 2.25.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH v2 1/1] arm64: Implement stack trace termination record 2021-04-02 3:24 ` [RFC PATCH v2 1/1] " madvenka @ 2021-04-03 15:59 ` Josh Poimboeuf 2021-04-04 3:46 ` Madhavan T. Venkataraman 2021-04-14 12:09 ` Madhavan T. Venkataraman 2021-04-16 16:17 ` Mark Brown 2 siblings, 1 reply; 11+ messages in thread From: Josh Poimboeuf @ 2021-04-03 15:59 UTC (permalink / raw) To: madvenka Cc: mark.rutland, broonie, jthierry, catalin.marinas, will, linux-arm-kernel, live-patching, linux-kernel On Thu, Apr 01, 2021 at 10:24:04PM -0500, madvenka@linux.microsoft.com wrote: > From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > @@ -447,9 +464,9 @@ SYM_FUNC_START_LOCAL(__primary_switched) > #endif > bl switch_to_vhe // Prefer VHE if possible > add sp, sp, #16 > - mov x29, #0 > - mov x30, #0 > - b start_kernel > + setup_final_frame > + bl start_kernel > + nop > SYM_FUNC_END(__primary_switched) > > .pushsection ".rodata", "a" > @@ -606,14 +623,14 @@ SYM_FUNC_START_LOCAL(__secondary_switched) > cbz x2, __secondary_too_slow > msr sp_el0, x2 > scs_load x2, x3 > - mov x29, #0 > - mov x30, #0 > + setup_final_frame > > #ifdef CONFIG_ARM64_PTR_AUTH > ptrauth_keys_init_cpu x2, x3, x4, x5 > #endif > > - b secondary_start_kernel > + bl secondary_start_kernel > + nop > SYM_FUNC_END(__secondary_switched) I'm somewhat arm-ignorant, so take the following comments with a grain of salt. I don't think changing these to 'bl' is necessary, unless you wanted __primary_switched() and __secondary_switched() to show up in the stacktrace for some reason? If so, that seems like a separate patch. Also, why are nops added after the calls? My guess would be because, since these are basically tail calls to "noreturn" functions, the stack dump code would otherwise show the wrong function, i.e. whatever function happens to be after the 'bl'. We had the same issue for x86. It can be fixed by using '%pB' instead of '%pS' when printing the address in dump_backtrace_entry(). See sprint_backtrace() for more details. BTW I think the same issue exists for GCC-generated code. The following shows several such cases: objdump -dr vmlinux |awk '/bl / {bl=1;l=$0;next} bl == 1 && /^$/ {print l; print} // {bl=0}' However, looking at how arm64 unwinds through exceptions in kernel space, using '%pB' might have side effects when the exception LR (elr_el1) points to the beginning of a function. Then '%pB' would show the end of the previous function, instead of the function which was interrupted. So you may need to rethink how to unwind through in-kernel exceptions. Basically, when printing a stack return address, you want to use '%pB' for a call return address and '%pS' for an interrupted address. On x86, with the frame pointer unwinder, we encode the frame pointer by setting a bit in %rbp which tells the unwinder that it's a special pt_regs frame. Then instead of treating it like a normal call frame, the stack dump code prints the registers, and the return address (regs->ip) gets printed with '%pS'. > SYM_FUNC_START_LOCAL(__secondary_too_slow) > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index 325c83b1a24d..906baa232a89 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > @@ -437,6 +437,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, > } > p->thread.cpu_context.pc = (unsigned long)ret_from_fork; > p->thread.cpu_context.sp = (unsigned long)childregs; > + /* > + * For the benefit of the unwinder, set up childregs->stackframe > + * as the final frame for the new task. > + */ > + p->thread.cpu_context.fp = (unsigned long)childregs->stackframe; > > ptrace_hw_copy_thread(p); > > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c > index ad20981dfda4..72f5af8c69dc 100644 > --- a/arch/arm64/kernel/stacktrace.c > +++ b/arch/arm64/kernel/stacktrace.c > @@ -44,16 +44,16 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) > unsigned long fp = frame->fp; > struct stack_info info; > > - /* Terminal record; nothing to unwind */ > - if (!fp) > + if (!tsk) > + tsk = current; > + > + /* Final frame; nothing to unwind */ > + if (fp == (unsigned long) task_pt_regs(tsk)->stackframe) > return -ENOENT; As far as I can tell, the regs stackframe value is initialized to zero during syscall entry, so isn't this basically just 'if (fp == 0)'? Shouldn't it instead be comparing with the _address_ of the stackframe field to make sure it reached the end? -- Josh ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH v2 1/1] arm64: Implement stack trace termination record 2021-04-03 15:59 ` Josh Poimboeuf @ 2021-04-04 3:46 ` Madhavan T. Venkataraman 2021-04-04 4:40 ` Madhavan T. Venkataraman 0 siblings, 1 reply; 11+ messages in thread From: Madhavan T. Venkataraman @ 2021-04-04 3:46 UTC (permalink / raw) To: Josh Poimboeuf Cc: mark.rutland, broonie, jthierry, catalin.marinas, will, linux-arm-kernel, live-patching, linux-kernel On 4/3/21 10:59 AM, Josh Poimboeuf wrote: > On Thu, Apr 01, 2021 at 10:24:04PM -0500, madvenka@linux.microsoft.com wrote: >> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> >> @@ -447,9 +464,9 @@ SYM_FUNC_START_LOCAL(__primary_switched) >> #endif >> bl switch_to_vhe // Prefer VHE if possible >> add sp, sp, #16 >> - mov x29, #0 >> - mov x30, #0 >> - b start_kernel >> + setup_final_frame >> + bl start_kernel >> + nop >> SYM_FUNC_END(__primary_switched) >> >> .pushsection ".rodata", "a" >> @@ -606,14 +623,14 @@ SYM_FUNC_START_LOCAL(__secondary_switched) >> cbz x2, __secondary_too_slow >> msr sp_el0, x2 >> scs_load x2, x3 >> - mov x29, #0 >> - mov x30, #0 >> + setup_final_frame >> >> #ifdef CONFIG_ARM64_PTR_AUTH >> ptrauth_keys_init_cpu x2, x3, x4, x5 >> #endif >> >> - b secondary_start_kernel >> + bl secondary_start_kernel >> + nop >> SYM_FUNC_END(__secondary_switched) > > I'm somewhat arm-ignorant, so take the following comments with a grain > of salt. > > > I don't think changing these to 'bl' is necessary, unless you wanted > __primary_switched() and __secondary_switched() to show up in the > stacktrace for some reason? If so, that seems like a separate patch. > The problem is with __secondary_switched. If you trace the code back to where a secondary CPU is started, I don't see any calls anywhere. There are only branches if I am not mistaken. So, the return address register never gets set up with a proper address. The stack trace shows some hexadecimal value instead of a symbol name. On ARM64, the call instruction is actually a branch instruction IIUC. The only extra thing it does is to load the link register (return address register) with the return address. That is all. Instead of the link register pointing to some arbitrary code in startup that did not call start_kernel() or secondary_start_kernel(), I wanted to set it up as shown above. > > Also, why are nops added after the calls? My guess would be because, > since these are basically tail calls to "noreturn" functions, the stack > dump code would otherwise show the wrong function, i.e. whatever > function happens to be after the 'bl'. > That is correct. The stack trace shows something arbitrary. > We had the same issue for x86. It can be fixed by using '%pB' instead > of '%pS' when printing the address in dump_backtrace_entry(). See > sprint_backtrace() for more details. > > BTW I think the same issue exists for GCC-generated code. The following > shows several such cases: > > objdump -dr vmlinux |awk '/bl / {bl=1;l=$0;next} bl == 1 && /^$/ {print l; print} // {bl=0}' > > > However, looking at how arm64 unwinds through exceptions in kernel > space, using '%pB' might have side effects when the exception LR > (elr_el1) points to the beginning of a function. Then '%pB' would show > the end of the previous function, instead of the function which was > interrupted. > > So you may need to rethink how to unwind through in-kernel exceptions. > > Basically, when printing a stack return address, you want to use '%pB' > for a call return address and '%pS' for an interrupted address. > > On x86, with the frame pointer unwinder, we encode the frame pointer by > setting a bit in %rbp which tells the unwinder that it's a special > pt_regs frame. Then instead of treating it like a normal call frame, > the stack dump code prints the registers, and the return address > (regs->ip) gets printed with '%pS'. > Yes. But there are objections to that kind of encoding. Having the nop above does not do any harm. It just adds 4 bytes to the function text. I would rather keep this simple right now because this is only for getting a sensible stack trace for idle tasks. Is there any other problem that you can see? >> SYM_FUNC_START_LOCAL(__secondary_too_slow) >> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c >> index 325c83b1a24d..906baa232a89 100644 >> --- a/arch/arm64/kernel/process.c >> +++ b/arch/arm64/kernel/process.c >> @@ -437,6 +437,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, >> } >> p->thread.cpu_context.pc = (unsigned long)ret_from_fork; >> p->thread.cpu_context.sp = (unsigned long)childregs; >> + /* >> + * For the benefit of the unwinder, set up childregs->stackframe >> + * as the final frame for the new task. >> + */ >> + p->thread.cpu_context.fp = (unsigned long)childregs->stackframe; >> >> ptrace_hw_copy_thread(p); >> >> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c >> index ad20981dfda4..72f5af8c69dc 100644 >> --- a/arch/arm64/kernel/stacktrace.c >> +++ b/arch/arm64/kernel/stacktrace.c >> @@ -44,16 +44,16 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) >> unsigned long fp = frame->fp; >> struct stack_info info; >> >> - /* Terminal record; nothing to unwind */ >> - if (!fp) >> + if (!tsk) >> + tsk = current; >> + >> + /* Final frame; nothing to unwind */ >> + if (fp == (unsigned long) task_pt_regs(tsk)->stackframe) >> return -ENOENT; > > As far as I can tell, the regs stackframe value is initialized to zero > during syscall entry, so isn't this basically just 'if (fp == 0)'? > > Shouldn't it instead be comparing with the _address_ of the stackframe > field to make sure it reached the end? > pt_regs->stackframe is an array of two u64 elements- one for FP and one for PC. So, I am comparing the address and not the value of FP. u64 stackframe[2]; Madhavan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH v2 1/1] arm64: Implement stack trace termination record 2021-04-04 3:46 ` Madhavan T. Venkataraman @ 2021-04-04 4:40 ` Madhavan T. Venkataraman 2021-04-04 16:29 ` Madhavan T. Venkataraman 0 siblings, 1 reply; 11+ messages in thread From: Madhavan T. Venkataraman @ 2021-04-04 4:40 UTC (permalink / raw) To: Josh Poimboeuf Cc: mark.rutland, broonie, jthierry, catalin.marinas, will, linux-arm-kernel, live-patching, linux-kernel On 4/3/21 10:46 PM, Madhavan T. Venkataraman wrote: >> I'm somewhat arm-ignorant, so take the following comments with a grain >> of salt. >> >> >> I don't think changing these to 'bl' is necessary, unless you wanted >> __primary_switched() and __secondary_switched() to show up in the >> stacktrace for some reason? If so, that seems like a separate patch. >> > The problem is with __secondary_switched. If you trace the code back to where > a secondary CPU is started, I don't see any calls anywhere. There are only > branches if I am not mistaken. So, the return address register never gets > set up with a proper address. The stack trace shows some hexadecimal value > instead of a symbol name. > Actually, I take that back. There are calls in that code path. But I did only see some hexadecimal value instead of a proper address in the stack trace. Sorry about that confusion. My reason to convert the branches to calls is this - the value of the return address register at that point is the return PC of the previous branch and link instruction wherever that happens to be. I think that is a little arbitrary. Instead, if I call start_kernel() and secondary_start_kernel(), the return address gets set up to the next instruction which, IMHO, is better. But I am open to other suggestions. Madhavan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH v2 1/1] arm64: Implement stack trace termination record 2021-04-04 4:40 ` Madhavan T. Venkataraman @ 2021-04-04 16:29 ` Madhavan T. Venkataraman 0 siblings, 0 replies; 11+ messages in thread From: Madhavan T. Venkataraman @ 2021-04-04 16:29 UTC (permalink / raw) To: Josh Poimboeuf Cc: mark.rutland, broonie, jthierry, catalin.marinas, will, linux-arm-kernel, live-patching, linux-kernel On 4/3/21 11:40 PM, Madhavan T. Venkataraman wrote: > > > On 4/3/21 10:46 PM, Madhavan T. Venkataraman wrote: >>> I'm somewhat arm-ignorant, so take the following comments with a grain >>> of salt. >>> >>> >>> I don't think changing these to 'bl' is necessary, unless you wanted >>> __primary_switched() and __secondary_switched() to show up in the >>> stacktrace for some reason? If so, that seems like a separate patch. >>> >> The problem is with __secondary_switched. If you trace the code back to where >> a secondary CPU is started, I don't see any calls anywhere. There are only >> branches if I am not mistaken. So, the return address register never gets >> set up with a proper address. The stack trace shows some hexadecimal value >> instead of a symbol name. >> > > Actually, I take that back. There are calls in that code path. But I did only > see some hexadecimal value instead of a proper address in the stack trace. > Sorry about that confusion. > Again, I apologize. I had this confused with something else in my notes. So, the stack trace looks like this without my changes to convert the branch to secondary_start_kernel() to a call: ... [ 0.022492] secondary_start_kernel+0x188/0x1e0 [ 0.022503] 0xf8689e1cc It looks like the code calls __enable_mmu before reaching the place where it branches to secondary_start_kernel(). bl __enable_mmu The return address register should be set to the next instruction address. I am guessing that the return address is 0xf8689e1cc because of the idmap stuff. Madhavan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH v2 1/1] arm64: Implement stack trace termination record 2021-04-02 3:24 ` [RFC PATCH v2 1/1] " madvenka 2021-04-03 15:59 ` Josh Poimboeuf @ 2021-04-14 12:09 ` Madhavan T. Venkataraman 2021-04-16 16:17 ` Mark Brown 2 siblings, 0 replies; 11+ messages in thread From: Madhavan T. Venkataraman @ 2021-04-14 12:09 UTC (permalink / raw) To: mark.rutland, broonie, jpoimboe, jthierry, catalin.marinas, will, linux-arm-kernel, live-patching, linux-kernel Hi Mark Rutland, Mark Brown, Could you take a look at this version for proper stack termination and let me know what you think? Thanks! Madhavan On 4/1/21 10:24 PM, madvenka@linux.microsoft.com wrote: > From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > > Reliable stacktracing requires that we identify when a stacktrace is > terminated early. We can do this by ensuring all tasks have a final > frame record at a known location on their task stack, and checking > that this is the final frame record in the chain. > > Kernel Tasks > ============ > > All tasks except the idle task have a pt_regs structure right after the > task stack. This is called the task pt_regs. The pt_regs structure has a > special stackframe field. Make this stackframe field the final frame in the > task stack. This needs to be done in copy_thread() which initializes a new > task's pt_regs and initial CPU context. > > For the idle task, there is no task pt_regs. For our purpose, we need one. > So, create a pt_regs just like other kernel tasks and make > pt_regs->stackframe the final frame in the idle task stack. This needs to be > done at two places: > > - On the primary CPU, the boot task runs. It calls start_kernel() > and eventually becomes the idle task for the primary CPU. Just > before start_kernel() is called, set up the final frame. > > - On each secondary CPU, a startup task runs that calls > secondary_startup_kernel() and eventually becomes the idle task > on the secondary CPU. Just before secondary_start_kernel() is > called, set up the final frame. > > User Tasks > ========== > > User tasks are initially set up like kernel tasks when they are created. > Then, they return to userland after fork via ret_from_fork(). After that, > they enter the kernel only on an EL0 exception. (In arm64, system calls are > also EL0 exceptions). The EL0 exception handler stores state in the task > pt_regs and calls different functions based on the type of exception. The > stack trace for an EL0 exception must end at the task pt_regs. So, make > task pt_regs->stackframe as the final frame in the EL0 exception stack. > > In summary, task pt_regs->stackframe is where a successful stack trace ends. > > Stack trace termination > ======================= > > In the unwinder, terminate the stack trace successfully when > task_pt_regs(task)->stackframe is reached. For stack traces in the kernel, > this will correctly terminate the stack trace at the right place. > > However, debuggers terminate the stack trace when FP == 0. In the > pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the > debugger may print an extra record 0x0 at the end. While this is not > pretty, this does not do any harm. This is a small price to pay for > having reliable stack trace termination in the kernel. > > Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> > --- > arch/arm64/kernel/entry.S | 8 +++++--- > arch/arm64/kernel/head.S | 29 +++++++++++++++++++++++------ > arch/arm64/kernel/process.c | 5 +++++ > arch/arm64/kernel/stacktrace.c | 10 +++++----- > 4 files changed, 38 insertions(+), 14 deletions(-) > > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > index a31a0a713c85..e2dc2e998934 100644 > --- a/arch/arm64/kernel/entry.S > +++ b/arch/arm64/kernel/entry.S > @@ -261,16 +261,18 @@ alternative_else_nop_endif > stp lr, x21, [sp, #S_LR] > > /* > - * For exceptions from EL0, terminate the callchain here. > + * For exceptions from EL0, terminate the callchain here at > + * task_pt_regs(current)->stackframe. > + * > * For exceptions from EL1, create a synthetic frame record so the > * interrupted code shows up in the backtrace. > */ > .if \el == 0 > - mov x29, xzr > + stp xzr, xzr, [sp, #S_STACKFRAME] > .else > stp x29, x22, [sp, #S_STACKFRAME] > - add x29, sp, #S_STACKFRAME > .endif > + add x29, sp, #S_STACKFRAME > > #ifdef CONFIG_ARM64_SW_TTBR0_PAN > alternative_if_not ARM64_HAS_PAN > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S > index 840bda1869e9..743c019a42c7 100644 > --- a/arch/arm64/kernel/head.S > +++ b/arch/arm64/kernel/head.S > @@ -393,6 +393,23 @@ SYM_FUNC_START_LOCAL(__create_page_tables) > ret x28 > SYM_FUNC_END(__create_page_tables) > > + /* > + * The boot task becomes the idle task for the primary CPU. The > + * CPU startup task on each secondary CPU becomes the idle task > + * for the secondary CPU. > + * > + * The idle task does not require pt_regs. But create a dummy > + * pt_regs so that task_pt_regs(idle_task)->stackframe can be > + * set up to be the final frame on the idle task stack just like > + * all the other kernel tasks. This helps the unwinder to > + * terminate the stack trace at a well-known stack offset. > + */ > + .macro setup_final_frame > + sub sp, sp, #PT_REGS_SIZE > + stp xzr, xzr, [sp, #S_STACKFRAME] > + add x29, sp, #S_STACKFRAME > + .endm > + > /* > * The following fragment of code is executed with the MMU enabled. > * > @@ -447,9 +464,9 @@ SYM_FUNC_START_LOCAL(__primary_switched) > #endif > bl switch_to_vhe // Prefer VHE if possible > add sp, sp, #16 > - mov x29, #0 > - mov x30, #0 > - b start_kernel > + setup_final_frame > + bl start_kernel > + nop > SYM_FUNC_END(__primary_switched) > > .pushsection ".rodata", "a" > @@ -606,14 +623,14 @@ SYM_FUNC_START_LOCAL(__secondary_switched) > cbz x2, __secondary_too_slow > msr sp_el0, x2 > scs_load x2, x3 > - mov x29, #0 > - mov x30, #0 > + setup_final_frame > > #ifdef CONFIG_ARM64_PTR_AUTH > ptrauth_keys_init_cpu x2, x3, x4, x5 > #endif > > - b secondary_start_kernel > + bl secondary_start_kernel > + nop > SYM_FUNC_END(__secondary_switched) > > SYM_FUNC_START_LOCAL(__secondary_too_slow) > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index 325c83b1a24d..906baa232a89 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > @@ -437,6 +437,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, > } > p->thread.cpu_context.pc = (unsigned long)ret_from_fork; > p->thread.cpu_context.sp = (unsigned long)childregs; > + /* > + * For the benefit of the unwinder, set up childregs->stackframe > + * as the final frame for the new task. > + */ > + p->thread.cpu_context.fp = (unsigned long)childregs->stackframe; > > ptrace_hw_copy_thread(p); > > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c > index ad20981dfda4..72f5af8c69dc 100644 > --- a/arch/arm64/kernel/stacktrace.c > +++ b/arch/arm64/kernel/stacktrace.c > @@ -44,16 +44,16 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) > unsigned long fp = frame->fp; > struct stack_info info; > > - /* Terminal record; nothing to unwind */ > - if (!fp) > + if (!tsk) > + tsk = current; > + > + /* Final frame; nothing to unwind */ > + if (fp == (unsigned long) task_pt_regs(tsk)->stackframe) > return -ENOENT; > > if (fp & 0xf) > return -EINVAL; > > - if (!tsk) > - tsk = current; > - > if (!on_accessible_stack(tsk, fp, &info)) > return -EINVAL; > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH v2 1/1] arm64: Implement stack trace termination record 2021-04-02 3:24 ` [RFC PATCH v2 1/1] " madvenka 2021-04-03 15:59 ` Josh Poimboeuf 2021-04-14 12:09 ` Madhavan T. Venkataraman @ 2021-04-16 16:17 ` Mark Brown 2021-04-16 17:31 ` Madhavan T. Venkataraman 2 siblings, 1 reply; 11+ messages in thread From: Mark Brown @ 2021-04-16 16:17 UTC (permalink / raw) To: madvenka Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will, linux-arm-kernel, live-patching, linux-kernel [-- Attachment #1: Type: text/plain, Size: 387 bytes --] On Thu, Apr 01, 2021 at 10:24:04PM -0500, madvenka@linux.microsoft.com wrote: > Reliable stacktracing requires that we identify when a stacktrace is > terminated early. We can do this by ensuring all tasks have a final > frame record at a known location on their task stack, and checking > that this is the final frame record in the chain. Reviewed-by: Mark Brown <broonie@kernel.org> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH v2 1/1] arm64: Implement stack trace termination record 2021-04-16 16:17 ` Mark Brown @ 2021-04-16 17:31 ` Madhavan T. Venkataraman 0 siblings, 0 replies; 11+ messages in thread From: Madhavan T. Venkataraman @ 2021-04-16 17:31 UTC (permalink / raw) To: Mark Brown Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will, linux-arm-kernel, live-patching, linux-kernel Thanks! Madhavan On 4/16/21 11:17 AM, Mark Brown wrote: > On Thu, Apr 01, 2021 at 10:24:04PM -0500, madvenka@linux.microsoft.com wrote: > >> Reliable stacktracing requires that we identify when a stacktrace is >> terminated early. We can do this by ensuring all tasks have a final >> frame record at a known location on their task stack, and checking >> that this is the final frame record in the chain. > > Reviewed-by: Mark Brown <broonie@kernel.org> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH v2 0/1] arm64: Implement stack trace termination record 2021-04-02 3:24 ` [RFC PATCH v2 0/1] arm64: Implement stack trace termination record madvenka 2021-04-02 3:24 ` [RFC PATCH v2 1/1] " madvenka @ 2021-04-19 18:16 ` Madhavan T. Venkataraman 2021-04-19 18:18 ` Madhavan T. Venkataraman 1 sibling, 1 reply; 11+ messages in thread From: Madhavan T. Venkataraman @ 2021-04-19 18:16 UTC (permalink / raw) To: mark.rutland, broonie, jpoimboe, jthierry, catalin.marinas, will, linux-arm-kernel, live-patching, linux-kernel Cc: pasha.tatashin CCing Pavel Tatashin <pasha.tatashin@soleen.com> on request. Pasha, This is v2. v1 is here: https://lore.kernel.org/linux-arm-kernel/20210324184607.120948-1-madvenka@linux.microsoft.com/ Thanks! Madhavan On 4/1/21 10:24 PM, madvenka@linux.microsoft.com wrote: > From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > > Reliable stacktracing requires that we identify when a stacktrace is > terminated early. We can do this by ensuring all tasks have a final > frame record at a known location on their task stack, and checking > that this is the final frame record in the chain. > > All tasks have a pt_regs structure right after the task stack in the stack > page. The pt_regs structure contains a stackframe field. Make this stackframe > field the final frame in the task stack so all stack traces end at a fixed > stack offset. > > For kernel tasks, this is simple to understand. For user tasks, there is > some extra detail. User tasks get created via fork() et al. Once they return > from fork, they enter the kernel only on an EL0 exception. In arm64, > system calls are also EL0 exceptions. > > The EL0 exception handler uses the task pt_regs mentioned above to save > register state and call different exception functions. All stack traces > from EL0 exception code must end at the pt_regs. So, make pt_regs->stackframe > the final frame in the EL0 exception stack. > > To summarize, task_pt_regs(task)->stackframe will always be the final frame > in a stack trace. > > Sample stack traces > =================== > > The final frame for the idle tasks is different from v1. The rest of the > stack traces are the same. > > Primary CPU's idle task (changed from v1) > ======================= > > [ 0.022365] arch_stack_walk+0x0/0xd0 > [ 0.022376] callfd_stack+0x30/0x60 > [ 0.022387] rest_init+0xd8/0xf8 > [ 0.022397] arch_call_rest_init+0x18/0x24 > [ 0.022411] start_kernel+0x5b8/0x5f4 > [ 0.022424] __primary_switched+0xa8/0xac > > Secondary CPU's idle task (changed from v1) > ========================= > > [ 0.022484] arch_stack_walk+0x0/0xd0 > [ 0.022494] callfd_stack+0x30/0x60 > [ 0.022502] secondary_start_kernel+0x188/0x1e0 > [ 0.022513] __secondary_switched+0x80/0x84 > > --- > Changelog: > > v1 > - Set up task_pt_regs(current)->stackframe as the final frame > when a new task is initialized in copy_thread(). > > - Create pt_regs for the idle tasks and set up pt_regs->stackframe > as the final frame for the idle tasks. > > - Set up task_pt_regs(current)->stackframe as the final frame in > the EL0 exception handler so the EL0 exception stack trace ends > there. > > - Terminate the stack trace successfully in unwind_frame() when > the FP reaches task_pt_regs(current)->stackframe. > > - The stack traces (above) in the kernel will terminate at the > correct place. Debuggers may show an extra record 0x0 at the end > for pt_regs->stackframe. That said, I did not see that extra frame > when I did stack traces using gdb. > v2 > - Changed some wordings as suggested by Mark Rutland. > > - Removed the synthetic return PC for idle tasks. Changed the > branches to start_kernel() and secondary_start_kernel() to > calls so that they will have a proper return PC. > > Madhavan T. Venkataraman (1): > arm64: Implement stack trace termination record > > arch/arm64/kernel/entry.S | 8 +++++--- > arch/arm64/kernel/head.S | 29 +++++++++++++++++++++++------ > arch/arm64/kernel/process.c | 5 +++++ > arch/arm64/kernel/stacktrace.c | 10 +++++----- > 4 files changed, 38 insertions(+), 14 deletions(-) > > > base-commit: 0d02ec6b3136c73c09e7859f0d0e4e2c4c07b49b > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH v2 0/1] arm64: Implement stack trace termination record 2021-04-19 18:16 ` [RFC PATCH v2 0/1] " Madhavan T. Venkataraman @ 2021-04-19 18:18 ` Madhavan T. Venkataraman 0 siblings, 0 replies; 11+ messages in thread From: Madhavan T. Venkataraman @ 2021-04-19 18:18 UTC (permalink / raw) To: mark.rutland, broonie, jpoimboe, jthierry, catalin.marinas, will, linux-arm-kernel, live-patching, linux-kernel Cc: pasha.tatashin Sorry. Forgot to include link to v2. Here it is: https://lore.kernel.org/linux-arm-kernel/20210402032404.47239-1-madvenka@linux.microsoft.com/ Thanks! Madhavan On 4/19/21 1:16 PM, Madhavan T. Venkataraman wrote: > CCing Pavel Tatashin <pasha.tatashin@soleen.com> on request. > > Pasha, > > This is v2. v1 is here: > > https://lore.kernel.org/linux-arm-kernel/20210324184607.120948-1-madvenka@linux.microsoft.com/ > > Thanks! > > Madhavan > > On 4/1/21 10:24 PM, madvenka@linux.microsoft.com wrote: >> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> >> >> Reliable stacktracing requires that we identify when a stacktrace is >> terminated early. We can do this by ensuring all tasks have a final >> frame record at a known location on their task stack, and checking >> that this is the final frame record in the chain. >> >> All tasks have a pt_regs structure right after the task stack in the stack >> page. The pt_regs structure contains a stackframe field. Make this stackframe >> field the final frame in the task stack so all stack traces end at a fixed >> stack offset. >> >> For kernel tasks, this is simple to understand. For user tasks, there is >> some extra detail. User tasks get created via fork() et al. Once they return >> from fork, they enter the kernel only on an EL0 exception. In arm64, >> system calls are also EL0 exceptions. >> >> The EL0 exception handler uses the task pt_regs mentioned above to save >> register state and call different exception functions. All stack traces >> from EL0 exception code must end at the pt_regs. So, make pt_regs->stackframe >> the final frame in the EL0 exception stack. >> >> To summarize, task_pt_regs(task)->stackframe will always be the final frame >> in a stack trace. >> >> Sample stack traces >> =================== >> >> The final frame for the idle tasks is different from v1. The rest of the >> stack traces are the same. >> >> Primary CPU's idle task (changed from v1) >> ======================= >> >> [ 0.022365] arch_stack_walk+0x0/0xd0 >> [ 0.022376] callfd_stack+0x30/0x60 >> [ 0.022387] rest_init+0xd8/0xf8 >> [ 0.022397] arch_call_rest_init+0x18/0x24 >> [ 0.022411] start_kernel+0x5b8/0x5f4 >> [ 0.022424] __primary_switched+0xa8/0xac >> >> Secondary CPU's idle task (changed from v1) >> ========================= >> >> [ 0.022484] arch_stack_walk+0x0/0xd0 >> [ 0.022494] callfd_stack+0x30/0x60 >> [ 0.022502] secondary_start_kernel+0x188/0x1e0 >> [ 0.022513] __secondary_switched+0x80/0x84 >> >> --- >> Changelog: >> >> v1 >> - Set up task_pt_regs(current)->stackframe as the final frame >> when a new task is initialized in copy_thread(). >> >> - Create pt_regs for the idle tasks and set up pt_regs->stackframe >> as the final frame for the idle tasks. >> >> - Set up task_pt_regs(current)->stackframe as the final frame in >> the EL0 exception handler so the EL0 exception stack trace ends >> there. >> >> - Terminate the stack trace successfully in unwind_frame() when >> the FP reaches task_pt_regs(current)->stackframe. >> >> - The stack traces (above) in the kernel will terminate at the >> correct place. Debuggers may show an extra record 0x0 at the end >> for pt_regs->stackframe. That said, I did not see that extra frame >> when I did stack traces using gdb. >> v2 >> - Changed some wordings as suggested by Mark Rutland. >> >> - Removed the synthetic return PC for idle tasks. Changed the >> branches to start_kernel() and secondary_start_kernel() to >> calls so that they will have a proper return PC. >> >> Madhavan T. Venkataraman (1): >> arm64: Implement stack trace termination record >> >> arch/arm64/kernel/entry.S | 8 +++++--- >> arch/arm64/kernel/head.S | 29 +++++++++++++++++++++++------ >> arch/arm64/kernel/process.c | 5 +++++ >> arch/arm64/kernel/stacktrace.c | 10 +++++----- >> 4 files changed, 38 insertions(+), 14 deletions(-) >> >> >> base-commit: 0d02ec6b3136c73c09e7859f0d0e4e2c4c07b49b >> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-04-19 18:18 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <659f3d5cc025896ba4c49aea431aa8b1abc2b741> 2021-04-02 3:24 ` [RFC PATCH v2 0/1] arm64: Implement stack trace termination record madvenka 2021-04-02 3:24 ` [RFC PATCH v2 1/1] " madvenka 2021-04-03 15:59 ` Josh Poimboeuf 2021-04-04 3:46 ` Madhavan T. Venkataraman 2021-04-04 4:40 ` Madhavan T. Venkataraman 2021-04-04 16:29 ` Madhavan T. Venkataraman 2021-04-14 12:09 ` Madhavan T. Venkataraman 2021-04-16 16:17 ` Mark Brown 2021-04-16 17:31 ` Madhavan T. Venkataraman 2021-04-19 18:16 ` [RFC PATCH v2 0/1] " Madhavan T. Venkataraman 2021-04-19 18:18 ` Madhavan T. Venkataraman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).