* [PATCH v3 0/1] arm64: Implement stack trace termination record [not found] <80cac661608c8d3623328b37b9b1696f63f45968> @ 2021-04-20 18:44 ` madvenka 2021-04-20 18:44 ` [PATCH v3 1/1] " madvenka 0 siblings, 1 reply; 4+ messages in thread From: madvenka @ 2021-04-20 18:44 UTC (permalink / raw) To: broonie, jpoimboe, mark.rutland, jthierry, catalin.marinas, will, jmorris, pasha.tatashin, linux-arm-kernel, live-patching, linux-kernel, madvenka From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> Reliable stacktracing requires that we identify when a stacktrace is terminated early. We can do this by ensuring all tasks have a final frame record at a known location on their task stack, and checking that this is the final frame record in the chain. All tasks have a pt_regs structure right after the task stack in the stack page. The pt_regs structure contains a stackframe field. Make this stackframe field the final frame in the task stack so all stack traces end at a fixed stack offset. For kernel tasks, this is simple to understand. For user tasks, there is some extra detail. User tasks get created via fork() et al. Once they return from fork, they enter the kernel only on an EL0 exception. In arm64, system calls are also EL0 exceptions. The EL0 exception handler uses the task pt_regs mentioned above to save register state and call different exception functions. All stack traces from EL0 exception code must end at the pt_regs. So, make pt_regs->stackframe the final frame in the EL0 exception stack. To summarize, task_pt_regs(task)->stackframe will always be the final frame in a stack trace. Sample stack traces =================== Showing just the last couple of frames in each stack trace to show how the stack trace ends. Primary CPU idle task ===================== ... [ 0.077109] rest_init+0x108/0x144 [ 0.077188] arch_call_rest_init+0x18/0x24 [ 0.077220] start_kernel+0x3ac/0x3e4 [ 0.077293] __primary_switched+0xac/0xb0 Secondary CPU idle task ======================= ... [ 0.077264] secondary_start_kernel+0x228/0x388 [ 0.077326] __secondary_switched+0x80/0x84 Sample kernel thread ==================== ... [ 24.543250] kernel_init+0xa4/0x164 [ 24.561850] ret_from_fork+0x10/0x18 Write system call (EL0 exception) ================= (using a test driver called callfd) [ 1160.628723] callfd_stack+0x3c/0x70 [ 1160.628768] callfd_op+0x35c/0x3a8 [ 1160.628791] callfd_write+0x5c/0xc8 [ 1160.628813] vfs_write+0x104/0x3b8 [ 1160.628837] ksys_write+0xd0/0x188 [ 1160.628859] __arm64_sys_write+0x4c/0x60 [ 1160.628883] el0_svc_common.constprop.0+0xa8/0x240 [ 1160.628904] do_el0_svc+0x40/0xa8 [ 1160.628921] el0_svc+0x2c/0x78 [ 1160.628942] el0_sync_handler+0xb0/0xb8 [ 1160.628962] el0_sync+0x17c/0x180 NULL pointer dereference exception (EL1 exception) ================================== [ 1160.637984] callfd_stack+0x3c/0x70 [ 1160.638015] die_kernel_fault+0x80/0x108 [ 1160.638042] do_page_fault+0x520/0x600 [ 1160.638075] do_translation_fault+0xa8/0xdc [ 1160.638102] do_mem_abort+0x68/0x100 [ 1160.638120] el1_abort+0x40/0x60 [ 1160.638138] el1_sync_handler+0xac/0xc8 [ 1160.638157] el1_sync+0x74/0x100 [ 1160.638174] 0x0 <=== NULL pointer dereference [ 1160.638189] callfd_write+0x5c/0xc8 [ 1160.638211] vfs_write+0x104/0x3b8 [ 1160.638234] ksys_write+0xd0/0x188 [ 1160.638278] __arm64_sys_write+0x4c/0x60 [ 1160.638325] el0_svc_common.constprop.0+0xa8/0x240 [ 1160.638358] do_el0_svc+0x40/0xa8 [ 1160.638379] el0_svc+0x2c/0x78 [ 1160.638409] el0_sync_handler+0xb0/0xb8 [ 1160.638452] el0_sync+0x17c/0x180 Timer interrupt (EL1 exception) =============== Secondary CPU idle task interrupted by the timer interrupt: [ 1160.702949] callfd_callback: [ 1160.703006] callfd_stack+0x3c/0x70 [ 1160.703060] callfd_callback+0x30/0x40 [ 1160.703087] call_timer_fn+0x48/0x220 [ 1160.703113] run_timer_softirq+0x7cc/0xc70 [ 1160.703144] __do_softirq+0x1ec/0x608 [ 1160.703166] irq_exit+0x138/0x180 [ 1160.703193] __handle_domain_irq+0x8c/0xf0 [ 1160.703218] gic_handle_irq+0xec/0x410 [ 1160.703253] el1_irq+0xc0/0x180 [ 1160.703278] arch_local_irq_enable+0xc/0x28 [ 1160.703329] default_idle_call+0x54/0x1d8 [ 1160.703355] do_idle+0x2d8/0x350 [ 1160.703388] cpu_startup_entry+0x2c/0x98 [ 1160.703412] secondary_start_kernel+0x238/0x388 [ 1160.703446] __secondary_switched+0x80/0x84 --- Changelog: v3: - Added Reviewed-by: Mark Brown <broonie@kernel.org>. - Fixed an extra space after a cast reported by checkpatch --strict. - Synced with mainline tip. v2: - Changed some wordings as suggested by Mark Rutland. - Removed the synthetic return PC for idle tasks. Changed the branches to start_kernel() and secondary_start_kernel() to calls so that they will have a proper return PC. v1: - Set up task_pt_regs(current)->stackframe as the final frame when a new task is initialized in copy_thread(). - Create pt_regs for the idle tasks and set up pt_regs->stackframe as the final frame for the idle tasks. - Set up task_pt_regs(current)->stackframe as the final frame in the EL0 exception handler so the EL0 exception stack trace ends there. - Terminate the stack trace successfully in unwind_frame() when the FP reaches task_pt_regs(current)->stackframe. - The stack traces (above) in the kernel will terminate at the correct place. Debuggers may show an extra record 0x0 at the end for pt_regs->stackframe. That said, I did not see that extra frame when I did stack traces using gdb. Testing: - Functional validation on a ThunderX system. Previous versions and discussion ================================ v2: https://lore.kernel.org/linux-arm-kernel/20210402032404.47239-1-madvenka@linux.microsoft.com/ v1: https://lore.kernel.org/linux-arm-kernel/20210324184607.120948-1-madvenka@linux.microsoft.com/ Madhavan T. Venkataraman (1): arm64: Implement stack trace termination record arch/arm64/kernel/entry.S | 8 +++++--- arch/arm64/kernel/head.S | 29 +++++++++++++++++++++++------ arch/arm64/kernel/process.c | 5 +++++ arch/arm64/kernel/stacktrace.c | 10 +++++----- 4 files changed, 38 insertions(+), 14 deletions(-) base-commit: bf05bf16c76bb44ab5156223e1e58e26dfe30a88 -- 2.25.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 1/1] arm64: Implement stack trace termination record 2021-04-20 18:44 ` [PATCH v3 0/1] arm64: Implement stack trace termination record madvenka @ 2021-04-20 18:44 ` madvenka 2021-05-04 15:13 ` Mark Rutland 0 siblings, 1 reply; 4+ messages in thread From: madvenka @ 2021-04-20 18:44 UTC (permalink / raw) To: broonie, jpoimboe, mark.rutland, jthierry, catalin.marinas, will, jmorris, pasha.tatashin, linux-arm-kernel, live-patching, linux-kernel, madvenka From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> Reliable stacktracing requires that we identify when a stacktrace is terminated early. We can do this by ensuring all tasks have a final frame record at a known location on their task stack, and checking that this is the final frame record in the chain. Kernel Tasks ============ All tasks except the idle task have a pt_regs structure right after the task stack. This is called the task pt_regs. The pt_regs structure has a special stackframe field. Make this stackframe field the final frame in the task stack. This needs to be done in copy_thread() which initializes a new task's pt_regs and initial CPU context. For the idle task, there is no task pt_regs. For our purpose, we need one. So, create a pt_regs just like other kernel tasks and make pt_regs->stackframe the final frame in the idle task stack. This needs to be done at two places: - On the primary CPU, the boot task runs. It calls start_kernel() and eventually becomes the idle task for the primary CPU. Just before start_kernel() is called, set up the final frame. - On each secondary CPU, a startup task runs that calls secondary_startup_kernel() and eventually becomes the idle task on the secondary CPU. Just before secondary_start_kernel() is called, set up the final frame. User Tasks ========== User tasks are initially set up like kernel tasks when they are created. Then, they return to userland after fork via ret_from_fork(). After that, they enter the kernel only on an EL0 exception. (In arm64, system calls are also EL0 exceptions). The EL0 exception handler stores state in the task pt_regs and calls different functions based on the type of exception. The stack trace for an EL0 exception must end at the task pt_regs. So, make task pt_regs->stackframe as the final frame in the EL0 exception stack. In summary, task pt_regs->stackframe is where a successful stack trace ends. Stack trace termination ======================= In the unwinder, terminate the stack trace successfully when task_pt_regs(task)->stackframe is reached. For stack traces in the kernel, this will correctly terminate the stack trace at the right place. However, debuggers may terminate the stack trace when FP == 0. In the pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the debugger may print an extra record 0x0 at the end. While this is not pretty, this does not do any harm. This is a small price to pay for having reliable stack trace termination in the kernel. That said, gdb does not show the extra record probably because it uses DWARF and not frame pointers for stack traces. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> --- arch/arm64/kernel/entry.S | 8 +++++--- arch/arm64/kernel/head.S | 29 +++++++++++++++++++++++------ arch/arm64/kernel/process.c | 5 +++++ arch/arm64/kernel/stacktrace.c | 10 +++++----- 4 files changed, 38 insertions(+), 14 deletions(-) diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 6acfc5e6b5e0..e677b9a2b8f8 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -263,16 +263,18 @@ alternative_else_nop_endif stp lr, x21, [sp, #S_LR] /* - * For exceptions from EL0, terminate the callchain here. + * For exceptions from EL0, terminate the callchain here at + * task_pt_regs(current)->stackframe. + * * For exceptions from EL1, create a synthetic frame record so the * interrupted code shows up in the backtrace. */ .if \el == 0 - mov x29, xzr + stp xzr, xzr, [sp, #S_STACKFRAME] .else stp x29, x22, [sp, #S_STACKFRAME] - add x29, sp, #S_STACKFRAME .endif + add x29, sp, #S_STACKFRAME #ifdef CONFIG_ARM64_SW_TTBR0_PAN alternative_if_not ARM64_HAS_PAN diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 840bda1869e9..743c019a42c7 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -393,6 +393,23 @@ SYM_FUNC_START_LOCAL(__create_page_tables) ret x28 SYM_FUNC_END(__create_page_tables) + /* + * The boot task becomes the idle task for the primary CPU. The + * CPU startup task on each secondary CPU becomes the idle task + * for the secondary CPU. + * + * The idle task does not require pt_regs. But create a dummy + * pt_regs so that task_pt_regs(idle_task)->stackframe can be + * set up to be the final frame on the idle task stack just like + * all the other kernel tasks. This helps the unwinder to + * terminate the stack trace at a well-known stack offset. + */ + .macro setup_final_frame + sub sp, sp, #PT_REGS_SIZE + stp xzr, xzr, [sp, #S_STACKFRAME] + add x29, sp, #S_STACKFRAME + .endm + /* * The following fragment of code is executed with the MMU enabled. * @@ -447,9 +464,9 @@ SYM_FUNC_START_LOCAL(__primary_switched) #endif bl switch_to_vhe // Prefer VHE if possible add sp, sp, #16 - mov x29, #0 - mov x30, #0 - b start_kernel + setup_final_frame + bl start_kernel + nop SYM_FUNC_END(__primary_switched) .pushsection ".rodata", "a" @@ -606,14 +623,14 @@ SYM_FUNC_START_LOCAL(__secondary_switched) cbz x2, __secondary_too_slow msr sp_el0, x2 scs_load x2, x3 - mov x29, #0 - mov x30, #0 + setup_final_frame #ifdef CONFIG_ARM64_PTR_AUTH ptrauth_keys_init_cpu x2, x3, x4, x5 #endif - b secondary_start_kernel + bl secondary_start_kernel + nop SYM_FUNC_END(__secondary_switched) SYM_FUNC_START_LOCAL(__secondary_too_slow) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 6e60aa3b5ea9..999711c55274 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -439,6 +439,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, } p->thread.cpu_context.pc = (unsigned long)ret_from_fork; p->thread.cpu_context.sp = (unsigned long)childregs; + /* + * For the benefit of the unwinder, set up childregs->stackframe + * as the final frame for the new task. + */ + p->thread.cpu_context.fp = (unsigned long)childregs->stackframe; ptrace_hw_copy_thread(p); diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index d55bdfb7789c..774d9af2f0b6 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -44,16 +44,16 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) unsigned long fp = frame->fp; struct stack_info info; - /* Terminal record; nothing to unwind */ - if (!fp) + if (!tsk) + tsk = current; + + /* Final frame; nothing to unwind */ + if (fp == (unsigned long)task_pt_regs(tsk)->stackframe) return -ENOENT; if (fp & 0xf) return -EINVAL; - if (!tsk) - tsk = current; - if (!on_accessible_stack(tsk, fp, &info)) return -EINVAL; -- 2.25.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v3 1/1] arm64: Implement stack trace termination record 2021-04-20 18:44 ` [PATCH v3 1/1] " madvenka @ 2021-05-04 15:13 ` Mark Rutland 2021-05-04 18:37 ` Madhavan T. Venkataraman 0 siblings, 1 reply; 4+ messages in thread From: Mark Rutland @ 2021-05-04 15:13 UTC (permalink / raw) To: madvenka, broonie Cc: jpoimboe, jthierry, catalin.marinas, will, jmorris, pasha.tatashin, linux-arm-kernel, live-patching, linux-kernel Hi Madhavan, Sorry for the long radiosilence on this. I finally had some time to take a look, and this is pretty good! We had to make some changes in the arm64 unwinder last week, as Leo Yan reported an existing regression. I've rebased your patch atop that with some additional fixups -- I've noted those below, with a copy of the altered patch at the end of the mail. If you're happy with the result, I'll ask Will and Catalin to queue that come -rc1. I've also pushed that to my arm64/stacktrace-termination branch: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace-termination git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/stacktrace-termination Mark (Brown), I've kept your Reviewed-by, since I don't think any of those changes were against the spirit of that review. If you want that dropped, please shout! The conflicting commit (in the arm64 for-next/core branch) is: 8533d5bfad41e74b ("arm64: stacktrace: restore terminal records") If you want to check Leo's test case, be aware you'll also need commit: 5fd65aeb22b72682i ("tracing: Fix stack trace event size") ... which made it into v5.12-rc6 (while the arm64 for-next/core branch is based on v5.12-rc3). On Tue, Apr 20, 2021 at 01:44:47PM -0500, madvenka@linux.microsoft.com wrote: > From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > index 6acfc5e6b5e0..e677b9a2b8f8 100644 > --- a/arch/arm64/kernel/entry.S > +++ b/arch/arm64/kernel/entry.S > @@ -263,16 +263,18 @@ alternative_else_nop_endif > stp lr, x21, [sp, #S_LR] > > /* > - * For exceptions from EL0, terminate the callchain here. > + * For exceptions from EL0, terminate the callchain here at > + * task_pt_regs(current)->stackframe. > + * > * For exceptions from EL1, create a synthetic frame record so the > * interrupted code shows up in the backtrace. > */ > .if \el == 0 > - mov x29, xzr > + stp xzr, xzr, [sp, #S_STACKFRAME] > .else > stp x29, x22, [sp, #S_STACKFRAME] > - add x29, sp, #S_STACKFRAME > .endif > + add x29, sp, #S_STACKFRAME In 8533d5bfad41e74b we made the same logic change, so now we only need to update the comment. > #ifdef CONFIG_ARM64_SW_TTBR0_PAN > alternative_if_not ARM64_HAS_PAN > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S > index 840bda1869e9..743c019a42c7 100644 > --- a/arch/arm64/kernel/head.S > +++ b/arch/arm64/kernel/head.S > @@ -393,6 +393,23 @@ SYM_FUNC_START_LOCAL(__create_page_tables) > ret x28 > SYM_FUNC_END(__create_page_tables) > > + /* > + * The boot task becomes the idle task for the primary CPU. The > + * CPU startup task on each secondary CPU becomes the idle task > + * for the secondary CPU. > + * > + * The idle task does not require pt_regs. But create a dummy > + * pt_regs so that task_pt_regs(idle_task)->stackframe can be > + * set up to be the final frame on the idle task stack just like > + * all the other kernel tasks. This helps the unwinder to > + * terminate the stack trace at a well-known stack offset. > + */ I'd prefer to simplify this to: /* * Create a final frame record at task_pt_regs(current)->stackframe, so * that the unwinder can identify the final frame record of any task by * its location in the task stack. We reserve the entire pt_regs space * for consistency with user tasks and other kernel threads. */ ... saying `current` rather than `idle_task` makes it clear that this is altering the current task's state, and so long as we note that we do this when creating each task, we don't need to call out the idle tasks specifically. > + .macro setup_final_frame > + sub sp, sp, #PT_REGS_SIZE > + stp xzr, xzr, [sp, #S_STACKFRAME] > + add x29, sp, #S_STACKFRAME > + .endm It's probably worth noting in the commit message that a stacktrace will now include __primary_switched and __secondary_switched. I think that's fine, but we should be explict that this is expected as it is a small behavioural change. > + > /* > * The following fragment of code is executed with the MMU enabled. > * > @@ -447,9 +464,9 @@ SYM_FUNC_START_LOCAL(__primary_switched) > #endif > bl switch_to_vhe // Prefer VHE if possible > add sp, sp, #16 > - mov x29, #0 > - mov x30, #0 > - b start_kernel > + setup_final_frame > + bl start_kernel > + nop I'd prefr to use ASM_BUG() here. That'll match what we do in entry.S, and also catch any unexpected return. > SYM_FUNC_END(__primary_switched) > > .pushsection ".rodata", "a" > @@ -606,14 +623,14 @@ SYM_FUNC_START_LOCAL(__secondary_switched) > cbz x2, __secondary_too_slow > msr sp_el0, x2 > scs_load x2, x3 > - mov x29, #0 > - mov x30, #0 > + setup_final_frame > > #ifdef CONFIG_ARM64_PTR_AUTH > ptrauth_keys_init_cpu x2, x3, x4, x5 > #endif > > - b secondary_start_kernel > + bl secondary_start_kernel > + nop Likewise, I'd prefer ASM_BUG() here. > SYM_FUNC_END(__secondary_switched) > > SYM_FUNC_START_LOCAL(__secondary_too_slow) > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index 6e60aa3b5ea9..999711c55274 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > @@ -439,6 +439,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, > } > p->thread.cpu_context.pc = (unsigned long)ret_from_fork; > p->thread.cpu_context.sp = (unsigned long)childregs; > + /* > + * For the benefit of the unwinder, set up childregs->stackframe > + * as the final frame for the new task. > + */ > + p->thread.cpu_context.fp = (unsigned long)childregs->stackframe; > > ptrace_hw_copy_thread(p); > > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c > index d55bdfb7789c..774d9af2f0b6 100644 > --- a/arch/arm64/kernel/stacktrace.c > +++ b/arch/arm64/kernel/stacktrace.c > @@ -44,16 +44,16 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) > unsigned long fp = frame->fp; > struct stack_info info; > > - /* Terminal record; nothing to unwind */ > - if (!fp) > + if (!tsk) > + tsk = current; > + > + /* Final frame; nothing to unwind */ > + if (fp == (unsigned long)task_pt_regs(tsk)->stackframe) > return -ENOENT; > > if (fp & 0xf) > return -EINVAL; > > - if (!tsk) > - tsk = current; > - > if (!on_accessible_stack(tsk, fp, &info)) > return -EINVAL; This looks good. Commit 8533d5bfad41e74b made us check the unwound frame's FP, but when we identify the frame by location rather than contents we'll need to check the FP prior to unwinding as you have here. ---->8---- From eb795c46ad5d3c49c5401d5716c9674e52b22591 Mon Sep 17 00:00:00 2001 From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> Date: Tue, 20 Apr 2021 13:44:47 -0500 Subject: [PATCH] arm64: Implement stack trace termination record Reliable stacktracing requires that we identify when a stacktrace is terminated early. We can do this by ensuring all tasks have a final frame record at a known location on their task stack, and checking that this is the final frame record in the chain. We'd like to use task_pt_regs(task)->stackframe as the final frame record, as this is already setup upon exception entry from EL0. For kernel tasks we need to consistently reserve the pt_regs and point x29 at this, which we can do with small changes to __primary_switched, __secondary_switched, and copy_process(). Since the final frame record must be at a specific location, we must create the final frame record in __primary_switched and __secondary_switched rather than leaving this to start_kernel and secondary_start_kernel. Thus, __primary_switched and __secondary_switched will now show up in stacktraces for the idle tasks. Since the final frame record is now identified by its location rather than by its contents, we identify it at the start of unwind_frame(), before we read any values from it. External debuggers may terminate the stack trace when FP == 0. In the pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the debugger may print an extra record 0x0 at the end. While this is not pretty, this does not do any harm. This is a small price to pay for having reliable stack trace termination in the kernel. That said, gdb does not show the extra record probably because it uses DWARF and not frame pointers for stack traces. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> [Mark: rebase, use ASM_BUG(), update comments, update commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> --- arch/arm64/kernel/entry.S | 2 +- arch/arm64/kernel/head.S | 25 +++++++++++++++++++------ arch/arm64/kernel/process.c | 5 +++++ arch/arm64/kernel/stacktrace.c | 16 +++++++--------- 4 files changed, 32 insertions(+), 16 deletions(-) diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 806a39635482..a5bb8c9c8ace 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -283,7 +283,7 @@ alternative_else_nop_endif stp lr, x21, [sp, #S_LR] /* - * For exceptions from EL0, create a terminal frame record. + * For exceptions from EL0, create a final frame record. * For exceptions from EL1, create a synthetic frame record so the * interrupted code shows up in the backtrace. */ diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 96873dfa67fd..cc2d45d54838 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -16,6 +16,7 @@ #include <asm/asm_pointer_auth.h> #include <asm/assembler.h> #include <asm/boot.h> +#include <asm/bug.h> #include <asm/ptrace.h> #include <asm/asm-offsets.h> #include <asm/cache.h> @@ -393,6 +394,18 @@ SYM_FUNC_START_LOCAL(__create_page_tables) ret x28 SYM_FUNC_END(__create_page_tables) + /* + * Create a final frame record at task_pt_regs(current)->stackframe, so + * that the unwinder can identify the final frame record of any task by + * its location in the task stack. We reserve the entire pt_regs space + * for consistency with user tasks and kthreads. + */ + .macro setup_final_frame + sub sp, sp, #PT_REGS_SIZE + stp xzr, xzr, [sp, #S_STACKFRAME] + add x29, sp, #S_STACKFRAME + .endm + /* * The following fragment of code is executed with the MMU enabled. * @@ -447,9 +460,9 @@ SYM_FUNC_START_LOCAL(__primary_switched) #endif bl switch_to_vhe // Prefer VHE if possible add sp, sp, #16 - mov x29, #0 - mov x30, #0 - b start_kernel + setup_final_frame + bl start_kernel + ASM_BUG() SYM_FUNC_END(__primary_switched) .pushsection ".rodata", "a" @@ -639,14 +652,14 @@ SYM_FUNC_START_LOCAL(__secondary_switched) cbz x2, __secondary_too_slow msr sp_el0, x2 scs_load x2, x3 - mov x29, #0 - mov x30, #0 + setup_final_frame #ifdef CONFIG_ARM64_PTR_AUTH ptrauth_keys_init_cpu x2, x3, x4, x5 #endif - b secondary_start_kernel + bl secondary_start_kernel + ASM_BUG() SYM_FUNC_END(__secondary_switched) SYM_FUNC_START_LOCAL(__secondary_too_slow) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 0da0fd4ed1d0..31d98fe2e303 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -433,6 +433,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, } p->thread.cpu_context.pc = (unsigned long)ret_from_fork; p->thread.cpu_context.sp = (unsigned long)childregs; + /* + * For the benefit of the unwinder, set up childregs->stackframe + * as the final frame for the new task. + */ + p->thread.cpu_context.fp = (unsigned long)childregs->stackframe; ptrace_hw_copy_thread(p); diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index c22706aa32a1..1e4d44751492 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -68,12 +68,16 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) unsigned long fp = frame->fp; struct stack_info info; - if (fp & 0xf) - return -EINVAL; - if (!tsk) tsk = current; + /* Final frame; nothing to unwind */ + if (fp == (unsigned long)task_pt_regs(tsk)->stackframe) + return -ENOENT; + + if (fp & 0xf) + return -EINVAL; + if (!on_accessible_stack(tsk, fp, &info)) return -EINVAL; @@ -128,12 +132,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) frame->pc = ptrauth_strip_insn_pac(frame->pc); - /* - * This is a terminal record, so we have finished unwinding. - */ - if (!frame->fp && !frame->pc) - return -ENOENT; - return 0; } NOKPROBE_SYMBOL(unwind_frame); -- 2.11.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v3 1/1] arm64: Implement stack trace termination record 2021-05-04 15:13 ` Mark Rutland @ 2021-05-04 18:37 ` Madhavan T. Venkataraman 0 siblings, 0 replies; 4+ messages in thread From: Madhavan T. Venkataraman @ 2021-05-04 18:37 UTC (permalink / raw) To: Mark Rutland, broonie Cc: jpoimboe, jthierry, catalin.marinas, will, jmorris, pasha.tatashin, linux-arm-kernel, live-patching, linux-kernel On 5/4/21 10:13 AM, Mark Rutland wrote: > Hi Madhavan, > > Sorry for the long radiosilence on this. I finally had some time to take > a look, and this is pretty good! > Thanks a lot! Appreciate it! > We had to make some changes in the arm64 unwinder last week, as Leo Yan > reported an existing regression. I've rebased your patch atop that with > some additional fixups -- I've noted those below, with a copy of the > altered patch at the end of the mail. If you're happy with the result, > I'll ask Will and Catalin to queue that come -rc1. > I am totally happy with the result. Please ask Will and Catalin to queue it. Thanks! Madhavan > I've also pushed that to my arm64/stacktrace-termination branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace-termination > git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/stacktrace-termination > > Mark (Brown), I've kept your Reviewed-by, since I don't think any of > those changes were against the spirit of that review. If you want that > dropped, please shout! > > The conflicting commit (in the arm64 for-next/core branch) is: > > 8533d5bfad41e74b ("arm64: stacktrace: restore terminal records") > > If you want to check Leo's test case, be aware you'll also need commit: > > 5fd65aeb22b72682i ("tracing: Fix stack trace event size") > > ... which made it into v5.12-rc6 (while the arm64 for-next/core branch > is based on v5.12-rc3). > > On Tue, Apr 20, 2021 at 01:44:47PM -0500, madvenka@linux.microsoft.com wrote: >> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > >> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S >> index 6acfc5e6b5e0..e677b9a2b8f8 100644 >> --- a/arch/arm64/kernel/entry.S >> +++ b/arch/arm64/kernel/entry.S >> @@ -263,16 +263,18 @@ alternative_else_nop_endif >> stp lr, x21, [sp, #S_LR] >> >> /* >> - * For exceptions from EL0, terminate the callchain here. >> + * For exceptions from EL0, terminate the callchain here at >> + * task_pt_regs(current)->stackframe. >> + * >> * For exceptions from EL1, create a synthetic frame record so the >> * interrupted code shows up in the backtrace. >> */ >> .if \el == 0 >> - mov x29, xzr >> + stp xzr, xzr, [sp, #S_STACKFRAME] >> .else >> stp x29, x22, [sp, #S_STACKFRAME] >> - add x29, sp, #S_STACKFRAME >> .endif >> + add x29, sp, #S_STACKFRAME > > In 8533d5bfad41e74b we made the same logic change, so now we only need > to update the comment. > >> #ifdef CONFIG_ARM64_SW_TTBR0_PAN >> alternative_if_not ARM64_HAS_PAN >> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S >> index 840bda1869e9..743c019a42c7 100644 >> --- a/arch/arm64/kernel/head.S >> +++ b/arch/arm64/kernel/head.S >> @@ -393,6 +393,23 @@ SYM_FUNC_START_LOCAL(__create_page_tables) >> ret x28 >> SYM_FUNC_END(__create_page_tables) >> >> + /* >> + * The boot task becomes the idle task for the primary CPU. The >> + * CPU startup task on each secondary CPU becomes the idle task >> + * for the secondary CPU. >> + * >> + * The idle task does not require pt_regs. But create a dummy >> + * pt_regs so that task_pt_regs(idle_task)->stackframe can be >> + * set up to be the final frame on the idle task stack just like >> + * all the other kernel tasks. This helps the unwinder to >> + * terminate the stack trace at a well-known stack offset. >> + */ > > I'd prefer to simplify this to: > > /* > * Create a final frame record at task_pt_regs(current)->stackframe, so > * that the unwinder can identify the final frame record of any task by > * its location in the task stack. We reserve the entire pt_regs space > * for consistency with user tasks and other kernel threads. > */ > > ... saying `current` rather than `idle_task` makes it clear that this is > altering the current task's state, and so long as we note that we do > this when creating each task, we don't need to call out the idle tasks > specifically. > >> + .macro setup_final_frame >> + sub sp, sp, #PT_REGS_SIZE >> + stp xzr, xzr, [sp, #S_STACKFRAME] >> + add x29, sp, #S_STACKFRAME >> + .endm > > It's probably worth noting in the commit message that a stacktrace will > now include __primary_switched and __secondary_switched. I think that's > fine, but we should be explict that this is expected as it is a small > behavioural change. > >> + >> /* >> * The following fragment of code is executed with the MMU enabled. >> * >> @@ -447,9 +464,9 @@ SYM_FUNC_START_LOCAL(__primary_switched) >> #endif >> bl switch_to_vhe // Prefer VHE if possible >> add sp, sp, #16 >> - mov x29, #0 >> - mov x30, #0 >> - b start_kernel >> + setup_final_frame >> + bl start_kernel >> + nop > > I'd prefr to use ASM_BUG() here. That'll match what we do in entry.S, > and also catch any unexpected return. > >> SYM_FUNC_END(__primary_switched) >> >> .pushsection ".rodata", "a" >> @@ -606,14 +623,14 @@ SYM_FUNC_START_LOCAL(__secondary_switched) >> cbz x2, __secondary_too_slow >> msr sp_el0, x2 >> scs_load x2, x3 >> - mov x29, #0 >> - mov x30, #0 >> + setup_final_frame >> >> #ifdef CONFIG_ARM64_PTR_AUTH >> ptrauth_keys_init_cpu x2, x3, x4, x5 >> #endif >> >> - b secondary_start_kernel >> + bl secondary_start_kernel >> + nop > > Likewise, I'd prefer ASM_BUG() here. > >> SYM_FUNC_END(__secondary_switched) >> >> SYM_FUNC_START_LOCAL(__secondary_too_slow) >> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c >> index 6e60aa3b5ea9..999711c55274 100644 >> --- a/arch/arm64/kernel/process.c >> +++ b/arch/arm64/kernel/process.c >> @@ -439,6 +439,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, >> } >> p->thread.cpu_context.pc = (unsigned long)ret_from_fork; >> p->thread.cpu_context.sp = (unsigned long)childregs; >> + /* >> + * For the benefit of the unwinder, set up childregs->stackframe >> + * as the final frame for the new task. >> + */ >> + p->thread.cpu_context.fp = (unsigned long)childregs->stackframe; >> >> ptrace_hw_copy_thread(p); >> >> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c >> index d55bdfb7789c..774d9af2f0b6 100644 >> --- a/arch/arm64/kernel/stacktrace.c >> +++ b/arch/arm64/kernel/stacktrace.c >> @@ -44,16 +44,16 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) >> unsigned long fp = frame->fp; >> struct stack_info info; >> >> - /* Terminal record; nothing to unwind */ >> - if (!fp) >> + if (!tsk) >> + tsk = current; >> + >> + /* Final frame; nothing to unwind */ >> + if (fp == (unsigned long)task_pt_regs(tsk)->stackframe) >> return -ENOENT; >> >> if (fp & 0xf) >> return -EINVAL; >> >> - if (!tsk) >> - tsk = current; >> - >> if (!on_accessible_stack(tsk, fp, &info)) >> return -EINVAL; > > This looks good. Commit 8533d5bfad41e74b made us check the unwound > frame's FP, but when we identify the frame by location rather than > contents we'll need to check the FP prior to unwinding as you have here. > > ---->8---- >>From eb795c46ad5d3c49c5401d5716c9674e52b22591 Mon Sep 17 00:00:00 2001 > From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > Date: Tue, 20 Apr 2021 13:44:47 -0500 > Subject: [PATCH] arm64: Implement stack trace termination record > > Reliable stacktracing requires that we identify when a stacktrace is > terminated early. We can do this by ensuring all tasks have a final > frame record at a known location on their task stack, and checking > that this is the final frame record in the chain. > > We'd like to use task_pt_regs(task)->stackframe as the final frame > record, as this is already setup upon exception entry from EL0. For > kernel tasks we need to consistently reserve the pt_regs and point x29 > at this, which we can do with small changes to __primary_switched, > __secondary_switched, and copy_process(). > > Since the final frame record must be at a specific location, we must > create the final frame record in __primary_switched and > __secondary_switched rather than leaving this to start_kernel and > secondary_start_kernel. Thus, __primary_switched and > __secondary_switched will now show up in stacktraces for the idle tasks. > > Since the final frame record is now identified by its location rather > than by its contents, we identify it at the start of unwind_frame(), > before we read any values from it. > > External debuggers may terminate the stack trace when FP == 0. In the > pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the > debugger may print an extra record 0x0 at the end. While this is not > pretty, this does not do any harm. This is a small price to pay for > having reliable stack trace termination in the kernel. That said, gdb > does not show the extra record probably because it uses DWARF and not > frame pointers for stack traces. > > Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> > Reviewed-by: Mark Brown <broonie@kernel.org> > [Mark: rebase, use ASM_BUG(), update comments, update commit message] > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > --- > arch/arm64/kernel/entry.S | 2 +- > arch/arm64/kernel/head.S | 25 +++++++++++++++++++------ > arch/arm64/kernel/process.c | 5 +++++ > arch/arm64/kernel/stacktrace.c | 16 +++++++--------- > 4 files changed, 32 insertions(+), 16 deletions(-) > > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > index 806a39635482..a5bb8c9c8ace 100644 > --- a/arch/arm64/kernel/entry.S > +++ b/arch/arm64/kernel/entry.S > @@ -283,7 +283,7 @@ alternative_else_nop_endif > stp lr, x21, [sp, #S_LR] > > /* > - * For exceptions from EL0, create a terminal frame record. > + * For exceptions from EL0, create a final frame record. > * For exceptions from EL1, create a synthetic frame record so the > * interrupted code shows up in the backtrace. > */ > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S > index 96873dfa67fd..cc2d45d54838 100644 > --- a/arch/arm64/kernel/head.S > +++ b/arch/arm64/kernel/head.S > @@ -16,6 +16,7 @@ > #include <asm/asm_pointer_auth.h> > #include <asm/assembler.h> > #include <asm/boot.h> > +#include <asm/bug.h> > #include <asm/ptrace.h> > #include <asm/asm-offsets.h> > #include <asm/cache.h> > @@ -393,6 +394,18 @@ SYM_FUNC_START_LOCAL(__create_page_tables) > ret x28 > SYM_FUNC_END(__create_page_tables) > > + /* > + * Create a final frame record at task_pt_regs(current)->stackframe, so > + * that the unwinder can identify the final frame record of any task by > + * its location in the task stack. We reserve the entire pt_regs space > + * for consistency with user tasks and kthreads. > + */ > + .macro setup_final_frame > + sub sp, sp, #PT_REGS_SIZE > + stp xzr, xzr, [sp, #S_STACKFRAME] > + add x29, sp, #S_STACKFRAME > + .endm > + > /* > * The following fragment of code is executed with the MMU enabled. > * > @@ -447,9 +460,9 @@ SYM_FUNC_START_LOCAL(__primary_switched) > #endif > bl switch_to_vhe // Prefer VHE if possible > add sp, sp, #16 > - mov x29, #0 > - mov x30, #0 > - b start_kernel > + setup_final_frame > + bl start_kernel > + ASM_BUG() > SYM_FUNC_END(__primary_switched) > > .pushsection ".rodata", "a" > @@ -639,14 +652,14 @@ SYM_FUNC_START_LOCAL(__secondary_switched) > cbz x2, __secondary_too_slow > msr sp_el0, x2 > scs_load x2, x3 > - mov x29, #0 > - mov x30, #0 > + setup_final_frame > > #ifdef CONFIG_ARM64_PTR_AUTH > ptrauth_keys_init_cpu x2, x3, x4, x5 > #endif > > - b secondary_start_kernel > + bl secondary_start_kernel > + ASM_BUG() > SYM_FUNC_END(__secondary_switched) > > SYM_FUNC_START_LOCAL(__secondary_too_slow) > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index 0da0fd4ed1d0..31d98fe2e303 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > @@ -433,6 +433,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, > } > p->thread.cpu_context.pc = (unsigned long)ret_from_fork; > p->thread.cpu_context.sp = (unsigned long)childregs; > + /* > + * For the benefit of the unwinder, set up childregs->stackframe > + * as the final frame for the new task. > + */ > + p->thread.cpu_context.fp = (unsigned long)childregs->stackframe; > > ptrace_hw_copy_thread(p); > > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c > index c22706aa32a1..1e4d44751492 100644 > --- a/arch/arm64/kernel/stacktrace.c > +++ b/arch/arm64/kernel/stacktrace.c > @@ -68,12 +68,16 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) > unsigned long fp = frame->fp; > struct stack_info info; > > - if (fp & 0xf) > - return -EINVAL; > - > if (!tsk) > tsk = current; > > + /* Final frame; nothing to unwind */ > + if (fp == (unsigned long)task_pt_regs(tsk)->stackframe) > + return -ENOENT; > + > + if (fp & 0xf) > + return -EINVAL; > + > if (!on_accessible_stack(tsk, fp, &info)) > return -EINVAL; > > @@ -128,12 +132,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) > > frame->pc = ptrauth_strip_insn_pac(frame->pc); > > - /* > - * This is a terminal record, so we have finished unwinding. > - */ > - if (!frame->fp && !frame->pc) > - return -ENOENT; > - > return 0; > } > NOKPROBE_SYMBOL(unwind_frame); > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-05-04 18:38 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <80cac661608c8d3623328b37b9b1696f63f45968> 2021-04-20 18:44 ` [PATCH v3 0/1] arm64: Implement stack trace termination record madvenka 2021-04-20 18:44 ` [PATCH v3 1/1] " madvenka 2021-05-04 15:13 ` Mark Rutland 2021-05-04 18:37 ` Madhavan T. Venkataraman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).