live-patching.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/1] arm64: Implement stack trace termination record
       [not found] <b6144b5b1dc66bf775fe66374bba31af7e5f1d54>
@ 2021-03-24 18:46 ` madvenka
  2021-03-24 18:46   ` [RFC PATCH v1 1/1] " madvenka
  0 siblings, 1 reply; 4+ messages in thread
From: madvenka @ 2021-03-24 18:46 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

The unwinder needs to be able to reliably tell when it has reached the end
of a stack trace. One way to do this is to have the last stack frame at a
fixed offset from the base of the task stack. When the unwinder reaches
that offset, it knows it is done.

All tasks have a pt_regs structure right after the task stack in the stack
page. The pt_regs structure contains a stackframe field. Make this stackframe
field the last frame in the task stack so all stack traces end at a fixed
stack offset.

For kernel tasks, this is simple to understand. For user tasks, there is
some extra detail. User tasks get created via fork() et al. Once they return
from fork, they enter the kernel only on an EL0 exception. In arm64,
system calls are also EL0 exceptions.

The EL0 exception handler uses the task pt_regs mentioned above to save
register state and call different exception functions. All stack traces
from EL0 exception code must end at the pt_regs. So, make pt_regs->stackframe
the last frame in the EL0 exception stack.

To summarize, task_pt_regs(task)->stackframe will always be the stack
termination record.

Sample stack traces
===================

These stack traces were taken using a test driver called callfd from
certain locations.

Primary CPU's idle task
=======================

[    0.022932]   arch_stack_walk+0x0/0xd0
[    0.022944]   callfd_stack+0x30/0x60
[    0.022955]   rest_init+0xd8/0xf8
[    0.022968]   arch_call_rest_init+0x18/0x24
[    0.022984]   start_kernel+0x5b8/0x5f4
[    0.022993]   ret_from_fork+0x0/0x18

Secondary CPU's idle task
=========================

[    0.023043]   arch_stack_walk+0x0/0xd0
[    0.023052]   callfd_stack+0x30/0x60
[    0.023061]   secondary_start_kernel+0x188/0x1e0
[    0.023071]   ret_from_fork+0x0/0x18

Sample kernel thread
====================

[   12.000679]   arch_stack_walk+0x0/0xd0
[   12.007616]   callfd_stack+0x30/0x60
[   12.014347]   kernel_init+0x84/0x148
[   12.021026]   ret_from_fork+0x10/0x18

kernel_clone() system call
==========================

Showing EL0 exception:

[  364.152827]   arch_stack_walk+0x0/0xd0
[  364.152833]   callfd_stack+0x30/0x60
[  364.152839]   kernel_clone+0x57c/0x590
[  364.152846]   __do_sys_clone+0x58/0x88
[  364.152851]   __arm64_sys_clone+0x28/0x38
[  364.152856]   el0_svc_common.constprop.0+0x70/0x1a8
[  364.152863]   do_el0_svc+0x2c/0x98
[  364.152868]   el0_svc+0x2c/0x70
[  364.152873]   el0_sync_handler+0xb0/0xb8
[  364.152879]   el0_sync+0x178/0x180

Timer interrupt
===============

Showing EL1 exception (Interrupt happened on a secondary CPU):

[  364.195456]   arch_stack_walk+0x0/0xd0
[  364.195467]   callfd_stack+0x30/0x60
[  364.195475]   callfd_callback+0x2c/0x38
[  364.195482]   call_timer_fn+0x38/0x180
[  364.195489]   run_timer_softirq+0x43c/0x6b8
[  364.195495]   __do_softirq+0x138/0x37c
[  364.195501]   irq_exit+0xc0/0xe8
[  364.195512]   __handle_domain_irq+0x70/0xc8
[  364.195521]   gic_handle_irq+0xd4/0x2f4
[  364.195527]   el1_irq+0xc0/0x180
[  364.195533]   arch_cpu_idle+0x18/0x40
[  364.195540]   default_idle_call+0x44/0x170
[  364.195548]   do_idle+0x224/0x278
[  364.195567]   cpu_startup_entry+0x2c/0x98
[  364.195573]   secondary_start_kernel+0x198/0x1e0
[  364.195581]   ret_from_fork+0x0/0x18
---
Changelog:

v1
	- Set up task_pt_regs(current)->stackframe as the last frame
	  when a new task is initialized in copy_thread().

	- Create pt_regs for the idle tasks and set up pt_regs->stackframe
	  as the last frame for the idle tasks.

	- Set up task_pt_regs(current)->stackframe as the last frame in
	  the EL0 exception handler so the EL0 exception stack trace ends
	  there.

	- Terminate the stack trace successfully in unwind_frame() when
	  the FP reaches task_pt_regs(current)->stackframe.

	- The stack traces (above) in the kernel will terminate at the
	  correct place. Debuggers may show an extra record 0x0 at the end
	  for pt_regs->stackframe. That said, I did not see that extra frame
	  when I did stack traces using gdb.

Madhavan T. Venkataraman (1):
  arm64: Implement stack trace termination record

 arch/arm64/kernel/entry.S      |  8 +++++---
 arch/arm64/kernel/head.S       | 28 ++++++++++++++++++++++++----
 arch/arm64/kernel/process.c    |  5 +++++
 arch/arm64/kernel/stacktrace.c |  8 ++++----
 4 files changed, 38 insertions(+), 11 deletions(-)


base-commit: 0d02ec6b3136c73c09e7859f0d0e4e2c4c07b49b
-- 
2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH v1 1/1] arm64: Implement stack trace termination record
  2021-03-24 18:46 ` [RFC PATCH v1 0/1] arm64: Implement stack trace termination record madvenka
@ 2021-03-24 18:46   ` madvenka
  2021-03-29 11:27     ` Mark Rutland
  0 siblings, 1 reply; 4+ messages in thread
From: madvenka @ 2021-03-24 18:46 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

The unwinder needs to be able to reliably tell when it has reached the end
of a stack trace. One way to do this is to have the last stack frame at a
fixed offset from the base of the task stack. When the unwinder reaches
that offset, it knows it is done.

Kernel Tasks
============

All tasks except the idle task have a pt_regs structure right after the
task stack. This is called the task pt_regs. The pt_regs structure has a
special stackframe field. Make this stackframe field the last frame in the
task stack. This needs to be done in copy_thread() which initializes a new
task's pt_regs and initial CPU context.

For the idle task, there is no task pt_regs. For our purpose, we need one.
So, create a pt_regs just like other kernel tasks and make
pt_regs->stackframe the last frame in the idle task stack. This needs to be
done at two places:

	- On the primary CPU, the boot task runs. It calls start_kernel()
	  and eventually becomes the idle task for the primary CPU. Just
	  before start_kernel() is called, set up the last frame.

	- On each secondary CPU, a startup task runs that calls
	  secondary_startup_kernel() and eventually becomes the idle task
	  on the secondary CPU. Just before secondary_start_kernel() is
	  called, set up the last frame.

User Tasks
==========

User tasks are initially set up like kernel tasks when they are created.
Then, they return to userland after fork via ret_from_fork(). After that,
they enter the kernel only on an EL0 exception. (In arm64, system calls are
also EL0 exceptions). The EL0 exception handler stores state in the task
pt_regs and calls different functions based on the type of exception. The
stack trace for an EL0 exception must end at the task pt_regs. So, make
task pt_regs->stackframe as the last frame in the EL0 exception stack.

In summary, task pt_regs->stackframe is where a successful stack trace ends.

Stack trace termination
=======================

In the unwinder, terminate the stack trace successfully when
task_pt_regs(task)->stackframe is reached. For stack traces in the kernel,
this will correctly terminate the stack trace at the right place.

However, debuggers terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/kernel/entry.S      |  8 +++++---
 arch/arm64/kernel/head.S       | 28 ++++++++++++++++++++++++----
 arch/arm64/kernel/process.c    |  5 +++++
 arch/arm64/kernel/stacktrace.c |  8 ++++----
 4 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index a31a0a713c85..e2dc2e998934 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -261,16 +261,18 @@ alternative_else_nop_endif
 	stp	lr, x21, [sp, #S_LR]
 
 	/*
-	 * For exceptions from EL0, terminate the callchain here.
+	 * For exceptions from EL0, terminate the callchain here at
+	 * task_pt_regs(current)->stackframe.
+	 *
 	 * For exceptions from EL1, create a synthetic frame record so the
 	 * interrupted code shows up in the backtrace.
 	 */
 	.if \el == 0
-	mov	x29, xzr
+	stp	xzr, xzr, [sp, #S_STACKFRAME]
 	.else
 	stp	x29, x22, [sp, #S_STACKFRAME]
-	add	x29, sp, #S_STACKFRAME
 	.endif
+	add	x29, sp, #S_STACKFRAME
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 alternative_if_not ARM64_HAS_PAN
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 840bda1869e9..b8003fb9cfa5 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -393,6 +393,28 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 	ret	x28
 SYM_FUNC_END(__create_page_tables)
 
+	/*
+	 * The boot task becomes the idle task for the primary CPU. The
+	 * CPU startup task on each secondary CPU becomes the idle task
+	 * for the secondary CPU.
+	 *
+	 * The idle task does not require pt_regs. But create a dummy
+	 * pt_regs so that task_pt_regs(idle_task)->stackframe can be
+	 * set up to be the last frame on the idle task stack just like
+	 * all the other kernel tasks. This helps the unwinder to
+	 * terminate the stack trace at a well-known stack offset.
+	 *
+	 * Also, set up the last return PC to be ret_from_fork() just
+	 * like all the other kernel tasks so that the stack trace of
+	 * all kernel tasks ends with the same function.
+	 */
+	.macro setup_last_frame
+	sub	sp, sp, #PT_REGS_SIZE
+	stp	xzr, xzr, [sp, #S_STACKFRAME]
+	add	x29, sp, #S_STACKFRAME
+	ldr	x30, =ret_from_fork
+	.endm
+
 /*
  * The following fragment of code is executed with the MMU enabled.
  *
@@ -447,8 +469,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #endif
 	bl	switch_to_vhe			// Prefer VHE if possible
 	add	sp, sp, #16
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 	b	start_kernel
 SYM_FUNC_END(__primary_switched)
 
@@ -606,8 +627,7 @@ SYM_FUNC_START_LOCAL(__secondary_switched)
 	cbz	x2, __secondary_too_slow
 	msr	sp_el0, x2
 	scs_load x2, x3
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 
 #ifdef CONFIG_ARM64_PTR_AUTH
 	ptrauth_keys_init_cpu x2, x3, x4, x5
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 325c83b1a24d..7ffa689e8b60 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -437,6 +437,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	}
 	p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
 	p->thread.cpu_context.sp = (unsigned long)childregs;
+	/*
+	 * For the benefit of the unwinder, set up childregs->stackframe
+	 * as the last frame for the new task.
+	 */
+	p->thread.cpu_context.fp = (unsigned long)childregs->stackframe;
 
 	ptrace_hw_copy_thread(p);
 
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index ad20981dfda4..a35b760a1892 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -44,16 +44,16 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 	unsigned long fp = frame->fp;
 	struct stack_info info;
 
+	if (!tsk)
+		tsk = current;
+
 	/* Terminal record; nothing to unwind */
-	if (!fp)
+	if (fp == (unsigned long) task_pt_regs(tsk)->stackframe)
 		return -ENOENT;
 
 	if (fp & 0xf)
 		return -EINVAL;
 
-	if (!tsk)
-		tsk = current;
-
 	if (!on_accessible_stack(tsk, fp, &info))
 		return -EINVAL;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH v1 1/1] arm64: Implement stack trace termination record
  2021-03-24 18:46   ` [RFC PATCH v1 1/1] " madvenka
@ 2021-03-29 11:27     ` Mark Rutland
  2021-03-29 16:46       ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Rutland @ 2021-03-29 11:27 UTC (permalink / raw)
  To: madvenka
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

Hi Madhavan,

Overall this looks pretty good; I have a few comments below.

On Wed, Mar 24, 2021 at 01:46:07PM -0500, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> 
> The unwinder needs to be able to reliably tell when it has reached the end
> of a stack trace. One way to do this is to have the last stack frame at a
> fixed offset from the base of the task stack. When the unwinder reaches
> that offset, it knows it is done.

To make the relationship with reliable stacktrace clearer, how about:

| Reliable stacktracing requires that we identify when a stacktrace is
| terminated early. We can do this by ensuring all tasks have a final
| frame record at a known location on their task stack, and checking
| that this is the final frame record in the chain.

Currently we use inconsistent terminology to refer to the final frame
record, and it would be good if we could be consistent. The existing
code uses "terminal record" (which I appreciate isn't clear), and this
series largely uses "last frame". It'd be nice to make that consistent.

For clarity could we please use "final" rather than "last"? That avoids
the ambiguity of "last" also meaning "previous".

e.g. below this'd mean having `setup_final_frame`.

> 
> Kernel Tasks
> ============
> 
> All tasks except the idle task have a pt_regs structure right after the
> task stack. This is called the task pt_regs. The pt_regs structure has a
> special stackframe field. Make this stackframe field the last frame in the
> task stack. This needs to be done in copy_thread() which initializes a new
> task's pt_regs and initial CPU context.
> 
> For the idle task, there is no task pt_regs. For our purpose, we need one.
> So, create a pt_regs just like other kernel tasks and make
> pt_regs->stackframe the last frame in the idle task stack. This needs to be
> done at two places:
> 
> 	- On the primary CPU, the boot task runs. It calls start_kernel()
> 	  and eventually becomes the idle task for the primary CPU. Just
> 	  before start_kernel() is called, set up the last frame.
> 
> 	- On each secondary CPU, a startup task runs that calls
> 	  secondary_startup_kernel() and eventually becomes the idle task
> 	  on the secondary CPU. Just before secondary_start_kernel() is
> 	  called, set up the last frame.
> 
> User Tasks
> ==========
> 
> User tasks are initially set up like kernel tasks when they are created.
> Then, they return to userland after fork via ret_from_fork(). After that,
> they enter the kernel only on an EL0 exception. (In arm64, system calls are
> also EL0 exceptions). The EL0 exception handler stores state in the task
> pt_regs and calls different functions based on the type of exception. The
> stack trace for an EL0 exception must end at the task pt_regs. So, make
> task pt_regs->stackframe as the last frame in the EL0 exception stack.
> 
> In summary, task pt_regs->stackframe is where a successful stack trace ends.
> 
> Stack trace termination
> =======================
> 
> In the unwinder, terminate the stack trace successfully when
> task_pt_regs(task)->stackframe is reached. For stack traces in the kernel,
> this will correctly terminate the stack trace at the right place.
> 
> However, debuggers terminate the stack trace when FP == 0. In the
> pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
> debugger may print an extra record 0x0 at the end. While this is not
> pretty, this does not do any harm. This is a small price to pay for
> having reliable stack trace termination in the kernel.
> 
> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> ---
>  arch/arm64/kernel/entry.S      |  8 +++++---
>  arch/arm64/kernel/head.S       | 28 ++++++++++++++++++++++++----
>  arch/arm64/kernel/process.c    |  5 +++++
>  arch/arm64/kernel/stacktrace.c |  8 ++++----
>  4 files changed, 38 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index a31a0a713c85..e2dc2e998934 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -261,16 +261,18 @@ alternative_else_nop_endif
>  	stp	lr, x21, [sp, #S_LR]
>  
>  	/*
> -	 * For exceptions from EL0, terminate the callchain here.
> +	 * For exceptions from EL0, terminate the callchain here at
> +	 * task_pt_regs(current)->stackframe.
> +	 *
>  	 * For exceptions from EL1, create a synthetic frame record so the
>  	 * interrupted code shows up in the backtrace.
>  	 */
>  	.if \el == 0
> -	mov	x29, xzr
> +	stp	xzr, xzr, [sp, #S_STACKFRAME]
>  	.else
>  	stp	x29, x22, [sp, #S_STACKFRAME]
> -	add	x29, sp, #S_STACKFRAME
>  	.endif
> +	add	x29, sp, #S_STACKFRAME
>  
>  #ifdef CONFIG_ARM64_SW_TTBR0_PAN
>  alternative_if_not ARM64_HAS_PAN
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 840bda1869e9..b8003fb9cfa5 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -393,6 +393,28 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
>  	ret	x28
>  SYM_FUNC_END(__create_page_tables)
>  
> +	/*
> +	 * The boot task becomes the idle task for the primary CPU. The
> +	 * CPU startup task on each secondary CPU becomes the idle task
> +	 * for the secondary CPU.
> +	 *
> +	 * The idle task does not require pt_regs. But create a dummy
> +	 * pt_regs so that task_pt_regs(idle_task)->stackframe can be
> +	 * set up to be the last frame on the idle task stack just like
> +	 * all the other kernel tasks. This helps the unwinder to
> +	 * terminate the stack trace at a well-known stack offset.
> +	 *
> +	 * Also, set up the last return PC to be ret_from_fork() just
> +	 * like all the other kernel tasks so that the stack trace of
> +	 * all kernel tasks ends with the same function.
> +	 */
> +	.macro setup_last_frame
> +	sub	sp, sp, #PT_REGS_SIZE
> +	stp	xzr, xzr, [sp, #S_STACKFRAME]
> +	add	x29, sp, #S_STACKFRAME
> +	ldr	x30, =ret_from_fork
> +	.endm

Why do you need to put `ret_from_fork` into the chain here?

I'm not keen on adding synthetic entries to the trace; is there a
problem if we don't override x30 here?

Thanks,
Mark.

> +
>  /*
>   * The following fragment of code is executed with the MMU enabled.
>   *
> @@ -447,8 +469,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
>  #endif
>  	bl	switch_to_vhe			// Prefer VHE if possible
>  	add	sp, sp, #16
> -	mov	x29, #0
> -	mov	x30, #0
> +	setup_last_frame
>  	b	start_kernel
>  SYM_FUNC_END(__primary_switched)
>  
> @@ -606,8 +627,7 @@ SYM_FUNC_START_LOCAL(__secondary_switched)
>  	cbz	x2, __secondary_too_slow
>  	msr	sp_el0, x2
>  	scs_load x2, x3
> -	mov	x29, #0
> -	mov	x30, #0
> +	setup_last_frame
>  
>  #ifdef CONFIG_ARM64_PTR_AUTH
>  	ptrauth_keys_init_cpu x2, x3, x4, x5
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 325c83b1a24d..7ffa689e8b60 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -437,6 +437,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
>  	}
>  	p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
>  	p->thread.cpu_context.sp = (unsigned long)childregs;
> +	/*
> +	 * For the benefit of the unwinder, set up childregs->stackframe
> +	 * as the last frame for the new task.
> +	 */
> +	p->thread.cpu_context.fp = (unsigned long)childregs->stackframe;
>  
>  	ptrace_hw_copy_thread(p);
>  
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index ad20981dfda4..a35b760a1892 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -44,16 +44,16 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
>  	unsigned long fp = frame->fp;
>  	struct stack_info info;
>  
> +	if (!tsk)
> +		tsk = current;
> +
>  	/* Terminal record; nothing to unwind */
> -	if (!fp)
> +	if (fp == (unsigned long) task_pt_regs(tsk)->stackframe)
>  		return -ENOENT;
>  
>  	if (fp & 0xf)
>  		return -EINVAL;
>  
> -	if (!tsk)
> -		tsk = current;
> -
>  	if (!on_accessible_stack(tsk, fp, &info))
>  		return -EINVAL;
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH v1 1/1] arm64: Implement stack trace termination record
  2021-03-29 11:27     ` Mark Rutland
@ 2021-03-29 16:46       ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 4+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-29 16:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/29/21 6:27 AM, Mark Rutland wrote:
> Hi Madhavan,
> 
> Overall this looks pretty good; I have a few comments below.
> 
> On Wed, Mar 24, 2021 at 01:46:07PM -0500, madvenka@linux.microsoft.com wrote:
>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>
>> The unwinder needs to be able to reliably tell when it has reached the end
>> of a stack trace. One way to do this is to have the last stack frame at a
>> fixed offset from the base of the task stack. When the unwinder reaches
>> that offset, it knows it is done.
> 
> To make the relationship with reliable stacktrace clearer, how about:
> 
> | Reliable stacktracing requires that we identify when a stacktrace is
> | terminated early. We can do this by ensuring all tasks have a final
> | frame record at a known location on their task stack, and checking
> | that this is the final frame record in the chain.
> 
> Currently we use inconsistent terminology to refer to the final frame
> record, and it would be good if we could be consistent. The existing
> code uses "terminal record" (which I appreciate isn't clear), and this
> series largely uses "last frame". It'd be nice to make that consistent.
> 
> For clarity could we please use "final" rather than "last"? That avoids
> the ambiguity of "last" also meaning "previous".
> 
> e.g. below this'd mean having `setup_final_frame`.

OK. I will make the above changes.

> 
>>
>> Kernel Tasks
>> ============
>>
>> All tasks except the idle task have a pt_regs structure right after the
>> task stack. This is called the task pt_regs. The pt_regs structure has a
>> special stackframe field. Make this stackframe field the last frame in the
>> task stack. This needs to be done in copy_thread() which initializes a new
>> task's pt_regs and initial CPU context.
>>
>> For the idle task, there is no task pt_regs. For our purpose, we need one.
>> So, create a pt_regs just like other kernel tasks and make
>> pt_regs->stackframe the last frame in the idle task stack. This needs to be
>> done at two places:
>>
>> 	- On the primary CPU, the boot task runs. It calls start_kernel()
>> 	  and eventually becomes the idle task for the primary CPU. Just
>> 	  before start_kernel() is called, set up the last frame.
>>
>> 	- On each secondary CPU, a startup task runs that calls
>> 	  secondary_startup_kernel() and eventually becomes the idle task
>> 	  on the secondary CPU. Just before secondary_start_kernel() is
>> 	  called, set up the last frame.
>>
>> User Tasks
>> ==========
>>
>> User tasks are initially set up like kernel tasks when they are created.
>> Then, they return to userland after fork via ret_from_fork(). After that,
>> they enter the kernel only on an EL0 exception. (In arm64, system calls are
>> also EL0 exceptions). The EL0 exception handler stores state in the task
>> pt_regs and calls different functions based on the type of exception. The
>> stack trace for an EL0 exception must end at the task pt_regs. So, make
>> task pt_regs->stackframe as the last frame in the EL0 exception stack.
>>
>> In summary, task pt_regs->stackframe is where a successful stack trace ends.
>>
>> Stack trace termination
>> =======================
>>
>> In the unwinder, terminate the stack trace successfully when
>> task_pt_regs(task)->stackframe is reached. For stack traces in the kernel,
>> this will correctly terminate the stack trace at the right place.
>>
>> However, debuggers terminate the stack trace when FP == 0. In the
>> pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
>> debugger may print an extra record 0x0 at the end. While this is not
>> pretty, this does not do any harm. This is a small price to pay for
>> having reliable stack trace termination in the kernel.
>>
>> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
>> ---
>>  arch/arm64/kernel/entry.S      |  8 +++++---
>>  arch/arm64/kernel/head.S       | 28 ++++++++++++++++++++++++----
>>  arch/arm64/kernel/process.c    |  5 +++++
>>  arch/arm64/kernel/stacktrace.c |  8 ++++----
>>  4 files changed, 38 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>> index a31a0a713c85..e2dc2e998934 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -261,16 +261,18 @@ alternative_else_nop_endif
>>  	stp	lr, x21, [sp, #S_LR]
>>  
>>  	/*
>> -	 * For exceptions from EL0, terminate the callchain here.
>> +	 * For exceptions from EL0, terminate the callchain here at
>> +	 * task_pt_regs(current)->stackframe.
>> +	 *
>>  	 * For exceptions from EL1, create a synthetic frame record so the
>>  	 * interrupted code shows up in the backtrace.
>>  	 */
>>  	.if \el == 0
>> -	mov	x29, xzr
>> +	stp	xzr, xzr, [sp, #S_STACKFRAME]
>>  	.else
>>  	stp	x29, x22, [sp, #S_STACKFRAME]
>> -	add	x29, sp, #S_STACKFRAME
>>  	.endif
>> +	add	x29, sp, #S_STACKFRAME
>>  
>>  #ifdef CONFIG_ARM64_SW_TTBR0_PAN
>>  alternative_if_not ARM64_HAS_PAN
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 840bda1869e9..b8003fb9cfa5 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -393,6 +393,28 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
>>  	ret	x28
>>  SYM_FUNC_END(__create_page_tables)
>>  
>> +	/*
>> +	 * The boot task becomes the idle task for the primary CPU. The
>> +	 * CPU startup task on each secondary CPU becomes the idle task
>> +	 * for the secondary CPU.
>> +	 *
>> +	 * The idle task does not require pt_regs. But create a dummy
>> +	 * pt_regs so that task_pt_regs(idle_task)->stackframe can be
>> +	 * set up to be the last frame on the idle task stack just like
>> +	 * all the other kernel tasks. This helps the unwinder to
>> +	 * terminate the stack trace at a well-known stack offset.
>> +	 *
>> +	 * Also, set up the last return PC to be ret_from_fork() just
>> +	 * like all the other kernel tasks so that the stack trace of
>> +	 * all kernel tasks ends with the same function.
>> +	 */
>> +	.macro setup_last_frame
>> +	sub	sp, sp, #PT_REGS_SIZE
>> +	stp	xzr, xzr, [sp, #S_STACKFRAME]
>> +	add	x29, sp, #S_STACKFRAME
>> +	ldr	x30, =ret_from_fork
>> +	.endm
> 
> Why do you need to put `ret_from_fork` into the chain here?
> 
> I'm not keen on adding synthetic entries to the trace; is there a
> problem if we don't override x30 here?
> 

When someone looks at different stack traces, it might be helpful to know
that all kernel thread stack traces end in ret_from_fork().

That said, I have no problem with x30 reflecting the actual caller.

Thanks,

Madhavan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-29 16:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <b6144b5b1dc66bf775fe66374bba31af7e5f1d54>
2021-03-24 18:46 ` [RFC PATCH v1 0/1] arm64: Implement stack trace termination record madvenka
2021-03-24 18:46   ` [RFC PATCH v1 1/1] " madvenka
2021-03-29 11:27     ` Mark Rutland
2021-03-29 16:46       ` Madhavan T. Venkataraman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).