linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/8] arm64: Implement reliable stack trace
       [not found] <5997dfe8d261a3a543667b83c902883c1e4bd270>
@ 2021-03-15 16:57 ` madvenka
  2021-03-15 16:57   ` [RFC PATCH v2 1/8] arm64: Implement stack trace termination record madvenka
                     ` (8 more replies)
  0 siblings, 9 replies; 55+ messages in thread
From: madvenka @ 2021-03-15 16:57 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

I have made an attempt to implement reliable stack trace for arm64 so
it can be used for livepatch. Below is the list of changes. I have
documented my understanding of the issues and solutions below as well
as in the patch descriptions and the code. Please let me know if my
understanding is incorrect or incomplete anywhere.

Stack termination record
========================

The unwinder needs to be able to reliably tell when it has reached the end
of a stack trace. One way to do this is to have the last stack frame at a
fixed offset from the base of the task stack. When the unwinder reaches
that offset, it knows it is done.

All tasks have a pt_regs structure right after the task stack in the stack
page. The pt_regs structure contains a stackframe field. Make this stackframe
field the last frame in the task stack so all stack traces end at a fixed
stack offset.

For kernel tasks, this is simple to understand. For user tasks, there is
some extra detail. User tasks get created via fork() et al. Once they return
from fork, they enter the kernel only on an EL0 exception. In arm64,
system calls are also EL0 exceptions.

The EL0 exception handler uses the task pt_regs mentioned above to save
register state and call different exception functions. All stack traces
from EL0 exception code must end at the pt_regs. So, make pt_regs->stackframe
the last frame in the EL0 exception stack.

To summarize, task_pt_regs(task)->stackframe will always be the stack
termination record.

Implement frame types
=====================

Apart from the task pt_regs, pt_regs is also created on the stack for two
other cases:

EL1 exceptions:
	When the kernel encounters an exception (more on this below),
	it is called an EL1 exception. A pt_regs is created on the
	stack at that point to save register state. In addition,
	pt_regs->stackframe is set up for the interrupted kernel function
	so that the function shows up in the EL1 exception stack trace.

Ftrace:
	When CONFIG_DYNAMIC_FTRACE_WITH_REGS is on, the ftrace infrastructure
	is called at the beginning of a traced function, ftrace creates a
	pt_regs on the stack at that point to save register state. In addition,
	it sets up pt_regs->stackframe for the traced function so that the
	traced function shows up in the stack trace taken from anywhere in
	the ftrace code after that point. When the ftrace code returns to the
	traced function, the pt_regs is removed from the stack.

To summarize, pt_regs->stackframe is used as a marker frame in stack traces.
To enable the unwinder to detect these frames, tag each pt_regs->stackframe
with a type. To record the type, use the unused2 field in struct pt_regs and
rename it to frame_type. The types are:

TASK_FRAME
	Terminating frame for a normal stack trace.
EL0_FRAME
	Terminating frame for an EL0 exception.
EL1_FRAME
	EL1 exception frame.
FTRACE_FRAME
	FTRACE frame.

These frame types will be used by the unwinder later to validate frames.

Proper termination of the stack trace
=====================================

In the unwinder, check the following for properly terminating the stack
trace:

	- Check every frame to see if it is task_pt_regs(stack)->stackframe.
	  If it is, terminate the stack trace successfully.

	- For additional validation, make sure that the frame_type is either
	  TASK_FRAME or EL0_FRAME.

Detect EL1 frame
================

The kernel runs at Exception Level 1. If an exception happens while
executing in the kernel, it is an EL1 exception. This includes interrupts
which are asynchronous exceptions in arm64.

EL1 exceptions can happen on any instruction including instructions in
the frame pointer prolog or epilog. Depending on where exactly they happen,
they could render the stack trace unreliable.

If an EL1 exception frame is found on the stack, mark the stack trace as
unreliable.

Now, the EL1 exception frame is not at any well-known offset on the stack.
It can be anywhere on the stack. In order to properly detect an EL1
exception frame, some checks must be done. See the patch description and
the code for more detail.

There are two special cases to be aware of:

	- At the end of an interrupt, the code checks if the current task
	  must be preempted for any reason. If so, it calls the preemption
	  code which takes the task off the CPU. A stack trace taken on
	  the task after the preemption will show the EL1 frame and will be
	  considered unreliable. Preemption can happen practically at any
	  point in code including the frame pointer prolog and epilog.

	- Breakpoints encountered in kernel code are also EL1 exceptions.
	  The probing infrastructure uses breakpoints for executing
	  probe code. While in the probe code, the stack trace will show
	  an EL1 frame and will be considered unreliable. There is one
	  special case, viz, kretprobe which is discussed below.

Detect ftrace frame
===================

The ftrace infrastructure called for a traced function creates two frames:

	- One for the traced function

	- One for the caller of the traced function

That gives a reliable stack trace while executing in the ftrace infrastructure
code. When ftrace returns to the traced function, the frames are popped and
everything is back to normal.

However, in cases like live patch, execution is redirected to a different
function when ftrace returns. A stack trace taken while still in the ftrace
infrastructure code will not show the target function. The target function
is the real function that we want to track.

If an ftrace frame is detected on the stack, mark the stack trace as
unreliable.

NOTE: For Function Graph Tracing where the return address of a function is
modified, the unwinder already has code to address that. It retrieves the
original address during unwinding.

Return address check
====================

Check the return PC of every stack frame to make sure that it is a valid
kernel text address (and not some generated code, for example).

Check for kretprobe
===================

For functions with a kretprobe set up, probe code executes on entry
to the function and replaces the return address in the stack frame with a
kretprobe trampoline. Whenever the function returns, control is
transferred to the trampoline. The trampoline eventually returns to the
original return address.

A stack trace taken while executing in the function (or in functions that
get called from the function) will not show the original return address.
Similarly, a stack trace taken while executing in the trampoline itself
(and functions that get called from the trampoline) will not show the
original return address. This means that the caller of the probed function
will not show.

So, if the trampoline is detected in the stack trace, mark the stack trace
as unreliable.

FYI, each task contains a task->kretprobe_instances list that can
theoretically be consulted to find the orginal return address. But I am
not entirely sure how to safely traverse that list for stack traces
not on the current process.

I have taken the easy way out and marked the stack trace as unreliable.

Optprobes
=========

Optprobes may be implemented in the future for arm64. For optprobes, the
same approach to detect them as kretprobes will work.

Frame checks
============

I have a number of checks to make sure that the unwinder detects each frame
type correctly. So, I have not added a return address check. The return
address checks that could be added are:

TASK_FRAME
	Check for ret_from_fork().

EL0_FRAME
	Check for one of these:
		el0_sync
		el0_sync_compat
		el0_irq
		el0_irq_compat
		el0_error
		el0_error_compat

EL1_FRAME
	Check for one of these:
		el1_sync
		el1_irq
		el1_error

Currently these functions are local functions. Would need to make them
global so the unwinder can reference them. Also, Mark Rutland indicated
that these might need some reorg.

So, I am currently not doing these address checks at the frames. But if
the reviewers feel that I need to do them, I will add these checks.

Implement arch_stack_walk_reliable()
====================================

Now that the unwinder can mark the stack trace as reliable (or not),
implement arch_stack_walk_reliable() based on that.
---

Changelog:

v1
	- Introduced an implementation of reliable stack trace for arm64.

v2
	- Split the changes into logical individual patches.

	- I have inlined the code that was in a function called
	  update_frame() in unwind_frame() itself.

	- I have added a lot of documentation to record my
	  understanding of the issues and my solutions for them so
	  reviewers can comment on them.

	- In v1, all task stack traces ended in the same global frame.
	  I have changed that to a per-task termination record in the
	  task pt_regs->stackframe. This way, the stack trace always
	  ends at a particular stack offset.

	- The stack termination frame also contains FP == 0 and PC == 0
	  so that debuggers will continue to work when they take stack
	  traces.

	- I have removed the encoding of the frame pointer by setting the
	  LSB as it will mess up debuggers when they do stack traces.

	- I have implemented a frame type field in pt_regs. Each type of
	  frame is tagged with a specific pattern that can be checked by
	  the unwinder to validate the frame.

	- I have added the following reliability checks in the unwinder:

		- Check for proper stack trace termintaion

		- Check for EL1 exception frames

		- Check for ftrace exception frames

		- Check if the PC in every frame is a proper kernel text
		  address

		- Check for the kretprobed functions

	- Based on the above unwinder enhancements, I have implemented
	  arch_stack_walk_reliable().

Madhavan T. Venkataraman (8):
  arm64: Implement stack trace termination record
  arm64: Implement frame types
  arm64: Terminate the stack trace at TASK_FRAME and EL0_FRAME
  arm64: Detect an EL1 exception frame and mark a stack trace unreliable
  arm64: Detect an FTRACE frame and mark a stack trace unreliable
  arm64: Check the return PC of every stack frame
  arm64: Detect kretprobed functions in stack trace
  arm64: Implement arch_stack_walk_reliable()

 arch/arm64/Kconfig                  |   1 +
 arch/arm64/include/asm/ptrace.h     |  15 ++-
 arch/arm64/include/asm/stacktrace.h |   2 +
 arch/arm64/kernel/asm-offsets.c     |   1 +
 arch/arm64/kernel/entry-ftrace.S    |   2 +
 arch/arm64/kernel/entry.S           |  12 +-
 arch/arm64/kernel/head.S            |  30 ++++-
 arch/arm64/kernel/process.c         |   6 +
 arch/arm64/kernel/stacktrace.c      | 196 +++++++++++++++++++++++++++-
 9 files changed, 250 insertions(+), 15 deletions(-)


base-commit: a38fd8748464831584a19438cbb3082b5a2dab15
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v2 1/8] arm64: Implement stack trace termination record
  2021-03-15 16:57 ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace madvenka
@ 2021-03-15 16:57   ` madvenka
  2021-03-18 15:09     ` Mark Brown
  2021-03-15 16:57   ` [RFC PATCH v2 2/8] arm64: Implement frame types madvenka
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: madvenka @ 2021-03-15 16:57 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

The unwinder needs to be able to reliably tell when it has reached the end
of a stack trace. One way to do this is to have the last stack frame at a
fixed offset from the base of the task stack. When the unwinder reaches
that offset, it knows it is done.

Kernel Tasks
============

All tasks except the idle task have a pt_regs structure right after the
task stack. This is called the task pt_regs. The pt_regs structure has a
special stackframe field. Make this stackframe field the last frame in the
task stack. This needs to be done in copy_thread() which initializes a new
task's pt_regs and initial CPU context.

For the idle task, there is no task pt_regs. For our purpose, we need one.
So, create a pt_regs just like other kernel tasks and make
pt_regs->stackframe the last frame in the idle task stack. This needs to be
done at two places:

	- On the primary CPU, the boot task runs. It calls start_kernel()
	  and eventually becomes the idle task for the primary CPU. Just
	  before start_kernel() is called, set up the last frame.

	- On each secondary CPU, a startup task runs that calls
	  secondary_startup_kernel() and eventually becomes the idle task
	  on the secondary CPU. Just before secondary_start_kernel() is
	  called, set up the last frame.

User Tasks
==========

User tasks are initially set up like kernel tasks when they are created.
Then, they return to userland after fork via ret_from_fork(). After that,
they enter the kernel only on an EL0 exception. (In arm64, system calls are
also EL0 exceptions). The EL0 exception handler stores state in the task
pt_regs and calls different functions based on the type of exception. The
stack trace for an EL0 exception must end at the task pt_regs. So, make
task pt_regs->stackframe as the last frame in the EL0 exception stack.

In summary, task pt_regs->stackframe is where a successful stack trace ends.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/kernel/entry.S   |  8 +++++---
 arch/arm64/kernel/head.S    | 28 ++++++++++++++++++++++++----
 arch/arm64/kernel/process.c |  5 +++++
 3 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index a31a0a713c85..e2dc2e998934 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -261,16 +261,18 @@ alternative_else_nop_endif
 	stp	lr, x21, [sp, #S_LR]
 
 	/*
-	 * For exceptions from EL0, terminate the callchain here.
+	 * For exceptions from EL0, terminate the callchain here at
+	 * task_pt_regs(current)->stackframe.
+	 *
 	 * For exceptions from EL1, create a synthetic frame record so the
 	 * interrupted code shows up in the backtrace.
 	 */
 	.if \el == 0
-	mov	x29, xzr
+	stp	xzr, xzr, [sp, #S_STACKFRAME]
 	.else
 	stp	x29, x22, [sp, #S_STACKFRAME]
-	add	x29, sp, #S_STACKFRAME
 	.endif
+	add	x29, sp, #S_STACKFRAME
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 alternative_if_not ARM64_HAS_PAN
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 66b0e0b66e31..2769b20934d4 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -393,6 +393,28 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 	ret	x28
 SYM_FUNC_END(__create_page_tables)
 
+	/*
+	 * The boot task becomes the idle task for the primary CPU. The
+	 * CPU startup task on each secondary CPU becomes the idle task
+	 * for the secondary CPU.
+	 *
+	 * The idle task does not require pt_regs. But create a dummy
+	 * pt_regs so that task_pt_regs(idle_task)->stackframe can be
+	 * set up to be the last frame on the idle task stack just like
+	 * all the other kernel tasks. This helps the unwinder to
+	 * terminate the stack trace at a well-known stack offset.
+	 *
+	 * Also, set up the last return PC to be ret_from_fork() just
+	 * like all the other kernel tasks so that the stack trace of
+	 * all kernel tasks ends with the same function.
+	 */
+	.macro setup_last_frame
+	sub	sp, sp, #PT_REGS_SIZE
+	stp	xzr, xzr, [sp, #S_STACKFRAME]
+	add	x29, sp, #S_STACKFRAME
+	ldr	x30, =ret_from_fork
+	.endm
+
 /*
  * The following fragment of code is executed with the MMU enabled.
  *
@@ -447,8 +469,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #endif
 	bl	switch_to_vhe			// Prefer VHE if possible
 	add	sp, sp, #16
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 	b	start_kernel
 SYM_FUNC_END(__primary_switched)
 
@@ -606,8 +627,7 @@ SYM_FUNC_START_LOCAL(__secondary_switched)
 	cbz	x2, __secondary_too_slow
 	msr	sp_el0, x2
 	scs_load x2, x3
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 
 #ifdef CONFIG_ARM64_PTR_AUTH
 	ptrauth_keys_init_cpu x2, x3, x4, x5
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 325c83b1a24d..7ffa689e8b60 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -437,6 +437,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	}
 	p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
 	p->thread.cpu_context.sp = (unsigned long)childregs;
+	/*
+	 * For the benefit of the unwinder, set up childregs->stackframe
+	 * as the last frame for the new task.
+	 */
+	p->thread.cpu_context.fp = (unsigned long)childregs->stackframe;
 
 	ptrace_hw_copy_thread(p);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v2 2/8] arm64: Implement frame types
  2021-03-15 16:57 ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace madvenka
  2021-03-15 16:57   ` [RFC PATCH v2 1/8] arm64: Implement stack trace termination record madvenka
@ 2021-03-15 16:57   ` madvenka
  2021-03-18 17:40     ` Mark Brown
  2021-03-23 10:34     ` Mark Rutland
  2021-03-15 16:57   ` [RFC PATCH v2 3/8] arm64: Terminate the stack trace at TASK_FRAME and EL0_FRAME madvenka
                     ` (6 subsequent siblings)
  8 siblings, 2 replies; 55+ messages in thread
From: madvenka @ 2021-03-15 16:57 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Apart from the task pt_regs, pt_regs is also created on the stack for other
other cases:

	- EL1 exception. A pt_regs is created on the stack to save register
	  state. In addition, pt_regs->stackframe is set up for the
	  interrupted kernel function so that the function shows up in the
	  EL1 exception stack trace.

	- When a traced function calls the ftrace infrastructure at the
	  beginning of the function, ftrace creates a pt_regs on the stack
	  at that point to save register state. In addition, it sets up
	  pt_regs->stackframe for the traced function so that the traced
	  function shows up in the stack trace taken from anywhere in the
	  ftrace code after that point. When the ftrace code returns to the
	  traced function, the pt_regs is removed from the stack.

To summarize, pt_regs->stackframe is used (or will be used) as a marker
frame in stack traces. To enable the unwinder to detect these frames, tag
each pt_regs->stackframe with a type. To record the type, use the unused2
field in struct pt_regs and rename it to frame_type. The types are:

TASK_FRAME
	Terminating frame for a normal stack trace.
EL0_FRAME
	Terminating frame for an EL0 exception.
EL1_FRAME
	EL1 exception frame.
FTRACE_FRAME
	FTRACE frame.

These frame types will be used by the unwinder later to validate frames.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/ptrace.h | 15 +++++++++++++--
 arch/arm64/kernel/asm-offsets.c |  1 +
 arch/arm64/kernel/entry.S       |  4 ++++
 arch/arm64/kernel/head.S        |  2 ++
 arch/arm64/kernel/process.c     |  1 +
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index e58bca832dff..a75211ce009a 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -117,6 +117,17 @@
  */
 #define NO_SYSCALL (-1)
 
+/*
+ * pt_regs->stackframe is a marker frame that is used in different
+ * situations. These are the different types of frames. Use patterns
+ * for the frame types instead of (0, 1, 2, 3, ..) so that it is less
+ * likely to find them on the stack.
+ */
+#define TASK_FRAME	0xDEADBEE0	/* Task stack termination frame */
+#define EL0_FRAME	0xDEADBEE1	/* EL0 exception frame */
+#define EL1_FRAME	0xDEADBEE2	/* EL1 exception frame */
+#define FTRACE_FRAME	0xDEADBEE3	/* FTrace frame */
+
 #ifndef __ASSEMBLY__
 #include <linux/bug.h>
 #include <linux/types.h>
@@ -187,11 +198,11 @@ struct pt_regs {
 	};
 	u64 orig_x0;
 #ifdef __AARCH64EB__
-	u32 unused2;
+	u32 frame_type;
 	s32 syscallno;
 #else
 	s32 syscallno;
-	u32 unused2;
+	u32 frame_type;
 #endif
 	u64 sdei_ttbr1;
 	/* Only valid when ARM64_HAS_IRQ_PRIO_MASKING is enabled. */
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index a36e2fc330d4..43f97dbc7dfc 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -75,6 +75,7 @@ int main(void)
   DEFINE(S_SDEI_TTBR1,		offsetof(struct pt_regs, sdei_ttbr1));
   DEFINE(S_PMR_SAVE,		offsetof(struct pt_regs, pmr_save));
   DEFINE(S_STACKFRAME,		offsetof(struct pt_regs, stackframe));
+  DEFINE(S_FRAME_TYPE,		offsetof(struct pt_regs, frame_type));
   DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
   BLANK();
 #ifdef CONFIG_COMPAT
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index e2dc2e998934..ecc3507d9cdd 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -269,8 +269,12 @@ alternative_else_nop_endif
 	 */
 	.if \el == 0
 	stp	xzr, xzr, [sp, #S_STACKFRAME]
+	ldr	w17, =EL0_FRAME
+	str	w17, [sp, #S_FRAME_TYPE]
 	.else
 	stp	x29, x22, [sp, #S_STACKFRAME]
+	ldr	w17, =EL1_FRAME
+	str	w17, [sp, #S_FRAME_TYPE]
 	.endif
 	add	x29, sp, #S_STACKFRAME
 
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 2769b20934d4..d2ee78f8f97f 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -410,6 +410,8 @@ SYM_FUNC_END(__create_page_tables)
 	 */
 	.macro setup_last_frame
 	sub	sp, sp, #PT_REGS_SIZE
+	ldr	w17, =TASK_FRAME
+	str	w17, [sp, #S_FRAME_TYPE]
 	stp	xzr, xzr, [sp, #S_STACKFRAME]
 	add	x29, sp, #S_STACKFRAME
 	ldr	x30, =ret_from_fork
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 7ffa689e8b60..5c152fd60503 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -442,6 +442,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	 * as the last frame for the new task.
 	 */
 	p->thread.cpu_context.fp = (unsigned long)childregs->stackframe;
+	childregs->frame_type = TASK_FRAME;
 
 	ptrace_hw_copy_thread(p);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v2 3/8] arm64: Terminate the stack trace at TASK_FRAME and EL0_FRAME
  2021-03-15 16:57 ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace madvenka
  2021-03-15 16:57   ` [RFC PATCH v2 1/8] arm64: Implement stack trace termination record madvenka
  2021-03-15 16:57   ` [RFC PATCH v2 2/8] arm64: Implement frame types madvenka
@ 2021-03-15 16:57   ` madvenka
  2021-03-18 18:26     ` Mark Brown
  2021-03-15 16:57   ` [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable madvenka
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: madvenka @ 2021-03-15 16:57 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Implement the following checks in the unwinder to detect the terminating
frame reliably:

	- The frame must end in task_pt_regs(task)->stackframe.

	- The frame type must be either TASK_FRAME or EL0_FRAME.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/kernel/stacktrace.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index ad20981dfda4..504cd161339d 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -43,16 +43,22 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 {
 	unsigned long fp = frame->fp;
 	struct stack_info info;
+	struct pt_regs *regs;
 
-	/* Terminal record; nothing to unwind */
-	if (!fp)
-		return -ENOENT;
+	if (!tsk)
+		tsk = current;
+	regs = task_pt_regs(tsk);
 
-	if (fp & 0xf)
+	/* Terminal record, nothing to unwind */
+	if (fp == (unsigned long) regs->stackframe) {
+		if (regs->frame_type == TASK_FRAME ||
+		    regs->frame_type == EL0_FRAME)
+			return -ENOENT;
 		return -EINVAL;
+	}
 
-	if (!tsk)
-		tsk = current;
+	if (!fp || fp & 0xf)
+		return -EINVAL;
 
 	if (!on_accessible_stack(tsk, fp, &info))
 		return -EINVAL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable
  2021-03-15 16:57 ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace madvenka
                     ` (2 preceding siblings ...)
  2021-03-15 16:57   ` [RFC PATCH v2 3/8] arm64: Terminate the stack trace at TASK_FRAME and EL0_FRAME madvenka
@ 2021-03-15 16:57   ` madvenka
  2021-03-23 10:42     ` Mark Rutland
  2021-03-15 16:57   ` [RFC PATCH v2 5/8] arm64: Detect an FTRACE " madvenka
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: madvenka @ 2021-03-15 16:57 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

EL1 exceptions can happen on any instruction including instructions in
the frame pointer prolog or epilog. Depending on where exactly they happen,
they could render the stack trace unreliable.

If an EL1 exception frame is found on the stack, mark the stack trace as
unreliable.

Now, the EL1 exception frame is not at any well-known offset on the stack.
It can be anywhere on the stack. In order to properly detect an EL1
exception frame the following checks must be done:

	- The frame type must be EL1_FRAME.

	- When the register state is saved in the EL1 pt_regs, the frame
	  pointer x29 is saved in pt_regs->regs[29] and the return PC
	  is saved in pt_regs->pc. These must match with the current
	  frame.

Interrupts encountered in kernel code are also EL1 exceptions. At the end
of an interrupt, the interrupt handler checks if the current task must be
preempted for any reason. If so, it calls the preemption code which takes
the task off the CPU. A stack trace taken on the task after the preemption
will show the EL1 frame and will be considered unreliable. This is correct
behavior as preemption can happen practically at any point in code
including the frame pointer prolog and epilog.

Breakpoints encountered in kernel code are also EL1 exceptions. The probing
infrastructure uses breakpoints for executing probe code. While in the probe
code, the stack trace will show an EL1 frame and will be considered
unreliable. This is also correct behavior.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/stacktrace.h |  2 +
 arch/arm64/kernel/stacktrace.c      | 57 +++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index eb29b1fe8255..684f65808394 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -59,6 +59,7 @@ struct stackframe {
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	int graph;
 #endif
+	bool reliable;
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
@@ -169,6 +170,7 @@ static inline void start_backtrace(struct stackframe *frame,
 	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
 	frame->prev_fp = 0;
 	frame->prev_type = STACK_TYPE_UNKNOWN;
+	frame->reliable = true;
 }
 
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 504cd161339d..6ae103326f7b 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,6 +18,58 @@
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+static void check_if_reliable(unsigned long fp, struct stackframe *frame,
+			      struct stack_info *info)
+{
+	struct pt_regs *regs;
+	unsigned long regs_start, regs_end;
+
+	/*
+	 * If the stack trace has already been marked unreliable, just
+	 * return.
+	 */
+	if (!frame->reliable)
+		return;
+
+	/*
+	 * Assume that this is an intermediate marker frame inside a pt_regs
+	 * structure created on the stack and get the pt_regs pointer. Other
+	 * checks will be done below to make sure that this is a marker
+	 * frame.
+	 */
+	regs_start = fp - offsetof(struct pt_regs, stackframe);
+	if (regs_start < info->low)
+		return;
+	regs_end = regs_start + sizeof(*regs);
+	if (regs_end > info->high)
+		return;
+	regs = (struct pt_regs *) regs_start;
+
+	/*
+	 * When an EL1 exception happens, a pt_regs structure is created
+	 * on the stack and the register state is recorded. Part of the
+	 * state is the FP and PC at the time of the exception.
+	 *
+	 * In addition, the FP and PC are also stored in pt_regs->stackframe
+	 * and pt_regs->stackframe is chained with other frames on the stack.
+	 * This is so that the interrupted function shows up in the stack
+	 * trace.
+	 *
+	 * The exception could have happened during the frame pointer
+	 * prolog or epilog. This could result in a missing frame in
+	 * the stack trace so that the caller of the interrupted
+	 * function does not show up in the stack trace.
+	 *
+	 * So, mark the stack trace as unreliable if an EL1 frame is
+	 * detected.
+	 */
+	if (regs->frame_type == EL1_FRAME && regs->pc == frame->pc &&
+	    regs->regs[29] == frame->fp) {
+		frame->reliable = false;
+		return;
+	}
+}
+
 /*
  * AArch64 PCS assigns the frame pointer to x29.
  *
@@ -114,6 +166,11 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 
 	frame->pc = ptrauth_strip_insn_pac(frame->pc);
 
+	/*
+	 * Check for features that render the stack trace unreliable.
+	 */
+	check_if_reliable(fp, frame, &info);
+
 	return 0;
 }
 NOKPROBE_SYMBOL(unwind_frame);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-15 16:57 ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace madvenka
                     ` (3 preceding siblings ...)
  2021-03-15 16:57   ` [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable madvenka
@ 2021-03-15 16:57   ` madvenka
  2021-03-23 10:51     ` Mark Rutland
  2021-03-15 16:57   ` [RFC PATCH v2 6/8] arm64: Check the return PC of every stack frame madvenka
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: madvenka @ 2021-03-15 16:57 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

When CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled and tracing is activated
for a function, the ftrace infrastructure is called for the function at
the very beginning. Ftrace creates two frames:

	- One for the traced function

	- One for the caller of the traced function

That gives a reliable stack trace while executing in the ftrace
infrastructure code. When ftrace returns to the traced function, the frames
are popped and everything is back to normal.

However, in cases like live patch, execution is redirected to a different
function when ftrace returns. A stack trace taken while still in the ftrace
infrastructure code will not show the target function. The target function
is the real function that we want to track.

So, if an FTRACE frame is detected on the stack, just mark the stack trace
as unreliable.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/kernel/entry-ftrace.S |  2 ++
 arch/arm64/kernel/stacktrace.c   | 33 ++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index b3e4f9a088b1..1ec8c5180fc0 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -74,6 +74,8 @@
 	/* Create our frame record within pt_regs. */
 	stp	x29, x30, [sp, #S_STACKFRAME]
 	add	x29, sp, #S_STACKFRAME
+	ldr	w17, =FTRACE_FRAME
+	str	w17, [sp, #S_FRAME_TYPE]
 	.endm
 
 SYM_CODE_START(ftrace_regs_caller)
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 6ae103326f7b..594806a0c225 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -23,6 +23,7 @@ static void check_if_reliable(unsigned long fp, struct stackframe *frame,
 {
 	struct pt_regs *regs;
 	unsigned long regs_start, regs_end;
+	unsigned long caller_fp;
 
 	/*
 	 * If the stack trace has already been marked unreliable, just
@@ -68,6 +69,38 @@ static void check_if_reliable(unsigned long fp, struct stackframe *frame,
 		frame->reliable = false;
 		return;
 	}
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+	/*
+	 * When tracing is active for a function, the ftrace code is called
+	 * from the function even before the frame pointer prolog and
+	 * epilog. ftrace creates a pt_regs structure on the stack to save
+	 * register state.
+	 *
+	 * In addition, ftrace sets up two stack frames and chains them
+	 * with other frames on the stack. One frame is pt_regs->stackframe
+	 * that is for the traced function. The other frame is set up right
+	 * after the pt_regs structure and it is for the caller of the
+	 * traced function. This is done to ensure a proper stack trace.
+	 *
+	 * If the ftrace code returns to the traced function, then all is
+	 * fine. But if it transfers control to a different function (like
+	 * in livepatch), then a stack walk performed while still in the
+	 * ftrace code will not find the target function.
+	 *
+	 * So, mark the stack trace as unreliable if an ftrace frame is
+	 * detected.
+	 */
+	if (regs->frame_type == FTRACE_FRAME && frame->fp == regs_end &&
+	    frame->fp < info->high) {
+		/* Check the traced function's caller's frame. */
+		caller_fp = READ_ONCE_NOCHECK(*(unsigned long *)(frame->fp));
+		if (caller_fp == regs->regs[29]) {
+			frame->reliable = false;
+			return;
+		}
+	}
+#endif
 }
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v2 6/8] arm64: Check the return PC of every stack frame
  2021-03-15 16:57 ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace madvenka
                     ` (4 preceding siblings ...)
  2021-03-15 16:57   ` [RFC PATCH v2 5/8] arm64: Detect an FTRACE " madvenka
@ 2021-03-15 16:57   ` madvenka
  2021-03-15 16:57   ` [RFC PATCH v2 7/8] arm64: Detect kretprobed functions in stack trace madvenka
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 55+ messages in thread
From: madvenka @ 2021-03-15 16:57 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

If a function encountered in a stack trace is not a valid kernel text
address, the stack trace is considered unreliable. Mark the stack trace
as not reliable.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/kernel/stacktrace.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 594806a0c225..358aae3906d7 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -101,6 +101,16 @@ static void check_if_reliable(unsigned long fp, struct stackframe *frame,
 		}
 	}
 #endif
+
+	/*
+	 * A NULL or invalid return address probably means there's some
+	 * generated code which __kernel_text_address() doesn't know about.
+	 * Mark the stack trace as not reliable.
+	 */
+	if (!__kernel_text_address(frame->pc)) {
+		frame->reliable = false;
+		return;
+	}
 }
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v2 7/8] arm64: Detect kretprobed functions in stack trace
  2021-03-15 16:57 ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace madvenka
                     ` (5 preceding siblings ...)
  2021-03-15 16:57   ` [RFC PATCH v2 6/8] arm64: Check the return PC of every stack frame madvenka
@ 2021-03-15 16:57   ` madvenka
  2021-03-15 16:58   ` [RFC PATCH v2 8/8] arm64: Implement arch_stack_walk_reliable() madvenka
  2021-03-15 19:01   ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace Madhavan T. Venkataraman
  8 siblings, 0 replies; 55+ messages in thread
From: madvenka @ 2021-03-15 16:57 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

When a kretprobe is active for a function, the function's return address
in its stack frame is modified to point to the kretprobe trampoline. When
the function returns, the frame is popped and control is transferred
to the trampoline. The trampoline eventually returns to the original return
address.

If a stack walk is done within the function (or any functions that get
called from there), the stack trace will only show the trampoline and the
not the original caller. Detect this and mark the stack trace as unreliable.

Also, if the trampoline and the functions it calls do a stack trace,
that stack trace will also have the same problem. Detect this as well.

This is done by looking up the symbol table entry for the trampoline
and checking if the return PC in a frame falls anywhere in the
trampoline function.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/kernel/stacktrace.c | 43 ++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 358aae3906d7..752b77f11c61 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,6 +18,26 @@
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+#ifdef CONFIG_KRETPROBES
+static bool kretprobe_detected(struct stackframe *frame)
+{
+	static char kretprobe_name[KSYM_NAME_LEN];
+	static unsigned long kretprobe_pc, kretprobe_end_pc;
+	unsigned long pc, offset, size;
+
+	if (!kretprobe_pc) {
+		pc = (unsigned long) kretprobe_trampoline;
+		if (!kallsyms_lookup(pc, &size, &offset, NULL, kretprobe_name))
+			return false;
+
+		kretprobe_pc = pc - offset;
+		kretprobe_end_pc = kretprobe_pc + size;
+	}
+
+	return frame->pc >= kretprobe_pc && frame->pc < kretprobe_end_pc;
+}
+#endif
+
 static void check_if_reliable(unsigned long fp, struct stackframe *frame,
 			      struct stack_info *info)
 {
@@ -111,6 +131,29 @@ static void check_if_reliable(unsigned long fp, struct stackframe *frame,
 		frame->reliable = false;
 		return;
 	}
+
+#ifdef CONFIG_KRETPROBES
+	/*
+	 * The return address of a function that has an active kretprobe
+	 * is modified in the stack frame to point to a trampoline. So,
+	 * the original return address is not available on the stack.
+	 *
+	 * A stack trace taken while executing the function (and its
+	 * descendants) will not show the original caller. So, mark the
+	 * stack trace as unreliable if the trampoline shows up in the
+	 * stack trace. (Obtaining the original return address from
+	 * task->kretprobe_instances seems problematic and not worth the
+	 * effort).
+	 *
+	 * The stack trace taken while inside the trampoline and functions
+	 * called by the trampoline have the same problem as above. This
+	 * is also covered by kretprobe_detected() using a range check.
+	 */
+	if (kretprobe_detected(frame)) {
+		frame->reliable = false;
+		return;
+	}
+#endif
 }
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v2 8/8] arm64: Implement arch_stack_walk_reliable()
  2021-03-15 16:57 ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace madvenka
                     ` (6 preceding siblings ...)
  2021-03-15 16:57   ` [RFC PATCH v2 7/8] arm64: Detect kretprobed functions in stack trace madvenka
@ 2021-03-15 16:58   ` madvenka
  2021-03-15 19:01   ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace Madhavan T. Venkataraman
  8 siblings, 0 replies; 55+ messages in thread
From: madvenka @ 2021-03-15 16:58 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

unwind_frame() already sets the reliable flag in the stack frame during
a stack walk to indicate whether the stack trace is reliable or not.

Implement arch_stack_walk_reliable() like arch_stack_walk() but abort
the stack walk as soon as the reliable flag is set to false for any
reason.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/Kconfig             |  1 +
 arch/arm64/kernel/stacktrace.c | 35 ++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1f212b47a48a..954f60c35b26 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -167,6 +167,7 @@ config ARM64
 		if $(cc-option,-fpatchable-function-entry=2)
 	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
 		if DYNAMIC_FTRACE_WITH_REGS
+	select HAVE_RELIABLE_STACKTRACE
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select HAVE_FAST_GUP
 	select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 752b77f11c61..5d15c111f3aa 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -361,4 +361,39 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 	walk_stackframe(task, &frame, consume_entry, cookie);
 }
 
+/*
+ * Walk the stack like arch_stack_walk() but stop the walk as soon as
+ * some unreliability is detected in the stack.
+ */
+int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
+			      void *cookie, struct task_struct *task)
+{
+	struct stackframe frame;
+	int ret = 0;
+
+	if (task == current) {
+		start_backtrace(&frame,
+				(unsigned long)__builtin_frame_address(0),
+				(unsigned long)arch_stack_walk_reliable);
+	} else {
+		/*
+		 * The task must not be running anywhere for the duration of
+		 * arch_stack_walk_reliable(). The caller must guarantee
+		 * this.
+		 */
+		start_backtrace(&frame, thread_saved_fp(task),
+				thread_saved_pc(task));
+	}
+
+	while (!ret) {
+		if (!frame.reliable)
+			return -EINVAL;
+		if (!consume_entry(cookie, frame.pc))
+			return -EINVAL;
+		ret = unwind_frame(task, &frame);
+	}
+
+	return ret == -ENOENT ? 0 : -EINVAL;
+}
+
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 0/8] arm64: Implement reliable stack trace
  2021-03-15 16:57 ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace madvenka
                     ` (7 preceding siblings ...)
  2021-03-15 16:58   ` [RFC PATCH v2 8/8] arm64: Implement arch_stack_walk_reliable() madvenka
@ 2021-03-15 19:01   ` Madhavan T. Venkataraman
  8 siblings, 0 replies; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-15 19:01 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/15/21 11:57 AM, madvenka@linux.microsoft.com wrote:
> Proper termination of the stack trace
> =====================================
> 
> In the unwinder, check the following for properly terminating the stack
> trace:
> 
> 	- Check every frame to see if it is task_pt_regs(stack)->stackframe.
> 	  If it is, terminate the stack trace successfully.
> 

There is a typo in the above sentence. task_pt_regs(stack)->stackframe
should be task_pt_regs(task)->stackframe.

Sorry about that.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 1/8] arm64: Implement stack trace termination record
  2021-03-15 16:57   ` [RFC PATCH v2 1/8] arm64: Implement stack trace termination record madvenka
@ 2021-03-18 15:09     ` Mark Brown
  2021-03-18 20:26       ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Brown @ 2021-03-18 15:09 UTC (permalink / raw)
  To: madvenka
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1183 bytes --]

On Mon, Mar 15, 2021 at 11:57:53AM -0500, madvenka@linux.microsoft.com wrote:

> In summary, task pt_regs->stackframe is where a successful stack trace ends.

>         .if \el == 0
> -       mov     x29, xzr
> +       stp     xzr, xzr, [sp, #S_STACKFRAME]
>         .else
>         stp     x29, x22, [sp, #S_STACKFRAME]
> -       add     x29, sp, #S_STACKFRAME
>         .endif
> +       add     x29, sp, #S_STACKFRAME

For both user and kernel threads this patch (at least by itself) results
in an additional record being reported in stack traces with a NULL
function pointer since it keeps the existing record where it is and adds
this new fixed record below it.  This is addressed for the kernel later
in the series, by "arm64: Terminate the stack trace at TASK_FRAME and
EL0_FRAME", but will still be visible to other unwinders such as
debuggers.  I'm not sure that this *matters* but it might and should at
least be called out more explicitly.

If we are going to add the extra record there would probably be less
potential for confusion if we pointed it at some sensibly named dummy
function so anything or anyone that does see it on the stack doesn't get
confused by a NULL.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 2/8] arm64: Implement frame types
  2021-03-15 16:57   ` [RFC PATCH v2 2/8] arm64: Implement frame types madvenka
@ 2021-03-18 17:40     ` Mark Brown
  2021-03-18 22:22       ` Madhavan T. Venkataraman
  2021-03-23 10:34     ` Mark Rutland
  1 sibling, 1 reply; 55+ messages in thread
From: Mark Brown @ 2021-03-18 17:40 UTC (permalink / raw)
  To: madvenka
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2083 bytes --]

On Mon, Mar 15, 2021 at 11:57:54AM -0500, madvenka@linux.microsoft.com wrote:

> To summarize, pt_regs->stackframe is used (or will be used) as a marker
> frame in stack traces. To enable the unwinder to detect these frames, tag
> each pt_regs->stackframe with a type. To record the type, use the unused2
> field in struct pt_regs and rename it to frame_type. The types are:

Unless I'm misreading what's going on here this is more trying to set a
type for the stack as a whole than for a specific stack frame.  I'm also
finding this a bit confusing as the unwinder already tracks things it
calls frame types and it handles types that aren't covered here like
SDEI.  At the very least there's a naming issue here.

Taking a step back though do we want to be tracking this via pt_regs?
It's reliant on us robustly finding the correct pt_regs and on having
the things that make the stack unreliable explicitly go in and set the
appropriate type.  That seems like it will be error prone, I'd been
expecting to do something more like using sections to filter code for
unreliable features based on the addresses of the functions we find on
the stack or similar.  This could still go wrong of course but there's
fewer moving pieces, and especially fewer moving pieces specific to
reliable stack trace.

I'm wary of tracking data that only ever gets used for the reliable
stack trace path given that it's going to be fairly infrequently used
and hence tested, especially things that only crop up in cases that are
hard to provoke reliably.  If there's a way to detect things that
doesn't use special data that seems safer.

> EL1_FRAME
> 	EL1 exception frame.

We do trap into EL2 as well, the patch will track EL2 frames as EL1
frames.  Even if we can treat them the same the naming ought to be
clear.

> FTRACE_FRAME
>         FTRACE frame.

This is implemented later in the series.  If using this approach I'd
suggest pulling the change in entry-ftrace.S that sets this into this
patch, it's easier than adding a note about this being added later and
should help with any bisect issues.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 3/8] arm64: Terminate the stack trace at TASK_FRAME and EL0_FRAME
  2021-03-15 16:57   ` [RFC PATCH v2 3/8] arm64: Terminate the stack trace at TASK_FRAME and EL0_FRAME madvenka
@ 2021-03-18 18:26     ` Mark Brown
  2021-03-18 20:29       ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Brown @ 2021-03-18 18:26 UTC (permalink / raw)
  To: madvenka
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 873 bytes --]

On Mon, Mar 15, 2021 at 11:57:55AM -0500, madvenka@linux.microsoft.com wrote:

> +	/* Terminal record, nothing to unwind */
> +	if (fp == (unsigned long) regs->stackframe) {
> +		if (regs->frame_type == TASK_FRAME ||
> +		    regs->frame_type == EL0_FRAME)
> +			return -ENOENT;
>  		return -EINVAL;
> +	}

This is conflating the reliable stacktrace checks (which your series
will later flag up with frame->reliable) with verifying that we found
the bottom of the stack by looking for this terminal stack frame record.
For the purposes of determining if the unwinder got to the bottom of the
stack we don't care what stack type we're looking at, we just care if it
managed to walk to this defined final record.  

At the minute nothing except reliable stack trace has any intention of
checking the specific return code but it's clearer to be consistent.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 1/8] arm64: Implement stack trace termination record
  2021-03-18 15:09     ` Mark Brown
@ 2021-03-18 20:26       ` Madhavan T. Venkataraman
  2021-03-19 12:30         ` Mark Brown
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-18 20:26 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/18/21 10:09 AM, Mark Brown wrote:
> On Mon, Mar 15, 2021 at 11:57:53AM -0500, madvenka@linux.microsoft.com wrote:
> 
>> In summary, task pt_regs->stackframe is where a successful stack trace ends.
> 
>>         .if \el == 0
>> -       mov     x29, xzr
>> +       stp     xzr, xzr, [sp, #S_STACKFRAME]
>>         .else
>>         stp     x29, x22, [sp, #S_STACKFRAME]
>> -       add     x29, sp, #S_STACKFRAME
>>         .endif
>> +       add     x29, sp, #S_STACKFRAME
> 
> For both user and kernel threads this patch (at least by itself) results
> in an additional record being reported in stack traces with a NULL
> function pointer since it keeps the existing record where it is and adds
> this new fixed record below it.  This is addressed for the kernel later
> in the series, by "arm64: Terminate the stack trace at TASK_FRAME and
> EL0_FRAME", but will still be visible to other unwinders such as
> debuggers.  I'm not sure that this *matters* but it might and should at
> least be called out more explicitly.
> 
> If we are going to add the extra record there would probably be less
> potential for confusion if we pointed it at some sensibly named dummy
> function so anything or anyone that does see it on the stack doesn't get
> confused by a NULL.
> 

I agree. I will think about this some more. If no other solution presents
itself, I will add the dummy function.

Madhavan

> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 3/8] arm64: Terminate the stack trace at TASK_FRAME and EL0_FRAME
  2021-03-18 18:26     ` Mark Brown
@ 2021-03-18 20:29       ` Madhavan T. Venkataraman
  2021-03-23 10:36         ` Mark Rutland
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-18 20:29 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/18/21 1:26 PM, Mark Brown wrote:
> On Mon, Mar 15, 2021 at 11:57:55AM -0500, madvenka@linux.microsoft.com wrote:
> 
>> +	/* Terminal record, nothing to unwind */
>> +	if (fp == (unsigned long) regs->stackframe) {
>> +		if (regs->frame_type == TASK_FRAME ||
>> +		    regs->frame_type == EL0_FRAME)
>> +			return -ENOENT;
>>  		return -EINVAL;
>> +	}
> 
> This is conflating the reliable stacktrace checks (which your series
> will later flag up with frame->reliable) with verifying that we found
> the bottom of the stack by looking for this terminal stack frame record.
> For the purposes of determining if the unwinder got to the bottom of the
> stack we don't care what stack type we're looking at, we just care if it
> managed to walk to this defined final record.  
> 
> At the minute nothing except reliable stack trace has any intention of
> checking the specific return code but it's clearer to be consistent.
> 

So, you are saying that the type check is redundant. OK. I will remove it
and just return -ENOENT on reaching the final record.

Madhavan


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 2/8] arm64: Implement frame types
  2021-03-18 17:40     ` Mark Brown
@ 2021-03-18 22:22       ` Madhavan T. Venkataraman
  2021-03-19 13:22         ` Mark Brown
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-18 22:22 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/18/21 12:40 PM, Mark Brown wrote:
> On Mon, Mar 15, 2021 at 11:57:54AM -0500, madvenka@linux.microsoft.com wrote:
> 
>> To summarize, pt_regs->stackframe is used (or will be used) as a marker
>> frame in stack traces. To enable the unwinder to detect these frames, tag
>> each pt_regs->stackframe with a type. To record the type, use the unused2
>> field in struct pt_regs and rename it to frame_type. The types are:
> 
> Unless I'm misreading what's going on here this is more trying to set a
> type for the stack as a whole than for a specific stack frame.  I'm also
> finding this a bit confusing as the unwinder already tracks things it
> calls frame types and it handles types that aren't covered here like
> SDEI.  At the very least there's a naming issue here.
> 

When the unwinder gets to EL1 pt_regs->stackframe, it needs to be sure that
it is indeed a frame inside an EL1 pt_regs structure. It performs the
following checks:

	FP == pt_regs->regs[29]
	PC == pt_regs->pc
	type == EL1_FRAME

to confirm that the frame is EL1 pt_regs->stackframe.

Similarly, for EL0, the type is EL0_FRAME.

Both these frames are on the task stack. So, it is not a stack type.

> Taking a step back though do we want to be tracking this via pt_regs?
> It's reliant on us robustly finding the correct pt_regs and on having
> the things that make the stack unreliable explicitly go in and set the
> appropriate type.  That seems like it will be error prone, I'd been
> expecting to do something more like using sections to filter code for
> unreliable features based on the addresses of the functions we find on
> the stack or similar.  This could still go wrong of course but there's
> fewer moving pieces, and especially fewer moving pieces specific to
> reliable stack trace.
> 

In that case, I suggest doing both. That is, check the type as well
as specific functions. For instance, in the EL1 pt_regs, in addition
to the above checks, check the PC against el1_sync(), el1_irq() and
el1_error(). I have suggested this in the cover letter.

If this is OK with you, we could do that. We want to make really sure that
nothing goes wrong with detecting the exception frame.

> I'm wary of tracking data that only ever gets used for the reliable
> stack trace path given that it's going to be fairly infrequently used
> and hence tested, especially things that only crop up in cases that are
> hard to provoke reliably.  If there's a way to detect things that
> doesn't use special data that seems safer.
> 

If you dislike the frame type, I could remove it and just do the
following checks:

	FP == pt_regs->regs[29]
	PC == pt_regs->pc
	and the address check against el1_*() functions

and similar changes for EL0 as well.

I still think that the frame type check makes it more robust.

>> EL1_FRAME
>> 	EL1 exception frame.
> 
> We do trap into EL2 as well, the patch will track EL2 frames as EL1
> frames.  Even if we can treat them the same the naming ought to be
> clear.
> 

Are you referring to ARMv8.1 VHE extension where the kernel can run
at EL2? Could you elaborate? I thought that EL2 was basically for
Hypervisors.

Thanks.

>> FTRACE_FRAME
>>         FTRACE frame.
> 
> This is implemented later in the series.  If using this approach I'd
> suggest pulling the change in entry-ftrace.S that sets this into this
> patch, it's easier than adding a note about this being added later and
> should help with any bisect issues.
> 

OK. Good point.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 1/8] arm64: Implement stack trace termination record
  2021-03-18 20:26       ` Madhavan T. Venkataraman
@ 2021-03-19 12:30         ` Mark Brown
  2021-03-19 14:29           ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Brown @ 2021-03-19 12:30 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1025 bytes --]

On Thu, Mar 18, 2021 at 03:26:13PM -0500, Madhavan T. Venkataraman wrote:
> On 3/18/21 10:09 AM, Mark Brown wrote:

> > If we are going to add the extra record there would probably be less
> > potential for confusion if we pointed it at some sensibly named dummy
> > function so anything or anyone that does see it on the stack doesn't get
> > confused by a NULL.

> I agree. I will think about this some more. If no other solution presents
> itself, I will add the dummy function.

After discussing this with Mark Rutland offlist he convinced me that so
long as we ensure the kernel doesn't print the NULL record we're
probably OK here, the effort setting the function pointer up correctly
in all circumstances (especially when we're not in the normal memory
map) is probably not worth it for the limited impact it's likely to have
to see the NULL pointer (probably mainly a person working with some
external debugger).  It should be noted in the changelog though, and/or
merged in with the relevant change to the unwinder.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 2/8] arm64: Implement frame types
  2021-03-18 22:22       ` Madhavan T. Venkataraman
@ 2021-03-19 13:22         ` Mark Brown
  2021-03-19 14:40           ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Brown @ 2021-03-19 13:22 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3609 bytes --]

On Thu, Mar 18, 2021 at 05:22:49PM -0500, Madhavan T. Venkataraman wrote:
> On 3/18/21 12:40 PM, Mark Brown wrote:

> > Unless I'm misreading what's going on here this is more trying to set a
> > type for the stack as a whole than for a specific stack frame.  I'm also
> > finding this a bit confusing as the unwinder already tracks things it
> > calls frame types and it handles types that aren't covered here like
> > SDEI.  At the very least there's a naming issue here.

> Both these frames are on the task stack. So, it is not a stack type.

OTOH it's also not something that applies to every frame but only to the
base frame from each stack which I think was more where I was coming
from there.  In any case, the issue is also that there's already another
thing that the unwinder calls a frame type so there's at least that
collision which needs to be resolved if nothing else.

> > Taking a step back though do we want to be tracking this via pt_regs?
> > It's reliant on us robustly finding the correct pt_regs and on having
> > the things that make the stack unreliable explicitly go in and set the
> > appropriate type.  That seems like it will be error prone, I'd been
> > expecting to do something more like using sections to filter code for
> > unreliable features based on the addresses of the functions we find on
> > the stack or similar.  This could still go wrong of course but there's
> > fewer moving pieces, and especially fewer moving pieces specific to
> > reliable stack trace.

> In that case, I suggest doing both. That is, check the type as well
> as specific functions. For instance, in the EL1 pt_regs, in addition
> to the above checks, check the PC against el1_sync(), el1_irq() and
> el1_error(). I have suggested this in the cover letter.

> If this is OK with you, we could do that. We want to make really sure that
> nothing goes wrong with detecting the exception frame.

...

> If you dislike the frame type, I could remove it and just do the
> following checks:

> 	FP == pt_regs->regs[29]
> 	PC == pt_regs->pc
> 	and the address check against el1_*() functions

> and similar changes for EL0 as well.

> I still think that the frame type check makes it more robust.

Yeah, we know the entry points so they can serve the same role as
checking an explicitly written value.  It does mean one less operation
on exception entry, though I'm not sure that's that a big enough
overhead to actually worry about.  I don't have *super* strong opinons
against adding the explicitly written value other than it being one more
thing we don't otherwise use which we have to get right for reliable
stack trace, there's a greater risk of bitrot if it's not something that
we ever look at outside of the reliable stack trace code.

> >> EL1_FRAME
> >> 	EL1 exception frame.

> > We do trap into EL2 as well, the patch will track EL2 frames as EL1
> > frames.  Even if we can treat them the same the naming ought to be
> > clear.

> Are you referring to ARMv8.1 VHE extension where the kernel can run
> at EL2? Could you elaborate? I thought that EL2 was basically for
> Hypervisors.

KVM is the main case, yes - IIRC otherwise it's mainly error handlers
but I might be missing something.  We do recommend that the kernel is
started at EL2 where possible.

Actually now I look again it's just not adding anything on EL2 entries
at all, they use a separate set of macros which aren't updated - this
will only update things for EL0 and EL1 entries so my comment above
about this tracking EL2 as EL1 isn't accurate.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 1/8] arm64: Implement stack trace termination record
  2021-03-19 12:30         ` Mark Brown
@ 2021-03-19 14:29           ` Madhavan T. Venkataraman
  2021-03-19 18:19             ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-19 14:29 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/19/21 7:30 AM, Mark Brown wrote:
> On Thu, Mar 18, 2021 at 03:26:13PM -0500, Madhavan T. Venkataraman wrote:
>> On 3/18/21 10:09 AM, Mark Brown wrote:
> 
>>> If we are going to add the extra record there would probably be less
>>> potential for confusion if we pointed it at some sensibly named dummy
>>> function so anything or anyone that does see it on the stack doesn't get
>>> confused by a NULL.
> 
>> I agree. I will think about this some more. If no other solution presents
>> itself, I will add the dummy function.
> 
> After discussing this with Mark Rutland offlist he convinced me that so
> long as we ensure the kernel doesn't print the NULL record we're
> probably OK here, the effort setting the function pointer up correctly
> in all circumstances (especially when we're not in the normal memory
> map) is probably not worth it for the limited impact it's likely to have
> to see the NULL pointer (probably mainly a person working with some
> external debugger).  It should be noted in the changelog though, and/or
> merged in with the relevant change to the unwinder.
> 

OK. I will add a comment as well as note it in the changelog.

Thanks to both of you.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 2/8] arm64: Implement frame types
  2021-03-19 13:22         ` Mark Brown
@ 2021-03-19 14:40           ` Madhavan T. Venkataraman
  2021-03-19 15:02             ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-19 14:40 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/19/21 8:22 AM, Mark Brown wrote:
> On Thu, Mar 18, 2021 at 05:22:49PM -0500, Madhavan T. Venkataraman wrote:
>> On 3/18/21 12:40 PM, Mark Brown wrote:
> 
>>> Unless I'm misreading what's going on here this is more trying to set a
>>> type for the stack as a whole than for a specific stack frame.  I'm also
>>> finding this a bit confusing as the unwinder already tracks things it
>>> calls frame types and it handles types that aren't covered here like
>>> SDEI.  At the very least there's a naming issue here.
> 
>> Both these frames are on the task stack. So, it is not a stack type.
> 
> OTOH it's also not something that applies to every frame but only to the
> base frame from each stack which I think was more where I was coming
> from there.  In any case, the issue is also that there's already another
> thing that the unwinder calls a frame type so there's at least that
> collision which needs to be resolved if nothing else.
> 

The base frame from each stack as well as intermediate marker frames such
as the EL1 frame and the Ftrace frame.

As for the frame type, I will try to come up with a better name.

>>> Taking a step back though do we want to be tracking this via pt_regs?
>>> It's reliant on us robustly finding the correct pt_regs and on having
>>> the things that make the stack unreliable explicitly go in and set the
>>> appropriate type.  That seems like it will be error prone, I'd been
>>> expecting to do something more like using sections to filter code for
>>> unreliable features based on the addresses of the functions we find on
>>> the stack or similar.  This could still go wrong of course but there's
>>> fewer moving pieces, and especially fewer moving pieces specific to
>>> reliable stack trace.
> 
>> In that case, I suggest doing both. That is, check the type as well
>> as specific functions. For instance, in the EL1 pt_regs, in addition
>> to the above checks, check the PC against el1_sync(), el1_irq() and
>> el1_error(). I have suggested this in the cover letter.
> 
>> If this is OK with you, we could do that. We want to make really sure that
>> nothing goes wrong with detecting the exception frame.
> 
> ...
> 
>> If you dislike the frame type, I could remove it and just do the
>> following checks:
> 
>> 	FP == pt_regs->regs[29]
>> 	PC == pt_regs->pc
>> 	and the address check against el1_*() functions
> 
>> and similar changes for EL0 as well.
> 
>> I still think that the frame type check makes it more robust.
> 
> Yeah, we know the entry points so they can serve the same role as
> checking an explicitly written value.  It does mean one less operation
> on exception entry, though I'm not sure that's that a big enough
> overhead to actually worry about.  I don't have *super* strong opinons
> against adding the explicitly written value other than it being one more
> thing we don't otherwise use which we have to get right for reliable
> stack trace, there's a greater risk of bitrot if it's not something that
> we ever look at outside of the reliable stack trace code.
> 

So, I will add the address checks for robustness. I will think some more
about the frame type.

>>>> EL1_FRAME
>>>> 	EL1 exception frame.
> 
>>> We do trap into EL2 as well, the patch will track EL2 frames as EL1
>>> frames.  Even if we can treat them the same the naming ought to be
>>> clear.
> 
>> Are you referring to ARMv8.1 VHE extension where the kernel can run
>> at EL2? Could you elaborate? I thought that EL2 was basically for
>> Hypervisors.
> 
> KVM is the main case, yes - IIRC otherwise it's mainly error handlers
> but I might be missing something.  We do recommend that the kernel is
> started at EL2 where possible.
> 
> Actually now I look again it's just not adding anything on EL2 entries
> at all, they use a separate set of macros which aren't updated - this
> will only update things for EL0 and EL1 entries so my comment above
> about this tracking EL2 as EL1 isn't accurate.
> 

OK.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 2/8] arm64: Implement frame types
  2021-03-19 14:40           ` Madhavan T. Venkataraman
@ 2021-03-19 15:02             ` Madhavan T. Venkataraman
  2021-03-19 16:20               ` Mark Brown
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-19 15:02 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/19/21 9:40 AM, Madhavan T. Venkataraman wrote:
>> Are you referring to ARMv8.1 VHE extension where the kernel can run
>> at EL2? Could you elaborate? I thought that EL2 was basically for
>> Hypervisors.
> KVM is the main case, yes - IIRC otherwise it's mainly error handlers
> but I might be missing something.  We do recommend that the kernel is
> started at EL2 where possible.
> 
> Actually now I look again it's just not adding anything on EL2 entries
> at all, they use a separate set of macros which aren't updated - this
> will only update things for EL0 and EL1 entries so my comment above
> about this tracking EL2 as EL1 isn't accurate.

So, do I need to do anything here?

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 2/8] arm64: Implement frame types
  2021-03-19 15:02             ` Madhavan T. Venkataraman
@ 2021-03-19 16:20               ` Mark Brown
  2021-03-19 16:27                 ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Brown @ 2021-03-19 16:20 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 570 bytes --]

On Fri, Mar 19, 2021 at 10:02:52AM -0500, Madhavan T. Venkataraman wrote:
> On 3/19/21 9:40 AM, Madhavan T. Venkataraman wrote:

> > Actually now I look again it's just not adding anything on EL2 entries
> > at all, they use a separate set of macros which aren't updated - this
> > will only update things for EL0 and EL1 entries so my comment above
> > about this tracking EL2 as EL1 isn't accurate.

> So, do I need to do anything here?

Probably worth some note somewhere about other stack types existing and
how they end up being handled, in the changelog at least.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 2/8] arm64: Implement frame types
  2021-03-19 16:20               ` Mark Brown
@ 2021-03-19 16:27                 ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-19 16:27 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/19/21 11:20 AM, Mark Brown wrote:
> On Fri, Mar 19, 2021 at 10:02:52AM -0500, Madhavan T. Venkataraman wrote:
>> On 3/19/21 9:40 AM, Madhavan T. Venkataraman wrote:
> 
>>> Actually now I look again it's just not adding anything on EL2 entries
>>> at all, they use a separate set of macros which aren't updated - this
>>> will only update things for EL0 and EL1 entries so my comment above
>>> about this tracking EL2 as EL1 isn't accurate.
> 
>> So, do I need to do anything here?
> 
> Probably worth some note somewhere about other stack types existing and
> how they end up being handled, in the changelog at least.
> 
OK.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 1/8] arm64: Implement stack trace termination record
  2021-03-19 14:29           ` Madhavan T. Venkataraman
@ 2021-03-19 18:19             ` Madhavan T. Venkataraman
  2021-03-19 22:03               ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-19 18:19 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/19/21 9:29 AM, Madhavan T. Venkataraman wrote:
> 
> 
> On 3/19/21 7:30 AM, Mark Brown wrote:
>> On Thu, Mar 18, 2021 at 03:26:13PM -0500, Madhavan T. Venkataraman wrote:
>>> On 3/18/21 10:09 AM, Mark Brown wrote:
>>
>>>> If we are going to add the extra record there would probably be less
>>>> potential for confusion if we pointed it at some sensibly named dummy
>>>> function so anything or anyone that does see it on the stack doesn't get
>>>> confused by a NULL.
>>
>>> I agree. I will think about this some more. If no other solution presents
>>> itself, I will add the dummy function.
>>
>> After discussing this with Mark Rutland offlist he convinced me that so
>> long as we ensure the kernel doesn't print the NULL record we're
>> probably OK here, the effort setting the function pointer up correctly
>> in all circumstances (especially when we're not in the normal memory
>> map) is probably not worth it for the limited impact it's likely to have
>> to see the NULL pointer (probably mainly a person working with some
>> external debugger).  It should be noted in the changelog though, and/or
>> merged in with the relevant change to the unwinder.
>>
> 
> OK. I will add a comment as well as note it in the changelog.
> 
> Thanks to both of you.
> 
> Madhavan
> 

I thought about this some more. I think I have a simple solution. I will
prepare a patch and send it out. If you and Mark Rutland approve, I will
include it in the next version of this RFC.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 1/8] arm64: Implement stack trace termination record
  2021-03-19 18:19             ` Madhavan T. Venkataraman
@ 2021-03-19 22:03               ` Madhavan T. Venkataraman
  2021-03-23 10:24                 ` Mark Rutland
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-19 22:03 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/19/21 1:19 PM, Madhavan T. Venkataraman wrote:
> 
> 
> On 3/19/21 9:29 AM, Madhavan T. Venkataraman wrote:
>>
>>
>> On 3/19/21 7:30 AM, Mark Brown wrote:
>>> On Thu, Mar 18, 2021 at 03:26:13PM -0500, Madhavan T. Venkataraman wrote:
>>>> On 3/18/21 10:09 AM, Mark Brown wrote:
>>>
>>>>> If we are going to add the extra record there would probably be less
>>>>> potential for confusion if we pointed it at some sensibly named dummy
>>>>> function so anything or anyone that does see it on the stack doesn't get
>>>>> confused by a NULL.
>>>
>>>> I agree. I will think about this some more. If no other solution presents
>>>> itself, I will add the dummy function.
>>>
>>> After discussing this with Mark Rutland offlist he convinced me that so
>>> long as we ensure the kernel doesn't print the NULL record we're
>>> probably OK here, the effort setting the function pointer up correctly
>>> in all circumstances (especially when we're not in the normal memory
>>> map) is probably not worth it for the limited impact it's likely to have
>>> to see the NULL pointer (probably mainly a person working with some
>>> external debugger).  It should be noted in the changelog though, and/or
>>> merged in with the relevant change to the unwinder.
>>>
>>
>> OK. I will add a comment as well as note it in the changelog.
>>
>> Thanks to both of you.
>>
>> Madhavan
>>
> 
> I thought about this some more. I think I have a simple solution. I will
> prepare a patch and send it out. If you and Mark Rutland approve, I will
> include it in the next version of this RFC.
> 
> Madhavan
> 

I solved this by using existing functions logically instead of inventing a
dummy function. I initialize pt_regs->stackframe[1] to an existing function
so that the stack trace will not show a 0x0 entry as well as the kernel and
gdb will show identical stack traces.

For all task stack traces including the idle tasks, the stack trace will
end at copy_thread() as copy_thread() is the function that initializes the
pt_regs and the first stack frame for a task.

For EL0 exceptions, the stack trace will end with vectors() as vectors
entries call the EL0 handlers.

Here are sample stack traces (I only show the ending of each trace):

Idle task on primary CPU
========================

		 ...
[    0.022557]   start_kernel+0x5b8/0x5f4
[    0.022570]   __primary_switched+0xa8/0xb8
[    0.022578]   copy_thread+0x0/0x188

Idle task on secondary CPU
==========================

		 ...
[    0.023397]   secondary_start_kernel+0x188/0x1e0
[    0.023406]   __secondary_switched+0x40/0x88
[    0.023415]   copy_thread+0x0/0x188

All other kernel threads
========================

		 ...
[   13.501062]   ret_from_fork+0x10/0x18
[   13.507998]   copy_thread+0x0/0x188

User threads (EL0 exception)
============

write(2) system call example:

		 ...
[  521.686148]   vfs_write+0xc8/0x2c0
[  521.686156]   ksys_write+0x74/0x108
[  521.686161]   __arm64_sys_write+0x24/0x30
[  521.686166]   el0_svc_common.constprop.0+0x70/0x1a8
[  521.686175]   do_el0_svc+0x2c/0x98
[  521.686180]   el0_svc+0x2c/0x70
[  521.686188]   el0_sync_handler+0xb0/0xb8
[  521.686193]   el0_sync+0x17c/0x180
[  521.686198]   vectors+0x0/0x7d8

Here are the code changes:

========================================================================
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index a31a0a713c85..514307e80b79 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -261,16 +261,19 @@ alternative_else_nop_endif
 	stp	lr, x21, [sp, #S_LR]
 
 	/*
-	 * For exceptions from EL0, terminate the callchain here.
+	 * For exceptions from EL0, terminate the callchain here at
+	 * task_pt_regs(current)->stackframe.
+	 *
 	 * For exceptions from EL1, create a synthetic frame record so the
 	 * interrupted code shows up in the backtrace.
 	 */
 	.if \el == 0
-	mov	x29, xzr
+	ldr	x17, =vectors
+	stp	xzr, x17, [sp, #S_STACKFRAME]
 	.else
 	stp	x29, x22, [sp, #S_STACKFRAME]
-	add	x29, sp, #S_STACKFRAME
 	.endif
+	add	x29, sp, #S_STACKFRAME
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 alternative_if_not ARM64_HAS_PAN
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 66b0e0b66e31..699e0dd313a1 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -393,6 +393,29 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 	ret	x28
 SYM_FUNC_END(__create_page_tables)
 
+	/*
+	 * The boot task becomes the idle task for the primary CPU. The
+	 * CPU startup task on each secondary CPU becomes the idle task
+	 * for the secondary CPU.
+	 *
+	 * The idle task does not require pt_regs. But create a dummy
+	 * pt_regs so that task_pt_regs(idle_task)->stackframe can be
+	 * set up to be the last frame on the idle task stack just like
+	 * all the other kernel tasks. This helps the unwinder to
+	 * terminate the stack trace at a well-known stack offset.
+	 *
+	 * Also, set up the last return PC to be copy_thread() just
+	 * like all the other kernel tasks so that the stack trace of
+	 * all kernel tasks ends with the same function.
+	 */
+	.macro setup_last_frame
+	sub	sp, sp, #PT_REGS_SIZE
+	ldr	x17, =copy_thread
+	stp	xzr, x17, [sp, #S_STACKFRAME]
+	add	x29, sp, #S_STACKFRAME
+	adr	x30, #0
+	.endm
+
 /*
  * The following fragment of code is executed with the MMU enabled.
  *
@@ -447,8 +470,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #endif
 	bl	switch_to_vhe			// Prefer VHE if possible
 	add	sp, sp, #16
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 	b	start_kernel
 SYM_FUNC_END(__primary_switched)
 
@@ -606,8 +628,7 @@ SYM_FUNC_START_LOCAL(__secondary_switched)
 	cbz	x2, __secondary_too_slow
 	msr	sp_el0, x2
 	scs_load x2, x3
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 
 #ifdef CONFIG_ARM64_PTR_AUTH
 	ptrauth_keys_init_cpu x2, x3, x4, x5
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 325c83b1a24d..9050699ff67c 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -437,6 +437,12 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	}
 	p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
 	p->thread.cpu_context.sp = (unsigned long)childregs;
+	/*
+	 * For the benefit of the unwinder, set up childregs->stackframe
+	 * as the last frame for the new task.
+	 */
+	p->thread.cpu_context.fp = (unsigned long)childregs->stackframe;
+	childregs->stackframe[1] = (unsigned long)copy_thread;
 
 	ptrace_hw_copy_thread(p);
 
======================================================================

If you approve, the above will become RFC Patch v3 1/8 in the next version.

Let me know.

Also, I could introduce an extra frame in the EL1 exception stack trace that
includes vectors so the stack trace would look like this (timer interrupt example):

call_timer_fn
run_timer_softirq
__do_softirq
irq_exit
__handle_domain_irq
gic_handle_irq
el1_irq
vectors

This way, if the unwinder finds vectors, it knows that it is an exception frame.

Madhavan

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 1/8] arm64: Implement stack trace termination record
  2021-03-19 22:03               ` Madhavan T. Venkataraman
@ 2021-03-23 10:24                 ` Mark Rutland
  2021-03-23 12:39                   ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 10:24 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Mark Brown, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Fri, Mar 19, 2021 at 05:03:09PM -0500, Madhavan T. Venkataraman wrote:
> I solved this by using existing functions logically instead of inventing a
> dummy function. I initialize pt_regs->stackframe[1] to an existing function
> so that the stack trace will not show a 0x0 entry as well as the kernel and
> gdb will show identical stack traces.
> 
> For all task stack traces including the idle tasks, the stack trace will
> end at copy_thread() as copy_thread() is the function that initializes the
> pt_regs and the first stack frame for a task.

I don't think this is a good idea, as it will mean that copy_thread()
will appear to be live in every thread, and therefore will not be
patchable.

There are other things people need to be aware of when using an external
debugger (e.g. during EL0<->ELx transitions there are periods when X29
and X30 contain the EL0 values, and cannot be used to unwind), so I
don't think there's a strong need to make this look prettier to an
external debugger.

> For EL0 exceptions, the stack trace will end with vectors() as vectors
> entries call the EL0 handlers.
> 
> Here are sample stack traces (I only show the ending of each trace):
> 
> Idle task on primary CPU
> ========================
> 
> 		 ...
> [    0.022557]   start_kernel+0x5b8/0x5f4
> [    0.022570]   __primary_switched+0xa8/0xb8
> [    0.022578]   copy_thread+0x0/0x188
> 
> Idle task on secondary CPU
> ==========================
> 
> 		 ...
> [    0.023397]   secondary_start_kernel+0x188/0x1e0
> [    0.023406]   __secondary_switched+0x40/0x88
> [    0.023415]   copy_thread+0x0/0x188
> 
> All other kernel threads
> ========================
> 
> 		 ...
> [   13.501062]   ret_from_fork+0x10/0x18
> [   13.507998]   copy_thread+0x0/0x188
> 
> User threads (EL0 exception)
> ============
> 
> write(2) system call example:
> 
> 		 ...
> [  521.686148]   vfs_write+0xc8/0x2c0
> [  521.686156]   ksys_write+0x74/0x108
> [  521.686161]   __arm64_sys_write+0x24/0x30
> [  521.686166]   el0_svc_common.constprop.0+0x70/0x1a8
> [  521.686175]   do_el0_svc+0x2c/0x98
> [  521.686180]   el0_svc+0x2c/0x70
> [  521.686188]   el0_sync_handler+0xb0/0xb8
> [  521.686193]   el0_sync+0x17c/0x180
> [  521.686198]   vectors+0x0/0x7d8

[...]

> If you approve, the above will become RFC Patch v3 1/8 in the next version.

As above, I don't think we should repurpose an existing function here,
and my preference is to use 0x0.

> Let me know.
> 
> Also, I could introduce an extra frame in the EL1 exception stack trace that
> includes vectors so the stack trace would look like this (timer interrupt example):
> 
> call_timer_fn
> run_timer_softirq
> __do_softirq
> irq_exit
> __handle_domain_irq
> gic_handle_irq
> el1_irq
> vectors
> 
> This way, if the unwinder finds vectors, it knows that it is an exception frame.

I can see this might make it simpler to detect exception boundaries, but
I suspect that we need other information anyway, so this doesn't become
all that helpful. For EL0<->EL1 exception boundaries we want to
successfully terminate a robust stacktrace whereas for EL1<->EL1
exception boundaries we want to fail a robust stacktrace.

I reckon we have to figure that out from the el1_* and el0_* entry
points (which I am working to reduce/simplify as part of the entry
assembly conversion to C). With that we can terminate unwind at the
el0_* parts, and reject unwinding across any other bit of .entry.text.

Thanks,
Mark.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 2/8] arm64: Implement frame types
  2021-03-15 16:57   ` [RFC PATCH v2 2/8] arm64: Implement frame types madvenka
  2021-03-18 17:40     ` Mark Brown
@ 2021-03-23 10:34     ` Mark Rutland
  1 sibling, 0 replies; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 10:34 UTC (permalink / raw)
  To: madvenka
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Mon, Mar 15, 2021 at 11:57:54AM -0500, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> 
> Apart from the task pt_regs, pt_regs is also created on the stack for other
> other cases:
> 
> 	- EL1 exception. A pt_regs is created on the stack to save register
> 	  state. In addition, pt_regs->stackframe is set up for the
> 	  interrupted kernel function so that the function shows up in the
> 	  EL1 exception stack trace.
> 
> 	- When a traced function calls the ftrace infrastructure at the
> 	  beginning of the function, ftrace creates a pt_regs on the stack
> 	  at that point to save register state. In addition, it sets up
> 	  pt_regs->stackframe for the traced function so that the traced
> 	  function shows up in the stack trace taken from anywhere in the
> 	  ftrace code after that point. When the ftrace code returns to the
> 	  traced function, the pt_regs is removed from the stack.
> 
> To summarize, pt_regs->stackframe is used (or will be used) as a marker
> frame in stack traces. To enable the unwinder to detect these frames, tag
> each pt_regs->stackframe with a type. To record the type, use the unused2
> field in struct pt_regs and rename it to frame_type. The types are:
> 
> TASK_FRAME
> 	Terminating frame for a normal stack trace.
> EL0_FRAME
> 	Terminating frame for an EL0 exception.
> EL1_FRAME
> 	EL1 exception frame.
> FTRACE_FRAME
> 	FTRACE frame.
> 
> These frame types will be used by the unwinder later to validate frames.

I don't think that we need a marker in the pt_regs:

* For kernel tasks and user tasks we just need the terminal frame record
  to be at a known location. We don't need the pt_regs to determine
  this.

* For EL1<->EL1 exception boundaries, we already chain the frame records
  together, and we can identify the entry functions to see that there's
  an exception boundary. We don't need the pt_regs to determine this.

* For ftrace using patchable-function-entry, we can identify the
  trampoline function. I'm also hoping to move away from pt_regs to an
  ftrace_regs here, and I'd like to avoid more strongly coupling this to
  pt_regs.

  Maybe I'm missing something you need for this last case?

> 
> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> ---
>  arch/arm64/include/asm/ptrace.h | 15 +++++++++++++--
>  arch/arm64/kernel/asm-offsets.c |  1 +
>  arch/arm64/kernel/entry.S       |  4 ++++
>  arch/arm64/kernel/head.S        |  2 ++
>  arch/arm64/kernel/process.c     |  1 +
>  5 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
> index e58bca832dff..a75211ce009a 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -117,6 +117,17 @@
>   */
>  #define NO_SYSCALL (-1)
>  
> +/*
> + * pt_regs->stackframe is a marker frame that is used in different
> + * situations. These are the different types of frames. Use patterns
> + * for the frame types instead of (0, 1, 2, 3, ..) so that it is less
> + * likely to find them on the stack.
> + */
> +#define TASK_FRAME	0xDEADBEE0	/* Task stack termination frame */
> +#define EL0_FRAME	0xDEADBEE1	/* EL0 exception frame */
> +#define EL1_FRAME	0xDEADBEE2	/* EL1 exception frame */
> +#define FTRACE_FRAME	0xDEADBEE3	/* FTrace frame */

This sounds like we're using this as a heuristic, which I don't think we
should do. I'd strongly prefr to avoid magic valuess here, and if we
cannot be 100% certain of the stack contents, this is not reliable
anyway.

Thanks,
Mark.

>  #ifndef __ASSEMBLY__
>  #include <linux/bug.h>
>  #include <linux/types.h>
> @@ -187,11 +198,11 @@ struct pt_regs {
>  	};
>  	u64 orig_x0;
>  #ifdef __AARCH64EB__
> -	u32 unused2;
> +	u32 frame_type;
>  	s32 syscallno;
>  #else
>  	s32 syscallno;
> -	u32 unused2;
> +	u32 frame_type;
>  #endif
>  	u64 sdei_ttbr1;
>  	/* Only valid when ARM64_HAS_IRQ_PRIO_MASKING is enabled. */
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index a36e2fc330d4..43f97dbc7dfc 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -75,6 +75,7 @@ int main(void)
>    DEFINE(S_SDEI_TTBR1,		offsetof(struct pt_regs, sdei_ttbr1));
>    DEFINE(S_PMR_SAVE,		offsetof(struct pt_regs, pmr_save));
>    DEFINE(S_STACKFRAME,		offsetof(struct pt_regs, stackframe));
> +  DEFINE(S_FRAME_TYPE,		offsetof(struct pt_regs, frame_type));
>    DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
>    BLANK();
>  #ifdef CONFIG_COMPAT
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index e2dc2e998934..ecc3507d9cdd 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -269,8 +269,12 @@ alternative_else_nop_endif
>  	 */
>  	.if \el == 0
>  	stp	xzr, xzr, [sp, #S_STACKFRAME]
> +	ldr	w17, =EL0_FRAME
> +	str	w17, [sp, #S_FRAME_TYPE]
>  	.else
>  	stp	x29, x22, [sp, #S_STACKFRAME]
> +	ldr	w17, =EL1_FRAME
> +	str	w17, [sp, #S_FRAME_TYPE]
>  	.endif
>  	add	x29, sp, #S_STACKFRAME
>  
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 2769b20934d4..d2ee78f8f97f 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -410,6 +410,8 @@ SYM_FUNC_END(__create_page_tables)
>  	 */
>  	.macro setup_last_frame
>  	sub	sp, sp, #PT_REGS_SIZE
> +	ldr	w17, =TASK_FRAME
> +	str	w17, [sp, #S_FRAME_TYPE]
>  	stp	xzr, xzr, [sp, #S_STACKFRAME]
>  	add	x29, sp, #S_STACKFRAME
>  	ldr	x30, =ret_from_fork
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 7ffa689e8b60..5c152fd60503 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -442,6 +442,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
>  	 * as the last frame for the new task.
>  	 */
>  	p->thread.cpu_context.fp = (unsigned long)childregs->stackframe;
> +	childregs->frame_type = TASK_FRAME;
>  
>  	ptrace_hw_copy_thread(p);
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 3/8] arm64: Terminate the stack trace at TASK_FRAME and EL0_FRAME
  2021-03-18 20:29       ` Madhavan T. Venkataraman
@ 2021-03-23 10:36         ` Mark Rutland
  2021-03-23 12:40           ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 10:36 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Mark Brown, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Thu, Mar 18, 2021 at 03:29:19PM -0500, Madhavan T. Venkataraman wrote:
> 
> 
> On 3/18/21 1:26 PM, Mark Brown wrote:
> > On Mon, Mar 15, 2021 at 11:57:55AM -0500, madvenka@linux.microsoft.com wrote:
> > 
> >> +	/* Terminal record, nothing to unwind */
> >> +	if (fp == (unsigned long) regs->stackframe) {
> >> +		if (regs->frame_type == TASK_FRAME ||
> >> +		    regs->frame_type == EL0_FRAME)
> >> +			return -ENOENT;
> >>  		return -EINVAL;
> >> +	}
> > 
> > This is conflating the reliable stacktrace checks (which your series
> > will later flag up with frame->reliable) with verifying that we found
> > the bottom of the stack by looking for this terminal stack frame record.
> > For the purposes of determining if the unwinder got to the bottom of the
> > stack we don't care what stack type we're looking at, we just care if it
> > managed to walk to this defined final record.  
> > 
> > At the minute nothing except reliable stack trace has any intention of
> > checking the specific return code but it's clearer to be consistent.
> > 
> 
> So, you are saying that the type check is redundant. OK. I will remove it
> and just return -ENOENT on reaching the final record.

Yes please; and please fold that into the same patch that adds the final
records.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable
  2021-03-15 16:57   ` [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable madvenka
@ 2021-03-23 10:42     ` Mark Rutland
  2021-03-23 12:46       ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 10:42 UTC (permalink / raw)
  To: madvenka
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Mon, Mar 15, 2021 at 11:57:56AM -0500, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> 
> EL1 exceptions can happen on any instruction including instructions in
> the frame pointer prolog or epilog. Depending on where exactly they happen,
> they could render the stack trace unreliable.
> 
> If an EL1 exception frame is found on the stack, mark the stack trace as
> unreliable.
> 
> Now, the EL1 exception frame is not at any well-known offset on the stack.
> It can be anywhere on the stack. In order to properly detect an EL1
> exception frame the following checks must be done:
> 
> 	- The frame type must be EL1_FRAME.
> 
> 	- When the register state is saved in the EL1 pt_regs, the frame
> 	  pointer x29 is saved in pt_regs->regs[29] and the return PC
> 	  is saved in pt_regs->pc. These must match with the current
> 	  frame.

Before you can do this, you need to reliably identify that you have a
pt_regs on the stack, but this patch uses a heuristic, which is not
reliable.

However, instead you can identify whether you're trying to unwind
through one of the EL1 entry functions, which tells you the same thing
without even having to look at the pt_regs.

We can do that based on the entry functions all being in .entry.text,
which we could further sub-divide to split the EL0 and EL1 entry
functions.

> 
> Interrupts encountered in kernel code are also EL1 exceptions. At the end
> of an interrupt, the interrupt handler checks if the current task must be
> preempted for any reason. If so, it calls the preemption code which takes
> the task off the CPU. A stack trace taken on the task after the preemption
> will show the EL1 frame and will be considered unreliable. This is correct
> behavior as preemption can happen practically at any point in code
> including the frame pointer prolog and epilog.
> 
> Breakpoints encountered in kernel code are also EL1 exceptions. The probing
> infrastructure uses breakpoints for executing probe code. While in the probe
> code, the stack trace will show an EL1 frame and will be considered
> unreliable. This is also correct behavior.
> 
> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> ---
>  arch/arm64/include/asm/stacktrace.h |  2 +
>  arch/arm64/kernel/stacktrace.c      | 57 +++++++++++++++++++++++++++++
>  2 files changed, 59 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> index eb29b1fe8255..684f65808394 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -59,6 +59,7 @@ struct stackframe {
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  	int graph;
>  #endif
> +	bool reliable;
>  };
>  
>  extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
> @@ -169,6 +170,7 @@ static inline void start_backtrace(struct stackframe *frame,
>  	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
>  	frame->prev_fp = 0;
>  	frame->prev_type = STACK_TYPE_UNKNOWN;
> +	frame->reliable = true;
>  }
>  
>  #endif	/* __ASM_STACKTRACE_H */
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 504cd161339d..6ae103326f7b 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -18,6 +18,58 @@
>  #include <asm/stack_pointer.h>
>  #include <asm/stacktrace.h>
>  
> +static void check_if_reliable(unsigned long fp, struct stackframe *frame,
> +			      struct stack_info *info)
> +{
> +	struct pt_regs *regs;
> +	unsigned long regs_start, regs_end;
> +
> +	/*
> +	 * If the stack trace has already been marked unreliable, just
> +	 * return.
> +	 */
> +	if (!frame->reliable)
> +		return;
> +
> +	/*
> +	 * Assume that this is an intermediate marker frame inside a pt_regs
> +	 * structure created on the stack and get the pt_regs pointer. Other
> +	 * checks will be done below to make sure that this is a marker
> +	 * frame.
> +	 */

Sorry, but NAK to this approach specifically. This isn't reliable (since
it can be influenced by arbitrary data on the stack), and it's far more
complicated than identifying the entry functions specifically.

Thanks,
Mark.

> +	regs_start = fp - offsetof(struct pt_regs, stackframe);
> +	if (regs_start < info->low)
> +		return;
> +	regs_end = regs_start + sizeof(*regs);
> +	if (regs_end > info->high)
> +		return;
> +	regs = (struct pt_regs *) regs_start;
> +
> +	/*
> +	 * When an EL1 exception happens, a pt_regs structure is created
> +	 * on the stack and the register state is recorded. Part of the
> +	 * state is the FP and PC at the time of the exception.
> +	 *
> +	 * In addition, the FP and PC are also stored in pt_regs->stackframe
> +	 * and pt_regs->stackframe is chained with other frames on the stack.
> +	 * This is so that the interrupted function shows up in the stack
> +	 * trace.
> +	 *
> +	 * The exception could have happened during the frame pointer
> +	 * prolog or epilog. This could result in a missing frame in
> +	 * the stack trace so that the caller of the interrupted
> +	 * function does not show up in the stack trace.
> +	 *
> +	 * So, mark the stack trace as unreliable if an EL1 frame is
> +	 * detected.
> +	 */
> +	if (regs->frame_type == EL1_FRAME && regs->pc == frame->pc &&
> +	    regs->regs[29] == frame->fp) {
> +		frame->reliable = false;
> +		return;
> +	}
> +}
> +
>  /*
>   * AArch64 PCS assigns the frame pointer to x29.
>   *
> @@ -114,6 +166,11 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
>  
>  	frame->pc = ptrauth_strip_insn_pac(frame->pc);
>  
> +	/*
> +	 * Check for features that render the stack trace unreliable.
> +	 */
> +	check_if_reliable(fp, frame, &info);
> +
>  	return 0;
>  }
>  NOKPROBE_SYMBOL(unwind_frame);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-15 16:57   ` [RFC PATCH v2 5/8] arm64: Detect an FTRACE " madvenka
@ 2021-03-23 10:51     ` Mark Rutland
  2021-03-23 12:56       ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 10:51 UTC (permalink / raw)
  To: madvenka
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Mon, Mar 15, 2021 at 11:57:57AM -0500, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> 
> When CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled and tracing is activated
> for a function, the ftrace infrastructure is called for the function at
> the very beginning. Ftrace creates two frames:
> 
> 	- One for the traced function
> 
> 	- One for the caller of the traced function
> 
> That gives a reliable stack trace while executing in the ftrace
> infrastructure code. When ftrace returns to the traced function, the frames
> are popped and everything is back to normal.
> 
> However, in cases like live patch, execution is redirected to a different
> function when ftrace returns. A stack trace taken while still in the ftrace
> infrastructure code will not show the target function. The target function
> is the real function that we want to track.
> 
> So, if an FTRACE frame is detected on the stack, just mark the stack trace
> as unreliable.

To identify this case, please identify the ftrace trampolines instead,
e.g. ftrace_regs_caller, return_to_handler.

It'd be good to check *exactly* when we need to reject, since IIUC when
we have a graph stack entry the unwind will be correct from livepatch's
PoV.

Thanks,
Mark.

> 
> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> ---
>  arch/arm64/kernel/entry-ftrace.S |  2 ++
>  arch/arm64/kernel/stacktrace.c   | 33 ++++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> index b3e4f9a088b1..1ec8c5180fc0 100644
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -74,6 +74,8 @@
>  	/* Create our frame record within pt_regs. */
>  	stp	x29, x30, [sp, #S_STACKFRAME]
>  	add	x29, sp, #S_STACKFRAME
> +	ldr	w17, =FTRACE_FRAME
> +	str	w17, [sp, #S_FRAME_TYPE]
>  	.endm
>  
>  SYM_CODE_START(ftrace_regs_caller)
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 6ae103326f7b..594806a0c225 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -23,6 +23,7 @@ static void check_if_reliable(unsigned long fp, struct stackframe *frame,
>  {
>  	struct pt_regs *regs;
>  	unsigned long regs_start, regs_end;
> +	unsigned long caller_fp;
>  
>  	/*
>  	 * If the stack trace has already been marked unreliable, just
> @@ -68,6 +69,38 @@ static void check_if_reliable(unsigned long fp, struct stackframe *frame,
>  		frame->reliable = false;
>  		return;
>  	}
> +
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +	/*
> +	 * When tracing is active for a function, the ftrace code is called
> +	 * from the function even before the frame pointer prolog and
> +	 * epilog. ftrace creates a pt_regs structure on the stack to save
> +	 * register state.
> +	 *
> +	 * In addition, ftrace sets up two stack frames and chains them
> +	 * with other frames on the stack. One frame is pt_regs->stackframe
> +	 * that is for the traced function. The other frame is set up right
> +	 * after the pt_regs structure and it is for the caller of the
> +	 * traced function. This is done to ensure a proper stack trace.
> +	 *
> +	 * If the ftrace code returns to the traced function, then all is
> +	 * fine. But if it transfers control to a different function (like
> +	 * in livepatch), then a stack walk performed while still in the
> +	 * ftrace code will not find the target function.
> +	 *
> +	 * So, mark the stack trace as unreliable if an ftrace frame is
> +	 * detected.
> +	 */
> +	if (regs->frame_type == FTRACE_FRAME && frame->fp == regs_end &&
> +	    frame->fp < info->high) {
> +		/* Check the traced function's caller's frame. */
> +		caller_fp = READ_ONCE_NOCHECK(*(unsigned long *)(frame->fp));
> +		if (caller_fp == regs->regs[29]) {
> +			frame->reliable = false;
> +			return;
> +		}
> +	}
> +#endif
>  }
>  
>  /*
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 1/8] arm64: Implement stack trace termination record
  2021-03-23 10:24                 ` Mark Rutland
@ 2021-03-23 12:39                   ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 12:39 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Mark Brown, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 5:24 AM, Mark Rutland wrote:
> On Fri, Mar 19, 2021 at 05:03:09PM -0500, Madhavan T. Venkataraman wrote:
>> I solved this by using existing functions logically instead of inventing a
>> dummy function. I initialize pt_regs->stackframe[1] to an existing function
>> so that the stack trace will not show a 0x0 entry as well as the kernel and
>> gdb will show identical stack traces.
>>
>> For all task stack traces including the idle tasks, the stack trace will
>> end at copy_thread() as copy_thread() is the function that initializes the
>> pt_regs and the first stack frame for a task.
> 
> I don't think this is a good idea, as it will mean that copy_thread()
> will appear to be live in every thread, and therefore will not be
> patchable.
> 
> There are other things people need to be aware of when using an external
> debugger (e.g. during EL0<->ELx transitions there are periods when X29
> and X30 contain the EL0 values, and cannot be used to unwind), so I
> don't think there's a strong need to make this look prettier to an
> external debugger.
> 

OK.

>> For EL0 exceptions, the stack trace will end with vectors() as vectors
>> entries call the EL0 handlers.
>>
>> Here are sample stack traces (I only show the ending of each trace):
>>
>> Idle task on primary CPU
>> ========================
>>
>> 		 ...
>> [    0.022557]   start_kernel+0x5b8/0x5f4
>> [    0.022570]   __primary_switched+0xa8/0xb8
>> [    0.022578]   copy_thread+0x0/0x188
>>
>> Idle task on secondary CPU
>> ==========================
>>
>> 		 ...
>> [    0.023397]   secondary_start_kernel+0x188/0x1e0
>> [    0.023406]   __secondary_switched+0x40/0x88
>> [    0.023415]   copy_thread+0x0/0x188
>>
>> All other kernel threads
>> ========================
>>
>> 		 ...
>> [   13.501062]   ret_from_fork+0x10/0x18
>> [   13.507998]   copy_thread+0x0/0x188
>>
>> User threads (EL0 exception)
>> ============
>>
>> write(2) system call example:
>>
>> 		 ...
>> [  521.686148]   vfs_write+0xc8/0x2c0
>> [  521.686156]   ksys_write+0x74/0x108
>> [  521.686161]   __arm64_sys_write+0x24/0x30
>> [  521.686166]   el0_svc_common.constprop.0+0x70/0x1a8
>> [  521.686175]   do_el0_svc+0x2c/0x98
>> [  521.686180]   el0_svc+0x2c/0x70
>> [  521.686188]   el0_sync_handler+0xb0/0xb8
>> [  521.686193]   el0_sync+0x17c/0x180
>> [  521.686198]   vectors+0x0/0x7d8
> 
> [...]
> 
>> If you approve, the above will become RFC Patch v3 1/8 in the next version.
> 
> As above, I don't think we should repurpose an existing function here,
> and my preference is to use 0x0.
> 

OK.

>> Let me know.
>>
>> Also, I could introduce an extra frame in the EL1 exception stack trace that
>> includes vectors so the stack trace would look like this (timer interrupt example):
>>
>> call_timer_fn
>> run_timer_softirq
>> __do_softirq
>> irq_exit
>> __handle_domain_irq
>> gic_handle_irq
>> el1_irq
>> vectors
>>
>> This way, if the unwinder finds vectors, it knows that it is an exception frame.
> 
> I can see this might make it simpler to detect exception boundaries, but
> I suspect that we need other information anyway, so this doesn't become
> all that helpful. For EL0<->EL1 exception boundaries we want to
> successfully terminate a robust stacktrace whereas for EL1<->EL1
> exception boundaries we want to fail a robust stacktrace.
> 
> I reckon we have to figure that out from the el1_* and el0_* entry
> points (which I am working to reduce/simplify as part of the entry
> assembly conversion to C). With that we can terminate unwind at the
> el0_* parts, and reject unwinding across any other bit of .entry.text.
> 

OK. That is fine.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 3/8] arm64: Terminate the stack trace at TASK_FRAME and EL0_FRAME
  2021-03-23 10:36         ` Mark Rutland
@ 2021-03-23 12:40           ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 12:40 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Mark Brown, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 5:36 AM, Mark Rutland wrote:
> On Thu, Mar 18, 2021 at 03:29:19PM -0500, Madhavan T. Venkataraman wrote:
>>
>>
>> On 3/18/21 1:26 PM, Mark Brown wrote:
>>> On Mon, Mar 15, 2021 at 11:57:55AM -0500, madvenka@linux.microsoft.com wrote:
>>>
>>>> +	/* Terminal record, nothing to unwind */
>>>> +	if (fp == (unsigned long) regs->stackframe) {
>>>> +		if (regs->frame_type == TASK_FRAME ||
>>>> +		    regs->frame_type == EL0_FRAME)
>>>> +			return -ENOENT;
>>>>  		return -EINVAL;
>>>> +	}
>>>
>>> This is conflating the reliable stacktrace checks (which your series
>>> will later flag up with frame->reliable) with verifying that we found
>>> the bottom of the stack by looking for this terminal stack frame record.
>>> For the purposes of determining if the unwinder got to the bottom of the
>>> stack we don't care what stack type we're looking at, we just care if it
>>> managed to walk to this defined final record.  
>>>
>>> At the minute nothing except reliable stack trace has any intention of
>>> checking the specific return code but it's clearer to be consistent.
>>>
>>
>> So, you are saying that the type check is redundant. OK. I will remove it
>> and just return -ENOENT on reaching the final record.
> 
> Yes please; and please fold that into the same patch that adds the final
> records.
> 

Will do.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable
  2021-03-23 10:42     ` Mark Rutland
@ 2021-03-23 12:46       ` Madhavan T. Venkataraman
  2021-03-23 13:04         ` Mark Rutland
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 12:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 5:42 AM, Mark Rutland wrote:
> On Mon, Mar 15, 2021 at 11:57:56AM -0500, madvenka@linux.microsoft.com wrote:
>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>
>> EL1 exceptions can happen on any instruction including instructions in
>> the frame pointer prolog or epilog. Depending on where exactly they happen,
>> they could render the stack trace unreliable.
>>
>> If an EL1 exception frame is found on the stack, mark the stack trace as
>> unreliable.
>>
>> Now, the EL1 exception frame is not at any well-known offset on the stack.
>> It can be anywhere on the stack. In order to properly detect an EL1
>> exception frame the following checks must be done:
>>
>> 	- The frame type must be EL1_FRAME.
>>
>> 	- When the register state is saved in the EL1 pt_regs, the frame
>> 	  pointer x29 is saved in pt_regs->regs[29] and the return PC
>> 	  is saved in pt_regs->pc. These must match with the current
>> 	  frame.
> 
> Before you can do this, you need to reliably identify that you have a
> pt_regs on the stack, but this patch uses a heuristic, which is not
> reliable.
> 
> However, instead you can identify whether you're trying to unwind
> through one of the EL1 entry functions, which tells you the same thing
> without even having to look at the pt_regs.
> 
> We can do that based on the entry functions all being in .entry.text,
> which we could further sub-divide to split the EL0 and EL1 entry
> functions.
> 

Yes. I will check the entry functions. But I still think that we should
not rely on just one check. The additional checks will make it robust.
So, I suggest that the return address be checked first. If that passes,
then we can be reasonably sure that there are pt_regs. Then, check
the other things in pt_regs.

>>
>> Interrupts encountered in kernel code are also EL1 exceptions. At the end
>> of an interrupt, the interrupt handler checks if the current task must be
>> preempted for any reason. If so, it calls the preemption code which takes
>> the task off the CPU. A stack trace taken on the task after the preemption
>> will show the EL1 frame and will be considered unreliable. This is correct
>> behavior as preemption can happen practically at any point in code
>> including the frame pointer prolog and epilog.
>>
>> Breakpoints encountered in kernel code are also EL1 exceptions. The probing
>> infrastructure uses breakpoints for executing probe code. While in the probe
>> code, the stack trace will show an EL1 frame and will be considered
>> unreliable. This is also correct behavior.
>>
>> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
>> ---
>>  arch/arm64/include/asm/stacktrace.h |  2 +
>>  arch/arm64/kernel/stacktrace.c      | 57 +++++++++++++++++++++++++++++
>>  2 files changed, 59 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
>> index eb29b1fe8255..684f65808394 100644
>> --- a/arch/arm64/include/asm/stacktrace.h
>> +++ b/arch/arm64/include/asm/stacktrace.h
>> @@ -59,6 +59,7 @@ struct stackframe {
>>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>  	int graph;
>>  #endif
>> +	bool reliable;
>>  };
>>  
>>  extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
>> @@ -169,6 +170,7 @@ static inline void start_backtrace(struct stackframe *frame,
>>  	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
>>  	frame->prev_fp = 0;
>>  	frame->prev_type = STACK_TYPE_UNKNOWN;
>> +	frame->reliable = true;
>>  }
>>  
>>  #endif	/* __ASM_STACKTRACE_H */
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index 504cd161339d..6ae103326f7b 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -18,6 +18,58 @@
>>  #include <asm/stack_pointer.h>
>>  #include <asm/stacktrace.h>
>>  
>> +static void check_if_reliable(unsigned long fp, struct stackframe *frame,
>> +			      struct stack_info *info)
>> +{
>> +	struct pt_regs *regs;
>> +	unsigned long regs_start, regs_end;
>> +
>> +	/*
>> +	 * If the stack trace has already been marked unreliable, just
>> +	 * return.
>> +	 */
>> +	if (!frame->reliable)
>> +		return;
>> +
>> +	/*
>> +	 * Assume that this is an intermediate marker frame inside a pt_regs
>> +	 * structure created on the stack and get the pt_regs pointer. Other
>> +	 * checks will be done below to make sure that this is a marker
>> +	 * frame.
>> +	 */
> 
> Sorry, but NAK to this approach specifically. This isn't reliable (since
> it can be influenced by arbitrary data on the stack), and it's far more
> complicated than identifying the entry functions specifically.
> 

As I mentioned above, I agree that we should check the return address. But
just as a precaution, I think we should double check the pt_regs.

Is that OK with you? It does not take away anything or increase the risk in
anyway. I think it makes it more robust.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 10:51     ` Mark Rutland
@ 2021-03-23 12:56       ` Madhavan T. Venkataraman
  2021-03-23 13:36         ` Mark Rutland
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 12:56 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 5:51 AM, Mark Rutland wrote:
> On Mon, Mar 15, 2021 at 11:57:57AM -0500, madvenka@linux.microsoft.com wrote:
>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>
>> When CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled and tracing is activated
>> for a function, the ftrace infrastructure is called for the function at
>> the very beginning. Ftrace creates two frames:
>>
>> 	- One for the traced function
>>
>> 	- One for the caller of the traced function
>>
>> That gives a reliable stack trace while executing in the ftrace
>> infrastructure code. When ftrace returns to the traced function, the frames
>> are popped and everything is back to normal.
>>
>> However, in cases like live patch, execution is redirected to a different
>> function when ftrace returns. A stack trace taken while still in the ftrace
>> infrastructure code will not show the target function. The target function
>> is the real function that we want to track.
>>
>> So, if an FTRACE frame is detected on the stack, just mark the stack trace
>> as unreliable.
> 
> To identify this case, please identify the ftrace trampolines instead,
> e.g. ftrace_regs_caller, return_to_handler.
> 

Yes. As part of the return address checking, I will check this. IIUC, I think that
I need to check for the inner labels that are defined at the point where the
instructions are patched for ftrace. E.g., ftrace_call and ftrace_graph_call.

SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
        bl      ftrace_stub	<====================================

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL) // ftrace_graph_caller();
        nop	<=======                // If enabled, this will be replaced
                                        // "b ftrace_graph_caller"
#endif

For instance, the stack trace I got while tracing do_mmap() with the stack trace
tracer looks like this:

		 ...
[  338.911793]   trace_function+0xc4/0x160
[  338.911801]   function_stack_trace_call+0xac/0x130
[  338.911807]   ftrace_graph_call+0x0/0x4
[  338.911813]   do_mmap+0x8/0x598
[  338.911820]   vm_mmap_pgoff+0xf4/0x188
[  338.911826]   ksys_mmap_pgoff+0x1d8/0x220
[  338.911832]   __arm64_sys_mmap+0x38/0x50
[  338.911839]   el0_svc_common.constprop.0+0x70/0x1a8
[  338.911846]   do_el0_svc+0x2c/0x98
[  338.911851]   el0_svc+0x2c/0x70
[  338.911859]   el0_sync_handler+0xb0/0xb8
[  338.911864]   el0_sync+0x180/0x1c0

> It'd be good to check *exactly* when we need to reject, since IIUC when
> we have a graph stack entry the unwind will be correct from livepatch's
> PoV.
> 

The current unwinder already handles this like this:

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
        if (tsk->ret_stack &&
                (ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) {
                struct ftrace_ret_stack *ret_stack;
                /*
                 * This is a case where function graph tracer has
                 * modified a return address (LR) in a stack frame
                 * to hook a function return.
                 * So replace it to an original value.
                 */
                ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
                if (WARN_ON_ONCE(!ret_stack))
                        return -EINVAL;
                frame->pc = ret_stack->ret;
        }
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */

Is there anything else that needs handling here?

Thanks,

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable
  2021-03-23 12:46       ` Madhavan T. Venkataraman
@ 2021-03-23 13:04         ` Mark Rutland
  2021-03-23 13:31           ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 13:04 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Tue, Mar 23, 2021 at 07:46:10AM -0500, Madhavan T. Venkataraman wrote:
> On 3/23/21 5:42 AM, Mark Rutland wrote:
> > On Mon, Mar 15, 2021 at 11:57:56AM -0500, madvenka@linux.microsoft.com wrote:
> >> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> >>
> >> EL1 exceptions can happen on any instruction including instructions in
> >> the frame pointer prolog or epilog. Depending on where exactly they happen,
> >> they could render the stack trace unreliable.
> >>
> >> If an EL1 exception frame is found on the stack, mark the stack trace as
> >> unreliable.
> >>
> >> Now, the EL1 exception frame is not at any well-known offset on the stack.
> >> It can be anywhere on the stack. In order to properly detect an EL1
> >> exception frame the following checks must be done:
> >>
> >> 	- The frame type must be EL1_FRAME.
> >>
> >> 	- When the register state is saved in the EL1 pt_regs, the frame
> >> 	  pointer x29 is saved in pt_regs->regs[29] and the return PC
> >> 	  is saved in pt_regs->pc. These must match with the current
> >> 	  frame.
> > 
> > Before you can do this, you need to reliably identify that you have a
> > pt_regs on the stack, but this patch uses a heuristic, which is not
> > reliable.
> > 
> > However, instead you can identify whether you're trying to unwind
> > through one of the EL1 entry functions, which tells you the same thing
> > without even having to look at the pt_regs.
> > 
> > We can do that based on the entry functions all being in .entry.text,
> > which we could further sub-divide to split the EL0 and EL1 entry
> > functions.
> 
> Yes. I will check the entry functions. But I still think that we should
> not rely on just one check. The additional checks will make it robust.
> So, I suggest that the return address be checked first. If that passes,
> then we can be reasonably sure that there are pt_regs. Then, check
> the other things in pt_regs.

What do you think this will catch?

The only way to correctly identify whether or not we have a pt_regs is
to check whether we're in specific portions of the EL1 entry assembly
where the regs exist. However, as any EL1<->EL1 transition cannot be
safely unwound, we'd mark any trace going through the EL1 entry assembly
as unreliable.

Given that, I don't think it's useful to check the regs, and I'd prefer
to avoid the subtlteties involved in attempting to do so.

[...]

> >> +static void check_if_reliable(unsigned long fp, struct stackframe *frame,
> >> +			      struct stack_info *info)
> >> +{
> >> +	struct pt_regs *regs;
> >> +	unsigned long regs_start, regs_end;
> >> +
> >> +	/*
> >> +	 * If the stack trace has already been marked unreliable, just
> >> +	 * return.
> >> +	 */
> >> +	if (!frame->reliable)
> >> +		return;
> >> +
> >> +	/*
> >> +	 * Assume that this is an intermediate marker frame inside a pt_regs
> >> +	 * structure created on the stack and get the pt_regs pointer. Other
> >> +	 * checks will be done below to make sure that this is a marker
> >> +	 * frame.
> >> +	 */
> > 
> > Sorry, but NAK to this approach specifically. This isn't reliable (since
> > it can be influenced by arbitrary data on the stack), and it's far more
> > complicated than identifying the entry functions specifically.
> 
> As I mentioned above, I agree that we should check the return address. But
> just as a precaution, I think we should double check the pt_regs.
> 
> Is that OK with you? It does not take away anything or increase the risk in
> anyway. I think it makes it more robust.

As above, I think that the work necessary to correctly access the regs
means that it's not helpful to check the regs themselves. If you have
something in mind where checking the regs is helpful I'm happy to
consider that, but my general preference would be to stay away from the
regs for now.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable
  2021-03-23 13:04         ` Mark Rutland
@ 2021-03-23 13:31           ` Madhavan T. Venkataraman
  2021-03-23 14:33             ` Mark Rutland
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 13:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 8:04 AM, Mark Rutland wrote:
> On Tue, Mar 23, 2021 at 07:46:10AM -0500, Madhavan T. Venkataraman wrote:
>> On 3/23/21 5:42 AM, Mark Rutland wrote:
>>> On Mon, Mar 15, 2021 at 11:57:56AM -0500, madvenka@linux.microsoft.com wrote:
>>>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>>>
>>>> EL1 exceptions can happen on any instruction including instructions in
>>>> the frame pointer prolog or epilog. Depending on where exactly they happen,
>>>> they could render the stack trace unreliable.
>>>>
>>>> If an EL1 exception frame is found on the stack, mark the stack trace as
>>>> unreliable.
>>>>
>>>> Now, the EL1 exception frame is not at any well-known offset on the stack.
>>>> It can be anywhere on the stack. In order to properly detect an EL1
>>>> exception frame the following checks must be done:
>>>>
>>>> 	- The frame type must be EL1_FRAME.
>>>>
>>>> 	- When the register state is saved in the EL1 pt_regs, the frame
>>>> 	  pointer x29 is saved in pt_regs->regs[29] and the return PC
>>>> 	  is saved in pt_regs->pc. These must match with the current
>>>> 	  frame.
>>>
>>> Before you can do this, you need to reliably identify that you have a
>>> pt_regs on the stack, but this patch uses a heuristic, which is not
>>> reliable.
>>>
>>> However, instead you can identify whether you're trying to unwind
>>> through one of the EL1 entry functions, which tells you the same thing
>>> without even having to look at the pt_regs.
>>>
>>> We can do that based on the entry functions all being in .entry.text,
>>> which we could further sub-divide to split the EL0 and EL1 entry
>>> functions.
>>
>> Yes. I will check the entry functions. But I still think that we should
>> not rely on just one check. The additional checks will make it robust.
>> So, I suggest that the return address be checked first. If that passes,
>> then we can be reasonably sure that there are pt_regs. Then, check
>> the other things in pt_regs.
> 
> What do you think this will catch?
> 

I am not sure that I have an exact example to mention here. But I will attempt
one. Let us say that a task has called arch_stack_walk() in the recent past.
The unwinder may have copied a stack frame onto some location in the stack
with one of the return addresses we check. Let us assume that there is some
stack corruption that makes a frame pointer point to that exact record. Now,
we will get a match on the return address on the next unwind.

Pardon me if the example is somewhat crude. My point is that it is highly unlikely
but not impossible for the return address to be on the stack and for the unwinder to
get an unfortunate match.

> The only way to correctly identify whether or not we have a pt_regs is
> to check whether we're in specific portions of the EL1 entry assembly
> where the regs exist. However, as any EL1<->EL1 transition cannot be
> safely unwound, we'd mark any trace going through the EL1 entry assembly
> as unreliable.
> 
> Given that, I don't think it's useful to check the regs, and I'd prefer
> to avoid the subtlteties involved in attempting to do so.
> 

I agree that the return address check is a good check. I would like to add
extra checks to be absolutely sure.

> [...]
> 
>>>> +static void check_if_reliable(unsigned long fp, struct stackframe *frame,
>>>> +			      struct stack_info *info)
>>>> +{
>>>> +	struct pt_regs *regs;
>>>> +	unsigned long regs_start, regs_end;
>>>> +
>>>> +	/*
>>>> +	 * If the stack trace has already been marked unreliable, just
>>>> +	 * return.
>>>> +	 */
>>>> +	if (!frame->reliable)
>>>> +		return;
>>>> +
>>>> +	/*
>>>> +	 * Assume that this is an intermediate marker frame inside a pt_regs
>>>> +	 * structure created on the stack and get the pt_regs pointer. Other
>>>> +	 * checks will be done below to make sure that this is a marker
>>>> +	 * frame.
>>>> +	 */
>>>
>>> Sorry, but NAK to this approach specifically. This isn't reliable (since
>>> it can be influenced by arbitrary data on the stack), and it's far more
>>> complicated than identifying the entry functions specifically.
>>
>> As I mentioned above, I agree that we should check the return address. But
>> just as a precaution, I think we should double check the pt_regs.
>>
>> Is that OK with you? It does not take away anything or increase the risk in
>> anyway. I think it makes it more robust.
> 
> As above, I think that the work necessary to correctly access the regs
> means that it's not helpful to check the regs themselves. If you have
> something in mind where checking the regs is helpful I'm happy to
> consider that, but my general preference would be to stay away from the
> regs for now.
> 

I have mentioned a possibility above. Please take a look and let me know.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 12:56       ` Madhavan T. Venkataraman
@ 2021-03-23 13:36         ` Mark Rutland
  2021-03-23 13:38           ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 13:36 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Tue, Mar 23, 2021 at 07:56:40AM -0500, Madhavan T. Venkataraman wrote:
> 
> 
> On 3/23/21 5:51 AM, Mark Rutland wrote:
> > On Mon, Mar 15, 2021 at 11:57:57AM -0500, madvenka@linux.microsoft.com wrote:
> >> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> >>
> >> When CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled and tracing is activated
> >> for a function, the ftrace infrastructure is called for the function at
> >> the very beginning. Ftrace creates two frames:
> >>
> >> 	- One for the traced function
> >>
> >> 	- One for the caller of the traced function
> >>
> >> That gives a reliable stack trace while executing in the ftrace
> >> infrastructure code. When ftrace returns to the traced function, the frames
> >> are popped and everything is back to normal.
> >>
> >> However, in cases like live patch, execution is redirected to a different
> >> function when ftrace returns. A stack trace taken while still in the ftrace
> >> infrastructure code will not show the target function. The target function
> >> is the real function that we want to track.
> >>
> >> So, if an FTRACE frame is detected on the stack, just mark the stack trace
> >> as unreliable.
> > 
> > To identify this case, please identify the ftrace trampolines instead,
> > e.g. ftrace_regs_caller, return_to_handler.
> > 
> 
> Yes. As part of the return address checking, I will check this. IIUC, I think that
> I need to check for the inner labels that are defined at the point where the
> instructions are patched for ftrace. E.g., ftrace_call and ftrace_graph_call.
> 
> SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>         bl      ftrace_stub	<====================================
> 
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL) // ftrace_graph_caller();
>         nop	<=======                // If enabled, this will be replaced
>                                         // "b ftrace_graph_caller"
> #endif
> 
> For instance, the stack trace I got while tracing do_mmap() with the stack trace
> tracer looks like this:
> 
> 		 ...
> [  338.911793]   trace_function+0xc4/0x160
> [  338.911801]   function_stack_trace_call+0xac/0x130
> [  338.911807]   ftrace_graph_call+0x0/0x4
> [  338.911813]   do_mmap+0x8/0x598
> [  338.911820]   vm_mmap_pgoff+0xf4/0x188
> [  338.911826]   ksys_mmap_pgoff+0x1d8/0x220
> [  338.911832]   __arm64_sys_mmap+0x38/0x50
> [  338.911839]   el0_svc_common.constprop.0+0x70/0x1a8
> [  338.911846]   do_el0_svc+0x2c/0x98
> [  338.911851]   el0_svc+0x2c/0x70
> [  338.911859]   el0_sync_handler+0xb0/0xb8
> [  338.911864]   el0_sync+0x180/0x1c0
> 
> > It'd be good to check *exactly* when we need to reject, since IIUC when
> > we have a graph stack entry the unwind will be correct from livepatch's
> > PoV.
> > 
> 
> The current unwinder already handles this like this:
> 
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         if (tsk->ret_stack &&
>                 (ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) {
>                 struct ftrace_ret_stack *ret_stack;
>                 /*
>                  * This is a case where function graph tracer has
>                  * modified a return address (LR) in a stack frame
>                  * to hook a function return.
>                  * So replace it to an original value.
>                  */
>                 ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
>                 if (WARN_ON_ONCE(!ret_stack))
>                         return -EINVAL;
>                 frame->pc = ret_stack->ret;
>         }
> #endif /* CONFIG_FUNCTION_GRAPH_TRACER */

Beware that this handles the case where a function will return to
return_to_handler, but doesn't handle unwinding from *within*
return_to_handler, which we can't do reliably today, so that might need
special handling.

> Is there anything else that needs handling here?

I wrote up a few cases to consider in:

https://www.kernel.org/doc/html/latest/livepatch/reliable-stacktrace.html

... e.g. the "Obscuring of return addresses" case.

It might be that we're fine so long as we refuse to unwind across
exception boundaries, but it needs some thought. We probably need to go
over each of the trampolines instruction-by-instruction to consider
that.

As mentioned above, within return_to_handler when we call
ftrace_return_to_handler, there's a period where the real return address
has been removed from the ftrace return stack, but hasn't yet been
placed in x30, and wouldn't show up in a trace (e.g. if we could somehow
hook the return from ftrace_return_to_handler).

We might be saved by the fact we'll mark traces across exception
boundaries as unreliable, but I haven't thought very hard about it. We
might want to explciitly reject unwinds within return_to_handler in case
it's possible to interpose ftrace_return_to_handler somehow.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 13:36         ` Mark Rutland
@ 2021-03-23 13:38           ` Madhavan T. Venkataraman
  2021-03-23 14:15             ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 13:38 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 8:36 AM, Mark Rutland wrote:
> On Tue, Mar 23, 2021 at 07:56:40AM -0500, Madhavan T. Venkataraman wrote:
>>
>>
>> On 3/23/21 5:51 AM, Mark Rutland wrote:
>>> On Mon, Mar 15, 2021 at 11:57:57AM -0500, madvenka@linux.microsoft.com wrote:
>>>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>>>
>>>> When CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled and tracing is activated
>>>> for a function, the ftrace infrastructure is called for the function at
>>>> the very beginning. Ftrace creates two frames:
>>>>
>>>> 	- One for the traced function
>>>>
>>>> 	- One for the caller of the traced function
>>>>
>>>> That gives a reliable stack trace while executing in the ftrace
>>>> infrastructure code. When ftrace returns to the traced function, the frames
>>>> are popped and everything is back to normal.
>>>>
>>>> However, in cases like live patch, execution is redirected to a different
>>>> function when ftrace returns. A stack trace taken while still in the ftrace
>>>> infrastructure code will not show the target function. The target function
>>>> is the real function that we want to track.
>>>>
>>>> So, if an FTRACE frame is detected on the stack, just mark the stack trace
>>>> as unreliable.
>>>
>>> To identify this case, please identify the ftrace trampolines instead,
>>> e.g. ftrace_regs_caller, return_to_handler.
>>>
>>
>> Yes. As part of the return address checking, I will check this. IIUC, I think that
>> I need to check for the inner labels that are defined at the point where the
>> instructions are patched for ftrace. E.g., ftrace_call and ftrace_graph_call.
>>
>> SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>>         bl      ftrace_stub	<====================================
>>
>> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL) // ftrace_graph_caller();
>>         nop	<=======                // If enabled, this will be replaced
>>                                         // "b ftrace_graph_caller"
>> #endif
>>
>> For instance, the stack trace I got while tracing do_mmap() with the stack trace
>> tracer looks like this:
>>
>> 		 ...
>> [  338.911793]   trace_function+0xc4/0x160
>> [  338.911801]   function_stack_trace_call+0xac/0x130
>> [  338.911807]   ftrace_graph_call+0x0/0x4
>> [  338.911813]   do_mmap+0x8/0x598
>> [  338.911820]   vm_mmap_pgoff+0xf4/0x188
>> [  338.911826]   ksys_mmap_pgoff+0x1d8/0x220
>> [  338.911832]   __arm64_sys_mmap+0x38/0x50
>> [  338.911839]   el0_svc_common.constprop.0+0x70/0x1a8
>> [  338.911846]   do_el0_svc+0x2c/0x98
>> [  338.911851]   el0_svc+0x2c/0x70
>> [  338.911859]   el0_sync_handler+0xb0/0xb8
>> [  338.911864]   el0_sync+0x180/0x1c0
>>
>>> It'd be good to check *exactly* when we need to reject, since IIUC when
>>> we have a graph stack entry the unwind will be correct from livepatch's
>>> PoV.
>>>
>>
>> The current unwinder already handles this like this:
>>
>> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>         if (tsk->ret_stack &&
>>                 (ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) {
>>                 struct ftrace_ret_stack *ret_stack;
>>                 /*
>>                  * This is a case where function graph tracer has
>>                  * modified a return address (LR) in a stack frame
>>                  * to hook a function return.
>>                  * So replace it to an original value.
>>                  */
>>                 ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
>>                 if (WARN_ON_ONCE(!ret_stack))
>>                         return -EINVAL;
>>                 frame->pc = ret_stack->ret;
>>         }
>> #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> 
> Beware that this handles the case where a function will return to
> return_to_handler, but doesn't handle unwinding from *within*
> return_to_handler, which we can't do reliably today, so that might need
> special handling.
> 

OK. I will take a look at this.

>> Is there anything else that needs handling here?
> 
> I wrote up a few cases to consider in:
> 
> https://www.kernel.org/doc/html/latest/livepatch/reliable-stacktrace.html
> 
> ... e.g. the "Obscuring of return addresses" case.
> 
> It might be that we're fine so long as we refuse to unwind across
> exception boundaries, but it needs some thought. We probably need to go
> over each of the trampolines instruction-by-instruction to consider
> that.
> 
> As mentioned above, within return_to_handler when we call
> ftrace_return_to_handler, there's a period where the real return address
> has been removed from the ftrace return stack, but hasn't yet been
> placed in x30, and wouldn't show up in a trace (e.g. if we could somehow
> hook the return from ftrace_return_to_handler).
> 
> We might be saved by the fact we'll mark traces across exception
> boundaries as unreliable, but I haven't thought very hard about it. We
> might want to explciitly reject unwinds within return_to_handler in case
> it's possible to interpose ftrace_return_to_handler somehow.
> 

OK. I will study the above.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 13:38           ` Madhavan T. Venkataraman
@ 2021-03-23 14:15             ` Madhavan T. Venkataraman
  2021-03-23 14:57               ` Mark Rutland
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 14:15 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

Hi Mark,

I have a general question. When exceptions are nested, how does it work? Let us consider 2 cases:

1. Exception in a page fault handler itself. In this case, I guess one more pt_regs will get
   established in the task stack for the second exception.

2. Exception in an interrupt handler. Here the interrupt handler is running on the IRQ stack.
   Will the pt_regs get created on the IRQ stack?

Also, is there a maximum nesting for exceptions?

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable
  2021-03-23 13:31           ` Madhavan T. Venkataraman
@ 2021-03-23 14:33             ` Mark Rutland
  2021-03-23 15:22               ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 14:33 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Tue, Mar 23, 2021 at 08:31:50AM -0500, Madhavan T. Venkataraman wrote:
> On 3/23/21 8:04 AM, Mark Rutland wrote:
> > On Tue, Mar 23, 2021 at 07:46:10AM -0500, Madhavan T. Venkataraman wrote:
> >> On 3/23/21 5:42 AM, Mark Rutland wrote:
> >>> On Mon, Mar 15, 2021 at 11:57:56AM -0500, madvenka@linux.microsoft.com wrote:
> >>>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> >>>>
> >>>> EL1 exceptions can happen on any instruction including instructions in
> >>>> the frame pointer prolog or epilog. Depending on where exactly they happen,
> >>>> they could render the stack trace unreliable.
> >>>>
> >>>> If an EL1 exception frame is found on the stack, mark the stack trace as
> >>>> unreliable.
> >>>>
> >>>> Now, the EL1 exception frame is not at any well-known offset on the stack.
> >>>> It can be anywhere on the stack. In order to properly detect an EL1
> >>>> exception frame the following checks must be done:
> >>>>
> >>>> 	- The frame type must be EL1_FRAME.
> >>>>
> >>>> 	- When the register state is saved in the EL1 pt_regs, the frame
> >>>> 	  pointer x29 is saved in pt_regs->regs[29] and the return PC
> >>>> 	  is saved in pt_regs->pc. These must match with the current
> >>>> 	  frame.
> >>>
> >>> Before you can do this, you need to reliably identify that you have a
> >>> pt_regs on the stack, but this patch uses a heuristic, which is not
> >>> reliable.
> >>>
> >>> However, instead you can identify whether you're trying to unwind
> >>> through one of the EL1 entry functions, which tells you the same thing
> >>> without even having to look at the pt_regs.
> >>>
> >>> We can do that based on the entry functions all being in .entry.text,
> >>> which we could further sub-divide to split the EL0 and EL1 entry
> >>> functions.
> >>
> >> Yes. I will check the entry functions. But I still think that we should
> >> not rely on just one check. The additional checks will make it robust.
> >> So, I suggest that the return address be checked first. If that passes,
> >> then we can be reasonably sure that there are pt_regs. Then, check
> >> the other things in pt_regs.
> > 
> > What do you think this will catch?
> 
> I am not sure that I have an exact example to mention here. But I will attempt
> one. Let us say that a task has called arch_stack_walk() in the recent past.
> The unwinder may have copied a stack frame onto some location in the stack
> with one of the return addresses we check. Let us assume that there is some
> stack corruption that makes a frame pointer point to that exact record. Now,
> we will get a match on the return address on the next unwind.

I don't see how this is material to the pt_regs case, as either:

* When the unwinder considers this frame, it appears to be in the middle
  of an EL1 entry function, and the unwinder must mark the unwinding as
  unreliable regardless of the contents of any regs (so there's no need
  to look at the regs).

* When the unwinder considers this frame, it does not appear to be in
  the middle of an EL1 entry function, so the unwinder does not think
  there are any regs to consider, and so we cannot detect this case.

... unless I've misunderstood the example?

There's a general problem that it's possible to corrupt any portion of
the chain to skip records, e.g.

  A -> B -> C -> D -> E -> F -> G -> H -> [final]

... could get corrupted to:

  A -> B -> D -> H -> [final]

... regardless of whether C/E/F/G had associated pt_regs. AFAICT there's
no good way to catch this generally unless we have additional metadata
to check the unwinding against.

The likelihood of this happening without triggering other checks is
vanishingly low, and as we don't have a reliable mechanism for detecting
this, I don't think it's worthwhile attempting to do so.

If and when we try to unwind across EL1 exception boundaries, the
potential mismatch between the frame record and regs will be more
significant, and I agree at that point thisd will need more thought.

> Pardon me if the example is somewhat crude. My point is that it is
> highly unlikely but not impossible for the return address to be on the
> stack and for the unwinder to get an unfortunate match.

I agree that this is possible in theory, but as above I don't think this
is a practical concern.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 14:15             ` Madhavan T. Venkataraman
@ 2021-03-23 14:57               ` Mark Rutland
  2021-03-23 15:26                 ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 14:57 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Tue, Mar 23, 2021 at 09:15:36AM -0500, Madhavan T. Venkataraman wrote:
> Hi Mark,
> 
> I have a general question. When exceptions are nested, how does it work? Let us consider 2 cases:
> 
> 1. Exception in a page fault handler itself. In this case, I guess one more pt_regs will get
>    established in the task stack for the second exception.

Generally (ignoring SDEI and stack overflow exceptions) the regs will be
placed on the stack that was in use when the exception occurred, e.g.

  task -> task
  irq -> irq
  overflow -> overflow

For SDEI and stack overflow, we'll place the regs on the relevant SDEI
or overflow stack, e.g.

  task -> overflow
  irq -> overflow

  task -> sdei
  irq -> sdei

I tried to explain the nesting rules in:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/stacktrace.c?h=v5.11#n59
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64/kernel/stacktrace.c?h=v5.11&id=592700f094be229b5c9cc1192d5cea46eb4c7afc

> 2. Exception in an interrupt handler. Here the interrupt handler is running on the IRQ stack.
>    Will the pt_regs get created on the IRQ stack?

For an interrupt the regs will be placed on the stack that was in use
when the interrupt was taken. The kernel switches to the IRQ stack
*after* stacking the registers. e.g.

  task -> task // subsequently switches to IRQ stack
  irq -> irq

> Also, is there a maximum nesting for exceptions?

In practice, yes, but the specific number isn't a constant, so in the
unwind code we have to act as if there is no limit other than stack
sizing.

We try to prevent cerain exceptions from nesting (e.g. debug exceptions
cannot nest), but there are still several level sof nesting, and some
exceptions which can be nested safely (like faults). For example, it's
possible to have a chain:

 syscall -> fault -> interrupt -> fault -> pNMI -> fault -> SError -> fault -> watchpoint -> fault -> overflow -> fault -> BRK

... and potentially longer than that.

The practical limit is the size of all the stacks, and the unwinder's 
stack monotonicity checks ensure that an unwind will terminate.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable
  2021-03-23 14:33             ` Mark Rutland
@ 2021-03-23 15:22               ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 15:22 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 9:33 AM, Mark Rutland wrote:
> On Tue, Mar 23, 2021 at 08:31:50AM -0500, Madhavan T. Venkataraman wrote:
>> On 3/23/21 8:04 AM, Mark Rutland wrote:
>>> On Tue, Mar 23, 2021 at 07:46:10AM -0500, Madhavan T. Venkataraman wrote:
>>>> On 3/23/21 5:42 AM, Mark Rutland wrote:
>>>>> On Mon, Mar 15, 2021 at 11:57:56AM -0500, madvenka@linux.microsoft.com wrote:
>>>>>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>>>>>
>>>>>> EL1 exceptions can happen on any instruction including instructions in
>>>>>> the frame pointer prolog or epilog. Depending on where exactly they happen,
>>>>>> they could render the stack trace unreliable.
>>>>>>
>>>>>> If an EL1 exception frame is found on the stack, mark the stack trace as
>>>>>> unreliable.
>>>>>>
>>>>>> Now, the EL1 exception frame is not at any well-known offset on the stack.
>>>>>> It can be anywhere on the stack. In order to properly detect an EL1
>>>>>> exception frame the following checks must be done:
>>>>>>
>>>>>> 	- The frame type must be EL1_FRAME.
>>>>>>
>>>>>> 	- When the register state is saved in the EL1 pt_regs, the frame
>>>>>> 	  pointer x29 is saved in pt_regs->regs[29] and the return PC
>>>>>> 	  is saved in pt_regs->pc. These must match with the current
>>>>>> 	  frame.
>>>>>
>>>>> Before you can do this, you need to reliably identify that you have a
>>>>> pt_regs on the stack, but this patch uses a heuristic, which is not
>>>>> reliable.
>>>>>
>>>>> However, instead you can identify whether you're trying to unwind
>>>>> through one of the EL1 entry functions, which tells you the same thing
>>>>> without even having to look at the pt_regs.
>>>>>
>>>>> We can do that based on the entry functions all being in .entry.text,
>>>>> which we could further sub-divide to split the EL0 and EL1 entry
>>>>> functions.
>>>>
>>>> Yes. I will check the entry functions. But I still think that we should
>>>> not rely on just one check. The additional checks will make it robust.
>>>> So, I suggest that the return address be checked first. If that passes,
>>>> then we can be reasonably sure that there are pt_regs. Then, check
>>>> the other things in pt_regs.
>>>
>>> What do you think this will catch?
>>
>> I am not sure that I have an exact example to mention here. But I will attempt
>> one. Let us say that a task has called arch_stack_walk() in the recent past.
>> The unwinder may have copied a stack frame onto some location in the stack
>> with one of the return addresses we check. Let us assume that there is some
>> stack corruption that makes a frame pointer point to that exact record. Now,
>> we will get a match on the return address on the next unwind.
> 
> I don't see how this is material to the pt_regs case, as either:
> 
> * When the unwinder considers this frame, it appears to be in the middle
>   of an EL1 entry function, and the unwinder must mark the unwinding as
>   unreliable regardless of the contents of any regs (so there's no need
>   to look at the regs).
> 
> * When the unwinder considers this frame, it does not appear to be in
>   the middle of an EL1 entry function, so the unwinder does not think
>   there are any regs to consider, and so we cannot detect this case.
> 
> ... unless I've misunderstood the example?
> 
> There's a general problem that it's possible to corrupt any portion of
> the chain to skip records, e.g.
> 
>   A -> B -> C -> D -> E -> F -> G -> H -> [final]
> 
> ... could get corrupted to:
> 
>   A -> B -> D -> H -> [final]
> 
> ... regardless of whether C/E/F/G had associated pt_regs. AFAICT there's
> no good way to catch this generally unless we have additional metadata
> to check the unwinding against.
> 
> The likelihood of this happening without triggering other checks is
> vanishingly low, and as we don't have a reliable mechanism for detecting
> this, I don't think it's worthwhile attempting to do so.
> 
> If and when we try to unwind across EL1 exception boundaries, the
> potential mismatch between the frame record and regs will be more
> significant, and I agree at that point thisd will need more thought.
> 
>> Pardon me if the example is somewhat crude. My point is that it is
>> highly unlikely but not impossible for the return address to be on the
>> stack and for the unwinder to get an unfortunate match.
> 
> I agree that this is possible in theory, but as above I don't think this
> is a practical concern.
> 

OK. What you say makes sense.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 14:57               ` Mark Rutland
@ 2021-03-23 15:26                 ` Madhavan T. Venkataraman
  2021-03-23 16:20                   ` Madhavan T. Venkataraman
  2021-03-23 16:48                   ` Mark Rutland
  0 siblings, 2 replies; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 15:26 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 9:57 AM, Mark Rutland wrote:
> On Tue, Mar 23, 2021 at 09:15:36AM -0500, Madhavan T. Venkataraman wrote:
>> Hi Mark,
>>
>> I have a general question. When exceptions are nested, how does it work? Let us consider 2 cases:
>>
>> 1. Exception in a page fault handler itself. In this case, I guess one more pt_regs will get
>>    established in the task stack for the second exception.
> 
> Generally (ignoring SDEI and stack overflow exceptions) the regs will be
> placed on the stack that was in use when the exception occurred, e.g.
> 
>   task -> task
>   irq -> irq
>   overflow -> overflow
> 
> For SDEI and stack overflow, we'll place the regs on the relevant SDEI
> or overflow stack, e.g.
> 
>   task -> overflow
>   irq -> overflow
> 
>   task -> sdei
>   irq -> sdei
> 
> I tried to explain the nesting rules in:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/stacktrace.c?h=v5.11#n59
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64/kernel/stacktrace.c?h=v5.11&id=592700f094be229b5c9cc1192d5cea46eb4c7afc
> 
>> 2. Exception in an interrupt handler. Here the interrupt handler is running on the IRQ stack.
>>    Will the pt_regs get created on the IRQ stack?
> 
> For an interrupt the regs will be placed on the stack that was in use
> when the interrupt was taken. The kernel switches to the IRQ stack
> *after* stacking the registers. e.g.
> 
>   task -> task // subsequently switches to IRQ stack
>   irq -> irq
> 
>> Also, is there a maximum nesting for exceptions?
> 
> In practice, yes, but the specific number isn't a constant, so in the
> unwind code we have to act as if there is no limit other than stack
> sizing.
> 
> We try to prevent cerain exceptions from nesting (e.g. debug exceptions
> cannot nest), but there are still several level sof nesting, and some
> exceptions which can be nested safely (like faults). For example, it's
> possible to have a chain:
> 
>  syscall -> fault -> interrupt -> fault -> pNMI -> fault -> SError -> fault -> watchpoint -> fault -> overflow -> fault -> BRK
> 
> ... and potentially longer than that.
> 
> The practical limit is the size of all the stacks, and the unwinder's 
> stack monotonicity checks ensure that an unwind will terminate.
> 

Thanks for explaining the nesting. It is now clear to me.

So, my next question is - can we define a practical limit for the nesting so that any nesting beyond that
is fatal? The reason I ask is - if there is a max, then we can allocate an array of stack frames out of
band for the special frames so they are not part of the stack and will not likely get corrupted.

Also, we don't have to do any special detection. If the number of out of band frames used is one or more
then we have exceptions and the stack trace is unreliable.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 15:26                 ` Madhavan T. Venkataraman
@ 2021-03-23 16:20                   ` Madhavan T. Venkataraman
  2021-03-23 17:02                     ` Mark Rutland
  2021-03-23 16:48                   ` Mark Rutland
  1 sibling, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 16:20 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 10:26 AM, Madhavan T. Venkataraman wrote:
> 
> 
> On 3/23/21 9:57 AM, Mark Rutland wrote:
>> On Tue, Mar 23, 2021 at 09:15:36AM -0500, Madhavan T. Venkataraman wrote:
>>> Hi Mark,
>>>
>>> I have a general question. When exceptions are nested, how does it work? Let us consider 2 cases:
>>>
>>> 1. Exception in a page fault handler itself. In this case, I guess one more pt_regs will get
>>>    established in the task stack for the second exception.
>>
>> Generally (ignoring SDEI and stack overflow exceptions) the regs will be
>> placed on the stack that was in use when the exception occurred, e.g.
>>
>>   task -> task
>>   irq -> irq
>>   overflow -> overflow
>>
>> For SDEI and stack overflow, we'll place the regs on the relevant SDEI
>> or overflow stack, e.g.
>>
>>   task -> overflow
>>   irq -> overflow
>>
>>   task -> sdei
>>   irq -> sdei
>>
>> I tried to explain the nesting rules in:
>>
>>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/stacktrace.c?h=v5.11#n59
>>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64/kernel/stacktrace.c?h=v5.11&id=592700f094be229b5c9cc1192d5cea46eb4c7afc
>>
>>> 2. Exception in an interrupt handler. Here the interrupt handler is running on the IRQ stack.
>>>    Will the pt_regs get created on the IRQ stack?
>>
>> For an interrupt the regs will be placed on the stack that was in use
>> when the interrupt was taken. The kernel switches to the IRQ stack
>> *after* stacking the registers. e.g.
>>
>>   task -> task // subsequently switches to IRQ stack
>>   irq -> irq
>>
>>> Also, is there a maximum nesting for exceptions?
>>
>> In practice, yes, but the specific number isn't a constant, so in the
>> unwind code we have to act as if there is no limit other than stack
>> sizing.
>>
>> We try to prevent cerain exceptions from nesting (e.g. debug exceptions
>> cannot nest), but there are still several level sof nesting, and some
>> exceptions which can be nested safely (like faults). For example, it's
>> possible to have a chain:
>>
>>  syscall -> fault -> interrupt -> fault -> pNMI -> fault -> SError -> fault -> watchpoint -> fault -> overflow -> fault -> BRK
>>
>> ... and potentially longer than that.
>>
>> The practical limit is the size of all the stacks, and the unwinder's 
>> stack monotonicity checks ensure that an unwind will terminate.
>>
> 
> Thanks for explaining the nesting. It is now clear to me.
> 
> So, my next question is - can we define a practical limit for the nesting so that any nesting beyond that
> is fatal? The reason I ask is - if there is a max, then we can allocate an array of stack frames out of
> band for the special frames so they are not part of the stack and will not likely get corrupted.
> 
> Also, we don't have to do any special detection. If the number of out of band frames used is one or more
> then we have exceptions and the stack trace is unreliable.
> 

Alternatively, if we can just increment a counter in the task structure when an exception is entered
and decrement it when an exception returns, that counter will tell us that the stack trace is
unreliable.

Is this feasible?

I think I have enough for v3 at this point. If you think that the counter idea is OK, I can
implement it in v3. Once you confirm, I will start working on v3.

Thanks for all the input.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 15:26                 ` Madhavan T. Venkataraman
  2021-03-23 16:20                   ` Madhavan T. Venkataraman
@ 2021-03-23 16:48                   ` Mark Rutland
  2021-03-23 16:53                     ` Madhavan T. Venkataraman
  1 sibling, 1 reply; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 16:48 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Tue, Mar 23, 2021 at 10:26:50AM -0500, Madhavan T. Venkataraman wrote:
> On 3/23/21 9:57 AM, Mark Rutland wrote:
> Thanks for explaining the nesting. It is now clear to me.

No problem!

> So, my next question is - can we define a practical limit for the
> nesting so that any nesting beyond that is fatal? The reason I ask is
> - if there is a max, then we can allocate an array of stack frames out
> of band for the special frames so they are not part of the stack and
> will not likely get corrupted.

I suspect we can't define such a fatal limit without introducing a local
DoS vector on some otherwise legitimate workload, and I fear this will
further complicate the entry/exit logic, so I'd prefer to avoid
introducing a new limit.

What exactly do you mean by a "special frame", and why do those need
additional protection over regular frame records?

> Also, we don't have to do any special detection. If the number of out
> of band frames used is one or more then we have exceptions and the
> stack trace is unreliable.

What is expected to protect against?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 16:48                   ` Mark Rutland
@ 2021-03-23 16:53                     ` Madhavan T. Venkataraman
  2021-03-23 17:09                       ` Mark Rutland
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 16:53 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 11:48 AM, Mark Rutland wrote:
> On Tue, Mar 23, 2021 at 10:26:50AM -0500, Madhavan T. Venkataraman wrote:
>> On 3/23/21 9:57 AM, Mark Rutland wrote:
>> Thanks for explaining the nesting. It is now clear to me.
> 
> No problem!
> 
>> So, my next question is - can we define a practical limit for the
>> nesting so that any nesting beyond that is fatal? The reason I ask is
>> - if there is a max, then we can allocate an array of stack frames out
>> of band for the special frames so they are not part of the stack and
>> will not likely get corrupted.
> 
> I suspect we can't define such a fatal limit without introducing a local
> DoS vector on some otherwise legitimate workload, and I fear this will
> further complicate the entry/exit logic, so I'd prefer to avoid
> introducing a new limit.
> 

I suspected as much. But I thought I will ask anyway.

> What exactly do you mean by a "special frame", and why do those need
> additional protection over regular frame records?
> 

Special frame just means pt_regs->stackframe that is used for exceptions.
No additional protection is needed. I just meant that since they are
out of band, we can reliably tell that there are exceptions without
examining the stack. That is all.

>> Also, we don't have to do any special detection. If the number of out
>> of band frames used is one or more then we have exceptions and the
>> stack trace is unreliable.
> 
> What is expected to protect against?
> 

It is not a protection thing. I just wanted a reliable way to tell that there
is an exception without having to unwind the stack up to the exception frame.
That is all.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 16:20                   ` Madhavan T. Venkataraman
@ 2021-03-23 17:02                     ` Mark Rutland
  2021-03-23 17:23                       ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 17:02 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Tue, Mar 23, 2021 at 11:20:44AM -0500, Madhavan T. Venkataraman wrote:
> On 3/23/21 10:26 AM, Madhavan T. Venkataraman wrote:
> > On 3/23/21 9:57 AM, Mark Rutland wrote:
> >> On Tue, Mar 23, 2021 at 09:15:36AM -0500, Madhavan T. Venkataraman wrote:
> > So, my next question is - can we define a practical limit for the
> > nesting so that any nesting beyond that is fatal? The reason I ask
> > is - if there is a max, then we can allocate an array of stack
> > frames out of band for the special frames so they are not part of
> > the stack and will not likely get corrupted.
> > 
> > Also, we don't have to do any special detection. If the number of
> > out of band frames used is one or more then we have exceptions and
> > the stack trace is unreliable.
> 
> Alternatively, if we can just increment a counter in the task
> structure when an exception is entered and decrement it when an
> exception returns, that counter will tell us that the stack trace is
> unreliable.

As I noted earlier, we must treat *any* EL1 exception boundary needs to
be treated as unreliable for unwinding, and per my other comments w.r.t.
corrupting the call chain I don't think we need additional protection on
exception boundaries specifically.

> Is this feasible?
> 
> I think I have enough for v3 at this point. If you think that the
> counter idea is OK, I can implement it in v3. Once you confirm, I will
> start working on v3.

Currently, I don't see a compelling reason to need this, and would
prefer to avoid it.

More generally, could we please break this work into smaller steps? I
reckon we can break this down into the following chunks:

1. Add the explicit final frame and associated handling. I suspect that
   this is complicated enough on its own to be an independent series,
   and it's something that we can merge without all the bits and pieces
   necessary for truly reliable stacktracing.

2. Figure out how we must handle kprobes and ftrace. That probably means
   rejecting unwinds from specific places, but we might also want to
   adjust the trampolines if that makes this easier.

3. Figure out exception boundary handling. I'm currently working to
   simplify the entry assembly down to a uniform set of stubs, and I'd
   prefer to get that sorted before we teach the unwinder about
   exception boundaries, as it'll be significantly simpler to reason
   about and won't end up clashing with the rework.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 16:53                     ` Madhavan T. Venkataraman
@ 2021-03-23 17:09                       ` Mark Rutland
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 17:09 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Tue, Mar 23, 2021 at 11:53:04AM -0500, Madhavan T. Venkataraman wrote:
> On 3/23/21 11:48 AM, Mark Rutland wrote:
> > On Tue, Mar 23, 2021 at 10:26:50AM -0500, Madhavan T. Venkataraman wrote:
> >> So, my next question is - can we define a practical limit for the
> >> nesting so that any nesting beyond that is fatal? The reason I ask is
> >> - if there is a max, then we can allocate an array of stack frames out
> >> of band for the special frames so they are not part of the stack and
> >> will not likely get corrupted.

> >> Also, we don't have to do any special detection. If the number of out
> >> of band frames used is one or more then we have exceptions and the
> >> stack trace is unreliable.
> > 
> > What is expected to protect against?
> 
> It is not a protection thing. I just wanted a reliable way to tell that there
> is an exception without having to unwind the stack up to the exception frame.
> That is all.

I see.

Given that's an optimization, we can consider doing something like that
that after we have the functional bits in place, where we'll be in a
position to see whether this is even a measureable concern in practice.

I suspect that longer-term we'll end up trying to use metadata to unwind
across exception boundaries, since it's possible to get blocked within
those for long periods (e.g. for a uaccess fault), and the larger scale
optimization for patching is to not block the patch.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 17:02                     ` Mark Rutland
@ 2021-03-23 17:23                       ` Madhavan T. Venkataraman
  2021-03-23 17:27                         ` Madhavan T. Venkataraman
                                           ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 17:23 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 12:02 PM, Mark Rutland wrote:
> On Tue, Mar 23, 2021 at 11:20:44AM -0500, Madhavan T. Venkataraman wrote:
>> On 3/23/21 10:26 AM, Madhavan T. Venkataraman wrote:
>>> On 3/23/21 9:57 AM, Mark Rutland wrote:
>>>> On Tue, Mar 23, 2021 at 09:15:36AM -0500, Madhavan T. Venkataraman wrote:
>>> So, my next question is - can we define a practical limit for the
>>> nesting so that any nesting beyond that is fatal? The reason I ask
>>> is - if there is a max, then we can allocate an array of stack
>>> frames out of band for the special frames so they are not part of
>>> the stack and will not likely get corrupted.
>>>
>>> Also, we don't have to do any special detection. If the number of
>>> out of band frames used is one or more then we have exceptions and
>>> the stack trace is unreliable.
>>
>> Alternatively, if we can just increment a counter in the task
>> structure when an exception is entered and decrement it when an
>> exception returns, that counter will tell us that the stack trace is
>> unreliable.
> 
> As I noted earlier, we must treat *any* EL1 exception boundary needs to
> be treated as unreliable for unwinding, and per my other comments w.r.t.
> corrupting the call chain I don't think we need additional protection on
> exception boundaries specifically.
> 
>> Is this feasible?
>>
>> I think I have enough for v3 at this point. If you think that the
>> counter idea is OK, I can implement it in v3. Once you confirm, I will
>> start working on v3.
> 
> Currently, I don't see a compelling reason to need this, and would
> prefer to avoid it.
> 

I think that I did a bad job of explaining what I wanted to do. It is not
for any additional protection at all.

So, let us say we create a field in the task structure:

	u64		unreliable_stack;

Whenever an EL1 exception is entered or FTRACE is entered and pt_regs get
set up and pt_regs->stackframe gets chained, increment unreliable_stack.
On exiting the above, decrement unreliable_stack.

In arch_stack_walk_reliable(), simply do this check upfront:

	if (task->unreliable_stack)
		return -EINVAL;

This way, the function does not even bother unwinding the stack to find
exception frames or checking for different return addresses or anything.
We also don't have to worry about code being reorganized, functions
being renamed, etc. It also may help in debugging to know if a task is
experiencing an exception and the level of nesting, etc.

> More generally, could we please break this work into smaller steps? I
> reckon we can break this down into the following chunks:
> 
> 1. Add the explicit final frame and associated handling. I suspect that
>    this is complicated enough on its own to be an independent series,
>    and it's something that we can merge without all the bits and pieces
>    necessary for truly reliable stacktracing.
> 

OK. I can do that.

> 2. Figure out how we must handle kprobes and ftrace. That probably means
>    rejecting unwinds from specific places, but we might also want to
>    adjust the trampolines if that makes this easier.
> 

I think I am already doing all the checks except the one you mentioned
earlier. Yes, I can do this separately.

> 3. Figure out exception boundary handling. I'm currently working to
>    simplify the entry assembly down to a uniform set of stubs, and I'd
>    prefer to get that sorted before we teach the unwinder about
>    exception boundaries, as it'll be significantly simpler to reason
>    about and won't end up clashing with the rework.
> 

So, here is where I still have a question. Is it necessary for the unwinder
to know the exception boundaries? Is it not enough if it knows if there are
exceptions present? For instance, using something like num_special_frames
I suggested above?

Thanks,

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 17:23                       ` Madhavan T. Venkataraman
@ 2021-03-23 17:27                         ` Madhavan T. Venkataraman
  2021-03-23 18:27                         ` Mark Brown
  2021-03-23 18:30                         ` Mark Rutland
  2 siblings, 0 replies; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 17:27 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 12:23 PM, Madhavan T. Venkataraman wrote:
> 
> 
> On 3/23/21 12:02 PM, Mark Rutland wrote:
>> On Tue, Mar 23, 2021 at 11:20:44AM -0500, Madhavan T. Venkataraman wrote:
>>> On 3/23/21 10:26 AM, Madhavan T. Venkataraman wrote:
>>>> On 3/23/21 9:57 AM, Mark Rutland wrote:
>>>>> On Tue, Mar 23, 2021 at 09:15:36AM -0500, Madhavan T. Venkataraman wrote:
>>>> So, my next question is - can we define a practical limit for the
>>>> nesting so that any nesting beyond that is fatal? The reason I ask
>>>> is - if there is a max, then we can allocate an array of stack
>>>> frames out of band for the special frames so they are not part of
>>>> the stack and will not likely get corrupted.
>>>>
>>>> Also, we don't have to do any special detection. If the number of
>>>> out of band frames used is one or more then we have exceptions and
>>>> the stack trace is unreliable.
>>>
>>> Alternatively, if we can just increment a counter in the task
>>> structure when an exception is entered and decrement it when an
>>> exception returns, that counter will tell us that the stack trace is
>>> unreliable.
>>
>> As I noted earlier, we must treat *any* EL1 exception boundary needs to
>> be treated as unreliable for unwinding, and per my other comments w.r.t.
>> corrupting the call chain I don't think we need additional protection on
>> exception boundaries specifically.
>>
>>> Is this feasible?
>>>
>>> I think I have enough for v3 at this point. If you think that the
>>> counter idea is OK, I can implement it in v3. Once you confirm, I will
>>> start working on v3.
>>
>> Currently, I don't see a compelling reason to need this, and would
>> prefer to avoid it.
>>
> 
> I think that I did a bad job of explaining what I wanted to do. It is not
> for any additional protection at all.
> 
> So, let us say we create a field in the task structure:
> 
> 	u64		unreliable_stack;
> 
> Whenever an EL1 exception is entered or FTRACE is entered and pt_regs get
> set up and pt_regs->stackframe gets chained, increment unreliable_stack.
> On exiting the above, decrement unreliable_stack.
> 
> In arch_stack_walk_reliable(), simply do this check upfront:
> 
> 	if (task->unreliable_stack)
> 		return -EINVAL;
> 
> This way, the function does not even bother unwinding the stack to find
> exception frames or checking for different return addresses or anything.
> We also don't have to worry about code being reorganized, functions
> being renamed, etc. It also may help in debugging to know if a task is
> experiencing an exception and the level of nesting, etc.
> 
>> More generally, could we please break this work into smaller steps? I
>> reckon we can break this down into the following chunks:
>>
>> 1. Add the explicit final frame and associated handling. I suspect that
>>    this is complicated enough on its own to be an independent series,
>>    and it's something that we can merge without all the bits and pieces
>>    necessary for truly reliable stacktracing.
>>
> 
> OK. I can do that.
> 
>> 2. Figure out how we must handle kprobes and ftrace. That probably means
>>    rejecting unwinds from specific places, but we might also want to
>>    adjust the trampolines if that makes this easier.
>>
> 
> I think I am already doing all the checks except the one you mentioned
> earlier. Yes, I can do this separately.
> 
>> 3. Figure out exception boundary handling. I'm currently working to
>>    simplify the entry assembly down to a uniform set of stubs, and I'd
>>    prefer to get that sorted before we teach the unwinder about
>>    exception boundaries, as it'll be significantly simpler to reason
>>    about and won't end up clashing with the rework.
>>
> 
> So, here is where I still have a question. Is it necessary for the unwinder
> to know the exception boundaries? Is it not enough if it knows if there are
> exceptions present? For instance, using something like num_special_frames

Typo - num_special_frames should be unreliable_stack. That is the name of
the counter I used above.

Sorry about that.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 17:23                       ` Madhavan T. Venkataraman
  2021-03-23 17:27                         ` Madhavan T. Venkataraman
@ 2021-03-23 18:27                         ` Mark Brown
  2021-03-23 20:23                           ` Madhavan T. Venkataraman
  2021-03-23 18:30                         ` Mark Rutland
  2 siblings, 1 reply; 55+ messages in thread
From: Mark Brown @ 2021-03-23 18:27 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Mark Rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1368 bytes --]

On Tue, Mar 23, 2021 at 12:23:34PM -0500, Madhavan T. Venkataraman wrote:
> On 3/23/21 12:02 PM, Mark Rutland wrote:

> > 3. Figure out exception boundary handling. I'm currently working to
> >    simplify the entry assembly down to a uniform set of stubs, and I'd
> >    prefer to get that sorted before we teach the unwinder about
> >    exception boundaries, as it'll be significantly simpler to reason
> >    about and won't end up clashing with the rework.

> So, here is where I still have a question. Is it necessary for the unwinder
> to know the exception boundaries? Is it not enough if it knows if there are
> exceptions present? For instance, using something like num_special_frames
> I suggested above?

For reliable stack trace we can live with just flagging things as
unreliable when we know there's an exception boundary somewhere but (as
Mark mentioned elsewhere) being able to actually go through a subset of
exception boundaries safely is likely to help usefully improve the
performance of live patching, and for defensiveness we want to try to
detect during an actual unwind anyway so it ends up being a performance
improvment and double check rather than saving us code.  Better
understanding of what's going on in the presence of exceptions may also
help other users of the unwinder which can use stacks which aren't
reliable get better results.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 17:23                       ` Madhavan T. Venkataraman
  2021-03-23 17:27                         ` Madhavan T. Venkataraman
  2021-03-23 18:27                         ` Mark Brown
@ 2021-03-23 18:30                         ` Mark Rutland
  2021-03-23 20:24                           ` Madhavan T. Venkataraman
  2 siblings, 1 reply; 55+ messages in thread
From: Mark Rutland @ 2021-03-23 18:30 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

On Tue, Mar 23, 2021 at 12:23:34PM -0500, Madhavan T. Venkataraman wrote:
> On 3/23/21 12:02 PM, Mark Rutland wrote:

[...]

> I think that I did a bad job of explaining what I wanted to do. It is not
> for any additional protection at all.
> 
> So, let us say we create a field in the task structure:
> 
> 	u64		unreliable_stack;
> 
> Whenever an EL1 exception is entered or FTRACE is entered and pt_regs get
> set up and pt_regs->stackframe gets chained, increment unreliable_stack.
> On exiting the above, decrement unreliable_stack.
> 
> In arch_stack_walk_reliable(), simply do this check upfront:
> 
> 	if (task->unreliable_stack)
> 		return -EINVAL;
> 
> This way, the function does not even bother unwinding the stack to find
> exception frames or checking for different return addresses or anything.
> We also don't have to worry about code being reorganized, functions
> being renamed, etc. It also may help in debugging to know if a task is
> experiencing an exception and the level of nesting, etc.

As in my other reply, since this is an optimization that is not
necessary for functional correctness, I would prefer to avoid this for
now. We can reconsider that in future if we encounter performance
problems.

Even with this there will be cases where we have to identify
non-unwindable functions explicitly (e.g. the patchable-function-entry
trampolines, where the real return address is in x9), and I'd prefer
that we use one mechanism consistently.

I suspect that in the future we'll need to unwind across exception
boundaries using metadata, and we can treat the non-unwindable metadata
in the same way.

[...]

> > 3. Figure out exception boundary handling. I'm currently working to
> >    simplify the entry assembly down to a uniform set of stubs, and I'd
> >    prefer to get that sorted before we teach the unwinder about
> >    exception boundaries, as it'll be significantly simpler to reason
> >    about and won't end up clashing with the rework.
> 
> So, here is where I still have a question. Is it necessary for the unwinder
> to know the exception boundaries? Is it not enough if it knows if there are
> exceptions present? For instance, using something like num_special_frames
> I suggested above?

I agree that it would be legitimate to bail out early if we knew there
was going to be an exception somewhere in the trace. Regardless, I think
it's simpler overall to identify non-unwindability during the trace, and
doing that during the trace aligns more closely with the structure that
we'll need to permit unwinding across these boundaries in future, so I'd
prefer we do that rather than trying to optimize for early returns
today.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 18:27                         ` Mark Brown
@ 2021-03-23 20:23                           ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 20:23 UTC (permalink / raw)
  To: Mark Brown
  Cc: Mark Rutland, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 1:27 PM, Mark Brown wrote:
> On Tue, Mar 23, 2021 at 12:23:34PM -0500, Madhavan T. Venkataraman wrote:
>> On 3/23/21 12:02 PM, Mark Rutland wrote:
> 
>>> 3. Figure out exception boundary handling. I'm currently working to
>>>    simplify the entry assembly down to a uniform set of stubs, and I'd
>>>    prefer to get that sorted before we teach the unwinder about
>>>    exception boundaries, as it'll be significantly simpler to reason
>>>    about and won't end up clashing with the rework.
> 
>> So, here is where I still have a question. Is it necessary for the unwinder
>> to know the exception boundaries? Is it not enough if it knows if there are
>> exceptions present? For instance, using something like num_special_frames
>> I suggested above?
> 
> For reliable stack trace we can live with just flagging things as
> unreliable when we know there's an exception boundary somewhere but (as
> Mark mentioned elsewhere) being able to actually go through a subset of
> exception boundaries safely is likely to help usefully improve the
> performance of live patching, and for defensiveness we want to try to
> detect during an actual unwind anyway so it ends up being a performance
> improvment and double check rather than saving us code.  Better
> understanding of what's going on in the presence of exceptions may also
> help other users of the unwinder which can use stacks which aren't
> reliable get better results.
> 

Actually, I was not suggesting that the counter replace the unwinder
intelligence to recognize exception boundaries. I was only suggesting
the use of the counter for arch_stack_walk_reliable().

But I am fine with not implementing the counter for now.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 18:30                         ` Mark Rutland
@ 2021-03-23 20:24                           ` Madhavan T. Venkataraman
  2021-03-23 21:04                             ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 20:24 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel



On 3/23/21 1:30 PM, Mark Rutland wrote:
> On Tue, Mar 23, 2021 at 12:23:34PM -0500, Madhavan T. Venkataraman wrote:
>> On 3/23/21 12:02 PM, Mark Rutland wrote:
> 
> [...]
> 
>> I think that I did a bad job of explaining what I wanted to do. It is not
>> for any additional protection at all.
>>
>> So, let us say we create a field in the task structure:
>>
>> 	u64		unreliable_stack;
>>
>> Whenever an EL1 exception is entered or FTRACE is entered and pt_regs get
>> set up and pt_regs->stackframe gets chained, increment unreliable_stack.
>> On exiting the above, decrement unreliable_stack.
>>
>> In arch_stack_walk_reliable(), simply do this check upfront:
>>
>> 	if (task->unreliable_stack)
>> 		return -EINVAL;
>>
>> This way, the function does not even bother unwinding the stack to find
>> exception frames or checking for different return addresses or anything.
>> We also don't have to worry about code being reorganized, functions
>> being renamed, etc. It also may help in debugging to know if a task is
>> experiencing an exception and the level of nesting, etc.
> 
> As in my other reply, since this is an optimization that is not
> necessary for functional correctness, I would prefer to avoid this for
> now. We can reconsider that in future if we encounter performance
> problems.
> 
> Even with this there will be cases where we have to identify
> non-unwindable functions explicitly (e.g. the patchable-function-entry
> trampolines, where the real return address is in x9), and I'd prefer
> that we use one mechanism consistently.
> 
> I suspect that in the future we'll need to unwind across exception
> boundaries using metadata, and we can treat the non-unwindable metadata
> in the same way.
> 
> [...]
> 
>>> 3. Figure out exception boundary handling. I'm currently working to
>>>    simplify the entry assembly down to a uniform set of stubs, and I'd
>>>    prefer to get that sorted before we teach the unwinder about
>>>    exception boundaries, as it'll be significantly simpler to reason
>>>    about and won't end up clashing with the rework.
>>
>> So, here is where I still have a question. Is it necessary for the unwinder
>> to know the exception boundaries? Is it not enough if it knows if there are
>> exceptions present? For instance, using something like num_special_frames
>> I suggested above?
> 
> I agree that it would be legitimate to bail out early if we knew there
> was going to be an exception somewhere in the trace. Regardless, I think
> it's simpler overall to identify non-unwindability during the trace, and
> doing that during the trace aligns more closely with the structure that
> we'll need to permit unwinding across these boundaries in future, so I'd
> prefer we do that rather than trying to optimize for early returns
> today.
> 

OK. Fair enough.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable
  2021-03-23 20:24                           ` Madhavan T. Venkataraman
@ 2021-03-23 21:04                             ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 55+ messages in thread
From: Madhavan T. Venkataraman @ 2021-03-23 21:04 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, jthierry, catalin.marinas, will,
	linux-arm-kernel, live-patching, linux-kernel

Thanks for all the input - Mark Rutland and Mark Brown.

I will send out the stack termination patch next. Since I am splitting
the original series into 3 separate series, I will change the titles and
start with version 1 in each case, if there is no objection.

Again, Thanks.

Madhavan

On 3/23/21 3:24 PM, Madhavan T. Venkataraman wrote:
> 
> 
> On 3/23/21 1:30 PM, Mark Rutland wrote:
>> On Tue, Mar 23, 2021 at 12:23:34PM -0500, Madhavan T. Venkataraman wrote:
>>> On 3/23/21 12:02 PM, Mark Rutland wrote:
>>
>> [...]
>>
>>> I think that I did a bad job of explaining what I wanted to do. It is not
>>> for any additional protection at all.
>>>
>>> So, let us say we create a field in the task structure:
>>>
>>> 	u64		unreliable_stack;
>>>
>>> Whenever an EL1 exception is entered or FTRACE is entered and pt_regs get
>>> set up and pt_regs->stackframe gets chained, increment unreliable_stack.
>>> On exiting the above, decrement unreliable_stack.
>>>
>>> In arch_stack_walk_reliable(), simply do this check upfront:
>>>
>>> 	if (task->unreliable_stack)
>>> 		return -EINVAL;
>>>
>>> This way, the function does not even bother unwinding the stack to find
>>> exception frames or checking for different return addresses or anything.
>>> We also don't have to worry about code being reorganized, functions
>>> being renamed, etc. It also may help in debugging to know if a task is
>>> experiencing an exception and the level of nesting, etc.
>>
>> As in my other reply, since this is an optimization that is not
>> necessary for functional correctness, I would prefer to avoid this for
>> now. We can reconsider that in future if we encounter performance
>> problems.
>>
>> Even with this there will be cases where we have to identify
>> non-unwindable functions explicitly (e.g. the patchable-function-entry
>> trampolines, where the real return address is in x9), and I'd prefer
>> that we use one mechanism consistently.
>>
>> I suspect that in the future we'll need to unwind across exception
>> boundaries using metadata, and we can treat the non-unwindable metadata
>> in the same way.
>>
>> [...]
>>
>>>> 3. Figure out exception boundary handling. I'm currently working to
>>>>    simplify the entry assembly down to a uniform set of stubs, and I'd
>>>>    prefer to get that sorted before we teach the unwinder about
>>>>    exception boundaries, as it'll be significantly simpler to reason
>>>>    about and won't end up clashing with the rework.
>>>
>>> So, here is where I still have a question. Is it necessary for the unwinder
>>> to know the exception boundaries? Is it not enough if it knows if there are
>>> exceptions present? For instance, using something like num_special_frames
>>> I suggested above?
>>
>> I agree that it would be legitimate to bail out early if we knew there
>> was going to be an exception somewhere in the trace. Regardless, I think
>> it's simpler overall to identify non-unwindability during the trace, and
>> doing that during the trace aligns more closely with the structure that
>> we'll need to permit unwinding across these boundaries in future, so I'd
>> prefer we do that rather than trying to optimize for early returns
>> today.
>>
> 
> OK. Fair enough.
> 
> Thanks.
> 
> Madhavan
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2021-03-23 21:05 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5997dfe8d261a3a543667b83c902883c1e4bd270>
2021-03-15 16:57 ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace madvenka
2021-03-15 16:57   ` [RFC PATCH v2 1/8] arm64: Implement stack trace termination record madvenka
2021-03-18 15:09     ` Mark Brown
2021-03-18 20:26       ` Madhavan T. Venkataraman
2021-03-19 12:30         ` Mark Brown
2021-03-19 14:29           ` Madhavan T. Venkataraman
2021-03-19 18:19             ` Madhavan T. Venkataraman
2021-03-19 22:03               ` Madhavan T. Venkataraman
2021-03-23 10:24                 ` Mark Rutland
2021-03-23 12:39                   ` Madhavan T. Venkataraman
2021-03-15 16:57   ` [RFC PATCH v2 2/8] arm64: Implement frame types madvenka
2021-03-18 17:40     ` Mark Brown
2021-03-18 22:22       ` Madhavan T. Venkataraman
2021-03-19 13:22         ` Mark Brown
2021-03-19 14:40           ` Madhavan T. Venkataraman
2021-03-19 15:02             ` Madhavan T. Venkataraman
2021-03-19 16:20               ` Mark Brown
2021-03-19 16:27                 ` Madhavan T. Venkataraman
2021-03-23 10:34     ` Mark Rutland
2021-03-15 16:57   ` [RFC PATCH v2 3/8] arm64: Terminate the stack trace at TASK_FRAME and EL0_FRAME madvenka
2021-03-18 18:26     ` Mark Brown
2021-03-18 20:29       ` Madhavan T. Venkataraman
2021-03-23 10:36         ` Mark Rutland
2021-03-23 12:40           ` Madhavan T. Venkataraman
2021-03-15 16:57   ` [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable madvenka
2021-03-23 10:42     ` Mark Rutland
2021-03-23 12:46       ` Madhavan T. Venkataraman
2021-03-23 13:04         ` Mark Rutland
2021-03-23 13:31           ` Madhavan T. Venkataraman
2021-03-23 14:33             ` Mark Rutland
2021-03-23 15:22               ` Madhavan T. Venkataraman
2021-03-15 16:57   ` [RFC PATCH v2 5/8] arm64: Detect an FTRACE " madvenka
2021-03-23 10:51     ` Mark Rutland
2021-03-23 12:56       ` Madhavan T. Venkataraman
2021-03-23 13:36         ` Mark Rutland
2021-03-23 13:38           ` Madhavan T. Venkataraman
2021-03-23 14:15             ` Madhavan T. Venkataraman
2021-03-23 14:57               ` Mark Rutland
2021-03-23 15:26                 ` Madhavan T. Venkataraman
2021-03-23 16:20                   ` Madhavan T. Venkataraman
2021-03-23 17:02                     ` Mark Rutland
2021-03-23 17:23                       ` Madhavan T. Venkataraman
2021-03-23 17:27                         ` Madhavan T. Venkataraman
2021-03-23 18:27                         ` Mark Brown
2021-03-23 20:23                           ` Madhavan T. Venkataraman
2021-03-23 18:30                         ` Mark Rutland
2021-03-23 20:24                           ` Madhavan T. Venkataraman
2021-03-23 21:04                             ` Madhavan T. Venkataraman
2021-03-23 16:48                   ` Mark Rutland
2021-03-23 16:53                     ` Madhavan T. Venkataraman
2021-03-23 17:09                       ` Mark Rutland
2021-03-15 16:57   ` [RFC PATCH v2 6/8] arm64: Check the return PC of every stack frame madvenka
2021-03-15 16:57   ` [RFC PATCH v2 7/8] arm64: Detect kretprobed functions in stack trace madvenka
2021-03-15 16:58   ` [RFC PATCH v2 8/8] arm64: Implement arch_stack_walk_reliable() madvenka
2021-03-15 19:01   ` [RFC PATCH v2 0/8] arm64: Implement reliable stack trace Madhavan T. Venkataraman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).