linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks
       [not found] <3f2aab69a35c243c5e97f47c4ad84046355f5b90>
@ 2021-06-30 22:33 ` madvenka
  2021-06-30 22:33   ` [RFC PATCH v6 1/3] arm64: Improve the unwinder return value madvenka
                     ` (3 more replies)
  2021-08-12 13:24 ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement " madvenka
  2021-08-12 18:35 ` madvenka
  2 siblings, 4 replies; 28+ messages in thread
From: madvenka @ 2021-06-30 22:33 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Unwinder return value
=====================

Currently, the unwinder returns a tri-state return value:

	0		means "continue with the unwind"
	-ENOENT		means "successful termination of the stack trace"
	-EINVAL		means "fatal error, abort the stack trace"

This is confusing. To fix this, define an enumeration of different return
codes to make it clear.

enum {
	UNWIND_CONTINUE,		/* No errors encountered */
	UNWIND_ABORT,			/* Fatal errors encountered */
	UNWIND_FINISH,			/* End of stack reached successfully */
};

Reliability checks
==================

There are a number of places in kernel code where the stack trace is not
reliable. Enhance the unwinder to check for those cases.

Return address check
--------------------

Check the return PC of every stack frame to make sure that it is a valid
kernel text address (and not some generated code, for example).

Low-level function check
------------------------

Low-level assembly functions are, by nature, unreliable from an unwinder
perspective. The unwinder must check for them in a stacktrace. See the
"Assembly Functions" section below.

Other checks
------------

Other checks may be added in the future. Once all of the checks are in place,
the unwinder can provide a reliable stack trace. But before this can be used
for livepatch, some other entity needs to validate the frame pointer in kernel
functions. objtool is currently being worked on to address that need.

Return code
-----------

If a reliability check fails, it is a non-fatal error. The unwinder needs to
return an appropriate code so the caller knows that some non-fatal error has
occurred. Add another code to the enumeration:

enum {
	UNWIND_CONTINUE,		/* No errors encountered */
	UNWIND_CONTINUE_WITH_RISK,	/* Non-fatal errors encountered */
	UNWIND_ABORT,			/* Fatal errors encountered */
	UNWIND_FINISH,			/* End of stack reached successfully */
};

When the unwinder returns UNWIND_CONTINUE_WITH_RISK:

	- Debug-type callers can choose to continue the unwind

	- Livepatch-type callers will stop unwinding

So, arch_stack_walk_reliable() (implemented in the future) will look like
this:

/*
 * Walk the stack like arch_stack_walk() but stop the walk as soon as
 * some unreliability is detected in the stack.
 */
int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
			      void *cookie, struct task_struct *task)
{
	struct stackframe frame;
	enum unwind_rc rc;

	if (task == current) {
		rc = start_backtrace(&frame,
				(unsigned long)__builtin_frame_address(0),
				(unsigned long)arch_stack_walk_reliable);
	} else {
		/*
		 * The task must not be running anywhere for the duration of
		 * arch_stack_walk_reliable(). The caller must guarantee
		 * this.
		 */
		rc = start_backtrace(&frame,
				     thread_saved_fp(task),
				     thread_saved_pc(task));
	}

	while (rc == UNWIND_CONTINUE) {
		if (!consume_entry(cookie, frame.pc))
			return -EINVAL;
		rc = unwind_frame(task, &frame);
	}

	return rc == UNWIND_FINISH ? 0 : -EINVAL;
}

Assembly functions
==================

There are a number of assembly functions in arm64. Except for a couple of
them, these functions do not have a frame pointer prolog or epilog. Also,
many of them manipulate low-level state such as registers. These functions
are, by definition, unreliable from a stack unwinding perspective. That is,
when these functions occur in a stack trace, the unwinder would not be able
to unwind through them reliably.

Assembly functions are defined as SYM_FUNC_*() functions or SYM_CODE_*()
functions. objtool peforms static analysis of SYM_FUNC functions. It ignores
SYM_CODE functions because they have low level code that is difficult to
analyze. When objtool becomes ready eventually, SYM_FUNC functions will
be analyzed and "fixed" as necessary. So, they are not "interesting" for
the reliable unwinder.

That leaves SYM_CODE functions. It is for the unwinder to deal with these
for reliable stack trace. The unwinder needs to do the following:

	- Recognize SYM_CODE functions in a stack trace.

	- If a particular SYM_CODE function can be unwinded through using
	  some special logic, then do it. E.g., the return trampoline for
	  Function Graph Tracing.

	- Otherwise, return UNWIND_CONTINUE_WITH_RISK.

Current approach
================

Define an ARM64 version of SYM_CODE_END() like this:

#define SYM_CODE_END(name)				\
	SYM_END(name, SYM_T_NONE)			;\
	99:						;\
	.pushsection "sym_code_functions", "aw"		;\
	.quad	name					;\
	.quad	99b					;\
	.popsection

The above macro does the usual SYM_END(). In addition, it records the
function's address range in a special section called "sym_code_functions".
This way, all SYM_CODE functions get recorded in the section automatically.

Implement an early_initcall() called init_sym_code_functions() that allocates
an array called sym_code_functions[] and copies the function ranges from the
section to the array.

Add a reliability check in unwind_check_frame() that compares a return
PC with sym_code_functions[]. If there is a match, then return
UNWIND_CONTINUE_WITH_RISK.

Call unwinder_is_unreliable() on every return PC from unwind_frame(). If there
is a match, then return UNWIND_CONTINUE_WITH_RISK.

Last stack frame
================

If a SYM_CODE function occurs in the very last frame in the stack trace,
then the stack trace is not considered unreliable. This is because there
is no more unwinding to do. Examples:

	- EL0 exception stack traces end in the top level EL0 exception
	  handlers.

	- All kernel thread stack traces end in ret_from_fork().

Special SYM_CODE functions
==========================

The return trampolines of the Function Graph Tracer and Kretprobe can
be recognized by the unwinder. If the return PCs can be translated to the
original PCs, then, the unwinder should perform that translation before
checking for reliability. The final PC that we end up with after all the
translations is the one we need to check for reliability.

Accordingly, I have moved the reliability checks to after the logic that
handles the Function Graph Tracer.

So, the approach is - all SYM_CODE functions are unreliable. If a SYM_CODE
function is "fixed" to make it reliable, then it should become a SYM_FUNC
function. Or, if the unwinder has special logic to unwind through a SYM_CODE
function, then that can be done.

Special cases
=============

Some special cases need to be mentioned:

	- EL1 interrupt and exception handlers end up in sym_code_ranges[].
	  So, all EL1 interrupt and exception stack traces will be considered
	  unreliable. This the correct behavior as interrupts and exceptions
	  can happen on any instruction including ones in the frame pointer
	  prolog and epilog. Unless objtool generates metadata so the unwinder
	  can unwind through these special cases, such stack traces will be
	  considered unreliable.

	- A task can get preempted at the end of an interrupt. Stack traces
	  of preempted tasks will show the interrupt frame in the stack trace
	  and will be considered unreliable.

	- Breakpoints are exceptions. So, all stack traces in the break point
	  handler (including probes) will be considered unreliable.

	- All of the ftrace trampolines end up in sym_code_functions[]. All
	  stack traces taken from tracer functions will be considered
	  unreliable.
---
Changelog:

v6:
	From Mark Rutland:

	- The per-frame reliability concept and flag are acceptable. But more
	  work is needed to make the per-frame checks more accurate and more
	  complete. E.g., some code reorg is being worked on that will help.

	  I have now removed the frame->reliable flag and deleted the whole
	  concept of per-frame status. This is orthogonal to this patch series.
	  Instead, I have improved the unwinder to return proper return codes
	  so a caller can take appropriate action without needing per-frame
	  status.

	- Remove the mention of PLTs and update the comment.

	  I have replaced the comment above the call to __kernel_text_address()
	  with the comment suggested by Mark Rutland.

	Other comments:

	- Other comments on the per-frame stuff are not relevant because
	  that approach is not there anymore.

v5:
	From Keiya Nobuta:
	
	- The term blacklist(ed) is not to be used anymore. I have changed it
	  to unreliable. So, the function unwinder_blacklisted() has been
	  changed to unwinder_is_unreliable().

	From Mark Brown:

	- Add a comment for the "reliable" flag in struct stackframe. The
	  reliability attribute is not complete until all the checks are
	  in place. Added a comment above struct stackframe.

	- Include some of the comments in the cover letter in the actual
	  code so that we can compare it with the reliable stack trace
	  requirements document for completeness. I have added a comment:

	  	- above unwinder_is_unreliable() that lists the requirements
		  that are addressed by the function.

		- above the __kernel_text_address() call about all the cases
		  the call covers.

v4:
	From Mark Brown:

	- I was checking the return PC with __kernel_text_address() before
	  the Function Graph trace handling. Mark Brown felt that all the
	  reliability checks should be performed on the original return PC
	  once that is obtained. So, I have moved all the reliability checks
	  to after the Function Graph Trace handling code in the unwinder.
	  Basically, the unwinder should perform PC translations first (for
	  rhe return trampoline for Function Graph Tracing, Kretprobes, etc).
	  Then, the reliability checks should be applied to the resulting
	  PC.

	- Mark said to improve the naming of the new functions so they don't
	  collide with existing ones. I have used a prefix "unwinder_" for
	  all the new functions.

	From Josh Poimboeuf:

	- In the error scenarios in the unwinder, the reliable flag in the
	  stack frame should be set. Implemented this.

	- Some of the other comments are not relevant to the new code as
	  I have taken a different approach in the new code. That is why
	  I have not made those changes. E.g., Ard wanted me to add the
	  "const" keyword to the global section array. That array does not
	  exist in v4. Similarly, Mark Brown said to use ARRAY_SIZE() for
	  the same array in a for loop.

	Other changes:

	- Add a new definition for SYM_CODE_END() that adds the address
	  range of the function to a special section called
	  "sym_code_functions".

	- Include the new section under initdata in vmlinux.lds.S.

	- Define an early_initcall() to copy the contents of the
	  "sym_code_functions" section to an array by the same name.

	- Define a function unwinder_blacklisted() that compares a return
	  PC against sym_code_sections[]. If there is a match, mark the
	  stack trace unreliable. Call this from unwind_frame().

v3:
	- Implemented a sym_code_ranges[] array to contains sections bounds
	  for text sections that contain SYM_CODE_*() functions. The unwinder
	  checks each return PC against the sections. If it falls in any of
	  the sections, the stack trace is marked unreliable.

	- Moved SYM_CODE functions from .text and .init.text into a new
	  text section called ".code.text". Added this section to
	  vmlinux.lds.S and sym_code_ranges[].

	- Fixed the logic in the unwinder that handles Function Graph
	  Tracer return trampoline.

	- Removed all the previous code that handles:
		- ftrace entry code for traced function
		- special_functions[] array that lists individual functions
		- kretprobe_trampoline() special case

v2
	- Removed the terminating entry { 0, 0 } in special_functions[]
	  and replaced it with the idiom { /* sentinel */ }.

	- Change the ftrace trampoline entry ftrace_graph_call in
	  special_functions[] to ftrace_call + 4 and added explanatory
	  comments.

	- Unnested #ifdefs in special_functions[] for FTRACE.

v1
	- Define a bool field in struct stackframe. This will indicate if
	  a stack trace is reliable.

	- Implement a special_functions[] array that will be populated
	  with special functions in which the stack trace is considered
	  unreliable.
	
	- Using kallsyms_lookup(), get the address ranges for the special
	  functions and record them.

	- Implement an is_reliable_function(pc). This function will check
	  if a given return PC falls in any of the special functions. If
	  it does, the stack trace is unreliable.

	- Implement check_reliability() function that will check if a
	  stack frame is reliable. Call is_reliable_function() from
	  check_reliability().

	- Before a return PC is checked against special_funtions[], it
	  must be validates as a proper kernel text address. Call
	  __kernel_text_address() from check_reliability().

	- Finally, call check_reliability() from unwind_frame() for
	  each stack frame.

	- Add EL1 exception handlers to special_functions[].

		el1_sync();
		el1_irq();
		el1_error();
		el1_sync_invalid();
		el1_irq_invalid();
		el1_fiq_invalid();
		el1_error_invalid();

	- The above functions are currently defined as LOCAL symbols.
	  Make them global so that they can be referenced from the
	  unwinder code.

	- Add FTRACE trampolines to special_functions[]:

		ftrace_graph_call()
		ftrace_graph_caller()
		return_to_handler()

	- Add the kretprobe trampoline to special functions[]:

		kretprobe_trampoline()

Previous versions and discussion
================================

v5: https://lore.kernel.org/linux-arm-kernel/20210526214917.20099-1-madvenka@linux.microsoft.com/
v4: https://lore.kernel.org/linux-arm-kernel/20210516040018.128105-1-madvenka@linux.microsoft.com/
v3: https://lore.kernel.org/linux-arm-kernel/20210503173615.21576-1-madvenka@linux.microsoft.com/
v2: https://lore.kernel.org/linux-arm-kernel/20210405204313.21346-1-madvenka@linux.microsoft.com/
v1: https://lore.kernel.org/linux-arm-kernel/20210330190955.13707-1-madvenka@linux.microsoft.com/
Madhavan T. Venkataraman (3):
  arm64: Improve the unwinder return value
  arm64: Introduce stack trace reliability checks in the unwinder
  arm64: Create a list of SYM_CODE functions, check return PC against
    list

 arch/arm64/include/asm/linkage.h    |  12 ++
 arch/arm64/include/asm/sections.h   |   1 +
 arch/arm64/include/asm/stacktrace.h |  16 ++-
 arch/arm64/kernel/perf_callchain.c  |   5 +-
 arch/arm64/kernel/process.c         |   8 +-
 arch/arm64/kernel/return_address.c  |  10 +-
 arch/arm64/kernel/stacktrace.c      | 180 ++++++++++++++++++++++++----
 arch/arm64/kernel/time.c            |   9 +-
 arch/arm64/kernel/vmlinux.lds.S     |   7 ++
 9 files changed, 213 insertions(+), 35 deletions(-)


base-commit: bf05bf16c76bb44ab5156223e1e58e26dfe30a88
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v6 1/3] arm64: Improve the unwinder return value
  2021-06-30 22:33 ` [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks madvenka
@ 2021-06-30 22:33   ` madvenka
  2021-07-28 16:56     ` Mark Rutland
  2021-06-30 22:33   ` [RFC PATCH v6 2/3] arm64: Introduce stack trace reliability checks in the unwinder madvenka
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: madvenka @ 2021-06-30 22:33 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Currently, the unwinder returns a tri-state return value:

	0		means "continue with the unwind"
	-ENOENT		means "successful termination of the stack trace"
	-EINVAL		means "fatal error, abort the stack trace"

This is confusing. To fix this, define an enumeration of different return
codes to make it clear. Handle the return codes in all of the unwind
consumers.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/stacktrace.h | 14 ++++++--
 arch/arm64/kernel/perf_callchain.c  |  5 ++-
 arch/arm64/kernel/process.c         |  8 +++--
 arch/arm64/kernel/return_address.c  | 10 ++++--
 arch/arm64/kernel/stacktrace.c      | 53 ++++++++++++++++-------------
 arch/arm64/kernel/time.c            |  9 +++--
 6 files changed, 64 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index eb29b1fe8255..6fcd58553fb1 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -30,6 +30,12 @@ struct stack_info {
 	enum stack_type type;
 };
 
+enum unwind_rc {
+	UNWIND_CONTINUE,		/* No errors encountered */
+	UNWIND_ABORT,			/* Fatal errors encountered */
+	UNWIND_FINISH,			/* End of stack reached successfully */
+};
+
 /*
  * A snapshot of a frame record or fp/lr register values, along with some
  * accounting information necessary for robust unwinding.
@@ -61,7 +67,8 @@ struct stackframe {
 #endif
 };
 
-extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
+extern enum unwind_rc unwind_frame(struct task_struct *tsk,
+				   struct stackframe *frame);
 extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 			    bool (*fn)(void *, unsigned long), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
@@ -148,8 +155,8 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
 	return false;
 }
 
-static inline void start_backtrace(struct stackframe *frame,
-				   unsigned long fp, unsigned long pc)
+static inline enum unwind_rc start_backtrace(struct stackframe *frame,
+					     unsigned long fp, unsigned long pc)
 {
 	frame->fp = fp;
 	frame->pc = pc;
@@ -169,6 +176,7 @@ static inline void start_backtrace(struct stackframe *frame,
 	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
 	frame->prev_fp = 0;
 	frame->prev_type = STACK_TYPE_UNKNOWN;
+	return UNWIND_CONTINUE;
 }
 
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
index 88ff471b0bce..f459208149ae 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -148,13 +148,16 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
 	struct stackframe frame;
+	enum unwind_rc rc;
 
 	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
 
-	start_backtrace(&frame, regs->regs[29], regs->pc);
+	rc = start_backtrace(&frame, regs->regs[29], regs->pc);
+	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
+		return;
 	walk_stackframe(current, &frame, callchain_trace, entry);
 }
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6e60aa3b5ea9..e9c763b44fd4 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -573,6 +573,7 @@ unsigned long get_wchan(struct task_struct *p)
 	struct stackframe frame;
 	unsigned long stack_page, ret = 0;
 	int count = 0;
+	enum unwind_rc rc;
 	if (!p || p == current || p->state == TASK_RUNNING)
 		return 0;
 
@@ -580,10 +581,13 @@ unsigned long get_wchan(struct task_struct *p)
 	if (!stack_page)
 		return 0;
 
-	start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
+	rc = start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
+	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
+		return 0;
 
 	do {
-		if (unwind_frame(p, &frame))
+		rc = unwind_frame(p, &frame);
+		if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
 			goto out;
 		if (!in_sched_functions(frame.pc)) {
 			ret = frame.pc;
diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
index a6d18755652f..1224e043e98f 100644
--- a/arch/arm64/kernel/return_address.c
+++ b/arch/arm64/kernel/return_address.c
@@ -36,13 +36,17 @@ void *return_address(unsigned int level)
 {
 	struct return_address_data data;
 	struct stackframe frame;
+	enum unwind_rc rc;
 
 	data.level = level + 2;
 	data.addr = NULL;
 
-	start_backtrace(&frame,
-			(unsigned long)__builtin_frame_address(0),
-			(unsigned long)return_address);
+	rc = start_backtrace(&frame,
+			     (unsigned long)__builtin_frame_address(0),
+			     (unsigned long)return_address);
+	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
+		return NULL;
+
 	walk_stackframe(current, &frame, save_return_addr, &data);
 
 	if (!data.level)
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index d55bdfb7789c..e9c2c1fa9dde 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -39,26 +39,27 @@
  * records (e.g. a cycle), determined based on the location and fp value of A
  * and the location (but not the fp value) of B.
  */
-int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
+enum unwind_rc notrace unwind_frame(struct task_struct *tsk,
+					struct stackframe *frame)
 {
 	unsigned long fp = frame->fp;
 	struct stack_info info;
 
 	/* Terminal record; nothing to unwind */
 	if (!fp)
-		return -ENOENT;
+		return UNWIND_FINISH;
 
 	if (fp & 0xf)
-		return -EINVAL;
+		return UNWIND_ABORT;
 
 	if (!tsk)
 		tsk = current;
 
 	if (!on_accessible_stack(tsk, fp, &info))
-		return -EINVAL;
+		return UNWIND_ABORT;
 
 	if (test_bit(info.type, frame->stacks_done))
-		return -EINVAL;
+		return UNWIND_ABORT;
 
 	/*
 	 * As stacks grow downward, any valid record on the same stack must be
@@ -75,7 +76,7 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 	 */
 	if (info.type == frame->prev_type) {
 		if (fp <= frame->prev_fp)
-			return -EINVAL;
+			return UNWIND_ABORT;
 	} else {
 		set_bit(frame->prev_type, frame->stacks_done);
 	}
@@ -101,14 +102,14 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 		 */
 		ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
 		if (WARN_ON_ONCE(!ret_stack))
-			return -EINVAL;
+			return UNWIND_ABORT;
 		frame->pc = ret_stack->ret;
 	}
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 	frame->pc = ptrauth_strip_insn_pac(frame->pc);
 
-	return 0;
+	return UNWIND_CONTINUE;
 }
 NOKPROBE_SYMBOL(unwind_frame);
 
@@ -116,12 +117,12 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 			     bool (*fn)(void *, unsigned long), void *data)
 {
 	while (1) {
-		int ret;
+		enum unwind_rc rc;
 
 		if (!fn(data, frame->pc))
 			break;
-		ret = unwind_frame(tsk, frame);
-		if (ret < 0)
+		rc = unwind_frame(tsk, frame);
+		if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
 			break;
 	}
 }
@@ -137,6 +138,7 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 {
 	struct stackframe frame;
 	int skip = 0;
+	enum unwind_rc rc;
 
 	pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
 
@@ -153,17 +155,19 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 		return;
 
 	if (tsk == current) {
-		start_backtrace(&frame,
-				(unsigned long)__builtin_frame_address(0),
-				(unsigned long)dump_backtrace);
+		rc = start_backtrace(&frame,
+				     (unsigned long)__builtin_frame_address(0),
+				     (unsigned long)dump_backtrace);
 	} else {
 		/*
 		 * task blocked in __switch_to
 		 */
-		start_backtrace(&frame,
-				thread_saved_fp(tsk),
-				thread_saved_pc(tsk));
+		rc = start_backtrace(&frame,
+				     thread_saved_fp(tsk),
+				     thread_saved_pc(tsk));
 	}
+	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
+		return;
 
 	printk("%sCall trace:\n", loglvl);
 	do {
@@ -181,7 +185,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 			 */
 			dump_backtrace_entry(regs->pc, loglvl);
 		}
-	} while (!unwind_frame(tsk, &frame));
+		rc = unwind_frame(tsk, &frame);
+	} while (rc != UNWIND_FINISH && rc != UNWIND_ABORT);
 
 	put_task_stack(tsk);
 }
@@ -199,17 +204,19 @@ noinline void arch_stack_walk(stack_trace_consume_fn consume_entry,
 			      struct pt_regs *regs)
 {
 	struct stackframe frame;
+	enum unwind_rc rc;
 
 	if (regs)
-		start_backtrace(&frame, regs->regs[29], regs->pc);
+		rc = start_backtrace(&frame, regs->regs[29], regs->pc);
 	else if (task == current)
-		start_backtrace(&frame,
+		rc = start_backtrace(&frame,
 				(unsigned long)__builtin_frame_address(1),
 				(unsigned long)__builtin_return_address(0));
 	else
-		start_backtrace(&frame, thread_saved_fp(task),
-				thread_saved_pc(task));
-
+		rc = start_backtrace(&frame, thread_saved_fp(task),
+				     thread_saved_pc(task));
+	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
+		return;
 	walk_stackframe(task, &frame, consume_entry, cookie);
 }
 
diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index eebbc8d7123e..eb50218ec9a4 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -35,15 +35,18 @@
 unsigned long profile_pc(struct pt_regs *regs)
 {
 	struct stackframe frame;
+	enum unwind_rc rc;
 
 	if (!in_lock_functions(regs->pc))
 		return regs->pc;
 
-	start_backtrace(&frame, regs->regs[29], regs->pc);
+	rc = start_backtrace(&frame, regs->regs[29], regs->pc);
+	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
+		return 0;
 
 	do {
-		int ret = unwind_frame(NULL, &frame);
-		if (ret < 0)
+		rc = unwind_frame(NULL, &frame);
+		if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
 			return 0;
 	} while (in_lock_functions(frame.pc));
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v6 2/3] arm64: Introduce stack trace reliability checks in the unwinder
  2021-06-30 22:33 ` [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks madvenka
  2021-06-30 22:33   ` [RFC PATCH v6 1/3] arm64: Improve the unwinder return value madvenka
@ 2021-06-30 22:33   ` madvenka
  2021-06-30 22:33   ` [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list madvenka
  2021-07-26 13:49   ` [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks Madhavan T. Venkataraman
  3 siblings, 0 replies; 28+ messages in thread
From: madvenka @ 2021-06-30 22:33 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

The unwinder should check for the presence of various features and
conditions that can render the stack trace unreliable. Introduce a
function unwind_check_frame() for this purpose.

Introduce the first reliability check in unwind_check_frame() - If
a return PC is not a valid kernel text address, consider the stack
trace unreliable. It could be some generated code.

Other reliability checks will be added in the future.

If a reliability check fails, it is a non-fatal error. Introduce a new
return code, UNWIND_CONTINUE_WITH_RISK, for non-fatal errors.

Call unwind_check_frame() from unwind_frame(). Also, call it from
start_backtrace() to remove the current assumption that the starting
frame is reliable.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/stacktrace.h |  4 +++-
 arch/arm64/kernel/stacktrace.c      | 17 ++++++++++++++++-
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 6fcd58553fb1..d1625d55b980 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -32,6 +32,7 @@ struct stack_info {
 
 enum unwind_rc {
 	UNWIND_CONTINUE,		/* No errors encountered */
+	UNWIND_CONTINUE_WITH_RISK,	/* Non-fatal errors encountered */
 	UNWIND_ABORT,			/* Fatal errors encountered */
 	UNWIND_FINISH,			/* End of stack reached successfully */
 };
@@ -73,6 +74,7 @@ extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 			    bool (*fn)(void *, unsigned long), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 			   const char *loglvl);
+extern enum unwind_rc unwind_check_frame(struct stackframe *frame);
 
 DECLARE_PER_CPU(unsigned long *, irq_stack_ptr);
 
@@ -176,7 +178,7 @@ static inline enum unwind_rc start_backtrace(struct stackframe *frame,
 	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
 	frame->prev_fp = 0;
 	frame->prev_type = STACK_TYPE_UNKNOWN;
-	return UNWIND_CONTINUE;
+	return unwind_check_frame(frame);
 }
 
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index e9c2c1fa9dde..ba7b97b119e4 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,6 +18,21 @@
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+/*
+ * Check the stack frame for conditions that make unwinding unreliable.
+ */
+enum unwind_rc unwind_check_frame(struct stackframe *frame)
+{
+	/*
+	 * If the PC is not a known kernel text address, then we cannot
+	 * be sure that a subsequent unwind will be reliable, as we
+	 * don't know that the code follows our unwind requirements.
+	 */
+	if (!__kernel_text_address(frame->pc))
+		return UNWIND_CONTINUE_WITH_RISK;
+	return UNWIND_CONTINUE;
+}
+
 /*
  * AArch64 PCS assigns the frame pointer to x29.
  *
@@ -109,7 +124,7 @@ enum unwind_rc notrace unwind_frame(struct task_struct *tsk,
 
 	frame->pc = ptrauth_strip_insn_pac(frame->pc);
 
-	return UNWIND_CONTINUE;
+	return unwind_check_frame(frame);
 }
 NOKPROBE_SYMBOL(unwind_frame);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list
  2021-06-30 22:33 ` [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks madvenka
  2021-06-30 22:33   ` [RFC PATCH v6 1/3] arm64: Improve the unwinder return value madvenka
  2021-06-30 22:33   ` [RFC PATCH v6 2/3] arm64: Introduce stack trace reliability checks in the unwinder madvenka
@ 2021-06-30 22:33   ` madvenka
  2021-07-28 17:25     ` Mark Rutland
  2021-07-26 13:49   ` [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks Madhavan T. Venkataraman
  3 siblings, 1 reply; 28+ messages in thread
From: madvenka @ 2021-06-30 22:33 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

The unwinder should check if the return PC falls in any function that
is considered unreliable from an unwinding perspective. If it does,
return UNWIND_CONTINUE_WITH_RISK.

Function types
==============

The compiler generates code for C functions and assigns the type STT_FUNC
to them.

Assembly functions are manually assigned a type:

	- STT_FUNC for functions defined with SYM_FUNC*() macros

	- STT_NONE for functions defined with SYM_CODE*() macros

In the future, STT_FUNC functions will be analyzed by objtool and "fixed"
as necessary. So, they are not "interesting" to the reliable unwinder in
the kernel.

That leaves SYM_CODE*() functions. These contain low-level code that is
difficult or impossible for objtool to analyze. So, objtool ignores them
leaving them to the reliable unwinder. These functions must be considered
unreliable from an unwinding perspective.

Define a special section for unreliable functions
=================================================

Define a SYM_CODE_END() macro for arm64 that adds the function address
range to a new section called "sym_code_functions".

Linker file
===========

Include the "sym_code_functions" section under initdata in vmlinux.lds.S.

Initialization
==============

Define an early_initcall() to copy the function address ranges from the
"sym_code_functions" section to an array by the same name.

Unwinder check
==============

Add a reliability check in unwind_check_frame() that compares a return
PC with sym_code_functions[]. If there is a match, then return
UNWIND_CONTINUE_WITH_RISK.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/linkage.h  |  12 ++++
 arch/arm64/include/asm/sections.h |   1 +
 arch/arm64/kernel/stacktrace.c    | 112 ++++++++++++++++++++++++++++++
 arch/arm64/kernel/vmlinux.lds.S   |   7 ++
 4 files changed, 132 insertions(+)

diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
index ba89a9af820a..3b5f1fd332b0 100644
--- a/arch/arm64/include/asm/linkage.h
+++ b/arch/arm64/include/asm/linkage.h
@@ -60,4 +60,16 @@
 		SYM_FUNC_END(x);		\
 		SYM_FUNC_END_ALIAS(__pi_##x)
 
+/*
+ * Record the address range of each SYM_CODE function in a struct code_range
+ * in a special section.
+ */
+#define SYM_CODE_END(name)				\
+	SYM_END(name, SYM_T_NONE)			;\
+	99:						;\
+	.pushsection "sym_code_functions", "aw"		;\
+	.quad	name					;\
+	.quad	99b					;\
+	.popsection
+
 #endif
diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 2f36b16a5b5d..29cb566f65ec 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -20,5 +20,6 @@ extern char __exittext_begin[], __exittext_end[];
 extern char __irqentry_text_start[], __irqentry_text_end[];
 extern char __mmuoff_data_start[], __mmuoff_data_end[];
 extern char __entry_tramp_text_start[], __entry_tramp_text_end[];
+extern char __sym_code_functions_start[], __sym_code_functions_end[];
 
 #endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index ba7b97b119e4..5d5728c3088e 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,11 +18,43 @@
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+struct code_range {
+	unsigned long	start;
+	unsigned long	end;
+};
+
+static struct code_range	*sym_code_functions;
+static int			num_sym_code_functions;
+
+int __init init_sym_code_functions(void)
+{
+	size_t size;
+
+	size = (unsigned long)__sym_code_functions_end -
+	       (unsigned long)__sym_code_functions_start;
+
+	sym_code_functions = kmalloc(size, GFP_KERNEL);
+	if (!sym_code_functions)
+		return -ENOMEM;
+
+	memcpy(sym_code_functions, __sym_code_functions_start, size);
+	/* Update num_sym_code_functions after copying sym_code_functions. */
+	smp_mb();
+	num_sym_code_functions = size / sizeof(struct code_range);
+
+	return 0;
+}
+early_initcall(init_sym_code_functions);
+
 /*
  * Check the stack frame for conditions that make unwinding unreliable.
  */
 enum unwind_rc unwind_check_frame(struct stackframe *frame)
 {
+	const struct code_range *range;
+	unsigned long pc;
+	int i;
+
 	/*
 	 * If the PC is not a known kernel text address, then we cannot
 	 * be sure that a subsequent unwind will be reliable, as we
@@ -30,6 +62,86 @@ enum unwind_rc unwind_check_frame(struct stackframe *frame)
 	 */
 	if (!__kernel_text_address(frame->pc))
 		return UNWIND_CONTINUE_WITH_RISK;
+
+	/*
+	 * If the final frame has been reached, there is no more unwinding
+	 * to do. There is no need to check if the return PC is considered
+	 * unreliable by the unwinder.
+	 */
+	if (!frame->fp)
+		return UNWIND_CONTINUE;
+
+	/*
+	 * Check the return PC against sym_code_functions[]. If there is a
+	 * match, then the consider the stack frame unreliable. These functions
+	 * contain low-level code where the frame pointer and/or the return
+	 * address register cannot be relied upon. This addresses the following
+	 * situations:
+	 *
+	 *  - Exception handlers and entry assembly
+	 *  - Trampoline assembly (e.g., ftrace, kprobes)
+	 *  - Hypervisor-related assembly
+	 *  - Hibernation-related assembly
+	 *  - CPU start-stop, suspend-resume assembly
+	 *  - Kernel relocation assembly
+	 *
+	 * Some special cases covered by sym_code_functions[] deserve a mention
+	 * here:
+	 *
+	 *  - All EL1 interrupt and exception stack traces will be considered
+	 *    unreliable. This is the correct behavior as interrupts and
+	 *    exceptions can happen on any instruction including ones in the
+	 *    frame pointer prolog and epilog. Unless stack metadata is
+	 *    available so the unwinder can unwind through these special
+	 *    cases, such stack traces will be considered unreliable.
+	 *
+	 *  - A task can get preempted at the end of an interrupt. Stack
+	 *    traces of preempted tasks will show the interrupt frame in the
+	 *    stack trace and will be considered unreliable.
+	 *
+	 *  - Breakpoints are exceptions. So, all stack traces in the break
+	 *    point handler (including probes) will be considered unreliable.
+	 *
+	 *  - All of the ftrace entry trampolines are considered unreliable.
+	 *    So, all stack traces taken from tracer functions will be
+	 *    considered unreliable.
+	 *
+	 *  - The Function Graph Tracer return trampoline (return_to_handler)
+	 *    and the Kretprobe return trampoline (kretprobe_trampoline) are
+	 *    also considered unreliable.
+	 *
+	 * Some of the special cases above can be unwound through using
+	 * special logic in unwind_frame().
+	 *
+	 *  - return_to_handler() is handled by the unwinder by attempting
+	 *    to retrieve the original return address from the per-task
+	 *    return address stack.
+	 *
+	 *  - kretprobe_trampoline() can be handled in a similar fashion by
+	 *    attempting to retrieve the original return address from the
+	 *    per-task kretprobe instance list.
+	 *
+	 *  - I reckon optprobes can be handled in a similar fashion in the
+	 *    future?
+	 *
+	 *  - Stack traces taken from the FTrace tracer functions can be
+	 *    handled as well. ftrace_call is an inner label defined in the
+	 *    Ftrace entry trampoline. This is the location where the call
+	 *    to a tracer function is patched. So, if the return PC equals
+	 *    ftrace_call+4, it is reliable. At that point, proper stack
+	 *    frames have already been set up for the traced function and
+	 *    its caller.
+	 *
+	 * NOTE:
+	 *   If sym_code_functions[] were sorted, a binary search could be
+	 *   done to make this more performant.
+	 */
+	pc = frame->pc;
+	for (i = 0; i < num_sym_code_functions; i++) {
+		range = &sym_code_functions[i];
+		if (pc >= range->start && pc < range->end)
+			return UNWIND_CONTINUE_WITH_RISK;
+	}
 	return UNWIND_CONTINUE;
 }
 
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7eea7888bb02..ee203f7ca084 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -103,6 +103,12 @@ jiffies = jiffies_64;
 #define TRAMP_TEXT
 #endif
 
+#define SYM_CODE_FUNCTIONS				\
+	. = ALIGN(16);					\
+	__sym_code_functions_start = .;			\
+	KEEP(*(sym_code_functions))			\
+	__sym_code_functions_end = .;
+
 /*
  * The size of the PE/COFF section that covers the kernel image, which
  * runs from _stext to _edata, must be a round multiple of the PE/COFF
@@ -218,6 +224,7 @@ SECTIONS
 		CON_INITCALL
 		INIT_RAM_FS
 		*(.init.altinstructions .init.bss)	/* from the EFI stub */
+               SYM_CODE_FUNCTIONS
 	}
 	.exit.data : {
 		EXIT_DATA
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks
  2021-06-30 22:33 ` [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks madvenka
                     ` (2 preceding siblings ...)
  2021-06-30 22:33   ` [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list madvenka
@ 2021-07-26 13:49   ` Madhavan T. Venkataraman
  3 siblings, 0 replies; 28+ messages in thread
From: Madhavan T. Venkataraman @ 2021-07-26 13:49 UTC (permalink / raw)
  To: broonie, mark.rutland, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel

Hi Mark Brown, Mark Rutland,

Could you please review this version of reliable stack trace?

Thanks.

Madhavan

On 6/30/21 5:33 PM, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> 
> Unwinder return value
> =====================
> 
> Currently, the unwinder returns a tri-state return value:
> 
> 	0		means "continue with the unwind"
> 	-ENOENT		means "successful termination of the stack trace"
> 	-EINVAL		means "fatal error, abort the stack trace"
> 
> This is confusing. To fix this, define an enumeration of different return
> codes to make it clear.
> 
> enum {
> 	UNWIND_CONTINUE,		/* No errors encountered */
> 	UNWIND_ABORT,			/* Fatal errors encountered */
> 	UNWIND_FINISH,			/* End of stack reached successfully */
> };
> 
> Reliability checks
> ==================
> 
> There are a number of places in kernel code where the stack trace is not
> reliable. Enhance the unwinder to check for those cases.
> 
> Return address check
> --------------------
> 
> Check the return PC of every stack frame to make sure that it is a valid
> kernel text address (and not some generated code, for example).
> 
> Low-level function check
> ------------------------
> 
> Low-level assembly functions are, by nature, unreliable from an unwinder
> perspective. The unwinder must check for them in a stacktrace. See the
> "Assembly Functions" section below.
> 
> Other checks
> ------------
> 
> Other checks may be added in the future. Once all of the checks are in place,
> the unwinder can provide a reliable stack trace. But before this can be used
> for livepatch, some other entity needs to validate the frame pointer in kernel
> functions. objtool is currently being worked on to address that need.
> 
> Return code
> -----------
> 
> If a reliability check fails, it is a non-fatal error. The unwinder needs to
> return an appropriate code so the caller knows that some non-fatal error has
> occurred. Add another code to the enumeration:
> 
> enum {
> 	UNWIND_CONTINUE,		/* No errors encountered */
> 	UNWIND_CONTINUE_WITH_RISK,	/* Non-fatal errors encountered */
> 	UNWIND_ABORT,			/* Fatal errors encountered */
> 	UNWIND_FINISH,			/* End of stack reached successfully */
> };
> 
> When the unwinder returns UNWIND_CONTINUE_WITH_RISK:
> 
> 	- Debug-type callers can choose to continue the unwind
> 
> 	- Livepatch-type callers will stop unwinding
> 
> So, arch_stack_walk_reliable() (implemented in the future) will look like
> this:
> 
> /*
>  * Walk the stack like arch_stack_walk() but stop the walk as soon as
>  * some unreliability is detected in the stack.
>  */
> int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
> 			      void *cookie, struct task_struct *task)
> {
> 	struct stackframe frame;
> 	enum unwind_rc rc;
> 
> 	if (task == current) {
> 		rc = start_backtrace(&frame,
> 				(unsigned long)__builtin_frame_address(0),
> 				(unsigned long)arch_stack_walk_reliable);
> 	} else {
> 		/*
> 		 * The task must not be running anywhere for the duration of
> 		 * arch_stack_walk_reliable(). The caller must guarantee
> 		 * this.
> 		 */
> 		rc = start_backtrace(&frame,
> 				     thread_saved_fp(task),
> 				     thread_saved_pc(task));
> 	}
> 
> 	while (rc == UNWIND_CONTINUE) {
> 		if (!consume_entry(cookie, frame.pc))
> 			return -EINVAL;
> 		rc = unwind_frame(task, &frame);
> 	}
> 
> 	return rc == UNWIND_FINISH ? 0 : -EINVAL;
> }
> 
> Assembly functions
> ==================
> 
> There are a number of assembly functions in arm64. Except for a couple of
> them, these functions do not have a frame pointer prolog or epilog. Also,
> many of them manipulate low-level state such as registers. These functions
> are, by definition, unreliable from a stack unwinding perspective. That is,
> when these functions occur in a stack trace, the unwinder would not be able
> to unwind through them reliably.
> 
> Assembly functions are defined as SYM_FUNC_*() functions or SYM_CODE_*()
> functions. objtool peforms static analysis of SYM_FUNC functions. It ignores
> SYM_CODE functions because they have low level code that is difficult to
> analyze. When objtool becomes ready eventually, SYM_FUNC functions will
> be analyzed and "fixed" as necessary. So, they are not "interesting" for
> the reliable unwinder.
> 
> That leaves SYM_CODE functions. It is for the unwinder to deal with these
> for reliable stack trace. The unwinder needs to do the following:
> 
> 	- Recognize SYM_CODE functions in a stack trace.
> 
> 	- If a particular SYM_CODE function can be unwinded through using
> 	  some special logic, then do it. E.g., the return trampoline for
> 	  Function Graph Tracing.
> 
> 	- Otherwise, return UNWIND_CONTINUE_WITH_RISK.
> 
> Current approach
> ================
> 
> Define an ARM64 version of SYM_CODE_END() like this:
> 
> #define SYM_CODE_END(name)				\
> 	SYM_END(name, SYM_T_NONE)			;\
> 	99:						;\
> 	.pushsection "sym_code_functions", "aw"		;\
> 	.quad	name					;\
> 	.quad	99b					;\
> 	.popsection
> 
> The above macro does the usual SYM_END(). In addition, it records the
> function's address range in a special section called "sym_code_functions".
> This way, all SYM_CODE functions get recorded in the section automatically.
> 
> Implement an early_initcall() called init_sym_code_functions() that allocates
> an array called sym_code_functions[] and copies the function ranges from the
> section to the array.
> 
> Add a reliability check in unwind_check_frame() that compares a return
> PC with sym_code_functions[]. If there is a match, then return
> UNWIND_CONTINUE_WITH_RISK.
> 
> Call unwinder_is_unreliable() on every return PC from unwind_frame(). If there
> is a match, then return UNWIND_CONTINUE_WITH_RISK.
> 
> Last stack frame
> ================
> 
> If a SYM_CODE function occurs in the very last frame in the stack trace,
> then the stack trace is not considered unreliable. This is because there
> is no more unwinding to do. Examples:
> 
> 	- EL0 exception stack traces end in the top level EL0 exception
> 	  handlers.
> 
> 	- All kernel thread stack traces end in ret_from_fork().
> 
> Special SYM_CODE functions
> ==========================
> 
> The return trampolines of the Function Graph Tracer and Kretprobe can
> be recognized by the unwinder. If the return PCs can be translated to the
> original PCs, then, the unwinder should perform that translation before
> checking for reliability. The final PC that we end up with after all the
> translations is the one we need to check for reliability.
> 
> Accordingly, I have moved the reliability checks to after the logic that
> handles the Function Graph Tracer.
> 
> So, the approach is - all SYM_CODE functions are unreliable. If a SYM_CODE
> function is "fixed" to make it reliable, then it should become a SYM_FUNC
> function. Or, if the unwinder has special logic to unwind through a SYM_CODE
> function, then that can be done.
> 
> Special cases
> =============
> 
> Some special cases need to be mentioned:
> 
> 	- EL1 interrupt and exception handlers end up in sym_code_ranges[].
> 	  So, all EL1 interrupt and exception stack traces will be considered
> 	  unreliable. This the correct behavior as interrupts and exceptions
> 	  can happen on any instruction including ones in the frame pointer
> 	  prolog and epilog. Unless objtool generates metadata so the unwinder
> 	  can unwind through these special cases, such stack traces will be
> 	  considered unreliable.
> 
> 	- A task can get preempted at the end of an interrupt. Stack traces
> 	  of preempted tasks will show the interrupt frame in the stack trace
> 	  and will be considered unreliable.
> 
> 	- Breakpoints are exceptions. So, all stack traces in the break point
> 	  handler (including probes) will be considered unreliable.
> 
> 	- All of the ftrace trampolines end up in sym_code_functions[]. All
> 	  stack traces taken from tracer functions will be considered
> 	  unreliable.
> ---
> Changelog:
> 
> v6:
> 	From Mark Rutland:
> 
> 	- The per-frame reliability concept and flag are acceptable. But more
> 	  work is needed to make the per-frame checks more accurate and more
> 	  complete. E.g., some code reorg is being worked on that will help.
> 
> 	  I have now removed the frame->reliable flag and deleted the whole
> 	  concept of per-frame status. This is orthogonal to this patch series.
> 	  Instead, I have improved the unwinder to return proper return codes
> 	  so a caller can take appropriate action without needing per-frame
> 	  status.
> 
> 	- Remove the mention of PLTs and update the comment.
> 
> 	  I have replaced the comment above the call to __kernel_text_address()
> 	  with the comment suggested by Mark Rutland.
> 
> 	Other comments:
> 
> 	- Other comments on the per-frame stuff are not relevant because
> 	  that approach is not there anymore.
> 
> v5:
> 	From Keiya Nobuta:
> 	
> 	- The term blacklist(ed) is not to be used anymore. I have changed it
> 	  to unreliable. So, the function unwinder_blacklisted() has been
> 	  changed to unwinder_is_unreliable().
> 
> 	From Mark Brown:
> 
> 	- Add a comment for the "reliable" flag in struct stackframe. The
> 	  reliability attribute is not complete until all the checks are
> 	  in place. Added a comment above struct stackframe.
> 
> 	- Include some of the comments in the cover letter in the actual
> 	  code so that we can compare it with the reliable stack trace
> 	  requirements document for completeness. I have added a comment:
> 
> 	  	- above unwinder_is_unreliable() that lists the requirements
> 		  that are addressed by the function.
> 
> 		- above the __kernel_text_address() call about all the cases
> 		  the call covers.
> 
> v4:
> 	From Mark Brown:
> 
> 	- I was checking the return PC with __kernel_text_address() before
> 	  the Function Graph trace handling. Mark Brown felt that all the
> 	  reliability checks should be performed on the original return PC
> 	  once that is obtained. So, I have moved all the reliability checks
> 	  to after the Function Graph Trace handling code in the unwinder.
> 	  Basically, the unwinder should perform PC translations first (for
> 	  rhe return trampoline for Function Graph Tracing, Kretprobes, etc).
> 	  Then, the reliability checks should be applied to the resulting
> 	  PC.
> 
> 	- Mark said to improve the naming of the new functions so they don't
> 	  collide with existing ones. I have used a prefix "unwinder_" for
> 	  all the new functions.
> 
> 	From Josh Poimboeuf:
> 
> 	- In the error scenarios in the unwinder, the reliable flag in the
> 	  stack frame should be set. Implemented this.
> 
> 	- Some of the other comments are not relevant to the new code as
> 	  I have taken a different approach in the new code. That is why
> 	  I have not made those changes. E.g., Ard wanted me to add the
> 	  "const" keyword to the global section array. That array does not
> 	  exist in v4. Similarly, Mark Brown said to use ARRAY_SIZE() for
> 	  the same array in a for loop.
> 
> 	Other changes:
> 
> 	- Add a new definition for SYM_CODE_END() that adds the address
> 	  range of the function to a special section called
> 	  "sym_code_functions".
> 
> 	- Include the new section under initdata in vmlinux.lds.S.
> 
> 	- Define an early_initcall() to copy the contents of the
> 	  "sym_code_functions" section to an array by the same name.
> 
> 	- Define a function unwinder_blacklisted() that compares a return
> 	  PC against sym_code_sections[]. If there is a match, mark the
> 	  stack trace unreliable. Call this from unwind_frame().
> 
> v3:
> 	- Implemented a sym_code_ranges[] array to contains sections bounds
> 	  for text sections that contain SYM_CODE_*() functions. The unwinder
> 	  checks each return PC against the sections. If it falls in any of
> 	  the sections, the stack trace is marked unreliable.
> 
> 	- Moved SYM_CODE functions from .text and .init.text into a new
> 	  text section called ".code.text". Added this section to
> 	  vmlinux.lds.S and sym_code_ranges[].
> 
> 	- Fixed the logic in the unwinder that handles Function Graph
> 	  Tracer return trampoline.
> 
> 	- Removed all the previous code that handles:
> 		- ftrace entry code for traced function
> 		- special_functions[] array that lists individual functions
> 		- kretprobe_trampoline() special case
> 
> v2
> 	- Removed the terminating entry { 0, 0 } in special_functions[]
> 	  and replaced it with the idiom { /* sentinel */ }.
> 
> 	- Change the ftrace trampoline entry ftrace_graph_call in
> 	  special_functions[] to ftrace_call + 4 and added explanatory
> 	  comments.
> 
> 	- Unnested #ifdefs in special_functions[] for FTRACE.
> 
> v1
> 	- Define a bool field in struct stackframe. This will indicate if
> 	  a stack trace is reliable.
> 
> 	- Implement a special_functions[] array that will be populated
> 	  with special functions in which the stack trace is considered
> 	  unreliable.
> 	
> 	- Using kallsyms_lookup(), get the address ranges for the special
> 	  functions and record them.
> 
> 	- Implement an is_reliable_function(pc). This function will check
> 	  if a given return PC falls in any of the special functions. If
> 	  it does, the stack trace is unreliable.
> 
> 	- Implement check_reliability() function that will check if a
> 	  stack frame is reliable. Call is_reliable_function() from
> 	  check_reliability().
> 
> 	- Before a return PC is checked against special_funtions[], it
> 	  must be validates as a proper kernel text address. Call
> 	  __kernel_text_address() from check_reliability().
> 
> 	- Finally, call check_reliability() from unwind_frame() for
> 	  each stack frame.
> 
> 	- Add EL1 exception handlers to special_functions[].
> 
> 		el1_sync();
> 		el1_irq();
> 		el1_error();
> 		el1_sync_invalid();
> 		el1_irq_invalid();
> 		el1_fiq_invalid();
> 		el1_error_invalid();
> 
> 	- The above functions are currently defined as LOCAL symbols.
> 	  Make them global so that they can be referenced from the
> 	  unwinder code.
> 
> 	- Add FTRACE trampolines to special_functions[]:
> 
> 		ftrace_graph_call()
> 		ftrace_graph_caller()
> 		return_to_handler()
> 
> 	- Add the kretprobe trampoline to special functions[]:
> 
> 		kretprobe_trampoline()
> 
> Previous versions and discussion
> ================================
> 
> v5: https://lore.kernel.org/linux-arm-kernel/20210526214917.20099-1-madvenka@linux.microsoft.com/
> v4: https://lore.kernel.org/linux-arm-kernel/20210516040018.128105-1-madvenka@linux.microsoft.com/
> v3: https://lore.kernel.org/linux-arm-kernel/20210503173615.21576-1-madvenka@linux.microsoft.com/
> v2: https://lore.kernel.org/linux-arm-kernel/20210405204313.21346-1-madvenka@linux.microsoft.com/
> v1: https://lore.kernel.org/linux-arm-kernel/20210330190955.13707-1-madvenka@linux.microsoft.com/
> Madhavan T. Venkataraman (3):
>   arm64: Improve the unwinder return value
>   arm64: Introduce stack trace reliability checks in the unwinder
>   arm64: Create a list of SYM_CODE functions, check return PC against
>     list
> 
>  arch/arm64/include/asm/linkage.h    |  12 ++
>  arch/arm64/include/asm/sections.h   |   1 +
>  arch/arm64/include/asm/stacktrace.h |  16 ++-
>  arch/arm64/kernel/perf_callchain.c  |   5 +-
>  arch/arm64/kernel/process.c         |   8 +-
>  arch/arm64/kernel/return_address.c  |  10 +-
>  arch/arm64/kernel/stacktrace.c      | 180 ++++++++++++++++++++++++----
>  arch/arm64/kernel/time.c            |   9 +-
>  arch/arm64/kernel/vmlinux.lds.S     |   7 ++
>  9 files changed, 213 insertions(+), 35 deletions(-)
> 
> 
> base-commit: bf05bf16c76bb44ab5156223e1e58e26dfe30a88
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v6 1/3] arm64: Improve the unwinder return value
  2021-06-30 22:33   ` [RFC PATCH v6 1/3] arm64: Improve the unwinder return value madvenka
@ 2021-07-28 16:56     ` Mark Rutland
  2021-07-29 13:54       ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2021-07-28 16:56 UTC (permalink / raw)
  To: madvenka
  Cc: broonie, jpoimboe, ardb, nobuta.keiya, sjitindarsingh,
	catalin.marinas, will, jmorris, pasha.tatashin, jthierry,
	linux-arm-kernel, live-patching, linux-kernel

On Wed, Jun 30, 2021 at 05:33:54PM -0500, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> 
> Currently, the unwinder returns a tri-state return value:
> 
> 	0		means "continue with the unwind"
> 	-ENOENT		means "successful termination of the stack trace"
> 	-EINVAL		means "fatal error, abort the stack trace"
> 
> This is confusing. To fix this, define an enumeration of different return
> codes to make it clear. Handle the return codes in all of the unwind
> consumers.

I agree the tri-state is confusing, and I also generally agree that
enums are preferabel to a set of error codes. However, I don't think
this is quite the right abstraction; more on that below.

> 
> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> ---
>  arch/arm64/include/asm/stacktrace.h | 14 ++++++--
>  arch/arm64/kernel/perf_callchain.c  |  5 ++-
>  arch/arm64/kernel/process.c         |  8 +++--
>  arch/arm64/kernel/return_address.c  | 10 ++++--
>  arch/arm64/kernel/stacktrace.c      | 53 ++++++++++++++++-------------
>  arch/arm64/kernel/time.c            |  9 +++--
>  6 files changed, 64 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> index eb29b1fe8255..6fcd58553fb1 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -30,6 +30,12 @@ struct stack_info {
>  	enum stack_type type;
>  };
>  
> +enum unwind_rc {
> +	UNWIND_CONTINUE,		/* No errors encountered */
> +	UNWIND_ABORT,			/* Fatal errors encountered */
> +	UNWIND_FINISH,			/* End of stack reached successfully */
> +};

Generally, there are a bunch of properties we might need to check for an
unwind step relating to reliabiltiy (e.g. as you add
UNWIND_CONTINUE_WITH_RISK in the next patch), and I'd prefer that we
check those properties on the struct stackframe, and simplify
unwind_frame() to return a bool.

Something akin to the x86 unwinders, where the main loop looks like:

for (unwind_start(&state, ...);
     !unwind_done(&state) && !unwind_error(&state);
     unwind_next_frame(&state) {
	...
}

That way we don't have to grow the enum to handle every variation that
we can think of, and it's simple enough for users to check the
properties with the helpers.

> +
>  /*
>   * A snapshot of a frame record or fp/lr register values, along with some
>   * accounting information necessary for robust unwinding.
> @@ -61,7 +67,8 @@ struct stackframe {
>  #endif
>  };
>  
> -extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
> +extern enum unwind_rc unwind_frame(struct task_struct *tsk,
> +				   struct stackframe *frame);
>  extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
>  			    bool (*fn)(void *, unsigned long), void *data);
>  extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
> @@ -148,8 +155,8 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
>  	return false;
>  }
>  
> -static inline void start_backtrace(struct stackframe *frame,
> -				   unsigned long fp, unsigned long pc)
> +static inline enum unwind_rc start_backtrace(struct stackframe *frame,
> +					     unsigned long fp, unsigned long pc)
>  {
>  	frame->fp = fp;
>  	frame->pc = pc;
> @@ -169,6 +176,7 @@ static inline void start_backtrace(struct stackframe *frame,
>  	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
>  	frame->prev_fp = 0;
>  	frame->prev_type = STACK_TYPE_UNKNOWN;
> +	return UNWIND_CONTINUE;
>  }
>  
>  #endif	/* __ASM_STACKTRACE_H */
> diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
> index 88ff471b0bce..f459208149ae 100644
> --- a/arch/arm64/kernel/perf_callchain.c
> +++ b/arch/arm64/kernel/perf_callchain.c
> @@ -148,13 +148,16 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
>  			   struct pt_regs *regs)
>  {
>  	struct stackframe frame;
> +	enum unwind_rc rc;
>  
>  	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
>  		/* We don't support guest os callchain now */
>  		return;
>  	}
>  
> -	start_backtrace(&frame, regs->regs[29], regs->pc);
> +	rc = start_backtrace(&frame, regs->regs[29], regs->pc);
> +	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
> +		return;
>  	walk_stackframe(current, &frame, callchain_trace, entry);

As a first step, could we convert this over to arch_stack_walk()?

>  }
>  
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 6e60aa3b5ea9..e9c763b44fd4 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -573,6 +573,7 @@ unsigned long get_wchan(struct task_struct *p)
>  	struct stackframe frame;
>  	unsigned long stack_page, ret = 0;
>  	int count = 0;
> +	enum unwind_rc rc;
>  	if (!p || p == current || p->state == TASK_RUNNING)
>  		return 0;
>  
> @@ -580,10 +581,13 @@ unsigned long get_wchan(struct task_struct *p)
>  	if (!stack_page)
>  		return 0;
>  
> -	start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
> +	rc = start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
> +	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
> +		return 0;
>  
>  	do {
> -		if (unwind_frame(p, &frame))
> +		rc = unwind_frame(p, &frame);
> +		if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
>  			goto out;
>  		if (!in_sched_functions(frame.pc)) {
>  			ret = frame.pc;

Likewise, can we convert this to use arch_stack_walk()?

> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
> index a6d18755652f..1224e043e98f 100644
> --- a/arch/arm64/kernel/return_address.c
> +++ b/arch/arm64/kernel/return_address.c
> @@ -36,13 +36,17 @@ void *return_address(unsigned int level)
>  {
>  	struct return_address_data data;
>  	struct stackframe frame;
> +	enum unwind_rc rc;
>  
>  	data.level = level + 2;
>  	data.addr = NULL;
>  
> -	start_backtrace(&frame,
> -			(unsigned long)__builtin_frame_address(0),
> -			(unsigned long)return_address);
> +	rc = start_backtrace(&frame,
> +			     (unsigned long)__builtin_frame_address(0),
> +			     (unsigned long)return_address);
> +	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
> +		return NULL;
> +
>  	walk_stackframe(current, &frame, save_return_addr, &data);

Likewise, can we convert this to use arch_stack_walk()?

Thanks,
Mark.

>  
>  	if (!data.level)
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index d55bdfb7789c..e9c2c1fa9dde 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -39,26 +39,27 @@
>   * records (e.g. a cycle), determined based on the location and fp value of A
>   * and the location (but not the fp value) of B.
>   */
> -int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> +enum unwind_rc notrace unwind_frame(struct task_struct *tsk,
> +					struct stackframe *frame)
>  {
>  	unsigned long fp = frame->fp;
>  	struct stack_info info;
>  
>  	/* Terminal record; nothing to unwind */
>  	if (!fp)
> -		return -ENOENT;
> +		return UNWIND_FINISH;
>  
>  	if (fp & 0xf)
> -		return -EINVAL;
> +		return UNWIND_ABORT;
>  
>  	if (!tsk)
>  		tsk = current;
>  
>  	if (!on_accessible_stack(tsk, fp, &info))
> -		return -EINVAL;
> +		return UNWIND_ABORT;
>  
>  	if (test_bit(info.type, frame->stacks_done))
> -		return -EINVAL;
> +		return UNWIND_ABORT;
>  
>  	/*
>  	 * As stacks grow downward, any valid record on the same stack must be
> @@ -75,7 +76,7 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
>  	 */
>  	if (info.type == frame->prev_type) {
>  		if (fp <= frame->prev_fp)
> -			return -EINVAL;
> +			return UNWIND_ABORT;
>  	} else {
>  		set_bit(frame->prev_type, frame->stacks_done);
>  	}
> @@ -101,14 +102,14 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
>  		 */
>  		ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
>  		if (WARN_ON_ONCE(!ret_stack))
> -			return -EINVAL;
> +			return UNWIND_ABORT;
>  		frame->pc = ret_stack->ret;
>  	}
>  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>  
>  	frame->pc = ptrauth_strip_insn_pac(frame->pc);
>  
> -	return 0;
> +	return UNWIND_CONTINUE;
>  }
>  NOKPROBE_SYMBOL(unwind_frame);
>  
> @@ -116,12 +117,12 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
>  			     bool (*fn)(void *, unsigned long), void *data)
>  {
>  	while (1) {
> -		int ret;
> +		enum unwind_rc rc;
>  
>  		if (!fn(data, frame->pc))
>  			break;
> -		ret = unwind_frame(tsk, frame);
> -		if (ret < 0)
> +		rc = unwind_frame(tsk, frame);
> +		if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
>  			break;
>  	}
>  }
> @@ -137,6 +138,7 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>  {
>  	struct stackframe frame;
>  	int skip = 0;
> +	enum unwind_rc rc;
>  
>  	pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
>  
> @@ -153,17 +155,19 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>  		return;
>  
>  	if (tsk == current) {
> -		start_backtrace(&frame,
> -				(unsigned long)__builtin_frame_address(0),
> -				(unsigned long)dump_backtrace);
> +		rc = start_backtrace(&frame,
> +				     (unsigned long)__builtin_frame_address(0),
> +				     (unsigned long)dump_backtrace);
>  	} else {
>  		/*
>  		 * task blocked in __switch_to
>  		 */
> -		start_backtrace(&frame,
> -				thread_saved_fp(tsk),
> -				thread_saved_pc(tsk));
> +		rc = start_backtrace(&frame,
> +				     thread_saved_fp(tsk),
> +				     thread_saved_pc(tsk));
>  	}
> +	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
> +		return;
>  
>  	printk("%sCall trace:\n", loglvl);
>  	do {
> @@ -181,7 +185,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>  			 */
>  			dump_backtrace_entry(regs->pc, loglvl);
>  		}
> -	} while (!unwind_frame(tsk, &frame));
> +		rc = unwind_frame(tsk, &frame);
> +	} while (rc != UNWIND_FINISH && rc != UNWIND_ABORT);
>  
>  	put_task_stack(tsk);
>  }
> @@ -199,17 +204,19 @@ noinline void arch_stack_walk(stack_trace_consume_fn consume_entry,
>  			      struct pt_regs *regs)
>  {
>  	struct stackframe frame;
> +	enum unwind_rc rc;
>  
>  	if (regs)
> -		start_backtrace(&frame, regs->regs[29], regs->pc);
> +		rc = start_backtrace(&frame, regs->regs[29], regs->pc);
>  	else if (task == current)
> -		start_backtrace(&frame,
> +		rc = start_backtrace(&frame,
>  				(unsigned long)__builtin_frame_address(1),
>  				(unsigned long)__builtin_return_address(0));
>  	else
> -		start_backtrace(&frame, thread_saved_fp(task),
> -				thread_saved_pc(task));
> -
> +		rc = start_backtrace(&frame, thread_saved_fp(task),
> +				     thread_saved_pc(task));
> +	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
> +		return;
>  	walk_stackframe(task, &frame, consume_entry, cookie);
>  }
>  
> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> index eebbc8d7123e..eb50218ec9a4 100644
> --- a/arch/arm64/kernel/time.c
> +++ b/arch/arm64/kernel/time.c
> @@ -35,15 +35,18 @@
>  unsigned long profile_pc(struct pt_regs *regs)
>  {
>  	struct stackframe frame;
> +	enum unwind_rc rc;
>  
>  	if (!in_lock_functions(regs->pc))
>  		return regs->pc;
>  
> -	start_backtrace(&frame, regs->regs[29], regs->pc);
> +	rc = start_backtrace(&frame, regs->regs[29], regs->pc);
> +	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
> +		return 0;
>  
>  	do {
> -		int ret = unwind_frame(NULL, &frame);
> -		if (ret < 0)
> +		rc = unwind_frame(NULL, &frame);
> +		if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
>  			return 0;
>  	} while (in_lock_functions(frame.pc));
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list
  2021-06-30 22:33   ` [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list madvenka
@ 2021-07-28 17:25     ` Mark Rutland
  2021-07-29 14:06       ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2021-07-28 17:25 UTC (permalink / raw)
  To: madvenka
  Cc: broonie, jpoimboe, ardb, nobuta.keiya, sjitindarsingh,
	catalin.marinas, will, jmorris, pasha.tatashin, jthierry,
	linux-arm-kernel, live-patching, linux-kernel

On Wed, Jun 30, 2021 at 05:33:56PM -0500, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> 
> The unwinder should check if the return PC falls in any function that
> is considered unreliable from an unwinding perspective. If it does,
> return UNWIND_CONTINUE_WITH_RISK.
> 
> Function types
> ==============
> 
> The compiler generates code for C functions and assigns the type STT_FUNC
> to them.
> 
> Assembly functions are manually assigned a type:
> 
> 	- STT_FUNC for functions defined with SYM_FUNC*() macros
> 
> 	- STT_NONE for functions defined with SYM_CODE*() macros
> 
> In the future, STT_FUNC functions will be analyzed by objtool and "fixed"
> as necessary. So, they are not "interesting" to the reliable unwinder in
> the kernel.
> 
> That leaves SYM_CODE*() functions. These contain low-level code that is
> difficult or impossible for objtool to analyze. So, objtool ignores them
> leaving them to the reliable unwinder. These functions must be considered
> unreliable from an unwinding perspective.
> 
> Define a special section for unreliable functions
> =================================================
> 
> Define a SYM_CODE_END() macro for arm64 that adds the function address
> range to a new section called "sym_code_functions".
> 
> Linker file
> ===========
> 
> Include the "sym_code_functions" section under initdata in vmlinux.lds.S.
> 
> Initialization
> ==============
> 
> Define an early_initcall() to copy the function address ranges from the
> "sym_code_functions" section to an array by the same name.
> 
> Unwinder check
> ==============
> 
> Add a reliability check in unwind_check_frame() that compares a return
> PC with sym_code_functions[]. If there is a match, then return
> UNWIND_CONTINUE_WITH_RISK.
> 
> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> ---
>  arch/arm64/include/asm/linkage.h  |  12 ++++
>  arch/arm64/include/asm/sections.h |   1 +
>  arch/arm64/kernel/stacktrace.c    | 112 ++++++++++++++++++++++++++++++
>  arch/arm64/kernel/vmlinux.lds.S   |   7 ++
>  4 files changed, 132 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
> index ba89a9af820a..3b5f1fd332b0 100644
> --- a/arch/arm64/include/asm/linkage.h
> +++ b/arch/arm64/include/asm/linkage.h
> @@ -60,4 +60,16 @@
>  		SYM_FUNC_END(x);		\
>  		SYM_FUNC_END_ALIAS(__pi_##x)
>  
> +/*
> + * Record the address range of each SYM_CODE function in a struct code_range
> + * in a special section.
> + */
> +#define SYM_CODE_END(name)				\
> +	SYM_END(name, SYM_T_NONE)			;\
> +	99:						;\
> +	.pushsection "sym_code_functions", "aw"		;\
> +	.quad	name					;\
> +	.quad	99b					;\
> +	.popsection
> +
>  #endif
> diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
> index 2f36b16a5b5d..29cb566f65ec 100644
> --- a/arch/arm64/include/asm/sections.h
> +++ b/arch/arm64/include/asm/sections.h
> @@ -20,5 +20,6 @@ extern char __exittext_begin[], __exittext_end[];
>  extern char __irqentry_text_start[], __irqentry_text_end[];
>  extern char __mmuoff_data_start[], __mmuoff_data_end[];
>  extern char __entry_tramp_text_start[], __entry_tramp_text_end[];
> +extern char __sym_code_functions_start[], __sym_code_functions_end[];
>  
>  #endif /* __ASM_SECTIONS_H */
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index ba7b97b119e4..5d5728c3088e 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -18,11 +18,43 @@
>  #include <asm/stack_pointer.h>
>  #include <asm/stacktrace.h>
>  
> +struct code_range {
> +	unsigned long	start;
> +	unsigned long	end;
> +};
> +
> +static struct code_range	*sym_code_functions;
> +static int			num_sym_code_functions;
> +
> +int __init init_sym_code_functions(void)
> +{
> +	size_t size;
> +
> +	size = (unsigned long)__sym_code_functions_end -
> +	       (unsigned long)__sym_code_functions_start;
> +
> +	sym_code_functions = kmalloc(size, GFP_KERNEL);
> +	if (!sym_code_functions)
> +		return -ENOMEM;
> +
> +	memcpy(sym_code_functions, __sym_code_functions_start, size);
> +	/* Update num_sym_code_functions after copying sym_code_functions. */
> +	smp_mb();
> +	num_sym_code_functions = size / sizeof(struct code_range);
> +
> +	return 0;
> +}
> +early_initcall(init_sym_code_functions);

What's the point of copying this, given we don't even sort it?

If we need to keep it around, it would be nicer to leave it where the
linker put it, but make it rodata or ro_after_init.

> +
>  /*
>   * Check the stack frame for conditions that make unwinding unreliable.
>   */
>  enum unwind_rc unwind_check_frame(struct stackframe *frame)
>  {
> +	const struct code_range *range;
> +	unsigned long pc;
> +	int i;
> +
>  	/*
>  	 * If the PC is not a known kernel text address, then we cannot
>  	 * be sure that a subsequent unwind will be reliable, as we
> @@ -30,6 +62,86 @@ enum unwind_rc unwind_check_frame(struct stackframe *frame)
>  	 */
>  	if (!__kernel_text_address(frame->pc))
>  		return UNWIND_CONTINUE_WITH_RISK;

As per patch 1, I'd prefer we had something like an
unwind_is_unreliable() helper, which can return a boolean in this case.

> +
> +	/*
> +	 * If the final frame has been reached, there is no more unwinding
> +	 * to do. There is no need to check if the return PC is considered
> +	 * unreliable by the unwinder.
> +	 */
> +	if (!frame->fp)
> +		return UNWIND_CONTINUE;

As mentioned on patch 1, I'd rather the main unwind loop checked for the
final frame specifically before trying to unwind. With that in mind, we
should never try to unwind to a NULL fp.

> +
> +	/*
> +	 * Check the return PC against sym_code_functions[]. If there is a
> +	 * match, then the consider the stack frame unreliable. These functions
> +	 * contain low-level code where the frame pointer and/or the return
> +	 * address register cannot be relied upon. This addresses the following
> +	 * situations:
> +	 *
> +	 *  - Exception handlers and entry assembly
> +	 *  - Trampoline assembly (e.g., ftrace, kprobes)
> +	 *  - Hypervisor-related assembly
> +	 *  - Hibernation-related assembly
> +	 *  - CPU start-stop, suspend-resume assembly
> +	 *  - Kernel relocation assembly
> +	 *
> +	 * Some special cases covered by sym_code_functions[] deserve a mention
> +	 * here:
> +	 *
> +	 *  - All EL1 interrupt and exception stack traces will be considered
> +	 *    unreliable. This is the correct behavior as interrupts and
> +	 *    exceptions can happen on any instruction including ones in the
> +	 *    frame pointer prolog and epilog. Unless stack metadata is
> +	 *    available so the unwinder can unwind through these special
> +	 *    cases, such stack traces will be considered unreliable.

As mentioned previously, we *can* reliably unwind precisely one step
across an exception boundary, as we can be certain of the PC value at
the time the exception was taken, but past this we can't be certain
whether the LR is legitimate.

I'd like that we capture that precisely in the unwinder, and I'm
currently reworking the entry assembly to make that possible.

> +	 *
> +	 *  - A task can get preempted at the end of an interrupt. Stack
> +	 *    traces of preempted tasks will show the interrupt frame in the
> +	 *    stack trace and will be considered unreliable.
> +	 *
> +	 *  - Breakpoints are exceptions. So, all stack traces in the break
> +	 *    point handler (including probes) will be considered unreliable.
> +	 *
> +	 *  - All of the ftrace entry trampolines are considered unreliable.
> +	 *    So, all stack traces taken from tracer functions will be
> +	 *    considered unreliable.
> +	 *
> +	 *  - The Function Graph Tracer return trampoline (return_to_handler)
> +	 *    and the Kretprobe return trampoline (kretprobe_trampoline) are
> +	 *    also considered unreliable.

We should be able to unwind these reliably if we specifically identify
them. I think we need a two-step check here; we should assume that
SYM_CODE() is unreliable by default, but in specific cases we should
unwind that reliably.

> +	 * Some of the special cases above can be unwound through using
> +	 * special logic in unwind_frame().
> +	 *
> +	 *  - return_to_handler() is handled by the unwinder by attempting
> +	 *    to retrieve the original return address from the per-task
> +	 *    return address stack.
> +	 *
> +	 *  - kretprobe_trampoline() can be handled in a similar fashion by
> +	 *    attempting to retrieve the original return address from the
> +	 *    per-task kretprobe instance list.

We don't do this today,

> +	 *
> +	 *  - I reckon optprobes can be handled in a similar fashion in the
> +	 *    future?
> +	 *
> +	 *  - Stack traces taken from the FTrace tracer functions can be
> +	 *    handled as well. ftrace_call is an inner label defined in the
> +	 *    Ftrace entry trampoline. This is the location where the call
> +	 *    to a tracer function is patched. So, if the return PC equals
> +	 *    ftrace_call+4, it is reliable. At that point, proper stack
> +	 *    frames have already been set up for the traced function and
> +	 *    its caller.
> +	 *
> +	 * NOTE:
> +	 *   If sym_code_functions[] were sorted, a binary search could be
> +	 *   done to make this more performant.
> +	 */

Since some of the above is speculative (e.g. the bit about optprobes),
and as code will change over time, I think we should have a much terser
comment, e.g.

	/*
	 * As SYM_CODE functions don't follow the usual calling
	 * conventions, we assume by default that any SYM_CODE function
	 * cannot be unwound reliably.
	 *
	 * Note that this includes exception entry/return sequences and
	 * trampoline for ftrace and kprobes.
	 */

... and then if/when we try to unwind a specific SYM_CODE function
reliably, we add the comment for that specifically.

Thanks,
Mark.

> +	pc = frame->pc;
> +	for (i = 0; i < num_sym_code_functions; i++) {
> +		range = &sym_code_functions[i];
> +		if (pc >= range->start && pc < range->end)
> +			return UNWIND_CONTINUE_WITH_RISK;
> +	}
>  	return UNWIND_CONTINUE;
>  }
>  
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 7eea7888bb02..ee203f7ca084 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -103,6 +103,12 @@ jiffies = jiffies_64;
>  #define TRAMP_TEXT
>  #endif
>  
> +#define SYM_CODE_FUNCTIONS				\
> +	. = ALIGN(16);					\
> +	__sym_code_functions_start = .;			\
> +	KEEP(*(sym_code_functions))			\
> +	__sym_code_functions_end = .;
> +
>  /*
>   * The size of the PE/COFF section that covers the kernel image, which
>   * runs from _stext to _edata, must be a round multiple of the PE/COFF
> @@ -218,6 +224,7 @@ SECTIONS
>  		CON_INITCALL
>  		INIT_RAM_FS
>  		*(.init.altinstructions .init.bss)	/* from the EFI stub */
> +               SYM_CODE_FUNCTIONS
>  	}
>  	.exit.data : {
>  		EXIT_DATA
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v6 1/3] arm64: Improve the unwinder return value
  2021-07-28 16:56     ` Mark Rutland
@ 2021-07-29 13:54       ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 28+ messages in thread
From: Madhavan T. Venkataraman @ 2021-07-29 13:54 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, ardb, nobuta.keiya, sjitindarsingh,
	catalin.marinas, will, jmorris, pasha.tatashin, jthierry,
	linux-arm-kernel, live-patching, linux-kernel

Thanks for the review. Responses inline...

On 7/28/21 11:56 AM, Mark Rutland wrote:
> On Wed, Jun 30, 2021 at 05:33:54PM -0500, madvenka@linux.microsoft.com wrote:
>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>
>> Currently, the unwinder returns a tri-state return value:
>>
>> 	0		means "continue with the unwind"
>> 	-ENOENT		means "successful termination of the stack trace"
>> 	-EINVAL		means "fatal error, abort the stack trace"
>>
>> This is confusing. To fix this, define an enumeration of different return
>> codes to make it clear. Handle the return codes in all of the unwind
>> consumers.
> 
> I agree the tri-state is confusing, and I also generally agree that
> enums are preferabel to a set of error codes. However, I don't think
> this is quite the right abstraction; more on that below.
> 

OK.

>>
>> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
>> ---
>>  arch/arm64/include/asm/stacktrace.h | 14 ++++++--
>>  arch/arm64/kernel/perf_callchain.c  |  5 ++-
>>  arch/arm64/kernel/process.c         |  8 +++--
>>  arch/arm64/kernel/return_address.c  | 10 ++++--
>>  arch/arm64/kernel/stacktrace.c      | 53 ++++++++++++++++-------------
>>  arch/arm64/kernel/time.c            |  9 +++--
>>  6 files changed, 64 insertions(+), 35 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
>> index eb29b1fe8255..6fcd58553fb1 100644
>> --- a/arch/arm64/include/asm/stacktrace.h
>> +++ b/arch/arm64/include/asm/stacktrace.h
>> @@ -30,6 +30,12 @@ struct stack_info {
>>  	enum stack_type type;
>>  };
>>  
>> +enum unwind_rc {
>> +	UNWIND_CONTINUE,		/* No errors encountered */
>> +	UNWIND_ABORT,			/* Fatal errors encountered */
>> +	UNWIND_FINISH,			/* End of stack reached successfully */
>> +};
> 
> Generally, there are a bunch of properties we might need to check for an
> unwind step relating to reliabiltiy (e.g. as you add
> UNWIND_CONTINUE_WITH_RISK in the next patch), and I'd prefer that we
> check those properties on the struct stackframe, and simplify
> unwind_frame() to return a bool.
> 
> Something akin to the x86 unwinders, where the main loop looks like:
> 
> for (unwind_start(&state, ...);
>      !unwind_done(&state) && !unwind_error(&state);
>      unwind_next_frame(&state) {
> 	...
> }
> 
> That way we don't have to grow the enum to handle every variation that
> we can think of, and it's simple enough for users to check the
> properties with the helpers.
> 

I can do that.

>> +
>>  /*
>>   * A snapshot of a frame record or fp/lr register values, along with some
>>   * accounting information necessary for robust unwinding.
>> @@ -61,7 +67,8 @@ struct stackframe {
>>  #endif
>>  };
>>  
>> -extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
>> +extern enum unwind_rc unwind_frame(struct task_struct *tsk,
>> +				   struct stackframe *frame);
>>  extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
>>  			    bool (*fn)(void *, unsigned long), void *data);
>>  extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>> @@ -148,8 +155,8 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
>>  	return false;
>>  }
>>  
>> -static inline void start_backtrace(struct stackframe *frame,
>> -				   unsigned long fp, unsigned long pc)
>> +static inline enum unwind_rc start_backtrace(struct stackframe *frame,
>> +					     unsigned long fp, unsigned long pc)
>>  {
>>  	frame->fp = fp;
>>  	frame->pc = pc;
>> @@ -169,6 +176,7 @@ static inline void start_backtrace(struct stackframe *frame,
>>  	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
>>  	frame->prev_fp = 0;
>>  	frame->prev_type = STACK_TYPE_UNKNOWN;
>> +	return UNWIND_CONTINUE;
>>  }
>>  
>>  #endif	/* __ASM_STACKTRACE_H */
>> diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
>> index 88ff471b0bce..f459208149ae 100644
>> --- a/arch/arm64/kernel/perf_callchain.c
>> +++ b/arch/arm64/kernel/perf_callchain.c
>> @@ -148,13 +148,16 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
>>  			   struct pt_regs *regs)
>>  {
>>  	struct stackframe frame;
>> +	enum unwind_rc rc;
>>  
>>  	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
>>  		/* We don't support guest os callchain now */
>>  		return;
>>  	}
>>  
>> -	start_backtrace(&frame, regs->regs[29], regs->pc);
>> +	rc = start_backtrace(&frame, regs->regs[29], regs->pc);
>> +	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
>> +		return;
>>  	walk_stackframe(current, &frame, callchain_trace, entry);
> 
> As a first step, could we convert this over to arch_stack_walk()?
> 

OK.

>>  }
>>  
>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> index 6e60aa3b5ea9..e9c763b44fd4 100644
>> --- a/arch/arm64/kernel/process.c
>> +++ b/arch/arm64/kernel/process.c
>> @@ -573,6 +573,7 @@ unsigned long get_wchan(struct task_struct *p)
>>  	struct stackframe frame;
>>  	unsigned long stack_page, ret = 0;
>>  	int count = 0;
>> +	enum unwind_rc rc;
>>  	if (!p || p == current || p->state == TASK_RUNNING)
>>  		return 0;
>>  
>> @@ -580,10 +581,13 @@ unsigned long get_wchan(struct task_struct *p)
>>  	if (!stack_page)
>>  		return 0;
>>  
>> -	start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
>> +	rc = start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
>> +	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
>> +		return 0;
>>  
>>  	do {
>> -		if (unwind_frame(p, &frame))
>> +		rc = unwind_frame(p, &frame);
>> +		if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
>>  			goto out;
>>  		if (!in_sched_functions(frame.pc)) {
>>  			ret = frame.pc;
> 
> Likewise, can we convert this to use arch_stack_walk()?
> 

OK.

>> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
>> index a6d18755652f..1224e043e98f 100644
>> --- a/arch/arm64/kernel/return_address.c
>> +++ b/arch/arm64/kernel/return_address.c
>> @@ -36,13 +36,17 @@ void *return_address(unsigned int level)
>>  {
>>  	struct return_address_data data;
>>  	struct stackframe frame;
>> +	enum unwind_rc rc;
>>  
>>  	data.level = level + 2;
>>  	data.addr = NULL;
>>  
>> -	start_backtrace(&frame,
>> -			(unsigned long)__builtin_frame_address(0),
>> -			(unsigned long)return_address);
>> +	rc = start_backtrace(&frame,
>> +			     (unsigned long)__builtin_frame_address(0),
>> +			     (unsigned long)return_address);
>> +	if (rc == UNWIND_FINISH || rc == UNWIND_ABORT)
>> +		return NULL;
>> +
>>  	walk_stackframe(current, &frame, save_return_addr, &data);
> 
> Likewise, can we convert this to use arch_stack_walk()?
> 

OK.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list
  2021-07-28 17:25     ` Mark Rutland
@ 2021-07-29 14:06       ` Madhavan T. Venkataraman
  2021-07-29 14:52         ` Mark Brown
  2021-07-29 15:48         ` Mark Rutland
  0 siblings, 2 replies; 28+ messages in thread
From: Madhavan T. Venkataraman @ 2021-07-29 14:06 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, ardb, nobuta.keiya, sjitindarsingh,
	catalin.marinas, will, jmorris, pasha.tatashin, jthierry,
	linux-arm-kernel, live-patching, linux-kernel

Responses inline...

On 7/28/21 12:25 PM, Mark Rutland wrote:
> On Wed, Jun 30, 2021 at 05:33:56PM -0500, madvenka@linux.microsoft.com wrote:
>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>> ... <snip> ...
>> +static struct code_range	*sym_code_functions;
>> +static int			num_sym_code_functions;
>> +
>> +int __init init_sym_code_functions(void)
>> +{
>> +	size_t size;
>> +
>> +	size = (unsigned long)__sym_code_functions_end -
>> +	       (unsigned long)__sym_code_functions_start;
>> +
>> +	sym_code_functions = kmalloc(size, GFP_KERNEL);
>> +	if (!sym_code_functions)
>> +		return -ENOMEM;
>> +
>> +	memcpy(sym_code_functions, __sym_code_functions_start, size);
>> +	/* Update num_sym_code_functions after copying sym_code_functions. */
>> +	smp_mb();
>> +	num_sym_code_functions = size / sizeof(struct code_range);
>> +
>> +	return 0;
>> +}
>> +early_initcall(init_sym_code_functions);
> 
> What's the point of copying this, given we don't even sort it?
> 
> If we need to keep it around, it would be nicer to leave it where the
> linker put it, but make it rodata or ro_after_init.
> 

I was planning to sort it for performance. I have a comment to that effect.
But I can remove the copy and retain the info in linker data.

>> +
>>  /*
>>   * Check the stack frame for conditions that make unwinding unreliable.
>>   */
>>  enum unwind_rc unwind_check_frame(struct stackframe *frame)
>>  {
>> +	const struct code_range *range;
>> +	unsigned long pc;
>> +	int i;
>> +
>>  	/*
>>  	 * If the PC is not a known kernel text address, then we cannot
>>  	 * be sure that a subsequent unwind will be reliable, as we
>> @@ -30,6 +62,86 @@ enum unwind_rc unwind_check_frame(struct stackframe *frame)
>>  	 */
>>  	if (!__kernel_text_address(frame->pc))
>>  		return UNWIND_CONTINUE_WITH_RISK;
> 
> As per patch 1, I'd prefer we had something like an
> unwind_is_unreliable() helper, which can return a boolean in this case.
> 

I will look into this.

>> +
>> +	/*
>> +	 * If the final frame has been reached, there is no more unwinding
>> +	 * to do. There is no need to check if the return PC is considered
>> +	 * unreliable by the unwinder.
>> +	 */
>> +	if (!frame->fp)
>> +		return UNWIND_CONTINUE;
> 
> As mentioned on patch 1, I'd rather the main unwind loop checked for the
> final frame specifically before trying to unwind. With that in mind, we
> should never try to unwind to a NULL fp.
> 

OK.

>> +
>> +	/*
>> +	 * Check the return PC against sym_code_functions[]. If there is a
>> +	 * match, then the consider the stack frame unreliable. These functions
>> +	 * contain low-level code where the frame pointer and/or the return
>> +	 * address register cannot be relied upon. This addresses the following
>> +	 * situations:
>> +	 *
>> +	 *  - Exception handlers and entry assembly
>> +	 *  - Trampoline assembly (e.g., ftrace, kprobes)
>> +	 *  - Hypervisor-related assembly
>> +	 *  - Hibernation-related assembly
>> +	 *  - CPU start-stop, suspend-resume assembly
>> +	 *  - Kernel relocation assembly
>> +	 *
>> +	 * Some special cases covered by sym_code_functions[] deserve a mention
>> +	 * here:
>> +	 *
>> +	 *  - All EL1 interrupt and exception stack traces will be considered
>> +	 *    unreliable. This is the correct behavior as interrupts and
>> +	 *    exceptions can happen on any instruction including ones in the
>> +	 *    frame pointer prolog and epilog. Unless stack metadata is
>> +	 *    available so the unwinder can unwind through these special
>> +	 *    cases, such stack traces will be considered unreliable.
> 
> As mentioned previously, we *can* reliably unwind precisely one step
> across an exception boundary, as we can be certain of the PC value at
> the time the exception was taken, but past this we can't be certain
> whether the LR is legitimate.
> 
> I'd like that we capture that precisely in the unwinder, and I'm
> currently reworking the entry assembly to make that possible.
> 
>> +	 *
>> +	 *  - A task can get preempted at the end of an interrupt. Stack
>> +	 *    traces of preempted tasks will show the interrupt frame in the
>> +	 *    stack trace and will be considered unreliable.
>> +	 *
>> +	 *  - Breakpoints are exceptions. So, all stack traces in the break
>> +	 *    point handler (including probes) will be considered unreliable.
>> +	 *
>> +	 *  - All of the ftrace entry trampolines are considered unreliable.
>> +	 *    So, all stack traces taken from tracer functions will be
>> +	 *    considered unreliable.
>> +	 *
>> +	 *  - The Function Graph Tracer return trampoline (return_to_handler)
>> +	 *    and the Kretprobe return trampoline (kretprobe_trampoline) are
>> +	 *    also considered unreliable.
> 
> We should be able to unwind these reliably if we specifically identify
> them. I think we need a two-step check here; we should assume that
> SYM_CODE() is unreliable by default, but in specific cases we should
> unwind that reliably.
> 
>> +	 * Some of the special cases above can be unwound through using
>> +	 * special logic in unwind_frame().
>> +	 *
>> +	 *  - return_to_handler() is handled by the unwinder by attempting
>> +	 *    to retrieve the original return address from the per-task
>> +	 *    return address stack.
>> +	 *
>> +	 *  - kretprobe_trampoline() can be handled in a similar fashion by
>> +	 *    attempting to retrieve the original return address from the
>> +	 *    per-task kretprobe instance list.
> 
> We don't do this today,
> 
>> +	 *
>> +	 *  - I reckon optprobes can be handled in a similar fashion in the
>> +	 *    future?
>> +	 *
>> +	 *  - Stack traces taken from the FTrace tracer functions can be
>> +	 *    handled as well. ftrace_call is an inner label defined in the
>> +	 *    Ftrace entry trampoline. This is the location where the call
>> +	 *    to a tracer function is patched. So, if the return PC equals
>> +	 *    ftrace_call+4, it is reliable. At that point, proper stack
>> +	 *    frames have already been set up for the traced function and
>> +	 *    its caller.
>> +	 *
>> +	 * NOTE:
>> +	 *   If sym_code_functions[] were sorted, a binary search could be
>> +	 *   done to make this more performant.
>> +	 */
> 
> Since some of the above is speculative (e.g. the bit about optprobes),
> and as code will change over time, I think we should have a much terser
> comment, e.g.
> 
> 	/*
> 	 * As SYM_CODE functions don't follow the usual calling
> 	 * conventions, we assume by default that any SYM_CODE function
> 	 * cannot be unwound reliably.
> 	 *
> 	 * Note that this includes exception entry/return sequences and
> 	 * trampoline for ftrace and kprobes.
> 	 */
> 
> ... and then if/when we try to unwind a specific SYM_CODE function
> reliably, we add the comment for that specifically.
> 

Just to confirm, are you suggesting that I remove the entire large comment
detailing the various cases and replace the whole thing with the terse comment?
I did the large comment because of Mark Brown's input that we must be verbose
about all the cases so that it is clear in the future what the different
cases are and how we handle them in this code. As the code evolves, the comments
would evolve.

I can replace the comment if you want. Please confirm.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list
  2021-07-29 14:06       ` Madhavan T. Venkataraman
@ 2021-07-29 14:52         ` Mark Brown
  2021-07-29 17:07           ` Madhavan T. Venkataraman
  2021-07-29 15:48         ` Mark Rutland
  1 sibling, 1 reply; 28+ messages in thread
From: Mark Brown @ 2021-07-29 14:52 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Mark Rutland, jpoimboe, ardb, nobuta.keiya, sjitindarsingh,
	catalin.marinas, will, jmorris, pasha.tatashin, jthierry,
	linux-arm-kernel, live-patching, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1368 bytes --]

On Thu, Jul 29, 2021 at 09:06:26AM -0500, Madhavan T. Venkataraman wrote:
> On 7/28/21 12:25 PM, Mark Rutland wrote:
> > On Wed, Jun 30, 2021 at 05:33:56PM -0500, madvenka@linux.microsoft.com wrote:

> > Since some of the above is speculative (e.g. the bit about optprobes),
> > and as code will change over time, I think we should have a much terser
> > comment, e.g.

> > 	/*
> > 	 * As SYM_CODE functions don't follow the usual calling
> > 	 * conventions, we assume by default that any SYM_CODE function
> > 	 * cannot be unwound reliably.
> > 	 *
> > 	 * Note that this includes exception entry/return sequences and
> > 	 * trampoline for ftrace and kprobes.
> > 	 */

> Just to confirm, are you suggesting that I remove the entire large comment
> detailing the various cases and replace the whole thing with the terse comment?
> I did the large comment because of Mark Brown's input that we must be verbose
> about all the cases so that it is clear in the future what the different
> cases are and how we handle them in this code. As the code evolves, the comments
> would evolve.

I do agree with Mark that this has probably gone from one extreme to the
other and could be cut back a lot - originally it didn't reference there
being complicated cases like the trampoline at all IIRC so you needed
external knowledge to figure out that those cases were handled.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list
  2021-07-29 14:06       ` Madhavan T. Venkataraman
  2021-07-29 14:52         ` Mark Brown
@ 2021-07-29 15:48         ` Mark Rutland
  2021-07-29 16:27           ` Mark Brown
  2021-07-29 17:09           ` Madhavan T. Venkataraman
  1 sibling, 2 replies; 28+ messages in thread
From: Mark Rutland @ 2021-07-29 15:48 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: broonie, jpoimboe, ardb, nobuta.keiya, sjitindarsingh,
	catalin.marinas, will, jmorris, pasha.tatashin, jthierry,
	linux-arm-kernel, live-patching, linux-kernel

On Thu, Jul 29, 2021 at 09:06:26AM -0500, Madhavan T. Venkataraman wrote:
> Responses inline...
> 
> On 7/28/21 12:25 PM, Mark Rutland wrote:
> > On Wed, Jun 30, 2021 at 05:33:56PM -0500, madvenka@linux.microsoft.com wrote:
> >> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> >> ... <snip> ...
> >> +static struct code_range	*sym_code_functions;
> >> +static int			num_sym_code_functions;
> >> +
> >> +int __init init_sym_code_functions(void)
> >> +{
> >> +	size_t size;
> >> +
> >> +	size = (unsigned long)__sym_code_functions_end -
> >> +	       (unsigned long)__sym_code_functions_start;
> >> +
> >> +	sym_code_functions = kmalloc(size, GFP_KERNEL);
> >> +	if (!sym_code_functions)
> >> +		return -ENOMEM;
> >> +
> >> +	memcpy(sym_code_functions, __sym_code_functions_start, size);
> >> +	/* Update num_sym_code_functions after copying sym_code_functions. */
> >> +	smp_mb();
> >> +	num_sym_code_functions = size / sizeof(struct code_range);
> >> +
> >> +	return 0;
> >> +}
> >> +early_initcall(init_sym_code_functions);
> > 
> > What's the point of copying this, given we don't even sort it?
> > 
> > If we need to keep it around, it would be nicer to leave it where the
> > linker put it, but make it rodata or ro_after_init.
> > 
> 
> I was planning to sort it for performance. I have a comment to that effect.
> But I can remove the copy and retain the info in linker data.

I think for now it's better to place it in .rodata. If we need to sort
this, we can rework that later, preferably sorting at compile time as
with extable entries.

That way this is *always* in a usable state, and there's a much lower
risk of this being corrupted by a stray write.

> >> +	/*
> >> +	 * Check the return PC against sym_code_functions[]. If there is a
> >> +	 * match, then the consider the stack frame unreliable. These functions
> >> +	 * contain low-level code where the frame pointer and/or the return
> >> +	 * address register cannot be relied upon. This addresses the following
> >> +	 * situations:
> >> +	 *
> >> +	 *  - Exception handlers and entry assembly
> >> +	 *  - Trampoline assembly (e.g., ftrace, kprobes)
> >> +	 *  - Hypervisor-related assembly
> >> +	 *  - Hibernation-related assembly
> >> +	 *  - CPU start-stop, suspend-resume assembly
> >> +	 *  - Kernel relocation assembly
> >> +	 *
> >> +	 * Some special cases covered by sym_code_functions[] deserve a mention
> >> +	 * here:
> >> +	 *
> >> +	 *  - All EL1 interrupt and exception stack traces will be considered
> >> +	 *    unreliable. This is the correct behavior as interrupts and
> >> +	 *    exceptions can happen on any instruction including ones in the
> >> +	 *    frame pointer prolog and epilog. Unless stack metadata is
> >> +	 *    available so the unwinder can unwind through these special
> >> +	 *    cases, such stack traces will be considered unreliable.
> > 
> > As mentioned previously, we *can* reliably unwind precisely one step
> > across an exception boundary, as we can be certain of the PC value at
> > the time the exception was taken, but past this we can't be certain
> > whether the LR is legitimate.
> > 
> > I'd like that we capture that precisely in the unwinder, and I'm
> > currently reworking the entry assembly to make that possible.
> > 
> >> +	 *
> >> +	 *  - A task can get preempted at the end of an interrupt. Stack
> >> +	 *    traces of preempted tasks will show the interrupt frame in the
> >> +	 *    stack trace and will be considered unreliable.
> >> +	 *
> >> +	 *  - Breakpoints are exceptions. So, all stack traces in the break
> >> +	 *    point handler (including probes) will be considered unreliable.
> >> +	 *
> >> +	 *  - All of the ftrace entry trampolines are considered unreliable.
> >> +	 *    So, all stack traces taken from tracer functions will be
> >> +	 *    considered unreliable.
> >> +	 *
> >> +	 *  - The Function Graph Tracer return trampoline (return_to_handler)
> >> +	 *    and the Kretprobe return trampoline (kretprobe_trampoline) are
> >> +	 *    also considered unreliable.
> > 
> > We should be able to unwind these reliably if we specifically identify
> > them. I think we need a two-step check here; we should assume that
> > SYM_CODE() is unreliable by default, but in specific cases we should
> > unwind that reliably.
> > 
> >> +	 * Some of the special cases above can be unwound through using
> >> +	 * special logic in unwind_frame().
> >> +	 *
> >> +	 *  - return_to_handler() is handled by the unwinder by attempting
> >> +	 *    to retrieve the original return address from the per-task
> >> +	 *    return address stack.
> >> +	 *
> >> +	 *  - kretprobe_trampoline() can be handled in a similar fashion by
> >> +	 *    attempting to retrieve the original return address from the
> >> +	 *    per-task kretprobe instance list.
> > 
> > We don't do this today,
> > 
> >> +	 *
> >> +	 *  - I reckon optprobes can be handled in a similar fashion in the
> >> +	 *    future?
> >> +	 *
> >> +	 *  - Stack traces taken from the FTrace tracer functions can be
> >> +	 *    handled as well. ftrace_call is an inner label defined in the
> >> +	 *    Ftrace entry trampoline. This is the location where the call
> >> +	 *    to a tracer function is patched. So, if the return PC equals
> >> +	 *    ftrace_call+4, it is reliable. At that point, proper stack
> >> +	 *    frames have already been set up for the traced function and
> >> +	 *    its caller.
> >> +	 *
> >> +	 * NOTE:
> >> +	 *   If sym_code_functions[] were sorted, a binary search could be
> >> +	 *   done to make this more performant.
> >> +	 */
> > 
> > Since some of the above is speculative (e.g. the bit about optprobes),
> > and as code will change over time, I think we should have a much terser
> > comment, e.g.
> > 
> > 	/*
> > 	 * As SYM_CODE functions don't follow the usual calling
> > 	 * conventions, we assume by default that any SYM_CODE function
> > 	 * cannot be unwound reliably.
> > 	 *
> > 	 * Note that this includes exception entry/return sequences and
> > 	 * trampoline for ftrace and kprobes.
> > 	 */
> > 
> > ... and then if/when we try to unwind a specific SYM_CODE function
> > reliably, we add the comment for that specifically.
> > 
> 
> Just to confirm, are you suggesting that I remove the entire large comment
> detailing the various cases and replace the whole thing with the terse comment?

Yes.

For clarity, let's take your bullet-point list above as a list of
examples, and make that:

	/*
	 * As SYM_CODE functions don't follow the usual calling
	 * conventions, we assume by default that any SYM_CODE function
	 * cannot be unwound reliably.
	 *
	 * Note that this includes:
	 *
	 * - Exception handlers and entry assembly
	 * - Trampoline assembly (e.g., ftrace, kprobes)
	 * - Hypervisor-related assembly
	 * - Hibernation-related assembly
	 * - CPU start-stop, suspend-resume assembly
	 * - Kernel relocation assembly
	 */

> I did the large comment because of Mark Brown's input that we must be
> verbose about all the cases so that it is clear in the future what the
> different cases are and how we handle them in this code. As the code
> evolves, the comments would evolve.

The bulk of the comment just enumerates cases and says we treat them as
unreliable, which I think is already clear from the terser comment with
the list. The cases which mention special treatment (e.g. for unwinding
through return_to_handler) aren't actually handled here (and the
kretprobes case isn't handled at all today), so this isn't the right
place for those -- they'll inevitably drift from the implementation.

> I can replace the comment if you want. Please confirm.

Yes please. If you can use the wording I've suggested immediately above
(with your list folded in), that would be great.

Thanks,
Mark.

> 
> Thanks.
> 
> Madhavan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list
  2021-07-29 15:48         ` Mark Rutland
@ 2021-07-29 16:27           ` Mark Brown
  2021-07-29 17:09           ` Madhavan T. Venkataraman
  1 sibling, 0 replies; 28+ messages in thread
From: Mark Brown @ 2021-07-29 16:27 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Madhavan T. Venkataraman, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 657 bytes --]

On Thu, Jul 29, 2021 at 04:48:04PM +0100, Mark Rutland wrote:

> For clarity, let's take your bullet-point list above as a list of
> examples, and make that:

> 	/*
> 	 * As SYM_CODE functions don't follow the usual calling
> 	 * conventions, we assume by default that any SYM_CODE function
> 	 * cannot be unwound reliably.
> 	 *
> 	 * Note that this includes:
> 	 *
> 	 * - Exception handlers and entry assembly
> 	 * - Trampoline assembly (e.g., ftrace, kprobes)
> 	 * - Hypervisor-related assembly
> 	 * - Hibernation-related assembly
> 	 * - CPU start-stop, suspend-resume assembly
> 	 * - Kernel relocation assembly
> 	 */

This looks good to me too.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list
  2021-07-29 14:52         ` Mark Brown
@ 2021-07-29 17:07           ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 28+ messages in thread
From: Madhavan T. Venkataraman @ 2021-07-29 17:07 UTC (permalink / raw)
  To: Mark Brown
  Cc: Mark Rutland, jpoimboe, ardb, nobuta.keiya, sjitindarsingh,
	catalin.marinas, will, jmorris, pasha.tatashin, jthierry,
	linux-arm-kernel, live-patching, linux-kernel



On 7/29/21 9:52 AM, Mark Brown wrote:
> On Thu, Jul 29, 2021 at 09:06:26AM -0500, Madhavan T. Venkataraman wrote:
>> On 7/28/21 12:25 PM, Mark Rutland wrote:
>>> On Wed, Jun 30, 2021 at 05:33:56PM -0500, madvenka@linux.microsoft.com wrote:
> 
>>> Since some of the above is speculative (e.g. the bit about optprobes),
>>> and as code will change over time, I think we should have a much terser
>>> comment, e.g.
> 
>>> 	/*
>>> 	 * As SYM_CODE functions don't follow the usual calling
>>> 	 * conventions, we assume by default that any SYM_CODE function
>>> 	 * cannot be unwound reliably.
>>> 	 *
>>> 	 * Note that this includes exception entry/return sequences and
>>> 	 * trampoline for ftrace and kprobes.
>>> 	 */
> 
>> Just to confirm, are you suggesting that I remove the entire large comment
>> detailing the various cases and replace the whole thing with the terse comment?
>> I did the large comment because of Mark Brown's input that we must be verbose
>> about all the cases so that it is clear in the future what the different
>> cases are and how we handle them in this code. As the code evolves, the comments
>> would evolve.
> 
> I do agree with Mark that this has probably gone from one extreme to the
> other and could be cut back a lot - originally it didn't reference there
> being complicated cases like the trampoline at all IIRC so you needed
> external knowledge to figure out that those cases were handled.
> 

OK.

Madhavan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list
  2021-07-29 15:48         ` Mark Rutland
  2021-07-29 16:27           ` Mark Brown
@ 2021-07-29 17:09           ` Madhavan T. Venkataraman
  1 sibling, 0 replies; 28+ messages in thread
From: Madhavan T. Venkataraman @ 2021-07-29 17:09 UTC (permalink / raw)
  To: Mark Rutland
  Cc: broonie, jpoimboe, ardb, nobuta.keiya, sjitindarsingh,
	catalin.marinas, will, jmorris, pasha.tatashin, jthierry,
	linux-arm-kernel, live-patching, linux-kernel



On 7/29/21 10:48 AM, Mark Rutland wrote:
> On Thu, Jul 29, 2021 at 09:06:26AM -0500, Madhavan T. Venkataraman wrote:
>> Responses inline...
>>
>> On 7/28/21 12:25 PM, Mark Rutland wrote:
>>> On Wed, Jun 30, 2021 at 05:33:56PM -0500, madvenka@linux.microsoft.com wrote:
>>>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>>> ... <snip> ...
>>>> +static struct code_range	*sym_code_functions;
>>>> +static int			num_sym_code_functions;
>>>> +
>>>> +int __init init_sym_code_functions(void)
>>>> +{
>>>> +	size_t size;
>>>> +
>>>> +	size = (unsigned long)__sym_code_functions_end -
>>>> +	       (unsigned long)__sym_code_functions_start;
>>>> +
>>>> +	sym_code_functions = kmalloc(size, GFP_KERNEL);
>>>> +	if (!sym_code_functions)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	memcpy(sym_code_functions, __sym_code_functions_start, size);
>>>> +	/* Update num_sym_code_functions after copying sym_code_functions. */
>>>> +	smp_mb();
>>>> +	num_sym_code_functions = size / sizeof(struct code_range);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +early_initcall(init_sym_code_functions);
>>>
>>> What's the point of copying this, given we don't even sort it?
>>>
>>> If we need to keep it around, it would be nicer to leave it where the
>>> linker put it, but make it rodata or ro_after_init.
>>>
>>
>> I was planning to sort it for performance. I have a comment to that effect.
>> But I can remove the copy and retain the info in linker data.
> 
> I think for now it's better to place it in .rodata. If we need to sort
> this, we can rework that later, preferably sorting at compile time as
> with extable entries.
> 
> That way this is *always* in a usable state, and there's a much lower
> risk of this being corrupted by a stray write.
> 

OK.

>>>> +	/*
>>>> +	 * Check the return PC against sym_code_functions[]. If there is a
>>>> +	 * match, then the consider the stack frame unreliable. These functions
>>>> +	 * contain low-level code where the frame pointer and/or the return
>>>> +	 * address register cannot be relied upon. This addresses the following
>>>> +	 * situations:
>>>> +	 *
>>>> +	 *  - Exception handlers and entry assembly
>>>> +	 *  - Trampoline assembly (e.g., ftrace, kprobes)
>>>> +	 *  - Hypervisor-related assembly
>>>> +	 *  - Hibernation-related assembly
>>>> +	 *  - CPU start-stop, suspend-resume assembly
>>>> +	 *  - Kernel relocation assembly
>>>> +	 *
>>>> +	 * Some special cases covered by sym_code_functions[] deserve a mention
>>>> +	 * here:
>>>> +	 *
>>>> +	 *  - All EL1 interrupt and exception stack traces will be considered
>>>> +	 *    unreliable. This is the correct behavior as interrupts and
>>>> +	 *    exceptions can happen on any instruction including ones in the
>>>> +	 *    frame pointer prolog and epilog. Unless stack metadata is
>>>> +	 *    available so the unwinder can unwind through these special
>>>> +	 *    cases, such stack traces will be considered unreliable.
>>>
>>> As mentioned previously, we *can* reliably unwind precisely one step
>>> across an exception boundary, as we can be certain of the PC value at
>>> the time the exception was taken, but past this we can't be certain
>>> whether the LR is legitimate.
>>>
>>> I'd like that we capture that precisely in the unwinder, and I'm
>>> currently reworking the entry assembly to make that possible.
>>>
>>>> +	 *
>>>> +	 *  - A task can get preempted at the end of an interrupt. Stack
>>>> +	 *    traces of preempted tasks will show the interrupt frame in the
>>>> +	 *    stack trace and will be considered unreliable.
>>>> +	 *
>>>> +	 *  - Breakpoints are exceptions. So, all stack traces in the break
>>>> +	 *    point handler (including probes) will be considered unreliable.
>>>> +	 *
>>>> +	 *  - All of the ftrace entry trampolines are considered unreliable.
>>>> +	 *    So, all stack traces taken from tracer functions will be
>>>> +	 *    considered unreliable.
>>>> +	 *
>>>> +	 *  - The Function Graph Tracer return trampoline (return_to_handler)
>>>> +	 *    and the Kretprobe return trampoline (kretprobe_trampoline) are
>>>> +	 *    also considered unreliable.
>>>
>>> We should be able to unwind these reliably if we specifically identify
>>> them. I think we need a two-step check here; we should assume that
>>> SYM_CODE() is unreliable by default, but in specific cases we should
>>> unwind that reliably.
>>>
>>>> +	 * Some of the special cases above can be unwound through using
>>>> +	 * special logic in unwind_frame().
>>>> +	 *
>>>> +	 *  - return_to_handler() is handled by the unwinder by attempting
>>>> +	 *    to retrieve the original return address from the per-task
>>>> +	 *    return address stack.
>>>> +	 *
>>>> +	 *  - kretprobe_trampoline() can be handled in a similar fashion by
>>>> +	 *    attempting to retrieve the original return address from the
>>>> +	 *    per-task kretprobe instance list.
>>>
>>> We don't do this today,
>>>
>>>> +	 *
>>>> +	 *  - I reckon optprobes can be handled in a similar fashion in the
>>>> +	 *    future?
>>>> +	 *
>>>> +	 *  - Stack traces taken from the FTrace tracer functions can be
>>>> +	 *    handled as well. ftrace_call is an inner label defined in the
>>>> +	 *    Ftrace entry trampoline. This is the location where the call
>>>> +	 *    to a tracer function is patched. So, if the return PC equals
>>>> +	 *    ftrace_call+4, it is reliable. At that point, proper stack
>>>> +	 *    frames have already been set up for the traced function and
>>>> +	 *    its caller.
>>>> +	 *
>>>> +	 * NOTE:
>>>> +	 *   If sym_code_functions[] were sorted, a binary search could be
>>>> +	 *   done to make this more performant.
>>>> +	 */
>>>
>>> Since some of the above is speculative (e.g. the bit about optprobes),
>>> and as code will change over time, I think we should have a much terser
>>> comment, e.g.
>>>
>>> 	/*
>>> 	 * As SYM_CODE functions don't follow the usual calling
>>> 	 * conventions, we assume by default that any SYM_CODE function
>>> 	 * cannot be unwound reliably.
>>> 	 *
>>> 	 * Note that this includes exception entry/return sequences and
>>> 	 * trampoline for ftrace and kprobes.
>>> 	 */
>>>
>>> ... and then if/when we try to unwind a specific SYM_CODE function
>>> reliably, we add the comment for that specifically.
>>>
>>
>> Just to confirm, are you suggesting that I remove the entire large comment
>> detailing the various cases and replace the whole thing with the terse comment?
> 
> Yes.
> 
> For clarity, let's take your bullet-point list above as a list of
> examples, and make that:
> 
> 	/*
> 	 * As SYM_CODE functions don't follow the usual calling
> 	 * conventions, we assume by default that any SYM_CODE function
> 	 * cannot be unwound reliably.
> 	 *
> 	 * Note that this includes:
> 	 *
> 	 * - Exception handlers and entry assembly
> 	 * - Trampoline assembly (e.g., ftrace, kprobes)
> 	 * - Hypervisor-related assembly
> 	 * - Hibernation-related assembly
> 	 * - CPU start-stop, suspend-resume assembly
> 	 * - Kernel relocation assembly
> 	 */
> 

OK.

>> I did the large comment because of Mark Brown's input that we must be
>> verbose about all the cases so that it is clear in the future what the
>> different cases are and how we handle them in this code. As the code
>> evolves, the comments would evolve.
> 
> The bulk of the comment just enumerates cases and says we treat them as
> unreliable, which I think is already clear from the terser comment with
> the list. The cases which mention special treatment (e.g. for unwinding
> through return_to_handler) aren't actually handled here (and the
> kretprobes case isn't handled at all today), so this isn't the right
> place for those -- they'll inevitably drift from the implementation.
> 
>> I can replace the comment if you want. Please confirm.
> 
> Yes please. If you can use the wording I've suggested immediately above
> (with your list folded in), that would be great.
> 

OK. I will use your suggested text.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement stack trace reliability checks
       [not found] <3f2aab69a35c243c5e97f47c4ad84046355f5b90>
  2021-06-30 22:33 ` [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks madvenka
@ 2021-08-12 13:24 ` madvenka
  2021-08-12 13:24   ` [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk() madvenka
                     ` (4 more replies)
  2021-08-12 18:35 ` madvenka
  2 siblings, 5 replies; 28+ messages in thread
From: madvenka @ 2021-08-12 13:24 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Make all stack walking functions use arch_stack_walk()
======================================================

Currently, there are multiple functions in ARM64 code that walk the
stack using start_backtrace() and unwind_frame(). Convert all of
them to use arch_stack_walk(). This makes maintenance easier.

Reorganize the unwinder code for better consistency and maintenance
===================================================================

Rename unwinder functions to unwind_*() similar to other architectures
for naming consistency.

Annotate all of the unwind_*() functions with notrace so they cannot be
ftraced and NOKPROBE_SYMBOL() so they cannot be kprobed. Ftrace and Kprobe
code can call the unwinder.

Redefine the unwinder loop and make it similar to other architectures.
Define the following:

	unwind_start(&frame, task, fp, pc);
	while (unwind_consume(&frame, consume_entry, cookie))
		unwind_next(&frame);
	return !unwind_failed(&frame);

unwind_start()
	Same as the original start_backtrace().

unwind_consume()
	This new function does two things:

	- Calls consume_entry() to consume the return PC.

	- Implements checks to determine whether the unwind should continue
	  or terminate.

unwind_next()
	Same as the original unwind_frame() except:

	- the stack trace termination check has been moved from here to
	  unwind_consume(). So, unwind_next() assumes that the fp is valid.

	- unwind_frame() used to return an error value. This function only
	  sets internal state and does not return anything. The state is
	  retrieved via a helper. See next.

unwind_failed()
	Return a boolean to indicate whether the stack trace completed
	successfully or failed. arch_stack_walk() ignores the return
	value. But arch_stack_walk_reliable() in the future will look
	at the return value.

Unwind status
	Introduce a new flag called "failed" in struct stackframe. Set this
	flag when an error is encountered. If this flag is set, terminate
	the unwind. Also, let the unwinder return the status to the caller.

Reliability checks
==================

There are some kernel features and conditions that make a stack trace
unreliable. Callers may require the unwinder to detect these cases.
E.g., livepatch.

Introduce a new function called unwind_is_reliable() that will detect
these cases and return a boolean.

Introduce a new argument to unwind() called "need_reliable" so a caller
can tell unwind() that it requires a reliable stack trace. For such a
caller, any unreliability in the stack trace must be treated as a fatal
error and the unwind must be aborted.

Call unwind_is_reliable() from unwind_consume() like this:

	if (frame->need_reliable && !unwind_is_reliable(frame)) {
		frame->failed = true;
		return false;
	}

arch_stack_walk() passes "false" for need_reliable because its callers
don't care about reliability. arch_stack_walk() is used for debug and
test purposes.

Introduce arch_stack_walk_reliable() for ARM64. This works like
arch_stack_walk() except for two things:

	- It passes "true" for need_reliable.

	- It returns -EINVAL if unwind() aborts.

Introduce the first reliability check in unwind_is_reliable() - If
a return PC is not a valid kernel text address, consider the stack
trace unreliable. It could be some generated code.

Other reliability checks will be added in the future. Until all of the
checks are in place, arch_stack_walk_reliable() may not be used by
livepatch. But it may be used by debug and test code.

SYM_CODE check
==============

SYM_CODE functions do not follow normal calling conventions. They cannot
be unwound reliably using the frame pointer. Collect the address ranges
of these functions in a special section called "sym_code_functions".

In unwind_is_reliable(), check the return PC against these ranges. If a
match is found, then consider the stack trace unreliable. This is the
second reliability check introduced by this work.

Last stack frame
----------------

If a SYM_CODE function occurs in the very last frame in the stack trace,
then the stack trace is not considered unreliable. This is because there
is no more unwinding to do. Examples:

	- EL0 exception stack traces end in the top level EL0 exception
	  handlers.

	- All kernel thread stack traces end in ret_from_fork().
---
Changelog:

v7:
	From Mark Rutland:

	- Make the unwinder loop similar to other architectures.

	- Keep details to within the unwinder functions and return a simple
	  boolean to the caller.

	- Convert some of the current code that contains unwinder logic to
	  simply use arch_stack_walk(). I have converted all of them.

	- Do not copy sym_code_functions[]. Just place it in rodata for now.

	- Have the main loop check for termination conditions rather than
	  having unwind_frame() check for them. In other words, let
	  unwind_frame() assume that the fp is valid.

	- Replace the big comment for SYM_CODE functions with a shorter
	  comment.

		/*
		 * As SYM_CODE functions don't follow the usual calling
		 * conventions, we assume by default that any SYM_CODE function
		 * cannot be unwound reliably.
		 *
		 * Note that this includes:
		 *
		 * - Exception handlers and entry assembly
		 * - Trampoline assembly (e.g., ftrace, kprobes)
		 * - Hypervisor-related assembly
		 * - Hibernation-related assembly
		 * - CPU start-stop, suspend-resume assembly
		 * - Kernel relocation assembly
		 */

v6:
	From Mark Rutland:

	- The per-frame reliability concept and flag are acceptable. But more
	  work is needed to make the per-frame checks more accurate and more
	  complete. E.g., some code reorg is being worked on that will help.

	  I have now removed the frame->reliable flag and deleted the whole
	  concept of per-frame status. This is orthogonal to this patch series.
	  Instead, I have improved the unwinder to return proper return codes
	  so a caller can take appropriate action without needing per-frame
	  status.

	- Remove the mention of PLTs and update the comment.

	  I have replaced the comment above the call to __kernel_text_address()
	  with the comment suggested by Mark Rutland.

	Other comments:

	- Other comments on the per-frame stuff are not relevant because
	  that approach is not there anymore.

v5:
	From Keiya Nobuta:
	
	- The term blacklist(ed) is not to be used anymore. I have changed it
	  to unreliable. So, the function unwinder_blacklisted() has been
	  changed to unwinder_is_unreliable().

	From Mark Brown:

	- Add a comment for the "reliable" flag in struct stackframe. The
	  reliability attribute is not complete until all the checks are
	  in place. Added a comment above struct stackframe.

	- Include some of the comments in the cover letter in the actual
	  code so that we can compare it with the reliable stack trace
	  requirements document for completeness. I have added a comment:

	  	- above unwinder_is_unreliable() that lists the requirements
		  that are addressed by the function.

		- above the __kernel_text_address() call about all the cases
		  the call covers.

v4:
	From Mark Brown:

	- I was checking the return PC with __kernel_text_address() before
	  the Function Graph trace handling. Mark Brown felt that all the
	  reliability checks should be performed on the original return PC
	  once that is obtained. So, I have moved all the reliability checks
	  to after the Function Graph Trace handling code in the unwinder.
	  Basically, the unwinder should perform PC translations first (for
	  rhe return trampoline for Function Graph Tracing, Kretprobes, etc).
	  Then, the reliability checks should be applied to the resulting
	  PC.

	- Mark said to improve the naming of the new functions so they don't
	  collide with existing ones. I have used a prefix "unwinder_" for
	  all the new functions.

	From Josh Poimboeuf:

	- In the error scenarios in the unwinder, the reliable flag in the
	  stack frame should be set. Implemented this.

	- Some of the other comments are not relevant to the new code as
	  I have taken a different approach in the new code. That is why
	  I have not made those changes. E.g., Ard wanted me to add the
	  "const" keyword to the global section array. That array does not
	  exist in v4. Similarly, Mark Brown said to use ARRAY_SIZE() for
	  the same array in a for loop.

	Other changes:

	- Add a new definition for SYM_CODE_END() that adds the address
	  range of the function to a special section called
	  "sym_code_functions".

	- Include the new section under initdata in vmlinux.lds.S.

	- Define an early_initcall() to copy the contents of the
	  "sym_code_functions" section to an array by the same name.

	- Define a function unwinder_blacklisted() that compares a return
	  PC against sym_code_sections[]. If there is a match, mark the
	  stack trace unreliable. Call this from unwind_frame().

v3:
	- Implemented a sym_code_ranges[] array to contains sections bounds
	  for text sections that contain SYM_CODE_*() functions. The unwinder
	  checks each return PC against the sections. If it falls in any of
	  the sections, the stack trace is marked unreliable.

	- Moved SYM_CODE functions from .text and .init.text into a new
	  text section called ".code.text". Added this section to
	  vmlinux.lds.S and sym_code_ranges[].

	- Fixed the logic in the unwinder that handles Function Graph
	  Tracer return trampoline.

	- Removed all the previous code that handles:
		- ftrace entry code for traced function
		- special_functions[] array that lists individual functions
		- kretprobe_trampoline() special case

v2
	- Removed the terminating entry { 0, 0 } in special_functions[]
	  and replaced it with the idiom { /* sentinel */ }.

	- Change the ftrace trampoline entry ftrace_graph_call in
	  special_functions[] to ftrace_call + 4 and added explanatory
	  comments.

	- Unnested #ifdefs in special_functions[] for FTRACE.

v1
	- Define a bool field in struct stackframe. This will indicate if
	  a stack trace is reliable.

	- Implement a special_functions[] array that will be populated
	  with special functions in which the stack trace is considered
	  unreliable.
	
	- Using kallsyms_lookup(), get the address ranges for the special
	  functions and record them.

	- Implement an is_reliable_function(pc). This function will check
	  if a given return PC falls in any of the special functions. If
	  it does, the stack trace is unreliable.

	- Implement check_reliability() function that will check if a
	  stack frame is reliable. Call is_reliable_function() from
	  check_reliability().

	- Before a return PC is checked against special_funtions[], it
	  must be validates as a proper kernel text address. Call
	  __kernel_text_address() from check_reliability().

	- Finally, call check_reliability() from unwind_frame() for
	  each stack frame.

	- Add EL1 exception handlers to special_functions[].

		el1_sync();
		el1_irq();
		el1_error();
		el1_sync_invalid();
		el1_irq_invalid();
		el1_fiq_invalid();
		el1_error_invalid();

	- The above functions are currently defined as LOCAL symbols.
	  Make them global so that they can be referenced from the
	  unwinder code.

	- Add FTRACE trampolines to special_functions[]:

		ftrace_graph_call()
		ftrace_graph_caller()
		return_to_handler()

	- Add the kretprobe trampoline to special functions[]:

		kretprobe_trampoline()

Previous versions and discussion
================================

v6: https://lore.kernel.org/linux-arm-kernel/20210630223356.58714-1-madvenka@linux.microsoft.com/
v5: https://lore.kernel.org/linux-arm-kernel/20210526214917.20099-1-madvenka@linux.microsoft.com/
v4: https://lore.kernel.org/linux-arm-kernel/20210516040018.128105-1-madvenka@linux.microsoft.com/
v3: https://lore.kernel.org/linux-arm-kernel/20210503173615.21576-1-madvenka@linux.microsoft.com/
v2: https://lore.kernel.org/linux-arm-kernel/20210405204313.21346-1-madvenka@linux.microsoft.com/
v1: https://lore.kernel.org/linux-arm-kernel/20210330190955.13707-1-madvenka@linux.microsoft.com/
Madhavan T. Venkataraman (4):
  arm64: Make all stack walking functions use arch_stack_walk()
  arm64: Reorganize the unwinder code for better consistency and
    maintenance
  arm64: Introduce stack trace reliability checks in the unwinder
  arm64: Create a list of SYM_CODE functions, check return PC against
    list

 arch/arm64/include/asm/linkage.h    |  12 ++
 arch/arm64/include/asm/sections.h   |   1 +
 arch/arm64/include/asm/stacktrace.h |  16 +-
 arch/arm64/kernel/perf_callchain.c  |   5 +-
 arch/arm64/kernel/process.c         |  39 ++--
 arch/arm64/kernel/return_address.c  |   6 +-
 arch/arm64/kernel/stacktrace.c      | 291 ++++++++++++++++++++--------
 arch/arm64/kernel/time.c            |  22 ++-
 arch/arm64/kernel/vmlinux.lds.S     |  10 +
 9 files changed, 277 insertions(+), 125 deletions(-)


base-commit: 36a21d51725af2ce0700c6ebcb6b9594aac658a6
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk()
  2021-08-12 13:24 ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement " madvenka
@ 2021-08-12 13:24   ` madvenka
  2021-08-12 15:23     ` Mark Brown
  2021-08-12 13:24   ` [RFC PATCH v7 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance madvenka
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: madvenka @ 2021-08-12 13:24 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Currently, there are multiple functions in ARM64 code that walk the
stack using start_backtrace() and unwind_frame(). Convert all of
them to use arch_stack_walk(). This makes maintenance easier.

Here is the list of functions:

	perf_callchain_kernel()
	get_wchan()
	return_address()
	dump_backtrace()
	profile_pc()

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/stacktrace.h |  3 ---
 arch/arm64/kernel/perf_callchain.c  |  5 +---
 arch/arm64/kernel/process.c         | 39 ++++++++++++++++++-----------
 arch/arm64/kernel/return_address.c  |  6 +----
 arch/arm64/kernel/stacktrace.c      | 38 +++-------------------------
 arch/arm64/kernel/time.c            | 22 +++++++++-------
 6 files changed, 43 insertions(+), 70 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 8aebc00c1718..e43dea1c6b41 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -61,9 +61,6 @@ struct stackframe {
 #endif
 };
 
-extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
-			    bool (*fn)(void *, unsigned long), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 			   const char *loglvl);
 
diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
index 4a72c2727309..2f289013c9c9 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -147,15 +147,12 @@ static bool callchain_trace(void *data, unsigned long pc)
 void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
-	struct stackframe frame;
-
 	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
 
-	start_backtrace(&frame, regs->regs[29], regs->pc);
-	walk_stackframe(current, &frame, callchain_trace, entry);
+	arch_stack_walk(callchain_trace, entry, current, regs);
 }
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index c8989b999250..52c12fd26407 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -544,11 +544,28 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	return last;
 }
 
+struct wchan_info {
+	unsigned long	pc;
+	int		count;
+};
+
+static bool get_wchan_cb(void *arg, unsigned long pc)
+{
+	struct wchan_info *wchan_info = arg;
+
+	if (!in_sched_functions(pc)) {
+		wchan_info->pc = pc;
+		return false;
+	}
+	wchan_info->count--;
+	return !!wchan_info->count;
+}
+
 unsigned long get_wchan(struct task_struct *p)
 {
-	struct stackframe frame;
-	unsigned long stack_page, ret = 0;
-	int count = 0;
+	unsigned long stack_page;
+	struct wchan_info wchan_info;
+
 	if (!p || p == current || task_is_running(p))
 		return 0;
 
@@ -556,20 +573,12 @@ unsigned long get_wchan(struct task_struct *p)
 	if (!stack_page)
 		return 0;
 
-	start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
+	wchan_info.pc = 0;
+	wchan_info.count = 16;
+	arch_stack_walk(get_wchan_cb, &wchan_info, p, NULL);
 
-	do {
-		if (unwind_frame(p, &frame))
-			goto out;
-		if (!in_sched_functions(frame.pc)) {
-			ret = frame.pc;
-			goto out;
-		}
-	} while (count++ < 16);
-
-out:
 	put_task_stack(p);
-	return ret;
+	return wchan_info.pc;
 }
 
 unsigned long arch_align_stack(unsigned long sp)
diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
index a6d18755652f..92a0f4d434e4 100644
--- a/arch/arm64/kernel/return_address.c
+++ b/arch/arm64/kernel/return_address.c
@@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
 void *return_address(unsigned int level)
 {
 	struct return_address_data data;
-	struct stackframe frame;
 
 	data.level = level + 2;
 	data.addr = NULL;
 
-	start_backtrace(&frame,
-			(unsigned long)__builtin_frame_address(0),
-			(unsigned long)return_address);
-	walk_stackframe(current, &frame, save_return_addr, &data);
+	arch_stack_walk(save_return_addr, &data, current, NULL);
 
 	if (!data.level)
 		return data.addr;
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 8982a2b78acf..1800310f92be 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -151,23 +151,21 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 }
 NOKPROBE_SYMBOL(walk_stackframe);
 
-static void dump_backtrace_entry(unsigned long where, const char *loglvl)
+static bool dump_backtrace_entry(void *arg, unsigned long where)
 {
+	char *loglvl = arg;
 	printk("%s %pSb\n", loglvl, (void *)where);
+	return true;
 }
 
 void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 		    const char *loglvl)
 {
-	struct stackframe frame;
-	int skip = 0;
-
 	pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
 
 	if (regs) {
 		if (user_mode(regs))
 			return;
-		skip = 1;
 	}
 
 	if (!tsk)
@@ -176,36 +174,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 	if (!try_get_task_stack(tsk))
 		return;
 
-	if (tsk == current) {
-		start_backtrace(&frame,
-				(unsigned long)__builtin_frame_address(0),
-				(unsigned long)dump_backtrace);
-	} else {
-		/*
-		 * task blocked in __switch_to
-		 */
-		start_backtrace(&frame,
-				thread_saved_fp(tsk),
-				thread_saved_pc(tsk));
-	}
-
 	printk("%sCall trace:\n", loglvl);
-	do {
-		/* skip until specified stack frame */
-		if (!skip) {
-			dump_backtrace_entry(frame.pc, loglvl);
-		} else if (frame.fp == regs->regs[29]) {
-			skip = 0;
-			/*
-			 * Mostly, this is the case where this function is
-			 * called in panic/abort. As exception handler's
-			 * stack frame does not contain the corresponding pc
-			 * at which an exception has taken place, use regs->pc
-			 * instead.
-			 */
-			dump_backtrace_entry(regs->pc, loglvl);
-		}
-	} while (!unwind_frame(tsk, &frame));
+	arch_stack_walk(dump_backtrace_entry, (void *)loglvl, tsk, regs);
 
 	put_task_stack(tsk);
 }
diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index eebbc8d7123e..671b3038a772 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -32,22 +32,26 @@
 #include <asm/stacktrace.h>
 #include <asm/paravirt.h>
 
+static bool profile_pc_cb(void *arg, unsigned long pc)
+{
+	unsigned long *prof_pc = arg;
+
+	if (in_lock_functions(pc))
+		return true;
+	*prof_pc = pc;
+	return false;
+}
+
 unsigned long profile_pc(struct pt_regs *regs)
 {
-	struct stackframe frame;
+	unsigned long prof_pc = 0;
 
 	if (!in_lock_functions(regs->pc))
 		return regs->pc;
 
-	start_backtrace(&frame, regs->regs[29], regs->pc);
-
-	do {
-		int ret = unwind_frame(NULL, &frame);
-		if (ret < 0)
-			return 0;
-	} while (in_lock_functions(frame.pc));
+	arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
 
-	return frame.pc;
+	return prof_pc;
 }
 EXPORT_SYMBOL(profile_pc);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v7 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance
  2021-08-12 13:24 ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement " madvenka
  2021-08-12 13:24   ` [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk() madvenka
@ 2021-08-12 13:24   ` madvenka
  2021-08-12 13:24   ` [RFC PATCH v7 3/4] arm64: Introduce stack trace reliability checks in the unwinder madvenka
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: madvenka @ 2021-08-12 13:24 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Renaming of unwinder functions
==============================

Rename unwinder functions to unwind_*() similar to other architectures
for naming consistency. More on this below.

unwind function attributes
==========================

Mark all of the unwind_*() functions with notrace so they cannot be ftraced
and NOKPROBE_SYMBOL() so they cannot be kprobed. Ftrace and Kprobe code
can call the unwinder.

start_backtrace()
=================

start_backtrace() is only called by arch_stack_walk(). Make it static.
Rename start_backtrace() to unwind_start() for naming consistency.

unwind_frame()
==============

Rename this to unwind_next() for naming consistency.

Replace walk_stackframe() with unwind()
=======================================

walk_stackframe() contains the unwinder loop that walks the stack
frames. Currently, start_backtrace() and walk_stackframe() are called
separately. They should be combined in the same function. Also, the
loop in walk_stackframe() should be simplified and should look like
the unwind loops in other architectures such as X86 and S390.

Remove walk_stackframe(). Define a new function called "unwind()" in
its place. Define the following unwinder loop:

	unwind_start(&frame, task, fp, pc);
	while (unwind_consume(&frame, consume_entry, cookie))
		unwind_next(&frame);
	return !unwind_failed(&frame);

unwind_start()
	Same as the original start_backtrace().

unwind_consume()
	This is a new function that calls the callback function to
	consume the PC in a stackframe. Do it this way so that checks
	can be performed before and after the callback to determine
	whether the unwind should continue or terminate.

unwind_next()
	Same as the original unwind_frame() except for two things:

		- the stack trace termination check has been moved from
		  here to unwind_consume(). So, unwind_next() is always
		  called on a valid fp.

		- unwind_frame() used to return an error value. This
		  function does not return anything.

unwind_failed()
	Return a boolean to indicate if the stack trace completed
	successfully or failed. arch_stack_walk() ignores the return
	value. But arch_stack_walk_reliable() in the future will look
	at the return value.

Unwind status
=============

Introduce a new flag called "failed" in struct stackframe. unwind_next()
and unwind_consume() will set this flag when an error is encountered and
unwind_consume() will check this flag. This is in keeping with other
architectures.

The failed flags is accessed via the helper unwind_failed().

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/stacktrace.h |   9 +-
 arch/arm64/kernel/stacktrace.c      | 145 ++++++++++++++++++----------
 2 files changed, 99 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index e43dea1c6b41..407007376e97 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -34,6 +34,8 @@ struct stack_info {
  * A snapshot of a frame record or fp/lr register values, along with some
  * accounting information necessary for robust unwinding.
  *
+ * @task:        The task whose stack is being unwound.
+ *
  * @fp:          The fp value in the frame record (or the real fp)
  * @pc:          The lr value in the frame record (or the real lr)
  *
@@ -49,8 +51,11 @@ struct stack_info {
  *
  * @graph:       When FUNCTION_GRAPH_TRACER is selected, holds the index of a
  *               replacement lr value in the ftrace graph stack.
+ *
+ * @failed:      Unwind failed.
  */
 struct stackframe {
+	struct task_struct *task;
 	unsigned long fp;
 	unsigned long pc;
 	DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
@@ -59,6 +64,7 @@ struct stackframe {
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	int graph;
 #endif
+	bool failed;
 };
 
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
@@ -145,7 +151,4 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
 	return false;
 }
 
-void start_backtrace(struct stackframe *frame, unsigned long fp,
-		     unsigned long pc);
-
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 1800310f92be..ec8f5163c4d0 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -32,10 +32,11 @@
  *	add	sp, sp, #0x10
  */
 
-
-void start_backtrace(struct stackframe *frame, unsigned long fp,
-		     unsigned long pc)
+static void notrace unwind_start(struct stackframe *frame,
+				 struct task_struct *task,
+				 unsigned long fp, unsigned long pc)
 {
+	frame->task = task;
 	frame->fp = fp;
 	frame->pc = pc;
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -45,7 +46,7 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
 	/*
 	 * Prime the first unwind.
 	 *
-	 * In unwind_frame() we'll check that the FP points to a valid stack,
+	 * In unwind_next() we'll check that the FP points to a valid stack,
 	 * which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
 	 * treated as a transition to whichever stack that happens to be. The
 	 * prev_fp value won't be used, but we set it to 0 such that it is
@@ -54,8 +55,11 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
 	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
 	frame->prev_fp = 0;
 	frame->prev_type = STACK_TYPE_UNKNOWN;
+	frame->failed = false;
 }
 
+NOKPROBE_SYMBOL(unwind_start);
+
 /*
  * Unwind from one frame record (A) to the next frame record (B).
  *
@@ -63,26 +67,26 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
  * records (e.g. a cycle), determined based on the location and fp value of A
  * and the location (but not the fp value) of B.
  */
-int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
+static void notrace unwind_next(struct stackframe *frame)
 {
 	unsigned long fp = frame->fp;
 	struct stack_info info;
+	struct task_struct *tsk = frame->task;
 
-	if (!tsk)
-		tsk = current;
-
-	/* Final frame; nothing to unwind */
-	if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
-		return -ENOENT;
-
-	if (fp & 0x7)
-		return -EINVAL;
+	if (fp & 0x7) {
+		frame->failed = true;
+		return;
+	}
 
-	if (!on_accessible_stack(tsk, fp, 16, &info))
-		return -EINVAL;
+	if (!on_accessible_stack(tsk, fp, 16, &info)) {
+		frame->failed = true;
+		return;
+	}
 
-	if (test_bit(info.type, frame->stacks_done))
-		return -EINVAL;
+	if (test_bit(info.type, frame->stacks_done)) {
+		frame->failed = true;
+		return;
+	}
 
 	/*
 	 * As stacks grow downward, any valid record on the same stack must be
@@ -98,15 +102,17 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 	 * stack.
 	 */
 	if (info.type == frame->prev_type) {
-		if (fp <= frame->prev_fp)
-			return -EINVAL;
+		if (fp <= frame->prev_fp) {
+			frame->failed = true;
+			return;
+		}
 	} else {
 		set_bit(frame->prev_type, frame->stacks_done);
 	}
 
 	/*
 	 * Record this frame record's values and location. The prev_fp and
-	 * prev_type are only meaningful to the next unwind_frame() invocation.
+	 * prev_type are only meaningful to the next unwind_next() invocation.
 	 */
 	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
 	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
@@ -124,32 +130,18 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 		 * So replace it to an original value.
 		 */
 		ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
-		if (WARN_ON_ONCE(!ret_stack))
-			return -EINVAL;
+		if (WARN_ON_ONCE(!ret_stack)) {
+			frame->failed = true;
+			return;
+		}
 		frame->pc = ret_stack->ret;
 	}
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 	frame->pc = ptrauth_strip_insn_pac(frame->pc);
-
-	return 0;
 }
-NOKPROBE_SYMBOL(unwind_frame);
 
-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
-			     bool (*fn)(void *, unsigned long), void *data)
-{
-	while (1) {
-		int ret;
-
-		if (!fn(data, frame->pc))
-			break;
-		ret = unwind_frame(tsk, frame);
-		if (ret < 0)
-			break;
-	}
-}
-NOKPROBE_SYMBOL(walk_stackframe);
+NOKPROBE_SYMBOL(unwind_next);
 
 static bool dump_backtrace_entry(void *arg, unsigned long where)
 {
@@ -186,25 +178,74 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
 	barrier();
 }
 
+static bool notrace unwind_consume(struct stackframe *frame,
+				   stack_trace_consume_fn consume_entry,
+				   void *cookie)
+{
+	if (frame->failed) {
+		/* PC is suspect. Cannot consume it. */
+		return false;
+	}
+
+	if (!consume_entry(cookie, frame->pc)) {
+		/* Caller terminated the unwind. */
+		frame->failed = true;
+		return false;
+	}
+
+	if (frame->fp == (unsigned long)task_pt_regs(frame->task)->stackframe) {
+		/* Final frame; nothing to unwind */
+		return false;
+	}
+	return true;
+}
+
+NOKPROBE_SYMBOL(unwind_consume);
+
+static inline bool unwind_failed(struct stackframe *frame)
+{
+	return frame->failed;
+}
+
+/* Core unwind function */
+static bool notrace unwind(stack_trace_consume_fn consume_entry, void *cookie,
+			   struct task_struct *task,
+			   unsigned long fp, unsigned long pc)
+{
+	struct stackframe frame;
+
+	unwind_start(&frame, task, fp, pc);
+	while (unwind_consume(&frame, consume_entry, cookie))
+		unwind_next(&frame);
+	return !unwind_failed(&frame);
+}
+
+NOKPROBE_SYMBOL(unwind);
+
 #ifdef CONFIG_STACKTRACE
 
 noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
 			      void *cookie, struct task_struct *task,
 			      struct pt_regs *regs)
 {
-	struct stackframe frame;
+	unsigned long fp, pc;
+
+	if (!task)
+		task = current;
 
-	if (regs)
-		start_backtrace(&frame, regs->regs[29], regs->pc);
-	else if (task == current)
-		start_backtrace(&frame,
-				(unsigned long)__builtin_frame_address(1),
-				(unsigned long)__builtin_return_address(0));
-	else
-		start_backtrace(&frame, thread_saved_fp(task),
-				thread_saved_pc(task));
-
-	walk_stackframe(task, &frame, consume_entry, cookie);
+	if (regs) {
+		fp = regs->regs[29];
+		pc = regs->pc;
+	} else if (task == current) {
+		/* Skip arch_stack_walk() in the stack trace. */
+		fp = (unsigned long)__builtin_frame_address(1);
+		pc = (unsigned long)__builtin_return_address(0);
+	} else {
+		/* Caller guarantees that the task is not running. */
+		fp = thread_saved_fp(task);
+		pc = thread_saved_pc(task);
+	}
+	unwind(consume_entry, cookie, task, fp, pc);
 }
 
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v7 3/4] arm64: Introduce stack trace reliability checks in the unwinder
  2021-08-12 13:24 ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement " madvenka
  2021-08-12 13:24   ` [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk() madvenka
  2021-08-12 13:24   ` [RFC PATCH v7 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance madvenka
@ 2021-08-12 13:24   ` madvenka
  2021-08-12 13:24   ` [RFC PATCH v7 4/4] arm64: Create a list of SYM_CODE functions, check return PC against list madvenka
  2021-08-12 18:31   ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement stack trace reliability checks Madhavan T. Venkataraman
  4 siblings, 0 replies; 28+ messages in thread
From: madvenka @ 2021-08-12 13:24 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

There are some kernel features and conditions that make a stack trace
unreliable. Callers may require the unwinder to detect these cases.
E.g., livepatch.

Introduce a new function called unwind_is_reliable() that will detect
these cases and return a boolean.

Introduce a new argument to unwind() called "need_reliable" so a caller
can tell unwind() that it requires a reliable stack trace. For such a
caller, any unreliability in the stack trace must be treated as a fatal
error and the unwind must be aborted.

Call unwind_is_reliable() from unwind_consume() like this:

	if (frame->need_reliable && !unwind_is_reliable(frame)) {
		frame->failed = true;
		return false;
	}

In other words, if the return PC in the stackframe falls in unreliable code,
then it cannot be unwound reliably.

arch_stack_walk() will pass "false" for need_reliable because its callers
don't care about reliability. arch_stack_walk() is used for debug and
test purposes.

Introduce arch_stack_walk_reliable() for ARM64. This works like
arch_stack_walk() except for two things:

	- It passes "true" for need_reliable.

	- It returns -EINVAL if unwind() says that the stack trace is
	  unreliable.

Introduce the first reliability check in unwind_is_reliable() - If
a return PC is not a valid kernel text address, consider the stack
trace unreliable. It could be some generated code.

Other reliability checks will be added in the future. Until all of the
checks are in place, arch_stack_walk_reliable() may not be used by
livepatch. But it may be used by debug and test code.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/stacktrace.h |  4 ++
 arch/arm64/kernel/stacktrace.c      | 63 +++++++++++++++++++++++++++--
 2 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 407007376e97..65ea151da5da 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -53,6 +53,9 @@ struct stack_info {
  *               replacement lr value in the ftrace graph stack.
  *
  * @failed:      Unwind failed.
+ *
+ * @need_reliable The caller needs a reliable stack trace. Treat any
+ *                unreliability as a fatal error.
  */
 struct stackframe {
 	struct task_struct *task;
@@ -65,6 +68,7 @@ struct stackframe {
 	int graph;
 #endif
 	bool failed;
+	bool need_reliable;
 };
 
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index ec8f5163c4d0..b60f8a20ba64 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -34,7 +34,8 @@
 
 static void notrace unwind_start(struct stackframe *frame,
 				 struct task_struct *task,
-				 unsigned long fp, unsigned long pc)
+				 unsigned long fp, unsigned long pc,
+				 bool need_reliable)
 {
 	frame->task = task;
 	frame->fp = fp;
@@ -56,6 +57,7 @@ static void notrace unwind_start(struct stackframe *frame,
 	frame->prev_fp = 0;
 	frame->prev_type = STACK_TYPE_UNKNOWN;
 	frame->failed = false;
+	frame->need_reliable = need_reliable;
 }
 
 NOKPROBE_SYMBOL(unwind_start);
@@ -178,6 +180,23 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
 	barrier();
 }
 
+/*
+ * Check the stack frame for conditions that make further unwinding unreliable.
+ */
+static bool notrace unwind_is_reliable(struct stackframe *frame)
+{
+	/*
+	 * If the PC is not a known kernel text address, then we cannot
+	 * be sure that a subsequent unwind will be reliable, as we
+	 * don't know that the code follows our unwind requirements.
+	 */
+	if (!__kernel_text_address(frame->pc))
+		return false;
+	return true;
+}
+
+NOKPROBE_SYMBOL(unwind_is_reliable);
+
 static bool notrace unwind_consume(struct stackframe *frame,
 				   stack_trace_consume_fn consume_entry,
 				   void *cookie)
@@ -197,6 +216,12 @@ static bool notrace unwind_consume(struct stackframe *frame,
 		/* Final frame; nothing to unwind */
 		return false;
 	}
+
+	if (frame->need_reliable && !unwind_is_reliable(frame)) {
+		/* Cannot unwind to the next frame reliably. */
+		frame->failed = true;
+		return false;
+	}
 	return true;
 }
 
@@ -210,11 +235,12 @@ static inline bool unwind_failed(struct stackframe *frame)
 /* Core unwind function */
 static bool notrace unwind(stack_trace_consume_fn consume_entry, void *cookie,
 			   struct task_struct *task,
-			   unsigned long fp, unsigned long pc)
+			   unsigned long fp, unsigned long pc,
+			   bool need_reliable)
 {
 	struct stackframe frame;
 
-	unwind_start(&frame, task, fp, pc);
+	unwind_start(&frame, task, fp, pc, need_reliable);
 	while (unwind_consume(&frame, consume_entry, cookie))
 		unwind_next(&frame);
 	return !unwind_failed(&frame);
@@ -245,7 +271,36 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
 		fp = thread_saved_fp(task);
 		pc = thread_saved_pc(task);
 	}
-	unwind(consume_entry, cookie, task, fp, pc);
+	unwind(consume_entry, cookie, task, fp, pc, false);
+}
+
+/*
+ * arch_stack_walk_reliable() may not be used for livepatch until all of
+ * the reliability checks are in place in unwind_consume(). However,
+ * debug and test code can choose to use it even if all the checks are not
+ * in place.
+ */
+noinline int notrace arch_stack_walk_reliable(stack_trace_consume_fn consume_fn,
+					      void *cookie,
+					      struct task_struct *task)
+{
+	unsigned long fp, pc;
+
+	if (!task)
+		task = current;
+
+	if (task == current) {
+		/* Skip arch_stack_walk_reliable() in the stack trace. */
+		fp = (unsigned long)__builtin_frame_address(1);
+		pc = (unsigned long)__builtin_return_address(0);
+	} else {
+		/* Caller guarantees that the task is not running. */
+		fp = thread_saved_fp(task);
+		pc = thread_saved_pc(task);
+	}
+	if (unwind(consume_fn, cookie, task, fp, pc, true))
+		return 0;
+	return -EINVAL;
 }
 
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v7 4/4] arm64: Create a list of SYM_CODE functions, check return PC against list
  2021-08-12 13:24 ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement " madvenka
                     ` (2 preceding siblings ...)
  2021-08-12 13:24   ` [RFC PATCH v7 3/4] arm64: Introduce stack trace reliability checks in the unwinder madvenka
@ 2021-08-12 13:24   ` madvenka
  2021-08-12 18:31   ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement stack trace reliability checks Madhavan T. Venkataraman
  4 siblings, 0 replies; 28+ messages in thread
From: madvenka @ 2021-08-12 13:24 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

SYM_CODE functions don't follow the usual calling conventions. Check if the
return PC in a stack frame falls in any of these. If it does, consider the
stack trace unreliable.

Define a special section for unreliable functions
=================================================

Define a SYM_CODE_END() macro for arm64 that adds the function address
range to a new section called "sym_code_functions".

Linker file
===========

Include the "sym_code_functions" section under read-only data in
vmlinux.lds.S.

Initialization
==============

Define an early_initcall() to create a sym_code_functions[] array from
the linker data.

Unwinder check
==============

Add a reliability check in unwind_is_reliable() that compares a return
PC with sym_code_functions[]. If there is a match, then return failure.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/linkage.h  | 12 +++++++
 arch/arm64/include/asm/sections.h |  1 +
 arch/arm64/kernel/stacktrace.c    | 53 +++++++++++++++++++++++++++++++
 arch/arm64/kernel/vmlinux.lds.S   | 10 ++++++
 4 files changed, 76 insertions(+)

diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
index 9906541a6861..616bad74e297 100644
--- a/arch/arm64/include/asm/linkage.h
+++ b/arch/arm64/include/asm/linkage.h
@@ -68,4 +68,16 @@
 		SYM_FUNC_END_ALIAS(x);		\
 		SYM_FUNC_END_ALIAS(__pi_##x)
 
+/*
+ * Record the address range of each SYM_CODE function in a struct code_range
+ * in a special section.
+ */
+#define SYM_CODE_END(name)				\
+	SYM_END(name, SYM_T_NONE)			;\
+	99:						;\
+	.pushsection "sym_code_functions", "aw"		;\
+	.quad	name					;\
+	.quad	99b					;\
+	.popsection
+
 #endif
diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index e4ad9db53af1..c84c71063d6e 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -21,5 +21,6 @@ extern char __exittext_begin[], __exittext_end[];
 extern char __irqentry_text_start[], __irqentry_text_end[];
 extern char __mmuoff_data_start[], __mmuoff_data_end[];
 extern char __entry_tramp_text_start[], __entry_tramp_text_end[];
+extern char __sym_code_functions_start[], __sym_code_functions_end[];
 
 #endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index b60f8a20ba64..26dbdd4fff77 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,6 +18,31 @@
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+struct code_range {
+	unsigned long	start;
+	unsigned long	end;
+};
+
+static struct code_range	*sym_code_functions;
+static int			num_sym_code_functions;
+
+int __init init_sym_code_functions(void)
+{
+	size_t size = (unsigned long)__sym_code_functions_end -
+		      (unsigned long)__sym_code_functions_start;
+
+	sym_code_functions = (struct code_range *)__sym_code_functions_start;
+	/*
+	 * Order it so that sym_code_functions is not visible before
+	 * num_sym_code_functions.
+	 */
+	smp_mb();
+	num_sym_code_functions = size / sizeof(struct code_range);
+
+	return 0;
+}
+early_initcall(init_sym_code_functions);
+
 /*
  * AArch64 PCS assigns the frame pointer to x29.
  *
@@ -185,6 +210,10 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
  */
 static bool notrace unwind_is_reliable(struct stackframe *frame)
 {
+	const struct code_range *range;
+	unsigned long pc;
+	int i;
+
 	/*
 	 * If the PC is not a known kernel text address, then we cannot
 	 * be sure that a subsequent unwind will be reliable, as we
@@ -192,6 +221,30 @@ static bool notrace unwind_is_reliable(struct stackframe *frame)
 	 */
 	if (!__kernel_text_address(frame->pc))
 		return false;
+
+	/*
+	 * Check the return PC against sym_code_functions[]. If there is a
+	 * match, then the consider the stack frame unreliable.
+	 *
+	 * As SYM_CODE functions don't follow the usual calling conventions,
+	 * we assume by default that any SYM_CODE function cannot be unwound
+	 * reliably.
+	 *
+	 * Note that this includes:
+	 *
+	 * - Exception handlers and entry assembly
+	 * - Trampoline assembly (e.g., ftrace, kprobes)
+	 * - Hypervisor-related assembly
+	 * - Hibernation-related assembly
+	 * - CPU start-stop, suspend-resume assembly
+	 * - Kernel relocation assembly
+	 */
+	pc = frame->pc;
+	for (i = 0; i < num_sym_code_functions; i++) {
+		range = &sym_code_functions[i];
+		if (pc >= range->start && pc < range->end)
+			return false;
+	}
 	return true;
 }
 
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 709d2c433c5e..2bf769f45b54 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -111,6 +111,14 @@ jiffies = jiffies_64;
 #define TRAMP_TEXT
 #endif
 
+#define SYM_CODE_FUNCTIONS				\
+	. = ALIGN(16);					\
+	.symcode : AT(ADDR(.symcode) - LOAD_OFFSET) {	\
+		__sym_code_functions_start = .;		\
+		KEEP(*(sym_code_functions))		\
+		__sym_code_functions_end = .;		\
+	}
+
 /*
  * The size of the PE/COFF section that covers the kernel image, which
  * runs from _stext to _edata, must be a round multiple of the PE/COFF
@@ -196,6 +204,8 @@ SECTIONS
 	swapper_pg_dir = .;
 	. += PAGE_SIZE;
 
+	SYM_CODE_FUNCTIONS
+
 	. = ALIGN(SEGMENT_ALIGN);
 	__init_begin = .;
 	__inittext_begin = .;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk()
  2021-08-12 13:24   ` [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk() madvenka
@ 2021-08-12 15:23     ` Mark Brown
  2021-08-12 16:30       ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Brown @ 2021-08-12 15:23 UTC (permalink / raw)
  To: madvenka
  Cc: mark.rutland, jpoimboe, ardb, nobuta.keiya, sjitindarsingh,
	catalin.marinas, will, jmorris, pasha.tatashin, jthierry,
	linux-arm-kernel, live-patching, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 517 bytes --]

On Thu, Aug 12, 2021 at 08:24:32AM -0500, madvenka@linux.microsoft.com wrote:

> Here is the list of functions:
> 
> 	perf_callchain_kernel()
> 	get_wchan()
> 	return_address()
> 	dump_backtrace()
> 	profile_pc()

I've not actually gone through this properly yet but my first thought is
that for clarity this should be split out into a patch per user plus one
to delete the old interface  - I'd not worry about it unless it needs to
get resubmitted though.

It'll definitely be good to get this done!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk()
  2021-08-12 15:23     ` Mark Brown
@ 2021-08-12 16:30       ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 28+ messages in thread
From: Madhavan T. Venkataraman @ 2021-08-12 16:30 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland, jpoimboe, ardb, nobuta.keiya, sjitindarsingh,
	catalin.marinas, will, jmorris, pasha.tatashin, jthierry,
	linux-arm-kernel, live-patching, linux-kernel



On 8/12/21 10:23 AM, Mark Brown wrote:
> On Thu, Aug 12, 2021 at 08:24:32AM -0500, madvenka@linux.microsoft.com wrote:
> 
>> Here is the list of functions:
>>
>> 	perf_callchain_kernel()
>> 	get_wchan()
>> 	return_address()
>> 	dump_backtrace()
>> 	profile_pc()
> 
> I've not actually gone through this properly yet but my first thought is
> that for clarity this should be split out into a patch per user plus one
> to delete the old interface  - I'd not worry about it unless it needs to
> get resubmitted though.
> 

OK. I will address it in the next version.

> It'll definitely be good to get this done!

Yes. Thanks!

Madhavan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement stack trace reliability checks
  2021-08-12 13:24 ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement " madvenka
                     ` (3 preceding siblings ...)
  2021-08-12 13:24   ` [RFC PATCH v7 4/4] arm64: Create a list of SYM_CODE functions, check return PC against list madvenka
@ 2021-08-12 18:31   ` Madhavan T. Venkataraman
  2021-08-12 18:45     ` Madhavan T. Venkataraman
  4 siblings, 1 reply; 28+ messages in thread
From: Madhavan T. Venkataraman @ 2021-08-12 18:31 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel

The messages are not threaded properly.

I will resend the whole series with proper threading.

I apologize.

Madhavan

On 8/12/21 8:24 AM, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> 
> Make all stack walking functions use arch_stack_walk()
> ======================================================
> 
> Currently, there are multiple functions in ARM64 code that walk the
> stack using start_backtrace() and unwind_frame(). Convert all of
> them to use arch_stack_walk(). This makes maintenance easier.
> 
> Reorganize the unwinder code for better consistency and maintenance
> ===================================================================
> 
> Rename unwinder functions to unwind_*() similar to other architectures
> for naming consistency.
> 
> Annotate all of the unwind_*() functions with notrace so they cannot be
> ftraced and NOKPROBE_SYMBOL() so they cannot be kprobed. Ftrace and Kprobe
> code can call the unwinder.
> 
> Redefine the unwinder loop and make it similar to other architectures.
> Define the following:
> 
> 	unwind_start(&frame, task, fp, pc);
> 	while (unwind_consume(&frame, consume_entry, cookie))
> 		unwind_next(&frame);
> 	return !unwind_failed(&frame);
> 
> unwind_start()
> 	Same as the original start_backtrace().
> 
> unwind_consume()
> 	This new function does two things:
> 
> 	- Calls consume_entry() to consume the return PC.
> 
> 	- Implements checks to determine whether the unwind should continue
> 	  or terminate.
> 
> unwind_next()
> 	Same as the original unwind_frame() except:
> 
> 	- the stack trace termination check has been moved from here to
> 	  unwind_consume(). So, unwind_next() assumes that the fp is valid.
> 
> 	- unwind_frame() used to return an error value. This function only
> 	  sets internal state and does not return anything. The state is
> 	  retrieved via a helper. See next.
> 
> unwind_failed()
> 	Return a boolean to indicate whether the stack trace completed
> 	successfully or failed. arch_stack_walk() ignores the return
> 	value. But arch_stack_walk_reliable() in the future will look
> 	at the return value.
> 
> Unwind status
> 	Introduce a new flag called "failed" in struct stackframe. Set this
> 	flag when an error is encountered. If this flag is set, terminate
> 	the unwind. Also, let the unwinder return the status to the caller.
> 
> Reliability checks
> ==================
> 
> There are some kernel features and conditions that make a stack trace
> unreliable. Callers may require the unwinder to detect these cases.
> E.g., livepatch.
> 
> Introduce a new function called unwind_is_reliable() that will detect
> these cases and return a boolean.
> 
> Introduce a new argument to unwind() called "need_reliable" so a caller
> can tell unwind() that it requires a reliable stack trace. For such a
> caller, any unreliability in the stack trace must be treated as a fatal
> error and the unwind must be aborted.
> 
> Call unwind_is_reliable() from unwind_consume() like this:
> 
> 	if (frame->need_reliable && !unwind_is_reliable(frame)) {
> 		frame->failed = true;
> 		return false;
> 	}
> 
> arch_stack_walk() passes "false" for need_reliable because its callers
> don't care about reliability. arch_stack_walk() is used for debug and
> test purposes.
> 
> Introduce arch_stack_walk_reliable() for ARM64. This works like
> arch_stack_walk() except for two things:
> 
> 	- It passes "true" for need_reliable.
> 
> 	- It returns -EINVAL if unwind() aborts.
> 
> Introduce the first reliability check in unwind_is_reliable() - If
> a return PC is not a valid kernel text address, consider the stack
> trace unreliable. It could be some generated code.
> 
> Other reliability checks will be added in the future. Until all of the
> checks are in place, arch_stack_walk_reliable() may not be used by
> livepatch. But it may be used by debug and test code.
> 
> SYM_CODE check
> ==============
> 
> SYM_CODE functions do not follow normal calling conventions. They cannot
> be unwound reliably using the frame pointer. Collect the address ranges
> of these functions in a special section called "sym_code_functions".
> 
> In unwind_is_reliable(), check the return PC against these ranges. If a
> match is found, then consider the stack trace unreliable. This is the
> second reliability check introduced by this work.
> 
> Last stack frame
> ----------------
> 
> If a SYM_CODE function occurs in the very last frame in the stack trace,
> then the stack trace is not considered unreliable. This is because there
> is no more unwinding to do. Examples:
> 
> 	- EL0 exception stack traces end in the top level EL0 exception
> 	  handlers.
> 
> 	- All kernel thread stack traces end in ret_from_fork().
> ---
> Changelog:
> 
> v7:
> 	From Mark Rutland:
> 
> 	- Make the unwinder loop similar to other architectures.
> 
> 	- Keep details to within the unwinder functions and return a simple
> 	  boolean to the caller.
> 
> 	- Convert some of the current code that contains unwinder logic to
> 	  simply use arch_stack_walk(). I have converted all of them.
> 
> 	- Do not copy sym_code_functions[]. Just place it in rodata for now.
> 
> 	- Have the main loop check for termination conditions rather than
> 	  having unwind_frame() check for them. In other words, let
> 	  unwind_frame() assume that the fp is valid.
> 
> 	- Replace the big comment for SYM_CODE functions with a shorter
> 	  comment.
> 
> 		/*
> 		 * As SYM_CODE functions don't follow the usual calling
> 		 * conventions, we assume by default that any SYM_CODE function
> 		 * cannot be unwound reliably.
> 		 *
> 		 * Note that this includes:
> 		 *
> 		 * - Exception handlers and entry assembly
> 		 * - Trampoline assembly (e.g., ftrace, kprobes)
> 		 * - Hypervisor-related assembly
> 		 * - Hibernation-related assembly
> 		 * - CPU start-stop, suspend-resume assembly
> 		 * - Kernel relocation assembly
> 		 */
> 
> v6:
> 	From Mark Rutland:
> 
> 	- The per-frame reliability concept and flag are acceptable. But more
> 	  work is needed to make the per-frame checks more accurate and more
> 	  complete. E.g., some code reorg is being worked on that will help.
> 
> 	  I have now removed the frame->reliable flag and deleted the whole
> 	  concept of per-frame status. This is orthogonal to this patch series.
> 	  Instead, I have improved the unwinder to return proper return codes
> 	  so a caller can take appropriate action without needing per-frame
> 	  status.
> 
> 	- Remove the mention of PLTs and update the comment.
> 
> 	  I have replaced the comment above the call to __kernel_text_address()
> 	  with the comment suggested by Mark Rutland.
> 
> 	Other comments:
> 
> 	- Other comments on the per-frame stuff are not relevant because
> 	  that approach is not there anymore.
> 
> v5:
> 	From Keiya Nobuta:
> 	
> 	- The term blacklist(ed) is not to be used anymore. I have changed it
> 	  to unreliable. So, the function unwinder_blacklisted() has been
> 	  changed to unwinder_is_unreliable().
> 
> 	From Mark Brown:
> 
> 	- Add a comment for the "reliable" flag in struct stackframe. The
> 	  reliability attribute is not complete until all the checks are
> 	  in place. Added a comment above struct stackframe.
> 
> 	- Include some of the comments in the cover letter in the actual
> 	  code so that we can compare it with the reliable stack trace
> 	  requirements document for completeness. I have added a comment:
> 
> 	  	- above unwinder_is_unreliable() that lists the requirements
> 		  that are addressed by the function.
> 
> 		- above the __kernel_text_address() call about all the cases
> 		  the call covers.
> 
> v4:
> 	From Mark Brown:
> 
> 	- I was checking the return PC with __kernel_text_address() before
> 	  the Function Graph trace handling. Mark Brown felt that all the
> 	  reliability checks should be performed on the original return PC
> 	  once that is obtained. So, I have moved all the reliability checks
> 	  to after the Function Graph Trace handling code in the unwinder.
> 	  Basically, the unwinder should perform PC translations first (for
> 	  rhe return trampoline for Function Graph Tracing, Kretprobes, etc).
> 	  Then, the reliability checks should be applied to the resulting
> 	  PC.
> 
> 	- Mark said to improve the naming of the new functions so they don't
> 	  collide with existing ones. I have used a prefix "unwinder_" for
> 	  all the new functions.
> 
> 	From Josh Poimboeuf:
> 
> 	- In the error scenarios in the unwinder, the reliable flag in the
> 	  stack frame should be set. Implemented this.
> 
> 	- Some of the other comments are not relevant to the new code as
> 	  I have taken a different approach in the new code. That is why
> 	  I have not made those changes. E.g., Ard wanted me to add the
> 	  "const" keyword to the global section array. That array does not
> 	  exist in v4. Similarly, Mark Brown said to use ARRAY_SIZE() for
> 	  the same array in a for loop.
> 
> 	Other changes:
> 
> 	- Add a new definition for SYM_CODE_END() that adds the address
> 	  range of the function to a special section called
> 	  "sym_code_functions".
> 
> 	- Include the new section under initdata in vmlinux.lds.S.
> 
> 	- Define an early_initcall() to copy the contents of the
> 	  "sym_code_functions" section to an array by the same name.
> 
> 	- Define a function unwinder_blacklisted() that compares a return
> 	  PC against sym_code_sections[]. If there is a match, mark the
> 	  stack trace unreliable. Call this from unwind_frame().
> 
> v3:
> 	- Implemented a sym_code_ranges[] array to contains sections bounds
> 	  for text sections that contain SYM_CODE_*() functions. The unwinder
> 	  checks each return PC against the sections. If it falls in any of
> 	  the sections, the stack trace is marked unreliable.
> 
> 	- Moved SYM_CODE functions from .text and .init.text into a new
> 	  text section called ".code.text". Added this section to
> 	  vmlinux.lds.S and sym_code_ranges[].
> 
> 	- Fixed the logic in the unwinder that handles Function Graph
> 	  Tracer return trampoline.
> 
> 	- Removed all the previous code that handles:
> 		- ftrace entry code for traced function
> 		- special_functions[] array that lists individual functions
> 		- kretprobe_trampoline() special case
> 
> v2
> 	- Removed the terminating entry { 0, 0 } in special_functions[]
> 	  and replaced it with the idiom { /* sentinel */ }.
> 
> 	- Change the ftrace trampoline entry ftrace_graph_call in
> 	  special_functions[] to ftrace_call + 4 and added explanatory
> 	  comments.
> 
> 	- Unnested #ifdefs in special_functions[] for FTRACE.
> 
> v1
> 	- Define a bool field in struct stackframe. This will indicate if
> 	  a stack trace is reliable.
> 
> 	- Implement a special_functions[] array that will be populated
> 	  with special functions in which the stack trace is considered
> 	  unreliable.
> 	
> 	- Using kallsyms_lookup(), get the address ranges for the special
> 	  functions and record them.
> 
> 	- Implement an is_reliable_function(pc). This function will check
> 	  if a given return PC falls in any of the special functions. If
> 	  it does, the stack trace is unreliable.
> 
> 	- Implement check_reliability() function that will check if a
> 	  stack frame is reliable. Call is_reliable_function() from
> 	  check_reliability().
> 
> 	- Before a return PC is checked against special_funtions[], it
> 	  must be validates as a proper kernel text address. Call
> 	  __kernel_text_address() from check_reliability().
> 
> 	- Finally, call check_reliability() from unwind_frame() for
> 	  each stack frame.
> 
> 	- Add EL1 exception handlers to special_functions[].
> 
> 		el1_sync();
> 		el1_irq();
> 		el1_error();
> 		el1_sync_invalid();
> 		el1_irq_invalid();
> 		el1_fiq_invalid();
> 		el1_error_invalid();
> 
> 	- The above functions are currently defined as LOCAL symbols.
> 	  Make them global so that they can be referenced from the
> 	  unwinder code.
> 
> 	- Add FTRACE trampolines to special_functions[]:
> 
> 		ftrace_graph_call()
> 		ftrace_graph_caller()
> 		return_to_handler()
> 
> 	- Add the kretprobe trampoline to special functions[]:
> 
> 		kretprobe_trampoline()
> 
> Previous versions and discussion
> ================================
> 
> v6: https://lore.kernel.org/linux-arm-kernel/20210630223356.58714-1-madvenka@linux.microsoft.com/
> v5: https://lore.kernel.org/linux-arm-kernel/20210526214917.20099-1-madvenka@linux.microsoft.com/
> v4: https://lore.kernel.org/linux-arm-kernel/20210516040018.128105-1-madvenka@linux.microsoft.com/
> v3: https://lore.kernel.org/linux-arm-kernel/20210503173615.21576-1-madvenka@linux.microsoft.com/
> v2: https://lore.kernel.org/linux-arm-kernel/20210405204313.21346-1-madvenka@linux.microsoft.com/
> v1: https://lore.kernel.org/linux-arm-kernel/20210330190955.13707-1-madvenka@linux.microsoft.com/
> Madhavan T. Venkataraman (4):
>   arm64: Make all stack walking functions use arch_stack_walk()
>   arm64: Reorganize the unwinder code for better consistency and
>     maintenance
>   arm64: Introduce stack trace reliability checks in the unwinder
>   arm64: Create a list of SYM_CODE functions, check return PC against
>     list
> 
>  arch/arm64/include/asm/linkage.h    |  12 ++
>  arch/arm64/include/asm/sections.h   |   1 +
>  arch/arm64/include/asm/stacktrace.h |  16 +-
>  arch/arm64/kernel/perf_callchain.c  |   5 +-
>  arch/arm64/kernel/process.c         |  39 ++--
>  arch/arm64/kernel/return_address.c  |   6 +-
>  arch/arm64/kernel/stacktrace.c      | 291 ++++++++++++++++++++--------
>  arch/arm64/kernel/time.c            |  22 ++-
>  arch/arm64/kernel/vmlinux.lds.S     |  10 +
>  9 files changed, 277 insertions(+), 125 deletions(-)
> 
> 
> base-commit: 36a21d51725af2ce0700c6ebcb6b9594aac658a6
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement stack trace reliability checks
       [not found] <3f2aab69a35c243c5e97f47c4ad84046355f5b90>
  2021-06-30 22:33 ` [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks madvenka
  2021-08-12 13:24 ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement " madvenka
@ 2021-08-12 18:35 ` madvenka
  2021-08-12 18:35   ` [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk() madvenka
                     ` (3 more replies)
  2 siblings, 4 replies; 28+ messages in thread
From: madvenka @ 2021-08-12 18:35 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Make all stack walking functions use arch_stack_walk()
======================================================

Currently, there are multiple functions in ARM64 code that walk the
stack using start_backtrace() and unwind_frame(). Convert all of
them to use arch_stack_walk(). This makes maintenance easier.

Reorganize the unwinder code for better consistency and maintenance
===================================================================

Rename unwinder functions to unwind_*() similar to other architectures
for naming consistency.

Annotate all of the unwind_*() functions with notrace so they cannot be
ftraced and NOKPROBE_SYMBOL() so they cannot be kprobed. Ftrace and Kprobe
code can call the unwinder.

Redefine the unwinder loop and make it similar to other architectures.
Define the following:

	unwind_start(&frame, task, fp, pc);
	while (unwind_consume(&frame, consume_entry, cookie))
		unwind_next(&frame);
	return !unwind_failed(&frame);

unwind_start()
	Same as the original start_backtrace().

unwind_consume()
	This new function does two things:

	- Calls consume_entry() to consume the return PC.

	- Implements checks to determine whether the unwind should continue
	  or terminate.

unwind_next()
	Same as the original unwind_frame() except:

	- the stack trace termination check has been moved from here to
	  unwind_consume(). So, unwind_next() assumes that the fp is valid.

	- unwind_frame() used to return an error value. This function only
	  sets internal state and does not return anything. The state is
	  retrieved via a helper. See next.

unwind_failed()
	Return a boolean to indicate whether the stack trace completed
	successfully or failed. arch_stack_walk() ignores the return
	value. But arch_stack_walk_reliable() in the future will look
	at the return value.

Unwind status
	Introduce a new flag called "failed" in struct stackframe. Set this
	flag when an error is encountered. If this flag is set, terminate
	the unwind. Also, let the unwinder return the status to the caller.

Reliability checks
==================

There are some kernel features and conditions that make a stack trace
unreliable. Callers may require the unwinder to detect these cases.
E.g., livepatch.

Introduce a new function called unwind_is_reliable() that will detect
these cases and return a boolean.

Introduce a new argument to unwind() called "need_reliable" so a caller
can tell unwind() that it requires a reliable stack trace. For such a
caller, any unreliability in the stack trace must be treated as a fatal
error and the unwind must be aborted.

Call unwind_is_reliable() from unwind_consume() like this:

	if (frame->need_reliable && !unwind_is_reliable(frame)) {
		frame->failed = true;
		return false;
	}

arch_stack_walk() passes "false" for need_reliable because its callers
don't care about reliability. arch_stack_walk() is used for debug and
test purposes.

Introduce arch_stack_walk_reliable() for ARM64. This works like
arch_stack_walk() except for two things:

	- It passes "true" for need_reliable.

	- It returns -EINVAL if unwind() aborts.

Introduce the first reliability check in unwind_is_reliable() - If
a return PC is not a valid kernel text address, consider the stack
trace unreliable. It could be some generated code.

Other reliability checks will be added in the future. Until all of the
checks are in place, arch_stack_walk_reliable() may not be used by
livepatch. But it may be used by debug and test code.

SYM_CODE check
==============

SYM_CODE functions do not follow normal calling conventions. They cannot
be unwound reliably using the frame pointer. Collect the address ranges
of these functions in a special section called "sym_code_functions".

In unwind_is_reliable(), check the return PC against these ranges. If a
match is found, then consider the stack trace unreliable. This is the
second reliability check introduced by this work.

Last stack frame
----------------

If a SYM_CODE function occurs in the very last frame in the stack trace,
then the stack trace is not considered unreliable. This is because there
is no more unwinding to do. Examples:

	- EL0 exception stack traces end in the top level EL0 exception
	  handlers.

	- All kernel thread stack traces end in ret_from_fork().
---
Changelog:

v7:
	From Mark Rutland:

	- Make the unwinder loop similar to other architectures.

	- Keep details to within the unwinder functions and return a simple
	  boolean to the caller.

	- Convert some of the current code that contains unwinder logic to
	  simply use arch_stack_walk(). I have converted all of them.

	- Do not copy sym_code_functions[]. Just place it in rodata for now.

	- Have the main loop check for termination conditions rather than
	  having unwind_frame() check for them. In other words, let
	  unwind_frame() assume that the fp is valid.

	- Replace the big comment for SYM_CODE functions with a shorter
	  comment.

		/*
		 * As SYM_CODE functions don't follow the usual calling
		 * conventions, we assume by default that any SYM_CODE function
		 * cannot be unwound reliably.
		 *
		 * Note that this includes:
		 *
		 * - Exception handlers and entry assembly
		 * - Trampoline assembly (e.g., ftrace, kprobes)
		 * - Hypervisor-related assembly
		 * - Hibernation-related assembly
		 * - CPU start-stop, suspend-resume assembly
		 * - Kernel relocation assembly
		 */

v6:
	From Mark Rutland:

	- The per-frame reliability concept and flag are acceptable. But more
	  work is needed to make the per-frame checks more accurate and more
	  complete. E.g., some code reorg is being worked on that will help.

	  I have now removed the frame->reliable flag and deleted the whole
	  concept of per-frame status. This is orthogonal to this patch series.
	  Instead, I have improved the unwinder to return proper return codes
	  so a caller can take appropriate action without needing per-frame
	  status.

	- Remove the mention of PLTs and update the comment.

	  I have replaced the comment above the call to __kernel_text_address()
	  with the comment suggested by Mark Rutland.

	Other comments:

	- Other comments on the per-frame stuff are not relevant because
	  that approach is not there anymore.

v5:
	From Keiya Nobuta:
	
	- The term blacklist(ed) is not to be used anymore. I have changed it
	  to unreliable. So, the function unwinder_blacklisted() has been
	  changed to unwinder_is_unreliable().

	From Mark Brown:

	- Add a comment for the "reliable" flag in struct stackframe. The
	  reliability attribute is not complete until all the checks are
	  in place. Added a comment above struct stackframe.

	- Include some of the comments in the cover letter in the actual
	  code so that we can compare it with the reliable stack trace
	  requirements document for completeness. I have added a comment:

	  	- above unwinder_is_unreliable() that lists the requirements
		  that are addressed by the function.

		- above the __kernel_text_address() call about all the cases
		  the call covers.

v4:
	From Mark Brown:

	- I was checking the return PC with __kernel_text_address() before
	  the Function Graph trace handling. Mark Brown felt that all the
	  reliability checks should be performed on the original return PC
	  once that is obtained. So, I have moved all the reliability checks
	  to after the Function Graph Trace handling code in the unwinder.
	  Basically, the unwinder should perform PC translations first (for
	  rhe return trampoline for Function Graph Tracing, Kretprobes, etc).
	  Then, the reliability checks should be applied to the resulting
	  PC.

	- Mark said to improve the naming of the new functions so they don't
	  collide with existing ones. I have used a prefix "unwinder_" for
	  all the new functions.

	From Josh Poimboeuf:

	- In the error scenarios in the unwinder, the reliable flag in the
	  stack frame should be set. Implemented this.

	- Some of the other comments are not relevant to the new code as
	  I have taken a different approach in the new code. That is why
	  I have not made those changes. E.g., Ard wanted me to add the
	  "const" keyword to the global section array. That array does not
	  exist in v4. Similarly, Mark Brown said to use ARRAY_SIZE() for
	  the same array in a for loop.

	Other changes:

	- Add a new definition for SYM_CODE_END() that adds the address
	  range of the function to a special section called
	  "sym_code_functions".

	- Include the new section under initdata in vmlinux.lds.S.

	- Define an early_initcall() to copy the contents of the
	  "sym_code_functions" section to an array by the same name.

	- Define a function unwinder_blacklisted() that compares a return
	  PC against sym_code_sections[]. If there is a match, mark the
	  stack trace unreliable. Call this from unwind_frame().

v3:
	- Implemented a sym_code_ranges[] array to contains sections bounds
	  for text sections that contain SYM_CODE_*() functions. The unwinder
	  checks each return PC against the sections. If it falls in any of
	  the sections, the stack trace is marked unreliable.

	- Moved SYM_CODE functions from .text and .init.text into a new
	  text section called ".code.text". Added this section to
	  vmlinux.lds.S and sym_code_ranges[].

	- Fixed the logic in the unwinder that handles Function Graph
	  Tracer return trampoline.

	- Removed all the previous code that handles:
		- ftrace entry code for traced function
		- special_functions[] array that lists individual functions
		- kretprobe_trampoline() special case

v2
	- Removed the terminating entry { 0, 0 } in special_functions[]
	  and replaced it with the idiom { /* sentinel */ }.

	- Change the ftrace trampoline entry ftrace_graph_call in
	  special_functions[] to ftrace_call + 4 and added explanatory
	  comments.

	- Unnested #ifdefs in special_functions[] for FTRACE.

v1
	- Define a bool field in struct stackframe. This will indicate if
	  a stack trace is reliable.

	- Implement a special_functions[] array that will be populated
	  with special functions in which the stack trace is considered
	  unreliable.
	
	- Using kallsyms_lookup(), get the address ranges for the special
	  functions and record them.

	- Implement an is_reliable_function(pc). This function will check
	  if a given return PC falls in any of the special functions. If
	  it does, the stack trace is unreliable.

	- Implement check_reliability() function that will check if a
	  stack frame is reliable. Call is_reliable_function() from
	  check_reliability().

	- Before a return PC is checked against special_funtions[], it
	  must be validates as a proper kernel text address. Call
	  __kernel_text_address() from check_reliability().

	- Finally, call check_reliability() from unwind_frame() for
	  each stack frame.

	- Add EL1 exception handlers to special_functions[].

		el1_sync();
		el1_irq();
		el1_error();
		el1_sync_invalid();
		el1_irq_invalid();
		el1_fiq_invalid();
		el1_error_invalid();

	- The above functions are currently defined as LOCAL symbols.
	  Make them global so that they can be referenced from the
	  unwinder code.

	- Add FTRACE trampolines to special_functions[]:

		ftrace_graph_call()
		ftrace_graph_caller()
		return_to_handler()

	- Add the kretprobe trampoline to special functions[]:

		kretprobe_trampoline()

Previous versions and discussion
================================

v6: https://lore.kernel.org/linux-arm-kernel/20210630223356.58714-1-madvenka@linux.microsoft.com/
v5: https://lore.kernel.org/linux-arm-kernel/20210526214917.20099-1-madvenka@linux.microsoft.com/
v4: https://lore.kernel.org/linux-arm-kernel/20210516040018.128105-1-madvenka@linux.microsoft.com/
v3: https://lore.kernel.org/linux-arm-kernel/20210503173615.21576-1-madvenka@linux.microsoft.com/
v2: https://lore.kernel.org/linux-arm-kernel/20210405204313.21346-1-madvenka@linux.microsoft.com/
v1: https://lore.kernel.org/linux-arm-kernel/20210330190955.13707-1-madvenka@linux.microsoft.com/
Madhavan T. Venkataraman (4):
  arm64: Make all stack walking functions use arch_stack_walk()
  arm64: Reorganize the unwinder code for better consistency and
    maintenance
  arm64: Introduce stack trace reliability checks in the unwinder
  arm64: Create a list of SYM_CODE functions, check return PC against
    list

 arch/arm64/include/asm/linkage.h    |  12 ++
 arch/arm64/include/asm/sections.h   |   1 +
 arch/arm64/include/asm/stacktrace.h |  16 +-
 arch/arm64/kernel/perf_callchain.c  |   5 +-
 arch/arm64/kernel/process.c         |  39 ++--
 arch/arm64/kernel/return_address.c  |   6 +-
 arch/arm64/kernel/stacktrace.c      | 291 ++++++++++++++++++++--------
 arch/arm64/kernel/time.c            |  22 ++-
 arch/arm64/kernel/vmlinux.lds.S     |  10 +
 9 files changed, 277 insertions(+), 125 deletions(-)


base-commit: 36a21d51725af2ce0700c6ebcb6b9594aac658a6
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk()
  2021-08-12 18:35 ` madvenka
@ 2021-08-12 18:35   ` madvenka
  2021-08-12 18:35   ` [RFC PATCH v7 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance madvenka
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: madvenka @ 2021-08-12 18:35 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Currently, there are multiple functions in ARM64 code that walk the
stack using start_backtrace() and unwind_frame(). Convert all of
them to use arch_stack_walk(). This makes maintenance easier.

Here is the list of functions:

	perf_callchain_kernel()
	get_wchan()
	return_address()
	dump_backtrace()
	profile_pc()

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/stacktrace.h |  3 ---
 arch/arm64/kernel/perf_callchain.c  |  5 +---
 arch/arm64/kernel/process.c         | 39 ++++++++++++++++++-----------
 arch/arm64/kernel/return_address.c  |  6 +----
 arch/arm64/kernel/stacktrace.c      | 38 +++-------------------------
 arch/arm64/kernel/time.c            | 22 +++++++++-------
 6 files changed, 43 insertions(+), 70 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 8aebc00c1718..e43dea1c6b41 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -61,9 +61,6 @@ struct stackframe {
 #endif
 };
 
-extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
-			    bool (*fn)(void *, unsigned long), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 			   const char *loglvl);
 
diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
index 4a72c2727309..2f289013c9c9 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -147,15 +147,12 @@ static bool callchain_trace(void *data, unsigned long pc)
 void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
-	struct stackframe frame;
-
 	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
 
-	start_backtrace(&frame, regs->regs[29], regs->pc);
-	walk_stackframe(current, &frame, callchain_trace, entry);
+	arch_stack_walk(callchain_trace, entry, current, regs);
 }
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index c8989b999250..52c12fd26407 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -544,11 +544,28 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	return last;
 }
 
+struct wchan_info {
+	unsigned long	pc;
+	int		count;
+};
+
+static bool get_wchan_cb(void *arg, unsigned long pc)
+{
+	struct wchan_info *wchan_info = arg;
+
+	if (!in_sched_functions(pc)) {
+		wchan_info->pc = pc;
+		return false;
+	}
+	wchan_info->count--;
+	return !!wchan_info->count;
+}
+
 unsigned long get_wchan(struct task_struct *p)
 {
-	struct stackframe frame;
-	unsigned long stack_page, ret = 0;
-	int count = 0;
+	unsigned long stack_page;
+	struct wchan_info wchan_info;
+
 	if (!p || p == current || task_is_running(p))
 		return 0;
 
@@ -556,20 +573,12 @@ unsigned long get_wchan(struct task_struct *p)
 	if (!stack_page)
 		return 0;
 
-	start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
+	wchan_info.pc = 0;
+	wchan_info.count = 16;
+	arch_stack_walk(get_wchan_cb, &wchan_info, p, NULL);
 
-	do {
-		if (unwind_frame(p, &frame))
-			goto out;
-		if (!in_sched_functions(frame.pc)) {
-			ret = frame.pc;
-			goto out;
-		}
-	} while (count++ < 16);
-
-out:
 	put_task_stack(p);
-	return ret;
+	return wchan_info.pc;
 }
 
 unsigned long arch_align_stack(unsigned long sp)
diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
index a6d18755652f..92a0f4d434e4 100644
--- a/arch/arm64/kernel/return_address.c
+++ b/arch/arm64/kernel/return_address.c
@@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
 void *return_address(unsigned int level)
 {
 	struct return_address_data data;
-	struct stackframe frame;
 
 	data.level = level + 2;
 	data.addr = NULL;
 
-	start_backtrace(&frame,
-			(unsigned long)__builtin_frame_address(0),
-			(unsigned long)return_address);
-	walk_stackframe(current, &frame, save_return_addr, &data);
+	arch_stack_walk(save_return_addr, &data, current, NULL);
 
 	if (!data.level)
 		return data.addr;
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 8982a2b78acf..1800310f92be 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -151,23 +151,21 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 }
 NOKPROBE_SYMBOL(walk_stackframe);
 
-static void dump_backtrace_entry(unsigned long where, const char *loglvl)
+static bool dump_backtrace_entry(void *arg, unsigned long where)
 {
+	char *loglvl = arg;
 	printk("%s %pSb\n", loglvl, (void *)where);
+	return true;
 }
 
 void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 		    const char *loglvl)
 {
-	struct stackframe frame;
-	int skip = 0;
-
 	pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
 
 	if (regs) {
 		if (user_mode(regs))
 			return;
-		skip = 1;
 	}
 
 	if (!tsk)
@@ -176,36 +174,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 	if (!try_get_task_stack(tsk))
 		return;
 
-	if (tsk == current) {
-		start_backtrace(&frame,
-				(unsigned long)__builtin_frame_address(0),
-				(unsigned long)dump_backtrace);
-	} else {
-		/*
-		 * task blocked in __switch_to
-		 */
-		start_backtrace(&frame,
-				thread_saved_fp(tsk),
-				thread_saved_pc(tsk));
-	}
-
 	printk("%sCall trace:\n", loglvl);
-	do {
-		/* skip until specified stack frame */
-		if (!skip) {
-			dump_backtrace_entry(frame.pc, loglvl);
-		} else if (frame.fp == regs->regs[29]) {
-			skip = 0;
-			/*
-			 * Mostly, this is the case where this function is
-			 * called in panic/abort. As exception handler's
-			 * stack frame does not contain the corresponding pc
-			 * at which an exception has taken place, use regs->pc
-			 * instead.
-			 */
-			dump_backtrace_entry(regs->pc, loglvl);
-		}
-	} while (!unwind_frame(tsk, &frame));
+	arch_stack_walk(dump_backtrace_entry, (void *)loglvl, tsk, regs);
 
 	put_task_stack(tsk);
 }
diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index eebbc8d7123e..671b3038a772 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -32,22 +32,26 @@
 #include <asm/stacktrace.h>
 #include <asm/paravirt.h>
 
+static bool profile_pc_cb(void *arg, unsigned long pc)
+{
+	unsigned long *prof_pc = arg;
+
+	if (in_lock_functions(pc))
+		return true;
+	*prof_pc = pc;
+	return false;
+}
+
 unsigned long profile_pc(struct pt_regs *regs)
 {
-	struct stackframe frame;
+	unsigned long prof_pc = 0;
 
 	if (!in_lock_functions(regs->pc))
 		return regs->pc;
 
-	start_backtrace(&frame, regs->regs[29], regs->pc);
-
-	do {
-		int ret = unwind_frame(NULL, &frame);
-		if (ret < 0)
-			return 0;
-	} while (in_lock_functions(frame.pc));
+	arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
 
-	return frame.pc;
+	return prof_pc;
 }
 EXPORT_SYMBOL(profile_pc);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v7 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance
  2021-08-12 18:35 ` madvenka
  2021-08-12 18:35   ` [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk() madvenka
@ 2021-08-12 18:35   ` madvenka
  2021-08-12 18:35   ` [RFC PATCH v7 3/4] arm64: Introduce stack trace reliability checks in the unwinder madvenka
  2021-08-12 18:35   ` [RFC PATCH v7 4/4] arm64: Create a list of SYM_CODE functions, check return PC against list madvenka
  3 siblings, 0 replies; 28+ messages in thread
From: madvenka @ 2021-08-12 18:35 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Renaming of unwinder functions
==============================

Rename unwinder functions to unwind_*() similar to other architectures
for naming consistency. More on this below.

unwind function attributes
==========================

Mark all of the unwind_*() functions with notrace so they cannot be ftraced
and NOKPROBE_SYMBOL() so they cannot be kprobed. Ftrace and Kprobe code
can call the unwinder.

start_backtrace()
=================

start_backtrace() is only called by arch_stack_walk(). Make it static.
Rename start_backtrace() to unwind_start() for naming consistency.

unwind_frame()
==============

Rename this to unwind_next() for naming consistency.

Replace walk_stackframe() with unwind()
=======================================

walk_stackframe() contains the unwinder loop that walks the stack
frames. Currently, start_backtrace() and walk_stackframe() are called
separately. They should be combined in the same function. Also, the
loop in walk_stackframe() should be simplified and should look like
the unwind loops in other architectures such as X86 and S390.

Remove walk_stackframe(). Define a new function called "unwind()" in
its place. Define the following unwinder loop:

	unwind_start(&frame, task, fp, pc);
	while (unwind_consume(&frame, consume_entry, cookie))
		unwind_next(&frame);
	return !unwind_failed(&frame);

unwind_start()
	Same as the original start_backtrace().

unwind_consume()
	This is a new function that calls the callback function to
	consume the PC in a stackframe. Do it this way so that checks
	can be performed before and after the callback to determine
	whether the unwind should continue or terminate.

unwind_next()
	Same as the original unwind_frame() except for two things:

		- the stack trace termination check has been moved from
		  here to unwind_consume(). So, unwind_next() is always
		  called on a valid fp.

		- unwind_frame() used to return an error value. This
		  function does not return anything.

unwind_failed()
	Return a boolean to indicate if the stack trace completed
	successfully or failed. arch_stack_walk() ignores the return
	value. But arch_stack_walk_reliable() in the future will look
	at the return value.

Unwind status
=============

Introduce a new flag called "failed" in struct stackframe. unwind_next()
and unwind_consume() will set this flag when an error is encountered and
unwind_consume() will check this flag. This is in keeping with other
architectures.

The failed flags is accessed via the helper unwind_failed().

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/stacktrace.h |   9 +-
 arch/arm64/kernel/stacktrace.c      | 145 ++++++++++++++++++----------
 2 files changed, 99 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index e43dea1c6b41..407007376e97 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -34,6 +34,8 @@ struct stack_info {
  * A snapshot of a frame record or fp/lr register values, along with some
  * accounting information necessary for robust unwinding.
  *
+ * @task:        The task whose stack is being unwound.
+ *
  * @fp:          The fp value in the frame record (or the real fp)
  * @pc:          The lr value in the frame record (or the real lr)
  *
@@ -49,8 +51,11 @@ struct stack_info {
  *
  * @graph:       When FUNCTION_GRAPH_TRACER is selected, holds the index of a
  *               replacement lr value in the ftrace graph stack.
+ *
+ * @failed:      Unwind failed.
  */
 struct stackframe {
+	struct task_struct *task;
 	unsigned long fp;
 	unsigned long pc;
 	DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
@@ -59,6 +64,7 @@ struct stackframe {
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	int graph;
 #endif
+	bool failed;
 };
 
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
@@ -145,7 +151,4 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
 	return false;
 }
 
-void start_backtrace(struct stackframe *frame, unsigned long fp,
-		     unsigned long pc);
-
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 1800310f92be..ec8f5163c4d0 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -32,10 +32,11 @@
  *	add	sp, sp, #0x10
  */
 
-
-void start_backtrace(struct stackframe *frame, unsigned long fp,
-		     unsigned long pc)
+static void notrace unwind_start(struct stackframe *frame,
+				 struct task_struct *task,
+				 unsigned long fp, unsigned long pc)
 {
+	frame->task = task;
 	frame->fp = fp;
 	frame->pc = pc;
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -45,7 +46,7 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
 	/*
 	 * Prime the first unwind.
 	 *
-	 * In unwind_frame() we'll check that the FP points to a valid stack,
+	 * In unwind_next() we'll check that the FP points to a valid stack,
 	 * which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
 	 * treated as a transition to whichever stack that happens to be. The
 	 * prev_fp value won't be used, but we set it to 0 such that it is
@@ -54,8 +55,11 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
 	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
 	frame->prev_fp = 0;
 	frame->prev_type = STACK_TYPE_UNKNOWN;
+	frame->failed = false;
 }
 
+NOKPROBE_SYMBOL(unwind_start);
+
 /*
  * Unwind from one frame record (A) to the next frame record (B).
  *
@@ -63,26 +67,26 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
  * records (e.g. a cycle), determined based on the location and fp value of A
  * and the location (but not the fp value) of B.
  */
-int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
+static void notrace unwind_next(struct stackframe *frame)
 {
 	unsigned long fp = frame->fp;
 	struct stack_info info;
+	struct task_struct *tsk = frame->task;
 
-	if (!tsk)
-		tsk = current;
-
-	/* Final frame; nothing to unwind */
-	if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
-		return -ENOENT;
-
-	if (fp & 0x7)
-		return -EINVAL;
+	if (fp & 0x7) {
+		frame->failed = true;
+		return;
+	}
 
-	if (!on_accessible_stack(tsk, fp, 16, &info))
-		return -EINVAL;
+	if (!on_accessible_stack(tsk, fp, 16, &info)) {
+		frame->failed = true;
+		return;
+	}
 
-	if (test_bit(info.type, frame->stacks_done))
-		return -EINVAL;
+	if (test_bit(info.type, frame->stacks_done)) {
+		frame->failed = true;
+		return;
+	}
 
 	/*
 	 * As stacks grow downward, any valid record on the same stack must be
@@ -98,15 +102,17 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 	 * stack.
 	 */
 	if (info.type == frame->prev_type) {
-		if (fp <= frame->prev_fp)
-			return -EINVAL;
+		if (fp <= frame->prev_fp) {
+			frame->failed = true;
+			return;
+		}
 	} else {
 		set_bit(frame->prev_type, frame->stacks_done);
 	}
 
 	/*
 	 * Record this frame record's values and location. The prev_fp and
-	 * prev_type are only meaningful to the next unwind_frame() invocation.
+	 * prev_type are only meaningful to the next unwind_next() invocation.
 	 */
 	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
 	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
@@ -124,32 +130,18 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 		 * So replace it to an original value.
 		 */
 		ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
-		if (WARN_ON_ONCE(!ret_stack))
-			return -EINVAL;
+		if (WARN_ON_ONCE(!ret_stack)) {
+			frame->failed = true;
+			return;
+		}
 		frame->pc = ret_stack->ret;
 	}
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 	frame->pc = ptrauth_strip_insn_pac(frame->pc);
-
-	return 0;
 }
-NOKPROBE_SYMBOL(unwind_frame);
 
-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
-			     bool (*fn)(void *, unsigned long), void *data)
-{
-	while (1) {
-		int ret;
-
-		if (!fn(data, frame->pc))
-			break;
-		ret = unwind_frame(tsk, frame);
-		if (ret < 0)
-			break;
-	}
-}
-NOKPROBE_SYMBOL(walk_stackframe);
+NOKPROBE_SYMBOL(unwind_next);
 
 static bool dump_backtrace_entry(void *arg, unsigned long where)
 {
@@ -186,25 +178,74 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
 	barrier();
 }
 
+static bool notrace unwind_consume(struct stackframe *frame,
+				   stack_trace_consume_fn consume_entry,
+				   void *cookie)
+{
+	if (frame->failed) {
+		/* PC is suspect. Cannot consume it. */
+		return false;
+	}
+
+	if (!consume_entry(cookie, frame->pc)) {
+		/* Caller terminated the unwind. */
+		frame->failed = true;
+		return false;
+	}
+
+	if (frame->fp == (unsigned long)task_pt_regs(frame->task)->stackframe) {
+		/* Final frame; nothing to unwind */
+		return false;
+	}
+	return true;
+}
+
+NOKPROBE_SYMBOL(unwind_consume);
+
+static inline bool unwind_failed(struct stackframe *frame)
+{
+	return frame->failed;
+}
+
+/* Core unwind function */
+static bool notrace unwind(stack_trace_consume_fn consume_entry, void *cookie,
+			   struct task_struct *task,
+			   unsigned long fp, unsigned long pc)
+{
+	struct stackframe frame;
+
+	unwind_start(&frame, task, fp, pc);
+	while (unwind_consume(&frame, consume_entry, cookie))
+		unwind_next(&frame);
+	return !unwind_failed(&frame);
+}
+
+NOKPROBE_SYMBOL(unwind);
+
 #ifdef CONFIG_STACKTRACE
 
 noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
 			      void *cookie, struct task_struct *task,
 			      struct pt_regs *regs)
 {
-	struct stackframe frame;
+	unsigned long fp, pc;
+
+	if (!task)
+		task = current;
 
-	if (regs)
-		start_backtrace(&frame, regs->regs[29], regs->pc);
-	else if (task == current)
-		start_backtrace(&frame,
-				(unsigned long)__builtin_frame_address(1),
-				(unsigned long)__builtin_return_address(0));
-	else
-		start_backtrace(&frame, thread_saved_fp(task),
-				thread_saved_pc(task));
-
-	walk_stackframe(task, &frame, consume_entry, cookie);
+	if (regs) {
+		fp = regs->regs[29];
+		pc = regs->pc;
+	} else if (task == current) {
+		/* Skip arch_stack_walk() in the stack trace. */
+		fp = (unsigned long)__builtin_frame_address(1);
+		pc = (unsigned long)__builtin_return_address(0);
+	} else {
+		/* Caller guarantees that the task is not running. */
+		fp = thread_saved_fp(task);
+		pc = thread_saved_pc(task);
+	}
+	unwind(consume_entry, cookie, task, fp, pc);
 }
 
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v7 3/4] arm64: Introduce stack trace reliability checks in the unwinder
  2021-08-12 18:35 ` madvenka
  2021-08-12 18:35   ` [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk() madvenka
  2021-08-12 18:35   ` [RFC PATCH v7 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance madvenka
@ 2021-08-12 18:35   ` madvenka
  2021-08-12 18:35   ` [RFC PATCH v7 4/4] arm64: Create a list of SYM_CODE functions, check return PC against list madvenka
  3 siblings, 0 replies; 28+ messages in thread
From: madvenka @ 2021-08-12 18:35 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

There are some kernel features and conditions that make a stack trace
unreliable. Callers may require the unwinder to detect these cases.
E.g., livepatch.

Introduce a new function called unwind_is_reliable() that will detect
these cases and return a boolean.

Introduce a new argument to unwind() called "need_reliable" so a caller
can tell unwind() that it requires a reliable stack trace. For such a
caller, any unreliability in the stack trace must be treated as a fatal
error and the unwind must be aborted.

Call unwind_is_reliable() from unwind_consume() like this:

	if (frame->need_reliable && !unwind_is_reliable(frame)) {
		frame->failed = true;
		return false;
	}

In other words, if the return PC in the stackframe falls in unreliable code,
then it cannot be unwound reliably.

arch_stack_walk() will pass "false" for need_reliable because its callers
don't care about reliability. arch_stack_walk() is used for debug and
test purposes.

Introduce arch_stack_walk_reliable() for ARM64. This works like
arch_stack_walk() except for two things:

	- It passes "true" for need_reliable.

	- It returns -EINVAL if unwind() says that the stack trace is
	  unreliable.

Introduce the first reliability check in unwind_is_reliable() - If
a return PC is not a valid kernel text address, consider the stack
trace unreliable. It could be some generated code.

Other reliability checks will be added in the future. Until all of the
checks are in place, arch_stack_walk_reliable() may not be used by
livepatch. But it may be used by debug and test code.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/stacktrace.h |  4 ++
 arch/arm64/kernel/stacktrace.c      | 63 +++++++++++++++++++++++++++--
 2 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 407007376e97..65ea151da5da 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -53,6 +53,9 @@ struct stack_info {
  *               replacement lr value in the ftrace graph stack.
  *
  * @failed:      Unwind failed.
+ *
+ * @need_reliable The caller needs a reliable stack trace. Treat any
+ *                unreliability as a fatal error.
  */
 struct stackframe {
 	struct task_struct *task;
@@ -65,6 +68,7 @@ struct stackframe {
 	int graph;
 #endif
 	bool failed;
+	bool need_reliable;
 };
 
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index ec8f5163c4d0..b60f8a20ba64 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -34,7 +34,8 @@
 
 static void notrace unwind_start(struct stackframe *frame,
 				 struct task_struct *task,
-				 unsigned long fp, unsigned long pc)
+				 unsigned long fp, unsigned long pc,
+				 bool need_reliable)
 {
 	frame->task = task;
 	frame->fp = fp;
@@ -56,6 +57,7 @@ static void notrace unwind_start(struct stackframe *frame,
 	frame->prev_fp = 0;
 	frame->prev_type = STACK_TYPE_UNKNOWN;
 	frame->failed = false;
+	frame->need_reliable = need_reliable;
 }
 
 NOKPROBE_SYMBOL(unwind_start);
@@ -178,6 +180,23 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
 	barrier();
 }
 
+/*
+ * Check the stack frame for conditions that make further unwinding unreliable.
+ */
+static bool notrace unwind_is_reliable(struct stackframe *frame)
+{
+	/*
+	 * If the PC is not a known kernel text address, then we cannot
+	 * be sure that a subsequent unwind will be reliable, as we
+	 * don't know that the code follows our unwind requirements.
+	 */
+	if (!__kernel_text_address(frame->pc))
+		return false;
+	return true;
+}
+
+NOKPROBE_SYMBOL(unwind_is_reliable);
+
 static bool notrace unwind_consume(struct stackframe *frame,
 				   stack_trace_consume_fn consume_entry,
 				   void *cookie)
@@ -197,6 +216,12 @@ static bool notrace unwind_consume(struct stackframe *frame,
 		/* Final frame; nothing to unwind */
 		return false;
 	}
+
+	if (frame->need_reliable && !unwind_is_reliable(frame)) {
+		/* Cannot unwind to the next frame reliably. */
+		frame->failed = true;
+		return false;
+	}
 	return true;
 }
 
@@ -210,11 +235,12 @@ static inline bool unwind_failed(struct stackframe *frame)
 /* Core unwind function */
 static bool notrace unwind(stack_trace_consume_fn consume_entry, void *cookie,
 			   struct task_struct *task,
-			   unsigned long fp, unsigned long pc)
+			   unsigned long fp, unsigned long pc,
+			   bool need_reliable)
 {
 	struct stackframe frame;
 
-	unwind_start(&frame, task, fp, pc);
+	unwind_start(&frame, task, fp, pc, need_reliable);
 	while (unwind_consume(&frame, consume_entry, cookie))
 		unwind_next(&frame);
 	return !unwind_failed(&frame);
@@ -245,7 +271,36 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
 		fp = thread_saved_fp(task);
 		pc = thread_saved_pc(task);
 	}
-	unwind(consume_entry, cookie, task, fp, pc);
+	unwind(consume_entry, cookie, task, fp, pc, false);
+}
+
+/*
+ * arch_stack_walk_reliable() may not be used for livepatch until all of
+ * the reliability checks are in place in unwind_consume(). However,
+ * debug and test code can choose to use it even if all the checks are not
+ * in place.
+ */
+noinline int notrace arch_stack_walk_reliable(stack_trace_consume_fn consume_fn,
+					      void *cookie,
+					      struct task_struct *task)
+{
+	unsigned long fp, pc;
+
+	if (!task)
+		task = current;
+
+	if (task == current) {
+		/* Skip arch_stack_walk_reliable() in the stack trace. */
+		fp = (unsigned long)__builtin_frame_address(1);
+		pc = (unsigned long)__builtin_return_address(0);
+	} else {
+		/* Caller guarantees that the task is not running. */
+		fp = thread_saved_fp(task);
+		pc = thread_saved_pc(task);
+	}
+	if (unwind(consume_fn, cookie, task, fp, pc, true))
+		return 0;
+	return -EINVAL;
 }
 
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v7 4/4] arm64: Create a list of SYM_CODE functions, check return PC against list
  2021-08-12 18:35 ` madvenka
                     ` (2 preceding siblings ...)
  2021-08-12 18:35   ` [RFC PATCH v7 3/4] arm64: Introduce stack trace reliability checks in the unwinder madvenka
@ 2021-08-12 18:35   ` madvenka
  3 siblings, 0 replies; 28+ messages in thread
From: madvenka @ 2021-08-12 18:35 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

SYM_CODE functions don't follow the usual calling conventions. Check if the
return PC in a stack frame falls in any of these. If it does, consider the
stack trace unreliable.

Define a special section for unreliable functions
=================================================

Define a SYM_CODE_END() macro for arm64 that adds the function address
range to a new section called "sym_code_functions".

Linker file
===========

Include the "sym_code_functions" section under read-only data in
vmlinux.lds.S.

Initialization
==============

Define an early_initcall() to create a sym_code_functions[] array from
the linker data.

Unwinder check
==============

Add a reliability check in unwind_is_reliable() that compares a return
PC with sym_code_functions[]. If there is a match, then return failure.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/linkage.h  | 12 +++++++
 arch/arm64/include/asm/sections.h |  1 +
 arch/arm64/kernel/stacktrace.c    | 53 +++++++++++++++++++++++++++++++
 arch/arm64/kernel/vmlinux.lds.S   | 10 ++++++
 4 files changed, 76 insertions(+)

diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
index 9906541a6861..616bad74e297 100644
--- a/arch/arm64/include/asm/linkage.h
+++ b/arch/arm64/include/asm/linkage.h
@@ -68,4 +68,16 @@
 		SYM_FUNC_END_ALIAS(x);		\
 		SYM_FUNC_END_ALIAS(__pi_##x)
 
+/*
+ * Record the address range of each SYM_CODE function in a struct code_range
+ * in a special section.
+ */
+#define SYM_CODE_END(name)				\
+	SYM_END(name, SYM_T_NONE)			;\
+	99:						;\
+	.pushsection "sym_code_functions", "aw"		;\
+	.quad	name					;\
+	.quad	99b					;\
+	.popsection
+
 #endif
diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index e4ad9db53af1..c84c71063d6e 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -21,5 +21,6 @@ extern char __exittext_begin[], __exittext_end[];
 extern char __irqentry_text_start[], __irqentry_text_end[];
 extern char __mmuoff_data_start[], __mmuoff_data_end[];
 extern char __entry_tramp_text_start[], __entry_tramp_text_end[];
+extern char __sym_code_functions_start[], __sym_code_functions_end[];
 
 #endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index b60f8a20ba64..26dbdd4fff77 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,6 +18,31 @@
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+struct code_range {
+	unsigned long	start;
+	unsigned long	end;
+};
+
+static struct code_range	*sym_code_functions;
+static int			num_sym_code_functions;
+
+int __init init_sym_code_functions(void)
+{
+	size_t size = (unsigned long)__sym_code_functions_end -
+		      (unsigned long)__sym_code_functions_start;
+
+	sym_code_functions = (struct code_range *)__sym_code_functions_start;
+	/*
+	 * Order it so that sym_code_functions is not visible before
+	 * num_sym_code_functions.
+	 */
+	smp_mb();
+	num_sym_code_functions = size / sizeof(struct code_range);
+
+	return 0;
+}
+early_initcall(init_sym_code_functions);
+
 /*
  * AArch64 PCS assigns the frame pointer to x29.
  *
@@ -185,6 +210,10 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
  */
 static bool notrace unwind_is_reliable(struct stackframe *frame)
 {
+	const struct code_range *range;
+	unsigned long pc;
+	int i;
+
 	/*
 	 * If the PC is not a known kernel text address, then we cannot
 	 * be sure that a subsequent unwind will be reliable, as we
@@ -192,6 +221,30 @@ static bool notrace unwind_is_reliable(struct stackframe *frame)
 	 */
 	if (!__kernel_text_address(frame->pc))
 		return false;
+
+	/*
+	 * Check the return PC against sym_code_functions[]. If there is a
+	 * match, then the consider the stack frame unreliable.
+	 *
+	 * As SYM_CODE functions don't follow the usual calling conventions,
+	 * we assume by default that any SYM_CODE function cannot be unwound
+	 * reliably.
+	 *
+	 * Note that this includes:
+	 *
+	 * - Exception handlers and entry assembly
+	 * - Trampoline assembly (e.g., ftrace, kprobes)
+	 * - Hypervisor-related assembly
+	 * - Hibernation-related assembly
+	 * - CPU start-stop, suspend-resume assembly
+	 * - Kernel relocation assembly
+	 */
+	pc = frame->pc;
+	for (i = 0; i < num_sym_code_functions; i++) {
+		range = &sym_code_functions[i];
+		if (pc >= range->start && pc < range->end)
+			return false;
+	}
 	return true;
 }
 
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 709d2c433c5e..2bf769f45b54 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -111,6 +111,14 @@ jiffies = jiffies_64;
 #define TRAMP_TEXT
 #endif
 
+#define SYM_CODE_FUNCTIONS				\
+	. = ALIGN(16);					\
+	.symcode : AT(ADDR(.symcode) - LOAD_OFFSET) {	\
+		__sym_code_functions_start = .;		\
+		KEEP(*(sym_code_functions))		\
+		__sym_code_functions_end = .;		\
+	}
+
 /*
  * The size of the PE/COFF section that covers the kernel image, which
  * runs from _stext to _edata, must be a round multiple of the PE/COFF
@@ -196,6 +204,8 @@ SECTIONS
 	swapper_pg_dir = .;
 	. += PAGE_SIZE;
 
+	SYM_CODE_FUNCTIONS
+
 	. = ALIGN(SEGMENT_ALIGN);
 	__init_begin = .;
 	__inittext_begin = .;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement stack trace reliability checks
  2021-08-12 18:31   ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement stack trace reliability checks Madhavan T. Venkataraman
@ 2021-08-12 18:45     ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 28+ messages in thread
From: Madhavan T. Venkataraman @ 2021-08-12 18:45 UTC (permalink / raw)
  To: mark.rutland, broonie, jpoimboe, ardb, nobuta.keiya,
	sjitindarsingh, catalin.marinas, will, jmorris, pasha.tatashin,
	jthierry, linux-arm-kernel, live-patching, linux-kernel

My mailer is screwing up.

I will resend the whole series as version 8 instead of version 7 to avoid further
confusion.

Thunderbird, sometimes! Again, I am so sorry.

Madhavan

On 8/12/21 1:31 PM, Madhavan T. Venkataraman wrote:
> The messages are not threaded properly.
> 
> I will resend the whole series with proper threading.
> 
> I apologize.
> 
> Madhavan
> 
> On 8/12/21 8:24 AM, madvenka@linux.microsoft.com wrote:
>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>
>> Make all stack walking functions use arch_stack_walk()
>> ======================================================
>>
>> Currently, there are multiple functions in ARM64 code that walk the
>> stack using start_backtrace() and unwind_frame(). Convert all of
>> them to use arch_stack_walk(). This makes maintenance easier.
>>
>> Reorganize the unwinder code for better consistency and maintenance
>> ===================================================================
>>
>> Rename unwinder functions to unwind_*() similar to other architectures
>> for naming consistency.
>>
>> Annotate all of the unwind_*() functions with notrace so they cannot be
>> ftraced and NOKPROBE_SYMBOL() so they cannot be kprobed. Ftrace and Kprobe
>> code can call the unwinder.
>>
>> Redefine the unwinder loop and make it similar to other architectures.
>> Define the following:
>>
>> 	unwind_start(&frame, task, fp, pc);
>> 	while (unwind_consume(&frame, consume_entry, cookie))
>> 		unwind_next(&frame);
>> 	return !unwind_failed(&frame);
>>
>> unwind_start()
>> 	Same as the original start_backtrace().
>>
>> unwind_consume()
>> 	This new function does two things:
>>
>> 	- Calls consume_entry() to consume the return PC.
>>
>> 	- Implements checks to determine whether the unwind should continue
>> 	  or terminate.
>>
>> unwind_next()
>> 	Same as the original unwind_frame() except:
>>
>> 	- the stack trace termination check has been moved from here to
>> 	  unwind_consume(). So, unwind_next() assumes that the fp is valid.
>>
>> 	- unwind_frame() used to return an error value. This function only
>> 	  sets internal state and does not return anything. The state is
>> 	  retrieved via a helper. See next.
>>
>> unwind_failed()
>> 	Return a boolean to indicate whether the stack trace completed
>> 	successfully or failed. arch_stack_walk() ignores the return
>> 	value. But arch_stack_walk_reliable() in the future will look
>> 	at the return value.
>>
>> Unwind status
>> 	Introduce a new flag called "failed" in struct stackframe. Set this
>> 	flag when an error is encountered. If this flag is set, terminate
>> 	the unwind. Also, let the unwinder return the status to the caller.
>>
>> Reliability checks
>> ==================
>>
>> There are some kernel features and conditions that make a stack trace
>> unreliable. Callers may require the unwinder to detect these cases.
>> E.g., livepatch.
>>
>> Introduce a new function called unwind_is_reliable() that will detect
>> these cases and return a boolean.
>>
>> Introduce a new argument to unwind() called "need_reliable" so a caller
>> can tell unwind() that it requires a reliable stack trace. For such a
>> caller, any unreliability in the stack trace must be treated as a fatal
>> error and the unwind must be aborted.
>>
>> Call unwind_is_reliable() from unwind_consume() like this:
>>
>> 	if (frame->need_reliable && !unwind_is_reliable(frame)) {
>> 		frame->failed = true;
>> 		return false;
>> 	}
>>
>> arch_stack_walk() passes "false" for need_reliable because its callers
>> don't care about reliability. arch_stack_walk() is used for debug and
>> test purposes.
>>
>> Introduce arch_stack_walk_reliable() for ARM64. This works like
>> arch_stack_walk() except for two things:
>>
>> 	- It passes "true" for need_reliable.
>>
>> 	- It returns -EINVAL if unwind() aborts.
>>
>> Introduce the first reliability check in unwind_is_reliable() - If
>> a return PC is not a valid kernel text address, consider the stack
>> trace unreliable. It could be some generated code.
>>
>> Other reliability checks will be added in the future. Until all of the
>> checks are in place, arch_stack_walk_reliable() may not be used by
>> livepatch. But it may be used by debug and test code.
>>
>> SYM_CODE check
>> ==============
>>
>> SYM_CODE functions do not follow normal calling conventions. They cannot
>> be unwound reliably using the frame pointer. Collect the address ranges
>> of these functions in a special section called "sym_code_functions".
>>
>> In unwind_is_reliable(), check the return PC against these ranges. If a
>> match is found, then consider the stack trace unreliable. This is the
>> second reliability check introduced by this work.
>>
>> Last stack frame
>> ----------------
>>
>> If a SYM_CODE function occurs in the very last frame in the stack trace,
>> then the stack trace is not considered unreliable. This is because there
>> is no more unwinding to do. Examples:
>>
>> 	- EL0 exception stack traces end in the top level EL0 exception
>> 	  handlers.
>>
>> 	- All kernel thread stack traces end in ret_from_fork().
>> ---
>> Changelog:
>>
>> v7:
>> 	From Mark Rutland:
>>
>> 	- Make the unwinder loop similar to other architectures.
>>
>> 	- Keep details to within the unwinder functions and return a simple
>> 	  boolean to the caller.
>>
>> 	- Convert some of the current code that contains unwinder logic to
>> 	  simply use arch_stack_walk(). I have converted all of them.
>>
>> 	- Do not copy sym_code_functions[]. Just place it in rodata for now.
>>
>> 	- Have the main loop check for termination conditions rather than
>> 	  having unwind_frame() check for them. In other words, let
>> 	  unwind_frame() assume that the fp is valid.
>>
>> 	- Replace the big comment for SYM_CODE functions with a shorter
>> 	  comment.
>>
>> 		/*
>> 		 * As SYM_CODE functions don't follow the usual calling
>> 		 * conventions, we assume by default that any SYM_CODE function
>> 		 * cannot be unwound reliably.
>> 		 *
>> 		 * Note that this includes:
>> 		 *
>> 		 * - Exception handlers and entry assembly
>> 		 * - Trampoline assembly (e.g., ftrace, kprobes)
>> 		 * - Hypervisor-related assembly
>> 		 * - Hibernation-related assembly
>> 		 * - CPU start-stop, suspend-resume assembly
>> 		 * - Kernel relocation assembly
>> 		 */
>>
>> v6:
>> 	From Mark Rutland:
>>
>> 	- The per-frame reliability concept and flag are acceptable. But more
>> 	  work is needed to make the per-frame checks more accurate and more
>> 	  complete. E.g., some code reorg is being worked on that will help.
>>
>> 	  I have now removed the frame->reliable flag and deleted the whole
>> 	  concept of per-frame status. This is orthogonal to this patch series.
>> 	  Instead, I have improved the unwinder to return proper return codes
>> 	  so a caller can take appropriate action without needing per-frame
>> 	  status.
>>
>> 	- Remove the mention of PLTs and update the comment.
>>
>> 	  I have replaced the comment above the call to __kernel_text_address()
>> 	  with the comment suggested by Mark Rutland.
>>
>> 	Other comments:
>>
>> 	- Other comments on the per-frame stuff are not relevant because
>> 	  that approach is not there anymore.
>>
>> v5:
>> 	From Keiya Nobuta:
>> 	
>> 	- The term blacklist(ed) is not to be used anymore. I have changed it
>> 	  to unreliable. So, the function unwinder_blacklisted() has been
>> 	  changed to unwinder_is_unreliable().
>>
>> 	From Mark Brown:
>>
>> 	- Add a comment for the "reliable" flag in struct stackframe. The
>> 	  reliability attribute is not complete until all the checks are
>> 	  in place. Added a comment above struct stackframe.
>>
>> 	- Include some of the comments in the cover letter in the actual
>> 	  code so that we can compare it with the reliable stack trace
>> 	  requirements document for completeness. I have added a comment:
>>
>> 	  	- above unwinder_is_unreliable() that lists the requirements
>> 		  that are addressed by the function.
>>
>> 		- above the __kernel_text_address() call about all the cases
>> 		  the call covers.
>>
>> v4:
>> 	From Mark Brown:
>>
>> 	- I was checking the return PC with __kernel_text_address() before
>> 	  the Function Graph trace handling. Mark Brown felt that all the
>> 	  reliability checks should be performed on the original return PC
>> 	  once that is obtained. So, I have moved all the reliability checks
>> 	  to after the Function Graph Trace handling code in the unwinder.
>> 	  Basically, the unwinder should perform PC translations first (for
>> 	  rhe return trampoline for Function Graph Tracing, Kretprobes, etc).
>> 	  Then, the reliability checks should be applied to the resulting
>> 	  PC.
>>
>> 	- Mark said to improve the naming of the new functions so they don't
>> 	  collide with existing ones. I have used a prefix "unwinder_" for
>> 	  all the new functions.
>>
>> 	From Josh Poimboeuf:
>>
>> 	- In the error scenarios in the unwinder, the reliable flag in the
>> 	  stack frame should be set. Implemented this.
>>
>> 	- Some of the other comments are not relevant to the new code as
>> 	  I have taken a different approach in the new code. That is why
>> 	  I have not made those changes. E.g., Ard wanted me to add the
>> 	  "const" keyword to the global section array. That array does not
>> 	  exist in v4. Similarly, Mark Brown said to use ARRAY_SIZE() for
>> 	  the same array in a for loop.
>>
>> 	Other changes:
>>
>> 	- Add a new definition for SYM_CODE_END() that adds the address
>> 	  range of the function to a special section called
>> 	  "sym_code_functions".
>>
>> 	- Include the new section under initdata in vmlinux.lds.S.
>>
>> 	- Define an early_initcall() to copy the contents of the
>> 	  "sym_code_functions" section to an array by the same name.
>>
>> 	- Define a function unwinder_blacklisted() that compares a return
>> 	  PC against sym_code_sections[]. If there is a match, mark the
>> 	  stack trace unreliable. Call this from unwind_frame().
>>
>> v3:
>> 	- Implemented a sym_code_ranges[] array to contains sections bounds
>> 	  for text sections that contain SYM_CODE_*() functions. The unwinder
>> 	  checks each return PC against the sections. If it falls in any of
>> 	  the sections, the stack trace is marked unreliable.
>>
>> 	- Moved SYM_CODE functions from .text and .init.text into a new
>> 	  text section called ".code.text". Added this section to
>> 	  vmlinux.lds.S and sym_code_ranges[].
>>
>> 	- Fixed the logic in the unwinder that handles Function Graph
>> 	  Tracer return trampoline.
>>
>> 	- Removed all the previous code that handles:
>> 		- ftrace entry code for traced function
>> 		- special_functions[] array that lists individual functions
>> 		- kretprobe_trampoline() special case
>>
>> v2
>> 	- Removed the terminating entry { 0, 0 } in special_functions[]
>> 	  and replaced it with the idiom { /* sentinel */ }.
>>
>> 	- Change the ftrace trampoline entry ftrace_graph_call in
>> 	  special_functions[] to ftrace_call + 4 and added explanatory
>> 	  comments.
>>
>> 	- Unnested #ifdefs in special_functions[] for FTRACE.
>>
>> v1
>> 	- Define a bool field in struct stackframe. This will indicate if
>> 	  a stack trace is reliable.
>>
>> 	- Implement a special_functions[] array that will be populated
>> 	  with special functions in which the stack trace is considered
>> 	  unreliable.
>> 	
>> 	- Using kallsyms_lookup(), get the address ranges for the special
>> 	  functions and record them.
>>
>> 	- Implement an is_reliable_function(pc). This function will check
>> 	  if a given return PC falls in any of the special functions. If
>> 	  it does, the stack trace is unreliable.
>>
>> 	- Implement check_reliability() function that will check if a
>> 	  stack frame is reliable. Call is_reliable_function() from
>> 	  check_reliability().
>>
>> 	- Before a return PC is checked against special_funtions[], it
>> 	  must be validates as a proper kernel text address. Call
>> 	  __kernel_text_address() from check_reliability().
>>
>> 	- Finally, call check_reliability() from unwind_frame() for
>> 	  each stack frame.
>>
>> 	- Add EL1 exception handlers to special_functions[].
>>
>> 		el1_sync();
>> 		el1_irq();
>> 		el1_error();
>> 		el1_sync_invalid();
>> 		el1_irq_invalid();
>> 		el1_fiq_invalid();
>> 		el1_error_invalid();
>>
>> 	- The above functions are currently defined as LOCAL symbols.
>> 	  Make them global so that they can be referenced from the
>> 	  unwinder code.
>>
>> 	- Add FTRACE trampolines to special_functions[]:
>>
>> 		ftrace_graph_call()
>> 		ftrace_graph_caller()
>> 		return_to_handler()
>>
>> 	- Add the kretprobe trampoline to special functions[]:
>>
>> 		kretprobe_trampoline()
>>
>> Previous versions and discussion
>> ================================
>>
>> v6: https://lore.kernel.org/linux-arm-kernel/20210630223356.58714-1-madvenka@linux.microsoft.com/
>> v5: https://lore.kernel.org/linux-arm-kernel/20210526214917.20099-1-madvenka@linux.microsoft.com/
>> v4: https://lore.kernel.org/linux-arm-kernel/20210516040018.128105-1-madvenka@linux.microsoft.com/
>> v3: https://lore.kernel.org/linux-arm-kernel/20210503173615.21576-1-madvenka@linux.microsoft.com/
>> v2: https://lore.kernel.org/linux-arm-kernel/20210405204313.21346-1-madvenka@linux.microsoft.com/
>> v1: https://lore.kernel.org/linux-arm-kernel/20210330190955.13707-1-madvenka@linux.microsoft.com/
>> Madhavan T. Venkataraman (4):
>>   arm64: Make all stack walking functions use arch_stack_walk()
>>   arm64: Reorganize the unwinder code for better consistency and
>>     maintenance
>>   arm64: Introduce stack trace reliability checks in the unwinder
>>   arm64: Create a list of SYM_CODE functions, check return PC against
>>     list
>>
>>  arch/arm64/include/asm/linkage.h    |  12 ++
>>  arch/arm64/include/asm/sections.h   |   1 +
>>  arch/arm64/include/asm/stacktrace.h |  16 +-
>>  arch/arm64/kernel/perf_callchain.c  |   5 +-
>>  arch/arm64/kernel/process.c         |  39 ++--
>>  arch/arm64/kernel/return_address.c  |   6 +-
>>  arch/arm64/kernel/stacktrace.c      | 291 ++++++++++++++++++++--------
>>  arch/arm64/kernel/time.c            |  22 ++-
>>  arch/arm64/kernel/vmlinux.lds.S     |  10 +
>>  9 files changed, 277 insertions(+), 125 deletions(-)
>>
>>
>> base-commit: 36a21d51725af2ce0700c6ebcb6b9594aac658a6
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-08-12 18:45 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3f2aab69a35c243c5e97f47c4ad84046355f5b90>
2021-06-30 22:33 ` [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks madvenka
2021-06-30 22:33   ` [RFC PATCH v6 1/3] arm64: Improve the unwinder return value madvenka
2021-07-28 16:56     ` Mark Rutland
2021-07-29 13:54       ` Madhavan T. Venkataraman
2021-06-30 22:33   ` [RFC PATCH v6 2/3] arm64: Introduce stack trace reliability checks in the unwinder madvenka
2021-06-30 22:33   ` [RFC PATCH v6 3/3] arm64: Create a list of SYM_CODE functions, check return PC against list madvenka
2021-07-28 17:25     ` Mark Rutland
2021-07-29 14:06       ` Madhavan T. Venkataraman
2021-07-29 14:52         ` Mark Brown
2021-07-29 17:07           ` Madhavan T. Venkataraman
2021-07-29 15:48         ` Mark Rutland
2021-07-29 16:27           ` Mark Brown
2021-07-29 17:09           ` Madhavan T. Venkataraman
2021-07-26 13:49   ` [RFC PATCH v6 0/3] arm64: Implement stack trace reliability checks Madhavan T. Venkataraman
2021-08-12 13:24 ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement " madvenka
2021-08-12 13:24   ` [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk() madvenka
2021-08-12 15:23     ` Mark Brown
2021-08-12 16:30       ` Madhavan T. Venkataraman
2021-08-12 13:24   ` [RFC PATCH v7 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance madvenka
2021-08-12 13:24   ` [RFC PATCH v7 3/4] arm64: Introduce stack trace reliability checks in the unwinder madvenka
2021-08-12 13:24   ` [RFC PATCH v7 4/4] arm64: Create a list of SYM_CODE functions, check return PC against list madvenka
2021-08-12 18:31   ` [RFC PATCH v7 0/4] arm64: Reorganize the unwinder and implement stack trace reliability checks Madhavan T. Venkataraman
2021-08-12 18:45     ` Madhavan T. Venkataraman
2021-08-12 18:35 ` madvenka
2021-08-12 18:35   ` [RFC PATCH v7 1/4] arm64: Make all stack walking functions use arch_stack_walk() madvenka
2021-08-12 18:35   ` [RFC PATCH v7 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance madvenka
2021-08-12 18:35   ` [RFC PATCH v7 3/4] arm64: Introduce stack trace reliability checks in the unwinder madvenka
2021-08-12 18:35   ` [RFC PATCH v7 4/4] arm64: Create a list of SYM_CODE functions, check return PC against list madvenka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).