live-patching.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Documentation: livepatch: document reliable stacktrace
@ 2021-01-13 16:57 Mark Brown
  2021-01-13 19:33 ` Josh Poimboeuf
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Brown @ 2021-01-13 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Rutland, Jiri Kosina, Joe Lawrence, Jonathan Corbet,
	Josh Poimboeuf, Mark Brown, Miroslav Benes, Petr Mladek,
	linux-doc, live-patching

From: Mark Rutland <mark.rutland@arm.com>

Add documentation for reliable stacktrace. This is intended to describe
the semantics and to be an aid for implementing architecture support for
HAVE_RELIABLE_STACKTRACE.

Unwinding is a subtle area, and architectures vary greatly in both
implementation and the set of concerns that affect them, so I've tried
to avoid making this too specific to any given architecture. I've used
examples from both x86_64 and arm64 to explain corner cases in more
detail, but I've tried to keep the descriptions sufficient for those who
are unfamiliar with the particular architecture.

I've tried to give rationale for all the recommendations/requirements,
since that makes it easier to spot nearby issues, or when a check
happens to catch a few things at once. I believe what I have written is
sound, but as some of this was reverse-engineered I may have missed
things worth noting.

I've made a few assumptions about preferred behaviour, notably:

* If you can reliably unwind through exceptions, you should (as x86_64
  does).

* It's fine to omit ftrace_return_to_handler and other return
  trampolines so long as these are not subject to patching and the
  original return address is reported. Most architectures do this for
  ftrace_return_handler, but not other return trampolines.

* For cases where link register unreliability could result in duplicate
  entries in the trace or an inverted trace, I've assumed this should be
  treated as unreliable. This specific case shouldn't matter to
  livepatching, but I assume that that we want a reliable trace to have
  the correct order.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: linux-doc@vgert.kernel.org
Cc: live-patching@vger.kernel.org
[Updates following review -- broonie]
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/livepatch/index.rst             |   1 +
 .../livepatch/reliable-stacktrace.rst         | 304 ++++++++++++++++++
 2 files changed, 305 insertions(+)
 create mode 100644 Documentation/livepatch/reliable-stacktrace.rst

diff --git a/Documentation/livepatch/index.rst b/Documentation/livepatch/index.rst
index 525944063be7..43cce5fad705 100644
--- a/Documentation/livepatch/index.rst
+++ b/Documentation/livepatch/index.rst
@@ -13,6 +13,7 @@ Kernel Livepatching
     module-elf-format
     shadow-vars
     system-state
+    reliable-stacktrace
 
 .. only::  subproject and html
 
diff --git a/Documentation/livepatch/reliable-stacktrace.rst b/Documentation/livepatch/reliable-stacktrace.rst
new file mode 100644
index 000000000000..a72f26344544
--- /dev/null
+++ b/Documentation/livepatch/reliable-stacktrace.rst
@@ -0,0 +1,304 @@
+===================
+Reliable Stacktrace
+===================
+
+This document outlines basic information about reliable stacktracing.
+
+.. Table of Contents:
+
+    1. Introduction
+    2. Requirements
+    3. Considerations
+       3.1 Identifying successful termination
+       3.2 Identifying unwindable code
+       3.3 Unwinding across interrupts and exceptions
+       3.4 Rewriting of return addresses
+       3.5 Obscuring of return addresses
+       3.6 Link register unreliability
+
+1. Introduction
+===============
+
+The kernel livepatch consistency model relies on accurately identifying which
+functions may have live state and therefore may not be safe to patch. One way
+to identify which functions are live is to use a stacktrace.
+
+Existing stacktrace code may not always give an accurate picture of all
+functions with live state, and best-effort approaches which can be helpful for
+debugging are unsound for livepatching. Livepatching depends on architectures
+to provide a *reliable* stacktrace which ensures it never omits any live
+functions from a trace.
+
+
+2. Requirements
+===============
+
+Architectures must implement one of the reliable stacktrace functions.
+Architectures using CONFIG_ARCH_STACKWALK must implement
+'arch_stack_walk_reliable', and other architectures must implement
+'save_stack_trace_tsk_reliable'.
+
+Principally, the reliable stacktrace function must ensure that either:
+
+* The trace includes all functions that the task may be returned to, and the
+  return code is zero to indicate that the trace is reliable.
+
+* The return code is non-zero to indicate that the trace is not reliable.
+
+.. note::
+   In some cases it is legitimate to omit specific functions from the trace,
+   but all other functions must be reported. These cases are described in
+   futher detail below.
+
+Secondly, the reliable stacktrace function must be robust to cases where
+the stack or other unwind state is corrupt or otherwise unreliable. The
+function should attempt to detect such cases and return a non-zero error
+code, and should not get stuck in an infinite loop or access memory in
+an unsafe way.  Specific cases are described in further detail below.
+
+
+3. Considerations
+=================
+
+The unwinding process varies across architectures, their respective procedure
+call standards, and kernel configurations. This section describes common
+details that architectures should consider.
+
+3.1 Identifying successful termination
+--------------------------------------
+
+Unwinding may terminate early for a number of reasons, including:
+
+* Stack or frame pointer corruption.
+
+* Missing unwind support for an uncommon scenario, or a bug in the unwinder.
+
+* Dynamically generated code (e.g. eBPF) or foreign code (e.g. EFI runtime
+  services) not following the conventions expected by the unwinder.
+
+To ensure that this does not result in functions being omitted from the trace,
+even if not caught by other checks, it is strongly recommended that
+architectures verify that a stacktrace ends at an expected location, e.g.
+
+* Within a specific function that is an entry point to the kernel.
+
+* At a specific location on a stack expected for a kernel entry point.
+
+* On a specific stack expected for a kernel entry point (e.g. if the
+  architecture has separate task and IRQ stacks).
+
+3.2 Identifying unwindable code
+-------------------------------
+
+Unwinding typically relies on code following specific conventions (e.g.
+manipulating a frame pointer), but there can be code which may not follow these
+conventions and may require special handling in the unwinder, e.g.
+
+* Exception vectors and entry assembly.
+
+* Procedure Linkage Table (PLT) entries and veneer functions.
+
+* Trampoline assembly (e.g. ftrace, kprobes).
+
+* Dynamically generated code (e.g. eBPF, optprobe trampolines).
+
+* Foreign code (e.g. EFI runtime services).
+
+To ensure that such cases do not result in functions being omitted from a
+trace, it is strongly recommended that architectures positively identify code
+which is known to be reliable to unwind from, and reject unwinding from all
+other code.
+
+Kernel code including modules and eBPF can be distinguished from foreign code
+using '__kernel_text_address()'. Checking for this also helps to detect stack
+corruption.
+
+There are several ways an architecture may identify kernel code which is deemed
+unreliable to unwind from, e.g.
+
+* Using metadata created by objtool, with such code annotated with
+  SYM_CODE_{START,END} or STACKFRAME_NON_STANDARD().
+
+* Placing such code into special linker sections, and rejecting unwinding from
+  any code in these sections.
+
+* Identifying specific portions of code using bounds information.
+
+3.3 Unwinding across interrupts and exceptions
+----------------------------------------------
+
+At function call boundaries the stack and other unwind state is expected to be
+in a consistent state suitable for reliable unwinding, but this may not be the
+case part-way through a function. For example, during a function prologue or
+epilogue a frame pointer may be transiently invalid, or during the function
+body the return address may be held in an arbitrary general purpose register.
+For some architectures this may change at runtime as a result of dynamic
+instrumentation.
+
+If an interrupt or other exception is taken while the stack or other unwind
+state is in an inconsistent state, it may not be possible to reliably unwind,
+and it may not be possible to identify whether such unwinding will be reliable.
+See below for examples.
+
+Architectures which cannot identify when it is reliable to unwind such cases
+(or where it is never reliable) must reject unwinding across exception
+boundaries. Note that it may be reliable to unwind across certain
+exceptions (e.g. IRQ) but unreliable to unwind across other exceptions
+(e.g. NMI).
+
+Architectures which can identify when it is reliable to unwind such cases (or
+have no such cases) should attempt to unwind across exception boundaries, as
+doing so can prevent unnecessarily stalling livepatch consistency checks and
+permits livepatch transitions to complete more quickly.
+
+3.4 Rewriting of return addresses
+---------------------------------
+
+Some trampolines temporarily modify the return address of a function in order
+to intercept when that function returns with a return trampoline, e.g.
+
+* An ftrace trampoline may modify the return address so that function graph
+  tracing can intercept returns.
+
+* A kprobes (or optprobes) trampoline may modify the return address so that
+  kretprobes can intercept returns.
+
+When this happens, the original return address will not be in its usual
+location. For trampolines which are not subject to live patching, where an
+unwinder can reliably determine the original return address and no unwind state
+is altered by the trampoline, the unwinder may report the original return
+address in place of the trampoline and report this as reliable. Otherwise, an
+unwinder must report these cases as unreliable.
+
+Special care is required when identifying the original return address, as this
+information is not in a consistent location for the duration of the entry
+trampoline or return trampoline. For example, considering the x86_64
+'return_to_handler' return trampoline:
+
+.. code-block:: none
+
+   SYM_CODE_START(return_to_handler)
+           UNWIND_HINT_EMPTY
+           subq  $24, %rsp
+
+           /* Save the return values */
+           movq %rax, (%rsp)
+           movq %rdx, 8(%rsp)
+           movq %rbp, %rdi
+
+           call ftrace_return_to_handler
+
+           movq %rax, %rdi
+           movq 8(%rsp), %rdx
+           movq (%rsp), %rax
+           addq $24, %rsp
+           JMP_NOSPEC rdi
+   SYM_CODE_END(return_to_handler)
+
+While the traced function runs its return address points on the stack points to
+the start of return_to_handler, and the original return address is stored in
+the task's cur_ret_stack. During this time the unwinder can find the return
+address using ftrace_graph_ret_addr().
+
+When the traced function returns to return_to_handler, there is no longer a
+return address on the stack, though the original return address is still stored
+in the task's cur_ret_stack. Within ftrace_return_to_handler(), the original
+return address is removed from cur_ret_stack and is transiently moved
+arbitrarily by the compiler before being returned in rax. The return_to_handler
+trampoline moves this into rdi before jumping to it.
+
+Architectures might not always be able to unwind such sequences, such as when
+ftrace_return_to_handler() has removed the address from cur_ret_stack, and the
+location of the return address cannot be reliably determined.
+
+It is recommended that architectures unwind cases where return_to_handler has
+not yet been returned to, but architectures are not required to unwind from the
+middle of return_to_handler and can report this as unreliable. Architectures
+are not required to unwind from other trampolines which modify the return
+address.
+
+3.5 Obscuring of return addresses
+---------------------------------
+
+Some trampolines do not rewrite the return address in order to intercept
+returns, but do transiently clobber the return address or other unwind state.
+
+For example, the x86_64 implementation of optprobes patches the probed function
+with a JMP instruction which targets the associated optprobe trampoline. When
+the probe is hit, the CPU will branch to the optprobe trampoline, and the
+address of the probed function is not held in any register or on the stack.
+
+Similarly, the arm64 implementation of DYNAMIC_FTRACE_WITH_REGS patches traced
+functions with the following:
+
+.. code-block:: none
+
+   MOV X9, X30
+   BL <trampoline>
+
+The MOV saves the link register (X30) into X9 to preserve the return address
+before the BL clobbers the link register and branches to the trampoline. At the
+start of the trampoline, the address of the traced function is in X9 rather
+than the link register as would usually be the case.
+
+Architectures must either ensure that unwinders either reliably unwind
+such cases, or report the unwinding as unreliable.
+
+3.6 Link register unreliability
+-------------------------------
+
+On some other architectures, 'call' instructions place the return address into a
+link register, and 'return' instructions consume the return address from the
+link register without modifying the register. On these architectures software
+must save the return address to the stack prior to making a function call. Over
+the duration of a function call, the return address may be held in the link
+register alone, on the stack alone, or in both locations.
+
+Unwinders typically assume the link register is always live, but this
+assumption can lead to unreliable stack traces. For example, consider the
+following arm64 assembly for a simple function:
+
+.. code-block:: none
+
+   function:
+           STP X29, X30, [SP, -16]!
+           MOV X29, SP
+           BL <other_function>
+           LDP X29, X30, [SP], #16
+           RET
+
+At entry to the function, the link register (x30) points to the caller, and the
+frame pointer (X29) points to the caller's frame including the caller's return
+address. The first two instructions create a new stackframe and update the
+frame pointer, and at this point the link register and the frame pointer both
+describe this function's return address. A trace at this point may describe
+this function twice, and if the function return is being traced, the unwinder
+may consume two entries from the fgraph return stack rather than one entry.
+
+The BL invokes 'other_function' with the link register pointing to this
+function's LDR and the frame pointer pointing to this function's stackframe.
+When 'other_function' returns, the link register is left pointing at the BL,
+and so a trace at this point could result in 'function' appearing twice in the
+backtrace.
+
+Similarly, a function may deliberately clobber the LR, e.g.
+
+.. code-block:: none
+
+   caller:
+           STP X29, X30, [SP, -16]!
+           MOV X29, SP
+           ADR LR, <callee>
+           BLR LR
+           LDP X29, X30, [SP], #16
+           RET
+
+The ADR places the address of 'callee' into the LR, before the BLR branches to
+this address. If a trace is made immediately after the ADR, 'callee' will
+appear to be the parent of 'caller', rather than the child.
+
+Due to cases such as the above, it may only be possible to reliably consume a
+link register value at a function call boundary. Architectures where this is
+the case must reject unwinding across exception boundaries unless they can
+reliably identify when the LR or stack value should be used (e.g. using
+metadata generated by objtool).
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] Documentation: livepatch: document reliable stacktrace
  2021-01-13 16:57 [PATCH] Documentation: livepatch: document reliable stacktrace Mark Brown
@ 2021-01-13 19:33 ` Josh Poimboeuf
  2021-01-13 20:23   ` Mark Brown
  2021-01-14 11:54   ` Mark Rutland
  0 siblings, 2 replies; 13+ messages in thread
From: Josh Poimboeuf @ 2021-01-13 19:33 UTC (permalink / raw)
  To: Mark Brown
  Cc: linux-kernel, Mark Rutland, Jiri Kosina, Joe Lawrence,
	Jonathan Corbet, Miroslav Benes, Petr Mladek, linux-doc,
	live-patching

On Wed, Jan 13, 2021 at 04:57:43PM +0000, Mark Brown wrote:
> From: Mark Rutland <mark.rutland@arm.com>
> 
> Add documentation for reliable stacktrace. This is intended to describe
> the semantics and to be an aid for implementing architecture support for
> HAVE_RELIABLE_STACKTRACE.
> 
> Unwinding is a subtle area, and architectures vary greatly in both
> implementation and the set of concerns that affect them, so I've tried
> to avoid making this too specific to any given architecture. I've used
> examples from both x86_64 and arm64 to explain corner cases in more
> detail, but I've tried to keep the descriptions sufficient for those who
> are unfamiliar with the particular architecture.
> 
> I've tried to give rationale for all the recommendations/requirements,
> since that makes it easier to spot nearby issues, or when a check
> happens to catch a few things at once. I believe what I have written is
> sound, but as some of this was reverse-engineered I may have missed
> things worth noting.
> 
> I've made a few assumptions about preferred behaviour, notably:
> 
> * If you can reliably unwind through exceptions, you should (as x86_64
>   does).
> 
> * It's fine to omit ftrace_return_to_handler and other return
>   trampolines so long as these are not subject to patching and the
>   original return address is reported. Most architectures do this for
>   ftrace_return_handler, but not other return trampolines.
> 
> * For cases where link register unreliability could result in duplicate
>   entries in the trace or an inverted trace, I've assumed this should be
>   treated as unreliable. This specific case shouldn't matter to
>   livepatching, but I assume that that we want a reliable trace to have
>   the correct order.

Thanks to you and Mark for getting this documented properly!

I think it's worth mentioning a little more about objtool.  There are a
few passing mentions of objtool's generation of metadata (i.e. ORC), but
objtool has another relevant purpose: stack validation.  That's
particularly important when it comes to frame pointers.

For some architectures like x86_64 and arm64 (but not powerpc/s390),
it's far too easy for a human to write asm and/or inline asm which
violates frame pointer protocol, silently causing the violater's callee
to get skipped in the unwind.  Such architectures need objtool
implemented for CONFIG_STACK_VALIDATION.

> +There are several ways an architecture may identify kernel code which is deemed
> +unreliable to unwind from, e.g.
> +
> +* Using metadata created by objtool, with such code annotated with
> +  SYM_CODE_{START,END} or STACKFRAME_NON_STANDARD().

I'm not sure why SYM_CODE_{START,END} is mentioned here, but it doesn't
necessarily mean the code is unreliable, and objtool doesn't treat it as
such.  Its mention can probably be removed unless there was some other
point I'm missing.

Also, s/STACKFRAME/STACK_FRAME/

-- 
Josh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Documentation: livepatch: document reliable stacktrace
  2021-01-13 19:33 ` Josh Poimboeuf
@ 2021-01-13 20:23   ` Mark Brown
  2021-01-13 22:25     ` Josh Poimboeuf
  2021-01-14 11:54   ` Mark Rutland
  1 sibling, 1 reply; 13+ messages in thread
From: Mark Brown @ 2021-01-13 20:23 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, Mark Rutland, Jiri Kosina, Joe Lawrence,
	Jonathan Corbet, Miroslav Benes, Petr Mladek, linux-doc,
	live-patching

[-- Attachment #1: Type: text/plain, Size: 1605 bytes --]

On Wed, Jan 13, 2021 at 01:33:13PM -0600, Josh Poimboeuf wrote:

> I think it's worth mentioning a little more about objtool.  There are a
> few passing mentions of objtool's generation of metadata (i.e. ORC), but
> objtool has another relevant purpose: stack validation.  That's
> particularly important when it comes to frame pointers.

> For some architectures like x86_64 and arm64 (but not powerpc/s390),
> it's far too easy for a human to write asm and/or inline asm which
> violates frame pointer protocol, silently causing the violater's callee
> to get skipped in the unwind.  Such architectures need objtool
> implemented for CONFIG_STACK_VALIDATION.

This basically boils down to just adding a statement saying "you may
need to depend on objtool" I think?

> > +There are several ways an architecture may identify kernel code which is deemed
> > +unreliable to unwind from, e.g.

> > +* Using metadata created by objtool, with such code annotated with
> > +  SYM_CODE_{START,END} or STACKFRAME_NON_STANDARD().

> I'm not sure why SYM_CODE_{START,END} is mentioned here, but it doesn't
> necessarily mean the code is unreliable, and objtool doesn't treat it as
> such.  Its mention can probably be removed unless there was some other
> point I'm missing.

I was reading that as being a thing that the architecture could possibly
do, especially as a first step - it does seem like a reasonable thing to
consider using anyway.  I guess you could also use it the other way
around and do additional checks for things that are supposed to be
regular functions that you relax for SYM_CODE() sections.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Documentation: livepatch: document reliable stacktrace
  2021-01-13 20:23   ` Mark Brown
@ 2021-01-13 22:25     ` Josh Poimboeuf
  2021-01-14 18:10       ` Mark Rutland
  0 siblings, 1 reply; 13+ messages in thread
From: Josh Poimboeuf @ 2021-01-13 22:25 UTC (permalink / raw)
  To: Mark Brown
  Cc: linux-kernel, Mark Rutland, Jiri Kosina, Joe Lawrence,
	Jonathan Corbet, Miroslav Benes, Petr Mladek, linux-doc,
	live-patching

On Wed, Jan 13, 2021 at 08:23:15PM +0000, Mark Brown wrote:
> On Wed, Jan 13, 2021 at 01:33:13PM -0600, Josh Poimboeuf wrote:
> 
> > I think it's worth mentioning a little more about objtool.  There are a
> > few passing mentions of objtool's generation of metadata (i.e. ORC), but
> > objtool has another relevant purpose: stack validation.  That's
> > particularly important when it comes to frame pointers.
> 
> > For some architectures like x86_64 and arm64 (but not powerpc/s390),
> > it's far too easy for a human to write asm and/or inline asm which
> > violates frame pointer protocol, silently causing the violater's callee
> > to get skipped in the unwind.  Such architectures need objtool
> > implemented for CONFIG_STACK_VALIDATION.
> 
> This basically boils down to just adding a statement saying "you may
> need to depend on objtool" I think?

Right, but maybe it would be a short paragraph or two.

> > > +There are several ways an architecture may identify kernel code which is deemed
> > > +unreliable to unwind from, e.g.
> 
> > > +* Using metadata created by objtool, with such code annotated with
> > > +  SYM_CODE_{START,END} or STACKFRAME_NON_STANDARD().
> 
> > I'm not sure why SYM_CODE_{START,END} is mentioned here, but it doesn't
> > necessarily mean the code is unreliable, and objtool doesn't treat it as
> > such.  Its mention can probably be removed unless there was some other
> > point I'm missing.
> 
> I was reading that as being a thing that the architecture could possibly
> do, especially as a first step - it does seem like a reasonable thing to
> consider using anyway.  I guess you could also use it the other way
> around and do additional checks for things that are supposed to be
> regular functions that you relax for SYM_CODE() sections.

Makes sense, but we have to be careful not to imply that objtool already
does something like that :-)

-- 
Josh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Documentation: livepatch: document reliable stacktrace
  2021-01-13 19:33 ` Josh Poimboeuf
  2021-01-13 20:23   ` Mark Brown
@ 2021-01-14 11:54   ` Mark Rutland
  2021-01-14 14:36     ` Josh Poimboeuf
  1 sibling, 1 reply; 13+ messages in thread
From: Mark Rutland @ 2021-01-14 11:54 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Mark Brown, linux-kernel, Jiri Kosina, Joe Lawrence,
	Jonathan Corbet, Miroslav Benes, Petr Mladek, linux-doc,
	live-patching

On Wed, Jan 13, 2021 at 01:33:13PM -0600, Josh Poimboeuf wrote:
> On Wed, Jan 13, 2021 at 04:57:43PM +0000, Mark Brown wrote:
> > From: Mark Rutland <mark.rutland@arm.com>
> > +There are several ways an architecture may identify kernel code which is deemed
> > +unreliable to unwind from, e.g.
> > +
> > +* Using metadata created by objtool, with such code annotated with
> > +  SYM_CODE_{START,END} or STACKFRAME_NON_STANDARD().
> 
> I'm not sure why SYM_CODE_{START,END} is mentioned here, but it doesn't
> necessarily mean the code is unreliable, and objtool doesn't treat it as
> such.  Its mention can probably be removed unless there was some other
> point I'm missing.
> 
> Also, s/STACKFRAME/STACK_FRAME/

When I wrote this, I was under the impression that (for x86) code marked
as SYM_CODE_{START,END} wouldn't be considered as a function by objtool.
Specifically SYM_FUNC_END() marks the function with SYM_T_FUNC whereas
SYM_CODE_END() marks it with SYM_T_NONE, and IIRC I thought that objtool
only generated ORC for SYM_T_FUNC functions, and hence anything else
would be considered not unwindable due to the absence of ORC.

Just to check, is that understanding for x86 correct, or did I get that
wrong?

If that's right, it might be worth splitting this into two points, e.g.

| * Using metadata created by objtool, with such code annotated with
|   STACKFRAME_NON_STANDARD().
|
|
| * Using ELF symbol attributes, with such code annotated with
|   SYM_CODE_{START,END}, and not having a function type.

If that's wrong, I suspect there are latent issues here?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Documentation: livepatch: document reliable stacktrace
  2021-01-14 11:54   ` Mark Rutland
@ 2021-01-14 14:36     ` Josh Poimboeuf
  2021-01-14 17:49       ` Mark Rutland
  0 siblings, 1 reply; 13+ messages in thread
From: Josh Poimboeuf @ 2021-01-14 14:36 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Mark Brown, linux-kernel, Jiri Kosina, Joe Lawrence,
	Jonathan Corbet, Miroslav Benes, Petr Mladek, linux-doc,
	live-patching

On Thu, Jan 14, 2021 at 11:54:18AM +0000, Mark Rutland wrote:
> On Wed, Jan 13, 2021 at 01:33:13PM -0600, Josh Poimboeuf wrote:
> > On Wed, Jan 13, 2021 at 04:57:43PM +0000, Mark Brown wrote:
> > > From: Mark Rutland <mark.rutland@arm.com>
> > > +There are several ways an architecture may identify kernel code which is deemed
> > > +unreliable to unwind from, e.g.
> > > +
> > > +* Using metadata created by objtool, with such code annotated with
> > > +  SYM_CODE_{START,END} or STACKFRAME_NON_STANDARD().
> > 
> > I'm not sure why SYM_CODE_{START,END} is mentioned here, but it doesn't
> > necessarily mean the code is unreliable, and objtool doesn't treat it as
> > such.  Its mention can probably be removed unless there was some other
> > point I'm missing.
> > 
> > Also, s/STACKFRAME/STACK_FRAME/
> 
> When I wrote this, I was under the impression that (for x86) code marked
> as SYM_CODE_{START,END} wouldn't be considered as a function by objtool.
> Specifically SYM_FUNC_END() marks the function with SYM_T_FUNC whereas
> SYM_CODE_END() marks it with SYM_T_NONE, and IIRC I thought that objtool
> only generated ORC for SYM_T_FUNC functions, and hence anything else
> would be considered not unwindable due to the absence of ORC.
> 
> Just to check, is that understanding for x86 correct, or did I get that
> wrong?

Doh, I suppose you read the documentation ;-)

I realize your understanding is pretty much consistent with
tools/objtool/Documentation/stack-validation.txt:

2. Conversely, each section of code which is *not* callable should *not*
   be annotated as an ELF function.  The ENDPROC macro shouldn't be used
   in this case.

   This rule is needed so that objtool can ignore non-callable code.
   Such code doesn't have to follow any of the other rules.

But this statement is no longer true:

  **This rule is needed so that objtool can ignore non-callable code.**

[ and it looks like the ENDPROC reference is also out of date ]

Since that document was written, around the time ORC was written we
realized objtool shouldn't ignore SYM_CODE after all.  That way we can
get full coverage for ORC (including interrupts/exceptions), as well as
some of the other validations like retpoline, uaccess, noinstr, etc.

Though it's still true that SYM_CODE doesn't have to follow the
function-specific rules, e.g. frame pointers.

So now objtool requires that it be able to traverse and understand *all*
code, otherwise it will spit out "unreachable instruction" warnings.
But since SYM_CODE isn't a normal callable function, objtool doesn't
know to interpret it directly.  Therefore all SYM_CODE must be reachable
by objtool in some other way:

- either indirectly, via a jump from a SYM_FUNC; or

- via an UNWIND_HINT

(And that's true for both ORC and frame pointers.)

If you look closely at arch/x86/entry/entry_64.S you should be able to
see that's the case.

> If that's right, it might be worth splitting this into two points, e.g.
> 
> | * Using metadata created by objtool, with such code annotated with
> |   STACKFRAME_NON_STANDARD().
> |
> |
> | * Using ELF symbol attributes, with such code annotated with
> |   SYM_CODE_{START,END}, and not having a function type.
> 
> If that's wrong, I suspect there are latent issues here?

For ORC, UNWIND_HINT_EMPTY is used to annotate that some code is
non-unwindable.  (Note I have plans to split that into UNWIND_HINT_ENTRY
and UNWIND_HINT_UNDEFINED.)

For frame pointers, the hints aren't used, other than by objtool to
follow the code flow as described above.  But objtool doesn't produce
any metadata for the FP unwinder.  Instead the FP unwinder makes such
determinations about unwindability at runtime.

-- 
Josh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Documentation: livepatch: document reliable stacktrace
  2021-01-14 14:36     ` Josh Poimboeuf
@ 2021-01-14 17:49       ` Mark Rutland
  2021-01-14 20:03         ` Josh Poimboeuf
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Rutland @ 2021-01-14 17:49 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Mark Brown, linux-kernel, Jiri Kosina, Joe Lawrence,
	Jonathan Corbet, Miroslav Benes, Petr Mladek, linux-doc,
	live-patching

On Thu, Jan 14, 2021 at 08:36:50AM -0600, Josh Poimboeuf wrote:
> On Thu, Jan 14, 2021 at 11:54:18AM +0000, Mark Rutland wrote:
> > On Wed, Jan 13, 2021 at 01:33:13PM -0600, Josh Poimboeuf wrote:
> > > On Wed, Jan 13, 2021 at 04:57:43PM +0000, Mark Brown wrote:
> > > > From: Mark Rutland <mark.rutland@arm.com>
> > > > +There are several ways an architecture may identify kernel code which is deemed
> > > > +unreliable to unwind from, e.g.
> > > > +
> > > > +* Using metadata created by objtool, with such code annotated with
> > > > +  SYM_CODE_{START,END} or STACKFRAME_NON_STANDARD().
> > > 
> > > I'm not sure why SYM_CODE_{START,END} is mentioned here, but it doesn't
> > > necessarily mean the code is unreliable, and objtool doesn't treat it as
> > > such.  Its mention can probably be removed unless there was some other
> > > point I'm missing.
> >
> > > Also, s/STACKFRAME/STACK_FRAME/

Given that (per the discussion below) STACK_FRAME_NON_STANDARD() also
doesn't result in objtool producing anything special metadata (and such
code is still expected to be unwindable), I believe we can delete this
bullet point outright?

> > When I wrote this, I was under the impression that (for x86) code marked
> > as SYM_CODE_{START,END} wouldn't be considered as a function by objtool.
> > Specifically SYM_FUNC_END() marks the function with SYM_T_FUNC whereas
> > SYM_CODE_END() marks it with SYM_T_NONE, and IIRC I thought that objtool
> > only generated ORC for SYM_T_FUNC functions, and hence anything else
> > would be considered not unwindable due to the absence of ORC.
> > 
> > Just to check, is that understanding for x86 correct, or did I get that
> > wrong?
> 
> Doh, I suppose you read the documentation ;-)

I think I skimmed the objtool source, too, but it was a while back. ;)

> I realize your understanding is pretty much consistent with
> tools/objtool/Documentation/stack-validation.txt:
> 
> 2. Conversely, each section of code which is *not* callable should *not*
>    be annotated as an ELF function.  The ENDPROC macro shouldn't be used
>    in this case.
> 
>    This rule is needed so that objtool can ignore non-callable code.
>    Such code doesn't have to follow any of the other rules.
> 
> But this statement is no longer true:
> 
>   **This rule is needed so that objtool can ignore non-callable code.**
> 
> [ and it looks like the ENDPROC reference is also out of date ]

Ok -- looks like that needs an update!

> Since that document was written, around the time ORC was written we
> realized objtool shouldn't ignore SYM_CODE after all.  That way we can
> get full coverage for ORC (including interrupts/exceptions), as well as
> some of the other validations like retpoline, uaccess, noinstr, etc.
> 
> Though it's still true that SYM_CODE doesn't have to follow the
> function-specific rules, e.g. frame pointers.

Ok; I suspect on the arm64 side we'll need to think a bit harder about
what that means for us. I guess that'll influence or interact with
whatever support we need specifically in objtool.

> So now objtool requires that it be able to traverse and understand *all*
> code, otherwise it will spit out "unreachable instruction" warnings.
> But since SYM_CODE isn't a normal callable function, objtool doesn't
> know to interpret it directly.  Therefore all SYM_CODE must be reachable
> by objtool in some other way:
> 
> - either indirectly, via a jump from a SYM_FUNC; or
> 
> - via an UNWIND_HINT
> 
> (And that's true for both ORC and frame pointers.)
> 
> If you look closely at arch/x86/entry/entry_64.S you should be able to
> see that's the case.

Assuming you mean the UNWIND_HINT_EMPTY at the start of each exception
entry point, I think I follow.

> > If that's right, it might be worth splitting this into two points, e.g.
> > 
> > | * Using metadata created by objtool, with such code annotated with
> > |   STACKFRAME_NON_STANDARD().
> > |
> > |
> > | * Using ELF symbol attributes, with such code annotated with
> > |   SYM_CODE_{START,END}, and not having a function type.
> > 
> > If that's wrong, I suspect there are latent issues here?
> 
> For ORC, UNWIND_HINT_EMPTY is used to annotate that some code is
> non-unwindable.  (Note I have plans to split that into UNWIND_HINT_ENTRY
> and UNWIND_HINT_UNDEFINED.)

Interesting; where would the UNDEFINED case be used?

> For frame pointers, the hints aren't used, other than by objtool to
> follow the code flow as described above.  But objtool doesn't produce
> any metadata for the FP unwinder.  Instead the FP unwinder makes such
> determinations about unwindability at runtime.

I suspect for arm64 with frame pointers we'll need a fair amount of
special casing for the entry code; I'll have a think offline.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Documentation: livepatch: document reliable stacktrace
  2021-01-13 22:25     ` Josh Poimboeuf
@ 2021-01-14 18:10       ` Mark Rutland
  2021-01-15  0:03         ` Josh Poimboeuf
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Rutland @ 2021-01-14 18:10 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Mark Brown, linux-kernel, Jiri Kosina, Joe Lawrence,
	Jonathan Corbet, Miroslav Benes, Petr Mladek, linux-doc,
	live-patching

On Wed, Jan 13, 2021 at 04:25:41PM -0600, Josh Poimboeuf wrote:
> On Wed, Jan 13, 2021 at 08:23:15PM +0000, Mark Brown wrote:
> > On Wed, Jan 13, 2021 at 01:33:13PM -0600, Josh Poimboeuf wrote:
> > 
> > > I think it's worth mentioning a little more about objtool.  There are a
> > > few passing mentions of objtool's generation of metadata (i.e. ORC), but
> > > objtool has another relevant purpose: stack validation.  That's
> > > particularly important when it comes to frame pointers.
> > 
> > > For some architectures like x86_64 and arm64 (but not powerpc/s390),
> > > it's far too easy for a human to write asm and/or inline asm which
> > > violates frame pointer protocol, silently causing the violater's callee
> > > to get skipped in the unwind.  Such architectures need objtool
> > > implemented for CONFIG_STACK_VALIDATION.
> > 
> > This basically boils down to just adding a statement saying "you may
> > need to depend on objtool" I think?
> 
> Right, but maybe it would be a short paragraph or two.

I reckon that's a top-level section between requirements and
consideration along the lines of:

3. Compile-time analysis
========================

To ensure that kernel code can be correctly unwound in all cases,
architectures may need to verify that code has been compiled in a manner
expected by the unwinder. For example, an unwinder may expect that
functions manipulate the stack pointer in a limited way, or that all
functions use specific prologue and epilogue sequences. Architectures
with such requirements should verify the kernel compilation using
objtool.

In some cases, an unwinder may require metadata to correctly unwind.
Where necessary, this metadata should be generated at build time using
objtool.

... perhaps elaborating a little further on the latter?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Documentation: livepatch: document reliable stacktrace
  2021-01-14 17:49       ` Mark Rutland
@ 2021-01-14 20:03         ` Josh Poimboeuf
  0 siblings, 0 replies; 13+ messages in thread
From: Josh Poimboeuf @ 2021-01-14 20:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Mark Brown, linux-kernel, Jiri Kosina, Joe Lawrence,
	Jonathan Corbet, Miroslav Benes, Petr Mladek, linux-doc,
	live-patching

On Thu, Jan 14, 2021 at 05:49:32PM +0000, Mark Rutland wrote:
> On Thu, Jan 14, 2021 at 08:36:50AM -0600, Josh Poimboeuf wrote:
> > On Thu, Jan 14, 2021 at 11:54:18AM +0000, Mark Rutland wrote:
> > > On Wed, Jan 13, 2021 at 01:33:13PM -0600, Josh Poimboeuf wrote:
> > > > On Wed, Jan 13, 2021 at 04:57:43PM +0000, Mark Brown wrote:
> > > > > From: Mark Rutland <mark.rutland@arm.com>
> > > > > +There are several ways an architecture may identify kernel code which is deemed
> > > > > +unreliable to unwind from, e.g.
> > > > > +
> > > > > +* Using metadata created by objtool, with such code annotated with
> > > > > +  SYM_CODE_{START,END} or STACKFRAME_NON_STANDARD().
> > > > 
> > > > I'm not sure why SYM_CODE_{START,END} is mentioned here, but it doesn't
> > > > necessarily mean the code is unreliable, and objtool doesn't treat it as
> > > > such.  Its mention can probably be removed unless there was some other
> > > > point I'm missing.
> > >
> > > > Also, s/STACKFRAME/STACK_FRAME/
> 
> Given that (per the discussion below) STACK_FRAME_NON_STANDARD() also
> doesn't result in objtool producing anything special metadata (and such
> code is still expected to be unwindable), I believe we can delete this
> bullet point outright?

With ORC, UNWIND_HINT_EMPTY can be used to mark missing ORC metadata,
which the unwinder translates as unreliable.  So that may be worth
mentioning.

> > I realize your understanding is pretty much consistent with
> > tools/objtool/Documentation/stack-validation.txt:
> > 
> > 2. Conversely, each section of code which is *not* callable should *not*
> >    be annotated as an ELF function.  The ENDPROC macro shouldn't be used
> >    in this case.
> > 
> >    This rule is needed so that objtool can ignore non-callable code.
> >    Such code doesn't have to follow any of the other rules.
> > 
> > But this statement is no longer true:
> > 
> >   **This rule is needed so that objtool can ignore non-callable code.**
> > 
> > [ and it looks like the ENDPROC reference is also out of date ]
> 
> Ok -- looks like that needs an update!

Added to the TODO list!

> > Since that document was written, around the time ORC was written we
> > realized objtool shouldn't ignore SYM_CODE after all.  That way we can
> > get full coverage for ORC (including interrupts/exceptions), as well as
> > some of the other validations like retpoline, uaccess, noinstr, etc.
> > 
> > Though it's still true that SYM_CODE doesn't have to follow the
> > function-specific rules, e.g. frame pointers.
> 
> Ok; I suspect on the arm64 side we'll need to think a bit harder about
> what that means for us. I guess that'll influence or interact with
> whatever support we need specifically in objtool.
> 
> > So now objtool requires that it be able to traverse and understand *all*
> > code, otherwise it will spit out "unreachable instruction" warnings.
> > But since SYM_CODE isn't a normal callable function, objtool doesn't
> > know to interpret it directly.  Therefore all SYM_CODE must be reachable
> > by objtool in some other way:
> > 
> > - either indirectly, via a jump from a SYM_FUNC; or

This should say "via a jump from some code objtool already knows about:
either a SYM_FUNC or a SYM_CODE with UNWIND_HINTs".

> > 
> > - via an UNWIND_HINT
> > 
> > (And that's true for both ORC and frame pointers.)
> > 
> > If you look closely at arch/x86/entry/entry_64.S you should be able to
> > see that's the case.
> 
> Assuming you mean the UNWIND_HINT_EMPTY at the start of each exception
> entry point, I think I follow.

Also see for example common_interrupt_return(), which doesn't have an
UNWIND_HINT right away, but is still reachable from other code which
objtool already knows about via
the 'swapgs_restore_regs_and_return_to_usermode' label.

> > > If that's right, it might be worth splitting this into two points, e.g.
> > > 
> > > | * Using metadata created by objtool, with such code annotated with
> > > |   STACKFRAME_NON_STANDARD().
> > > |
> > > |
> > > | * Using ELF symbol attributes, with such code annotated with
> > > |   SYM_CODE_{START,END}, and not having a function type.
> > > 
> > > If that's wrong, I suspect there are latent issues here?
> > 
> > For ORC, UNWIND_HINT_EMPTY is used to annotate that some code is
> > non-unwindable.  (Note I have plans to split that into UNWIND_HINT_ENTRY
> > and UNWIND_HINT_UNDEFINED.)
> 
> Interesting; where would the UNDEFINED case be used?

UNDEFINED would be some code which is either unreachable (like the
middle of a retpoline) or otherwise not annotated (like
STACK_FRAME_NON_STANDARD).

> > For frame pointers, the hints aren't used, other than by objtool to
> > follow the code flow as described above.  But objtool doesn't produce
> > any metadata for the FP unwinder.  Instead the FP unwinder makes such
> > determinations about unwindability at runtime.
> 
> I suspect for arm64 with frame pointers we'll need a fair amount of
> special casing for the entry code; I'll have a think offline.

I'd be happy to help with this.  It may end up easier for me to learn
your entry code than for you to learn the expectations of my tool ;-)

-- 
Josh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Documentation: livepatch: document reliable stacktrace
  2021-01-14 18:10       ` Mark Rutland
@ 2021-01-15  0:03         ` Josh Poimboeuf
  0 siblings, 0 replies; 13+ messages in thread
From: Josh Poimboeuf @ 2021-01-15  0:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Mark Brown, linux-kernel, Jiri Kosina, Joe Lawrence,
	Jonathan Corbet, Miroslav Benes, Petr Mladek, linux-doc,
	live-patching

On Thu, Jan 14, 2021 at 06:10:13PM +0000, Mark Rutland wrote:
> On Wed, Jan 13, 2021 at 04:25:41PM -0600, Josh Poimboeuf wrote:
> > On Wed, Jan 13, 2021 at 08:23:15PM +0000, Mark Brown wrote:
> > > On Wed, Jan 13, 2021 at 01:33:13PM -0600, Josh Poimboeuf wrote:
> > > 
> > > > I think it's worth mentioning a little more about objtool.  There are a
> > > > few passing mentions of objtool's generation of metadata (i.e. ORC), but
> > > > objtool has another relevant purpose: stack validation.  That's
> > > > particularly important when it comes to frame pointers.
> > > 
> > > > For some architectures like x86_64 and arm64 (but not powerpc/s390),
> > > > it's far too easy for a human to write asm and/or inline asm which
> > > > violates frame pointer protocol, silently causing the violater's callee
> > > > to get skipped in the unwind.  Such architectures need objtool
> > > > implemented for CONFIG_STACK_VALIDATION.
> > > 
> > > This basically boils down to just adding a statement saying "you may
> > > need to depend on objtool" I think?
> > 
> > Right, but maybe it would be a short paragraph or two.
> 
> I reckon that's a top-level section between requirements and
> consideration along the lines of:
> 
> 3. Compile-time analysis
> ========================
> 
> To ensure that kernel code can be correctly unwound in all cases,
> architectures may need to verify that code has been compiled in a manner
> expected by the unwinder. For example, an unwinder may expect that
> functions manipulate the stack pointer in a limited way, or that all
> functions use specific prologue and epilogue sequences. Architectures
> with such requirements should verify the kernel compilation using
> objtool.
> 
> In some cases, an unwinder may require metadata to correctly unwind.
> Where necessary, this metadata should be generated at build time using
> objtool.

Sounds good to me.

-- 
Josh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Documentation: livepatch: document reliable stacktrace
  2020-10-23 15:35 Mark Rutland
  2020-10-27 11:16 ` Petr Mladek
@ 2020-10-29 10:04 ` Miroslav Benes
  1 sibling, 0 replies; 13+ messages in thread
From: Miroslav Benes @ 2020-10-29 10:04 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, Jiri Kosina, Joe Lawrence, Jonathan Corbet,
	Josh Poimboeuf, Mark Brown, Petr Mladek, linux-doc,
	live-patching

Hi,

On Fri, 23 Oct 2020, Mark Rutland wrote:

> Add documentation for reliable stacktrace. This is intended to describe
> the semantics and to be an aid for implementing architecture support for
> HAVE_RELIABLE_STACKTRACE.

thanks a lot for doing the work!

> Unwinding is a subtle area, and architectures vary greatly in both
> implementation and the set of concerns that affect them, so I've tried
> to avoid making this too specific to any given architecture. I've used
> examples from both x86_64 and arm64 to explain corner cases in more
> detail, but I've tried to keep the descriptions sufficient for those who
> are unfamiliar with the particular architecture.

Yes, I think it is a good approach. We can always add more details later, 
but it would probably cause more confusion for those unfamiliar.

> I've tried to give rationale for all the recommendations/requirements,
> since that makes it easier to spot nearby issues, or when a check
> happens to catch a few things at once. I believe what I have written is
> sound, but as some of this was reverse-engineered I may have missed
> things worth noting.
> 
> I've made a few assumptions about preferred behaviour, notably:
> 
> * If you can reliably unwind through exceptions, you should (as x86_64
>   does).

Yes, it does. I think (and Josh will correct me if I am wrong here), that 
even at the beginning the intention was to improve the reliability of 
unwinding in general. Both x86_64 and s390x are the case. _reliable() 
interface only takes an advantage of that. As you pointed out in the 
document, unwinding through exceptions is not necessary. It can be 
reported as unreliable and we can deal with that later. But it is always 
better to do it if possible.

powerpc is an exception to the approach, because it implements its 
_reliable() API from the scratch.

> * It's fine to omit ftrace_return_to_handler and other return
>   trampolines so long as these are not subject to patching and the
>   original return address is reported. Most architectures do this for
>   ftrace_return_handler, but not other return trampolines.

Yes. Patching a trampoline is not something I can imagine, so that should 
not be a problem. But one never knows and we may run into a problem here 
easily. I don't remember if we even audited all the trampolines. And new 
ones are introduced all the time.

> * For cases where link register unreliability could result in duplicate
>   entries in the trace or an inverted trace, I've assumed this should be
>   treated as unreliable. This specific case shouldn't matter to
>   livepatching, but I assume that that we want a reliable trace to have
>   the correct order.

Agreed.

Thanks
Miroslav

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Documentation: livepatch: document reliable stacktrace
  2020-10-23 15:35 Mark Rutland
@ 2020-10-27 11:16 ` Petr Mladek
  2020-10-29 10:04 ` Miroslav Benes
  1 sibling, 0 replies; 13+ messages in thread
From: Petr Mladek @ 2020-10-27 11:16 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, Jiri Kosina, Joe Lawrence, Jonathan Corbet,
	Josh Poimboeuf, Mark Brown, Miroslav Benes, linux-doc,
	live-patching

On Fri 2020-10-23 16:35:27, Mark Rutland wrote:
> Add documentation for reliable stacktrace. This is intended to describe
> the semantics and to be an aid for implementing architecture support for
> HAVE_RELIABLE_STACKTRACE.

First, thanks a lot for putting this document together.

I am not expert on stack unwinders and am not sure if some details
should get corrected and added. I believe that it can be done by
others more effectively.

Anyway, the document is well readable and provides a lot of useful
information. I suggest only small change in the style, see below.


> diff --git a/Documentation/livepatch/reliable-stacktrace.rst b/Documentation/livepatch/reliable-stacktrace.rst
> new file mode 100644
> index 0000000000000..d296c93f6f0e0
> --- /dev/null
> +++ b/Documentation/livepatch/reliable-stacktrace.rst
> +2. Requirements
> +===============
> +
> +Architectures must implement one of the reliable stacktrace functions.
> +Architectures using CONFIG_ARCH_STACKWALK should implement
> +'arch_stack_walk_reliable', and other architectures should implement
> +'save_stack_trace_tsk_reliable'.
> +
> +Principally, the reliable stacktrace function must ensure that either:
> +
> +* The trace includes all functions that the task may be returned to, and the
> +  return code is zero to indicate that the trace is reliable.
> +
> +* The return code is non-zero to indicate that the trace is not reliable.
> +
> +.. note::
> +   In some cases it is legitimate to omit specific functions from the trace,
> +   but all other functions must be reported. These cases are described in
> +   futher detail below.
> +
> +Secondly, the reliable stacktrace function should be robust to cases where the
> +stack or other unwind state is corrupt or otherwise unreliable. The function
> +should attempt to detect such cases and return a non-zero error code, and
> +should not get stuck in an infinite loop or access memory in an unsafe way.
> +Specific cases are described in further detail below.

Please, use imperative style when something is required for the
reliability. For example, it means replacing all "should" with "must"
in the above paragraph.

I perfectly understand why you used "should". I use it heavily as
well. But we really must motivate people to handle all corner
cases here. ;-)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] Documentation: livepatch: document reliable stacktrace
@ 2020-10-23 15:35 Mark Rutland
  2020-10-27 11:16 ` Petr Mladek
  2020-10-29 10:04 ` Miroslav Benes
  0 siblings, 2 replies; 13+ messages in thread
From: Mark Rutland @ 2020-10-23 15:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Rutland, Jiri Kosina, Joe Lawrence, Jonathan Corbet,
	Josh Poimboeuf, Mark Brown, Miroslav Benes, Petr Mladek,
	linux-doc, live-patching

Add documentation for reliable stacktrace. This is intended to describe
the semantics and to be an aid for implementing architecture support for
HAVE_RELIABLE_STACKTRACE.

Unwinding is a subtle area, and architectures vary greatly in both
implementation and the set of concerns that affect them, so I've tried
to avoid making this too specific to any given architecture. I've used
examples from both x86_64 and arm64 to explain corner cases in more
detail, but I've tried to keep the descriptions sufficient for those who
are unfamiliar with the particular architecture.

I've tried to give rationale for all the recommendations/requirements,
since that makes it easier to spot nearby issues, or when a check
happens to catch a few things at once. I believe what I have written is
sound, but as some of this was reverse-engineered I may have missed
things worth noting.

I've made a few assumptions about preferred behaviour, notably:

* If you can reliably unwind through exceptions, you should (as x86_64
  does).

* It's fine to omit ftrace_return_to_handler and other return
  trampolines so long as these are not subject to patching and the
  original return address is reported. Most architectures do this for
  ftrace_return_handler, but not other return trampolines.

* For cases where link register unreliability could result in duplicate
  entries in the trace or an inverted trace, I've assumed this should be
  treated as unreliable. This specific case shouldn't matter to
  livepatching, but I assume that that we want a reliable trace to have
  the correct order.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: linux-doc@vgert.kernel.org
Cc: live-patching@vger.kernel.org
---
 Documentation/livepatch/index.rst               |   1 +
 Documentation/livepatch/reliable-stacktrace.rst | 303 ++++++++++++++++++++++++
 2 files changed, 304 insertions(+)
 create mode 100644 Documentation/livepatch/reliable-stacktrace.rst

diff --git a/Documentation/livepatch/index.rst b/Documentation/livepatch/index.rst
index 525944063be7a..43cce5fad705f 100644
--- a/Documentation/livepatch/index.rst
+++ b/Documentation/livepatch/index.rst
@@ -13,6 +13,7 @@ Kernel Livepatching
     module-elf-format
     shadow-vars
     system-state
+    reliable-stacktrace
 
 .. only::  subproject and html
 
diff --git a/Documentation/livepatch/reliable-stacktrace.rst b/Documentation/livepatch/reliable-stacktrace.rst
new file mode 100644
index 0000000000000..d296c93f6f0e0
--- /dev/null
+++ b/Documentation/livepatch/reliable-stacktrace.rst
@@ -0,0 +1,303 @@
+===================
+Reliable Stacktrace
+===================
+
+This document outlines basic information about reliable stacktracing.
+
+.. Table of Contents:
+
+    1. Introduction
+    2. Requirements
+    3. Considerations
+       3.1 Identifying successful termination
+       3.2 Identifying unwindable code
+       3.3 Unwinding across interrupts and exceptions
+       3.4 Rewriting of return addresses
+       3.5 Obscuring of return addresses
+       3.6 Link register unreliability
+
+1. Introduction
+===============
+
+The kernel livepatch consistency model relies on accurately identifying which
+functions may have live state and therefore may not be safe to patch. One way
+to identify which functions are live is to use a stacktrace.
+
+Existing stacktrace code may not always give an accurate picture of all
+functions with live state, and best-effort approaches which can be helpful for
+debugging are unsound for livepatching. Livepatching depends on architectures
+to provide a *reliable* stacktrace which ensures it never omits any live
+functions from a trace.
+
+
+2. Requirements
+===============
+
+Architectures must implement one of the reliable stacktrace functions.
+Architectures using CONFIG_ARCH_STACKWALK should implement
+'arch_stack_walk_reliable', and other architectures should implement
+'save_stack_trace_tsk_reliable'.
+
+Principally, the reliable stacktrace function must ensure that either:
+
+* The trace includes all functions that the task may be returned to, and the
+  return code is zero to indicate that the trace is reliable.
+
+* The return code is non-zero to indicate that the trace is not reliable.
+
+.. note::
+   In some cases it is legitimate to omit specific functions from the trace,
+   but all other functions must be reported. These cases are described in
+   futher detail below.
+
+Secondly, the reliable stacktrace function should be robust to cases where the
+stack or other unwind state is corrupt or otherwise unreliable. The function
+should attempt to detect such cases and return a non-zero error code, and
+should not get stuck in an infinite loop or access memory in an unsafe way.
+Specific cases are described in further detail below.
+
+
+3. Considerations
+=================
+
+The unwinding process varies across architectures, their respective procedure
+call standards, and kernel configurations. This section describes common
+details that architectures should consider.
+
+3.1 Identifying successful termination
+--------------------------------------
+
+Unwinding may terminate early for a number of reasons, including:
+
+* Stack or frame pointer corruption.
+
+* Missing unwind support for an uncommon scenario, or a bug in the unwinder.
+
+* Dynamically generated code (e.g. eBPF) or foreign code (e.g. EFI runtime
+  services) not following the conventions expected by the unwinder.
+
+To ensure that this does not result in functions being omitted from the trace,
+even if not caught by other checks, it is strongly recommended that
+architectures verify that a stacktrace ends at an expected location, e.g.
+
+* Within a specific function that is an entry point to the kernel.
+
+* At a specific location on a stack expected for a kernel entry point.
+
+* On a specific stack expected for a kernel entry point (e.g. if the
+  architecture has separate task and IRQ stacks).
+
+3.2 Identifying unwindable code
+-------------------------------
+
+Unwinding typically relies on code following specific conventions (e.g.
+manipulating a frame pointer), but there can be code which may not follow these
+conventions and may require special handling in the unwinder, e.g.
+
+* Exception vectors and entry assembly.
+
+* Procedure Linkage Table (PLT) entries and veneer functions.
+
+* Trampoline assembly (e.g. ftrace, kprobes).
+
+* Dynamically generated code (e.g. eBPF, optprobe trampolines).
+
+* Foreign code (e.g. EFI runtime services).
+
+To ensure that such cases do not result in functions being omitted from a
+trace, it is strongly recommended that architectures positively identify code
+which is known to be reliable to unwind from, and reject unwinding from all
+other code.
+
+Kernel code including modules and eBPF can be distinguished from foreign code
+using '__kernel_text_address()'. Checking for this also helps to detect stack
+corruption.
+
+There are several ways an architecture may identify kernel code which is deemed
+unreliable to unwind from, e.g.
+
+* Using metadata created by objtool, with such code annotated with
+  SYM_CODE_{START,END} or STACKFRAME_NON_STANDARD().
+
+* Placing such code into special linker sections, and rejecting unwinding from
+  any code in these sections.
+
+* Identifying specific portions of code using bounds information.
+
+3.3 Unwinding across interrupts and exceptions
+----------------------------------------------
+
+At function call boundaries the stack and other unwind state is expected to be
+in a consistent state suitable for reliable unwinding, but this may not be the
+case part-way through a function. For example, during a function prologue or
+epilogue a frame pointer may be transiently invalid, or during the function
+body the return address may be held in an arbitrary general purpose register.
+For some architectures this may change at runtime as a result of dynamic
+instrumentation.
+
+If an interrupt or other exception is taken while the stack or other unwind
+state is in an inconsistent state, it may not be possible to reliably unwind,
+and it may not be possible to identify whether such unwinding will be reliable.
+See below for examples.
+
+Architectures which cannot identify when it is reliable to unwind such cases
+(or where it is never reliable) should reject unwinding across exception
+boundaries. Note that it may be reliable to unwind across certain exceptions
+(e.g. IRQ) but unreliable to unwind across other exceptions (e.g. NMI).
+
+Architectures which can identify when it is reliable to unwind such cases (or
+have no such cases) should attempt to unwind across exception boundaries, as
+doing so can prevent unnecessarily stalling livepatch consistency checks and
+permits livepatch transitions to complete more quickly.
+
+3.4 Rewriting of return addresses
+---------------------------------
+
+Some trampolines temporarily modify the return address of a function in order
+to intercept when that function returns with a return trampoline, e.g.
+
+* An ftrace trampoline may modify the return address so that function graph
+  tracing can intercept returns.
+
+* A kprobes (or optprobes) trampoline may modify the return address so that
+  kretprobes can intercept returns.
+
+When this happens, the original return address will not be in its usual
+location. For trampolines which are not subject to live patching, where an
+unwinder can reliably determine the original return address and no unwind state
+is altered by the trampoline, the unwinder may report the original return
+address in place of the trampoline and report this as reliable. Otherwise, an
+unwinder must report these cases as unreliable.
+
+Special care is required when identifying the original return address, as this
+information is not in a consistent location for the duration of the entry
+trampoline or return trampoline. For example, considering the x86_64
+'return_to_handler' return trampoline:
+
+.. code-block:: none
+
+   SYM_CODE_START(return_to_handler)
+           UNWIND_HINT_EMPTY
+           subq  $24, %rsp
+
+           /* Save the return values */
+           movq %rax, (%rsp)
+           movq %rdx, 8(%rsp)
+           movq %rbp, %rdi
+
+           call ftrace_return_to_handler
+
+           movq %rax, %rdi
+           movq 8(%rsp), %rdx
+           movq (%rsp), %rax
+           addq $24, %rsp
+           JMP_NOSPEC rdi
+   SYM_CODE_END(return_to_handler)
+
+While the traced function runs its return address points on the stack points to
+the start of return_to_handler, and the original return address is stored in
+the task's cur_ret_stack. During this time the unwinder can find the return
+address using ftrace_graph_ret_addr().
+
+When the traced function returns to return_to_handler, there is no longer a
+return address on the stack, though the original return address is still stored
+in the task's cur_ret_stack. Within ftrace_return_to_handler(), the original
+return address is removed from cur_ret_stack and is transiently moved
+arbitrarily by the compiler before being returned in rax. The return_to_handler
+trampoline moves this into rdi before jumping to it.
+
+Architectures might not always be able to unwind such sequences, such as when
+ftrace_return_to_handler() has removed the address from cur_ret_stack, and the
+location of the return address cannot be reliably determined.
+
+It is recommended that architectures unwind cases where return_to_handler has
+not yet been returned to, but architectures are not required to unwind from the
+middle of return_to_handler and can report this as unreliable. Architectures
+are not required to unwind from other trampolines which modify the return
+address.
+
+3.5 Obscuring of return addresses
+---------------------------------
+
+Some trampolines do not rewrite the return address in order to intercept
+returns, but do transiently clobber the return address or other unwind state.
+
+For example, the x86_64 implementation of optprobes patches the probed function
+with a JMP instruction which targets the associated optprobe trampoline. When
+the probe is hit, the CPU will branch to the optprobe trampoline, and the
+address of the probed function is not held in any register or on the stack.
+
+Similarly, the arm64 implementation of DYNAMIC_FTRACE_WITH_REGS patches traced
+functions with the following:
+
+.. code-block:: none
+
+   MOV X9, X30
+   BL <trampoline>
+
+The MOV saves the link register (X30) into X9 to preserve the return address
+before the BL clobbers the link register and branches to the trampoline. At the
+start of the trampoline, the address of the traced function is in X9 rather
+than the link register as would usually be the case.
+
+Architectures should ensure that unwinders either reliably unwind such cases,
+or report the unwinding as unreliable.
+
+3.6 Link register unreliability
+-------------------------------
+
+On some other architectures, 'call' instructions place the return address into a
+link register, and 'return' instructions consume the return address from the
+link register without modifying the register. On these architectures software
+must save the return address to the stack prior to making a function call. Over
+the duration of a function call, the return address may be held in the link
+register alone, on the stack alone, or in both locations.
+
+Unwinders typically assume the link register is always live, but this
+assumption can lead to unreliable stack traces. For example, consider the
+following arm64 assembly for a simple function:
+
+.. code-block:: none
+
+   function:
+           STP X29, X30, [SP, -16]!
+           MOV X29, SP
+           BL <other_function>
+           LDP X29, X30, [SP], #16
+           RET
+
+At entry to the function, the link register (x30) points to the caller, and the
+frame pointer (X29) points to the caller's frame including the caller's return
+address. The first two instructions create a new stackframe and update the
+frame pointer, and at this point the link register and the frame pointer both
+describe this function's return address. A trace at this point may describe
+this function twice, and if the function return is being traced, the unwinder
+may consume two entries from the fgraph return stack rather than one entry.
+
+The BL invokes 'other_function' with the link register pointing to this
+function's LDR and the frame pointer pointing to this function's stackframe.
+When 'other_function' returns, the link register is left pointing at the BL,
+and so a trace at this point could result in 'function' appearing twice in the
+backtrace.
+
+Similarly, a function may deliberately clobber the LR, e.g.
+
+.. code-block:: none
+
+   caller:
+           STP X29, X30, [SP, -16]!
+           MOV X29, SP
+           ADR LR, <callee>
+           BLR LR
+           LDP X29, X30, [SP], #16
+           RET
+
+The ADR places the address of 'callee' into the LR, before the BLR branches to
+this address. If a trace is made immediately after the ADR, 'callee' will
+appear to be the parent of 'caller', rather than the child.
+
+Due to cases such as the above, it may only be possible to reliably consume a
+link register value at a function call boundary. Architectures where this is
+the case must reject unwinding across exception boundaries unless they can
+reliably identify when the LR or stack value should be used (e.g. using
+metadata generated by objtool).
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-01-15  0:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-13 16:57 [PATCH] Documentation: livepatch: document reliable stacktrace Mark Brown
2021-01-13 19:33 ` Josh Poimboeuf
2021-01-13 20:23   ` Mark Brown
2021-01-13 22:25     ` Josh Poimboeuf
2021-01-14 18:10       ` Mark Rutland
2021-01-15  0:03         ` Josh Poimboeuf
2021-01-14 11:54   ` Mark Rutland
2021-01-14 14:36     ` Josh Poimboeuf
2021-01-14 17:49       ` Mark Rutland
2021-01-14 20:03         ` Josh Poimboeuf
  -- strict thread matches above, loose matches on Subject: below --
2020-10-23 15:35 Mark Rutland
2020-10-27 11:16 ` Petr Mladek
2020-10-29 10:04 ` Miroslav Benes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).