Unwinding user-space programs in the kernel using SFrame format

* Unwinding user-space programs in the kernel using SFrame format
@ 2023-01-12 20:30 Indu Bhagat
  2023-01-24 21:58 ` Indu Bhagat
  2023-02-06 19:44 ` Unwinding user-space programs in the kernel using SFrame fo Steven Rostedt
  0 siblings, 2 replies; 4+ messages in thread
From: Indu Bhagat @ 2023-01-12 20:30 UTC (permalink / raw)
  To: linux-toolchains
  Cc: Jose E. Marchesi, Daan De Meyer, Kris Van Hees, Elena Zannoni

Hello,

This email is to initiate discussion/collaboration on adding a new 
user-space program unwinder in the kernel, an unwinder which uses the 
SFrame format.

What is SFrame format?
SFrame is the Simple Frame format.  It represents the minimal necessary 
information needed for backtracing - i.e. Canonical Frame Address (CFA), 
Frame Pointer (FP), and Return Address (RA).  SFrame unwind information 
is available in a section called .sframe, which is itself presented in a 
new segment of its own, PT_GNU_SFRAME.  SFrame format is supported for 
AMD64 and AARCH64 (be/le) ABIs only.

How can I experiment with the SFrame format support?
The support for SFrame format is available in binutils trunk. GNU 
assembler when passed a --gsframe command line option, generates the 
.sframe section. The GNU assembler uses the .cfi_* asm directives 
emitted by the compiler to generate an .sframe section. GNU ld merges 
the input .sframe sections as necessary, no explicit command line option 
is needed. There is support in objdump/readelf as well, pass a --sframe 
option to dump the .sframe section in textual format.

Where can I find details about the format?
More details are available in the include/sframe.h in binutils repo 
(https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=include/sframe.h). 
  SFrame spec is also present in the binutils trunk.  Some more content 
should be available online in the form of GNU Cauldron and LPC 2022 
presentations: this was talked about under the name "CTF Frame", but has 
since been renamed to "SFrame".

Why is SFrame based unwinder useful?
Having an unwinder for user-space programs based on SFrame format can be 
useful:
   - enabling -fno-omit-frame-pointers has performance implications and 
other issues.
   - Compared to .eh_frame info, SFrame is a simpler format to decode 
and generate backtraces. SFrame unwinder itself, hence, is small and 
simple 
(https://github.com/oracle/binutils-gdb/blob/oracle/sframe-unwinder/libsframe/sframe-backtrace.c 
is how an SFrame based unwinder can look like. This code uses libsframe 
APIs like sframe_decode, sframe_find_fre, sframe_fre_get_fp_offset etc. 
to generate backtraces.).

There was some interest, at LPC 2022, in exploring an SFrame-based 
userspace unwinder for the kernel.  To get started on that, some 
discussion on following items will be great. (Please feel free to 
add/delete/correct any items; my knowledge about the kernel and its 
internals remains limited).

Userspace unwinder selection
----------------------------
IIUC, userspace unwinding is always frame-pointer based in the kernel. 
This is unlike kernel-space unwinding where there are a set of unwinders 
to chose from: say, for x86_64, UNWINDER_ORC / UNWINDER_FRAME_POINTER / 
UNWINDER_GUESS. Additionally, for kernel stack unwinding, there is also 
a framework in place to plug-and-play these different unwinders.

For userspace stack unwinding, first, we may want to add new config 
options, such that:
    - USERSPACE_UNWINDER_SFRAME => This option enables the SFrame 
unwinder for unwinding user stack traces as the first choice.  User 
programs must be built with SFrame support. If not, no SFrame section 
will be present in the user program binary; In such a case, the 
userspace unwinder defaults to frame pointer unwinding.
    - USERSPACE_UNWINDER_FRAME_POINTER => userspace unwinding does frame 
pointer based unwinding only. User programs must be built with frame 
pointer preservation build flags to ensure useful stack traces.

Second, regarding "the framework" needed for non-frame-pointer-based 
unwinders, more thought is needed.

Interface of the userspace unwinder
-----------------------------------
* OPTION 1
This one might be overly simplified but is an option.  We add the 
following stub:

    ...
    if (check_sframe_state_p (current)) // checks for SFrame sections if 
CONFIG_USER_UNWINDER_SFRAME is true
       sframe_callchain_user (entry, regs); // current is implicit, 
stores callchain entries as it unwinds using .sframe sections
    ...

in the following target APIs in x86_64 and aarch64 to give the desired 
effect of "userspace unwinder selection"
   -- perf_callchain_user in arch/x86/events/core.c
   -- perf_callchain_user in in arch/arm64/kernel/perf_callchain.c

where the functions look like:
   static inline bool check_sframe_state_p(struct task_struct *task);
   void sframe_callchain_user(struct perf_callchain_entry_ctx *entry, 
struct pt_regs *regs);

Here, sframe_callchain_user () will, first, perform an operation similar 
to dl_iterate_phdr, because we need the location of the SFrame sections 
for unwinding. This means, for every sframe_unwind() call, we go over 
the memory mappings of "current" task_struct and find the locations of 
the .sframe sections of the program + its DSOs from pages that contain 
the ELF program headers. Next, using these SFrame sections, it will then 
decode the SFrame section and unwind.

* OPTION 2
A possible optimization is to instead:

1. Cache some sframe related state, "struct *sframe_state", in the 
"struct task_struct" (guarded by CONFIG_USER_UNWINDER_SFRAME), and
2. Use an API like so "void sframe_callchain_user(struct *sframe_state, 
struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)"
This state (struct sframe_state) is simply put: data about the size and 
addr of the text and SFrame segments of the program and its DSOs. 
Ideally this state can be setup at task setup time and needs to be 
updated only if there is any change in the DSOs (added or removed) [1]. 
The size of the struct sframe_state itself is small here, as SFrame 
sections can be decoded on-the-fly with no need for additional mallocs.

[1] PS: That this detection of "add/delete of DSOs in a user program" is 
possible in some efficient way in the kernel remains an assumption; I 
still need to figure things out. Any inputs on this appreciated.

Other framework
---------------
The kernel stack unwinders adhere to some interface allowing them to be 
used interchangeably.  The requirements of the userspace unwinder are a 
bit different though: not all user applications may be compiled with 
SFrame support, which means there needs to be a way we fall back on the 
frame-pointer based unwinder in the kernel for unwinding user programs.

This requirement, however, does not mean that some framework changes 
shouldn't be done now to make things work better.

Any feedback/ideas are appreciated.  I have also not been able yet to 
evaluate what other impacts could this have on perf, if at all.

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread