From: Steven Rostedt <rostedt@goodmis.org>
To: Indu Bhagat <indu.bhagat@oracle.com>
Cc: linux-toolchains@vger.kernel.org, daandemeyer@meta.com,
andrii@kernel.org, kris.van.hees@oracle.com,
elena.zannoni@oracle.com, nick.alcock@oracle.com
Subject: Re: [POC 4/5] sframe: add an SFrame format stack tracer
Date: Mon, 1 May 2023 19:00:18 -0400 [thread overview]
Message-ID: <20230501190018.24ae7704@gandalf.local.home> (raw)
In-Reply-To: <20230501200410.3973453-5-indu.bhagat@oracle.com>
On Mon, 1 May 2023 13:04:09 -0700
Indu Bhagat <indu.bhagat@oracle.com> wrote:
> This patch adds an SFrame format based stack tracer.
>
> The files iterate_phdr.c, iterate_phdr.h implement a dl_iterate_phdr()
> like functionality.
>
> The SFrame format based stack tracer is implemented in the
> sframe_unwind.c with architecture specific bits in the
> arch/arm64/include/asm/sframe_regs.h and
> arch/x86/include/asm/sframe_regs.h. Please note that the SFrame format
> is supported for x86_64 (AMD64 ABI) and aarch64 (AAPCS64 ABI) only at
> this time.
>
> The files sframe_state.[ch] implement the SFrame state management APIs.
>
> Some aspects of the implementation are "POC like". These will need to
> addressed for the implementation to become more palatable:
> - dealing with only Elf64_Phdr (no Elf32_Phdr) at this time, and other
> TODOs in the iterate_phdr.c,
> - detecting whether a program did a dlopen/dlclose,
> - code stubs around user space memory access (.sframe section, ELF hdr
> etc.) by the kernel need careful review.
>
> There are more aspects than above; The intention of this patch set is to
> help drive the discussion on how to best incorporate an SFrame-based user
> space unwinder in the kernel.
>
> Signed-off-by: Indu Bhagat <indu.bhagat@oracle.com>
> ---
> arch/arm64/include/asm/sframe_regs.h | 37 +++
> arch/x86/include/asm/sframe_regs.h | 34 +++
> include/sframe/sframe_regs.h | 11 +
> include/sframe/sframe_unwind.h | 62 ++++
> lib/sframe/Makefile | 8 +-
> lib/sframe/iterate_phdr.c | 113 +++++++
> lib/sframe/iterate_phdr.h | 34 +++
> lib/sframe/sframe_state.c | 424 +++++++++++++++++++++++++++
> lib/sframe/sframe_state.h | 80 +++++
> lib/sframe/sframe_unwind.c | 208 +++++++++++++
> 10 files changed, 1010 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/include/asm/sframe_regs.h
> create mode 100644 arch/x86/include/asm/sframe_regs.h
> create mode 100644 include/sframe/sframe_regs.h
> create mode 100644 include/sframe/sframe_unwind.h
> create mode 100644 lib/sframe/iterate_phdr.c
> create mode 100644 lib/sframe/iterate_phdr.h
> create mode 100644 lib/sframe/sframe_state.c
> create mode 100644 lib/sframe/sframe_state.h
> create mode 100644 lib/sframe/sframe_unwind.c
>
> diff --git a/arch/arm64/include/asm/sframe_regs.h b/arch/arm64/include/asm/sframe_regs.h
> new file mode 100644
> index 000000000000..ae9ab9d5d3c1
> --- /dev/null
> +++ b/arch/arm64/include/asm/sframe_regs.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2023, Oracle and/or its affiliates.
> + */
> +
> +#ifdef ASM_ARM64_SFRAME_REGS_H
> +#define ASM_ARM64_SFRAME_REGS_H
> +
> +#define STACK_ACCESS_LEN 8
> +
> +static inline uint64_t
> +get_ptregs_ip(struct pt_regs *regs)
> +{
> + return regs->pc;
> +}
> +
> +static inline uint64_t
> +get_ptregs_sp(struct pt_regs *regs)
> +{
> + return regs->sp;
> +}
> +
> +static inline uint64_t
> +get_ptregs_fp(struct pt_regs *regs)
> +{
> +#define UNWIND_AARCH64_X29 29 /* 64-bit frame pointer. */
> + return (uint64_t)regs->regs[UNWIND_AARCH64_X29];
> +}
> +
> +static inline uint64_t
> +get_ptregs_ra(struct pt_regs *regs)
> +{
> +#define UNWIND_AARCH64_X30 30 /* 64-bit link pointer. */
> + return (uint64_t)regs->regs[UNWIND_AARCH64_X30];
> +}
> +
> +#endif /* ASM_ARM64_SFRAME_REGS_H */
> diff --git a/arch/x86/include/asm/sframe_regs.h b/arch/x86/include/asm/sframe_regs.h
> new file mode 100644
> index 000000000000..99f84955854a
> --- /dev/null
> +++ b/arch/x86/include/asm/sframe_regs.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2023, Oracle and/or its affiliates.
> + */
> +
> +#ifndef ASM_X86_SFRAME_REGS_H
> +#define ASM_X86_SFRAME_REGS_H
> +
> +#define STACK_ACCESS_LEN 8
> +
> +static inline uint64_t
> +get_ptregs_ip(struct pt_regs *regs)
> +{
> + return (uint64_t)regs->ip;
> +}
> +
> +static inline uint64_t
> +get_ptregs_sp(struct pt_regs *regs)
> +{
> + return (uint64_t)regs->sp;
> +}
> +
> +static inline uint64_t
> +get_ptregs_fp(struct pt_regs *regs)
> +{
> + return (uint64_t)regs->bp;
> +}
> +
> +static inline uint64_t
> +get_ptregs_ra(struct pt_regs *regs)
> +{
> + return 0; /* SFRAME_CFA_FIXED_RA_INVALID */
> +}
> +#endif /* ASM_X86_SFRAME_REGS_H */
> diff --git a/include/sframe/sframe_regs.h b/include/sframe/sframe_regs.h
> new file mode 100644
> index 000000000000..32b67f7a7c78
> --- /dev/null
> +++ b/include/sframe/sframe_regs.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2023, Oracle and/or its affiliates.
> + */
> +
> +#ifndef _SFRAME_REGS_H
> +#define _SFRAME_REGS_H
> +
> +#include <asm/sframe_regs.h>
> +
> +#endif /* _SFRAME_REGS_H */
> diff --git a/include/sframe/sframe_unwind.h b/include/sframe/sframe_unwind.h
> new file mode 100644
> index 000000000000..3e2c12816b60
> --- /dev/null
> +++ b/include/sframe/sframe_unwind.h
Also, these should probably go into include/linux, Unless there's going to
be a lot more header files.
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2023, Oracle and/or its affiliates.
> + */
> +
> +#ifndef _SFRAME_UNWIND_H
> +#define _SFRAME_UNWIND_H
> +
> +#include <linux/sched.h>
> +#include <linux/perf_event.h>
> +
> +#define PT_GNU_SFRAME 0x6474e554
> +
> +/*
> + * State used for SFrame-based stack tracing for a user space task.
> + */
> +struct user_unwind_state {
> + uint64_t pc, sp, fp, ra;
I know this is POC, but please make each structure field a separate item.
Also, should be tab delimited.
> + enum stack_type stype;
> + struct task_struct *task;
> + bool error;
> +};
Also swap the task and the stype, as the pointer to the task will create a
hole in the structure.
struct user_unwind_state {
uint64_t pc;
uint64_t sp;
uint64_t fp;
uint64_t ra;
struct task_stuct *task;
enum stack_type stype;
bool error;
};
> +
> +/*
> + * APIs for an SFrame based stack tracer.
> + */
> +
> +void sframe_unwind_start(struct user_unwind_state *state,
> + struct task_struct *task, struct pt_regs *regs);
> +bool sframe_unwind_next_frame(struct user_unwind_state *state);
> +uint64_t sframe_unwind_get_return_address(struct user_unwind_state *state);
> +
> +static inline bool sframe_unwind_done(struct user_unwind_state *state)
> +{
> + return state->stype == STACK_TYPE_UNKNOWN;
> +}
> +
> +static inline bool sframe_unwind_error(struct user_unwind_state *state)
> +{
> + return state->error;
> +}
> +
> +/*
> + * APIs to manage the SFrame state per task for stack tracing.
> + */
> +
> +extern struct sframe_state *unwind_sframe_state_alloc(struct task_struct *task);
> +extern int unwind_sframe_state_update(struct task_struct *task);
> +extern void unwind_sframe_state_cleanup(struct task_struct *task);
> +
> +extern bool unwind_sframe_state_valid_p(struct sframe_state *sfstate);
> +extern bool unwind_sframe_state_ready_p(struct sframe_state *sftate);
> +
> +/*
> + * Get the callchain using SFrame unwind info for the given task.
> + */
> +extern int
> +sframe_callchain_user(struct task_struct *task,
> + struct perf_callchain_entry_ctx *entry,
> + struct pt_regs *regs);
I plan on using this without any perf involvement, I'd like to keep perf
separate from the sframe logic. As I mentioned in a previous email, I
expect sframe to have callbacks. So the callchain format should be defined
by sframe, and not reuse perf.
> +
> +#endif /* _SFRAME_UNWIND_H */
> diff --git a/lib/sframe/Makefile b/lib/sframe/Makefile
> index 4e4291d9294f..5ee9e3e7ec93 100644
> --- a/lib/sframe/Makefile
> +++ b/lib/sframe/Makefile
> @@ -1,5 +1,11 @@
> # SPDX-License-Identifier: GPL-2.0
> ##################################
> -obj-$(CONFIG_USER_UNWINDER_SFRAME) += sframe_read.o \
> +obj-$(CONFIG_USER_UNWINDER_SFRAME) += iterate_phdr.o \
> + sframe_read.o \
> + sframe_state.o \
> + sframe_unwind.o
Ah, the backslash is fixed here.
>
> +CFLAGS_iterate_phdr.o += -I $(srctree)/lib/sframe/ -Wno-error=declaration-after-statement
> CFLAGS_sframe_read.o += -I $(srctree)/lib/sframe/
> +CFLAGS_sframe_state.o += -I $(srctree)/lib/sframe/
> +CFLAGS_sframe_unwind.o += -I $(srctree)/lib/sframe/
> diff --git a/lib/sframe/iterate_phdr.c b/lib/sframe/iterate_phdr.c
> new file mode 100644
> index 000000000000..c10d590ecc67
> --- /dev/null
> +++ b/lib/sframe/iterate_phdr.c
> @@ -0,0 +1,113 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2023, Oracle and/or its affiliates.
> + */
> +
> +#include <linux/elf.h>
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mm_types.h>
> +
> +#include "iterate_phdr.h"
> +
> +/*
> + * Iterate over the task's memory mappings and find the ELF headers.
> + *
> + * This is expected to be called from perf_callchain_user(), so user process
> + * context is expected.
My thought is that this will be called in the ptrace path (not the perf
path), so yes, it will be in user process context.
> + */
> +
> +int iterate_phdr(int (*callback)(struct phdr_info *info,
> + struct task_struct *task,
> + void *data),
> + struct task_struct *task, void *data)
> +{
> + struct mm_struct *mm;
> + struct vm_area_struct *vma_mt;
> + struct page *page;
> +
> + Elf64_Ehdr *ehdr;
> + struct phdr_info phinfo;
> +
> + int ret = 0, res = 0;
> + int err = 0;
> + bool first = true;
> +
> + memset(&phinfo, 0, sizeof(struct phdr_info));
> +
> + mm = task->mm;
> +
> + MA_STATE(mas, &mm->mm_mt, 0, 0);
> +
So this is the code I want to discuss at LSFMM :-) As there will be more
experts about this than what I know.
Let me go and start making the infrastructure to encompass this.
-- Steve
> + mas_for_each(&mas, vma_mt, ULONG_MAX) {
> + /* ELF header has a fixed place in the file, starting at offset
> + * zero.
> + */
> + if (vma_mt->vm_pgoff)
> + continue;
> +
> + /* For the callback to infer if its the prog or DSO we are
> + * dealing with.
> + */
> + phinfo.pi_prog = first;
> + first = false;
> + /* FIXME TODO
> + * - This code assumes 64-bit ELF by using Elf64_Ehdr.
> + * - Detect the case when ELF program headers to be of
> + * size > 1 page.
> + */
> +
> + /* FIXME TODO KERNEL
> + * - get_user_pages_WHAT, which API.
> + * What flags ? Is this correct ?
> + */
> + ret = get_user_pages_remote(mm, vma_mt->vm_start, 1, FOLL_GET,
> + &page, &vma_mt, NULL);
> + if (ret <= 0)
> + continue;
> +
> + /* The first page must have the ELF header. */
> + ehdr = vmap(&page, 1, VM_MAP, PAGE_KERNEL);
> + if (!ehdr)
> + goto put_page;
> +
> + /* Check for magic bytes to make sure this is ehdr. */
> + err = 0;
> + err |= ((ehdr->e_ident[EI_MAG0] != ELFMAG0)
> + || (ehdr->e_ident[EI_MAG1] != ELFMAG1)
> + || (ehdr->e_ident[EI_MAG2] != ELFMAG2)
> + || (ehdr->e_ident[EI_MAG3] != ELFMAG3));
> + if (err)
> + goto unmap;
> +
> + /*
> + * FIXME TODO handle the case when number of program headers is
> + * greater than or equal to PN_XNUM later.
> + */
> + if (ehdr->e_phnum == PN_XNUM)
> + goto unmap;
> + /*
> + * FIXME TODO handle the case when Elf phdrs span more than one
> + * page later ?
> + */
> + if ((sizeof(Elf64_Ehdr) + ehdr->e_phentsize * ehdr->e_phnum)
> + > PAGE_SIZE)
> + goto unmap;
> +
> + /* Save the location of program headers and the phnum. */
> + phinfo.pi_addr = vma_mt->vm_start;
> + phinfo.pi_phdr = (void *)ehdr + ehdr->e_phoff;
> + phinfo.pi_phnum = ehdr->e_phnum;
> +
> + res = callback(&phinfo, task, data);
> +unmap:
> + vunmap(ehdr);
> +put_page:
> + put_page(page);
> +
> + if (res < 0)
> + break;
> + }
> +
> + return res;
> +}
>
next prev parent reply other threads:[~2023-05-01 23:00 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-01 20:04 [POC 0/5] SFrame based stack tracer for user space in the kernel Indu Bhagat
2023-05-01 20:04 ` [POC 1/5] Kconfig: x86: Add new config options for userspace unwinder Indu Bhagat
2023-05-01 20:04 ` [POC 2/5] task_struct : add additional member for sframe state Indu Bhagat
2023-05-01 20:04 ` [POC 3/5] sframe: add new SFrame library Indu Bhagat
2023-05-01 22:40 ` Steven Rostedt
2023-05-02 5:07 ` Indu Bhagat
2023-05-02 8:46 ` Peter Zijlstra
2023-05-02 9:09 ` Peter Zijlstra
2023-05-02 9:20 ` Peter Zijlstra
2023-05-02 9:28 ` Peter Zijlstra
2023-05-02 9:30 ` Peter Zijlstra
2023-05-03 6:03 ` Indu Bhagat
2023-05-02 10:31 ` Peter Zijlstra
2023-05-02 10:41 ` Peter Zijlstra
2023-05-02 15:22 ` Steven Rostedt
2023-05-01 20:04 ` [POC 4/5] sframe: add an SFrame format stack tracer Indu Bhagat
2023-05-01 23:00 ` Steven Rostedt [this message]
2023-05-02 6:16 ` Indu Bhagat
2023-05-02 8:53 ` Peter Zijlstra
2023-05-02 9:04 ` Peter Zijlstra
2023-05-01 20:04 ` [POC 5/5] x86_64: invoke SFrame based stack tracer for user space Indu Bhagat
2023-05-01 23:11 ` Steven Rostedt
2023-05-02 10:53 ` Peter Zijlstra
2023-05-02 15:27 ` Steven Rostedt
2023-05-16 17:25 ` Andrii Nakryiko
2023-05-16 17:38 ` Steven Rostedt
2023-05-16 17:51 ` Andrii Nakryiko
2024-03-13 14:37 ` Tatsuyuki Ishi
2024-03-13 14:52 ` Steven Rostedt
2024-03-13 14:58 ` Tatsuyuki Ishi
2024-03-13 15:04 ` Steven Rostedt
2023-05-01 22:15 ` [POC 0/5] SFrame based stack tracer for user space in the kernel Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230501190018.24ae7704@gandalf.local.home \
--to=rostedt@goodmis.org \
--cc=andrii@kernel.org \
--cc=daandemeyer@meta.com \
--cc=elena.zannoni@oracle.com \
--cc=indu.bhagat@oracle.com \
--cc=kris.van.hees@oracle.com \
--cc=linux-toolchains@vger.kernel.org \
--cc=nick.alcock@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).