linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Indu Bhagat <indu.bhagat@oracle.com>
Cc: linux-toolchains@vger.kernel.org, daandemeyer@meta.com,
	andrii@kernel.org, kris.van.hees@oracle.com,
	elena.zannoni@oracle.com, nick.alcock@oracle.com
Subject: Re: [POC 4/5] sframe: add an SFrame format stack tracer
Date: Mon, 1 May 2023 19:00:18 -0400	[thread overview]
Message-ID: <20230501190018.24ae7704@gandalf.local.home> (raw)
In-Reply-To: <20230501200410.3973453-5-indu.bhagat@oracle.com>

On Mon,  1 May 2023 13:04:09 -0700
Indu Bhagat <indu.bhagat@oracle.com> wrote:

> This patch adds an SFrame format based stack tracer.
> 
> The files iterate_phdr.c, iterate_phdr.h implement a dl_iterate_phdr()
> like functionality.
> 
> The SFrame format based stack tracer is implemented in the
> sframe_unwind.c with architecture specific bits in the
> arch/arm64/include/asm/sframe_regs.h and
> arch/x86/include/asm/sframe_regs.h.  Please note that the SFrame format
> is supported for x86_64 (AMD64 ABI) and aarch64 (AAPCS64 ABI) only at
> this time.
> 
> The files sframe_state.[ch] implement the SFrame state management APIs.
> 
> Some aspects of the implementation are "POC like". These will need to
> addressed for the implementation to become more palatable:
> - dealing with only Elf64_Phdr (no Elf32_Phdr) at this time, and other
>   TODOs in the iterate_phdr.c,
> - detecting whether a program did a dlopen/dlclose,
> - code stubs around user space memory access (.sframe section, ELF hdr
>   etc.) by the kernel need careful review.
> 
> There are more aspects than above; The intention of this patch set is to
> help drive the discussion on how to best incorporate an SFrame-based user
> space unwinder in the kernel.
> 
> Signed-off-by: Indu Bhagat <indu.bhagat@oracle.com>
> ---
>  arch/arm64/include/asm/sframe_regs.h |  37 +++
>  arch/x86/include/asm/sframe_regs.h   |  34 +++
>  include/sframe/sframe_regs.h         |  11 +
>  include/sframe/sframe_unwind.h       |  62 ++++
>  lib/sframe/Makefile                  |   8 +-
>  lib/sframe/iterate_phdr.c            | 113 +++++++
>  lib/sframe/iterate_phdr.h            |  34 +++
>  lib/sframe/sframe_state.c            | 424 +++++++++++++++++++++++++++
>  lib/sframe/sframe_state.h            |  80 +++++
>  lib/sframe/sframe_unwind.c           | 208 +++++++++++++
>  10 files changed, 1010 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/include/asm/sframe_regs.h
>  create mode 100644 arch/x86/include/asm/sframe_regs.h
>  create mode 100644 include/sframe/sframe_regs.h
>  create mode 100644 include/sframe/sframe_unwind.h
>  create mode 100644 lib/sframe/iterate_phdr.c
>  create mode 100644 lib/sframe/iterate_phdr.h
>  create mode 100644 lib/sframe/sframe_state.c
>  create mode 100644 lib/sframe/sframe_state.h
>  create mode 100644 lib/sframe/sframe_unwind.c
> 
> diff --git a/arch/arm64/include/asm/sframe_regs.h b/arch/arm64/include/asm/sframe_regs.h
> new file mode 100644
> index 000000000000..ae9ab9d5d3c1
> --- /dev/null
> +++ b/arch/arm64/include/asm/sframe_regs.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2023, Oracle and/or its affiliates.
> + */
> +
> +#ifdef ASM_ARM64_SFRAME_REGS_H
> +#define ASM_ARM64_SFRAME_REGS_H
> +
> +#define STACK_ACCESS_LEN 8
> +
> +static inline uint64_t
> +get_ptregs_ip(struct pt_regs *regs)
> +{
> +	return regs->pc;
> +}
> +
> +static inline uint64_t
> +get_ptregs_sp(struct pt_regs *regs)
> +{
> +	return regs->sp;
> +}
> +
> +static inline uint64_t
> +get_ptregs_fp(struct pt_regs *regs)
> +{
> +#define UNWIND_AARCH64_X29              29      /* 64-bit frame pointer.  */
> +	return (uint64_t)regs->regs[UNWIND_AARCH64_X29];
> +}
> +
> +static inline uint64_t
> +get_ptregs_ra(struct pt_regs *regs)
> +{
> +#define UNWIND_AARCH64_X30              30      /* 64-bit link pointer.  */
> +	return (uint64_t)regs->regs[UNWIND_AARCH64_X30];
> +}
> +
> +#endif /* ASM_ARM64_SFRAME_REGS_H */
> diff --git a/arch/x86/include/asm/sframe_regs.h b/arch/x86/include/asm/sframe_regs.h
> new file mode 100644
> index 000000000000..99f84955854a
> --- /dev/null
> +++ b/arch/x86/include/asm/sframe_regs.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2023, Oracle and/or its affiliates.
> + */
> +
> +#ifndef ASM_X86_SFRAME_REGS_H
> +#define ASM_X86_SFRAME_REGS_H
> +
> +#define STACK_ACCESS_LEN 8
> +
> +static inline uint64_t
> +get_ptregs_ip(struct pt_regs *regs)
> +{
> +	return (uint64_t)regs->ip;
> +}
> +
> +static inline uint64_t
> +get_ptregs_sp(struct pt_regs *regs)
> +{
> +	return (uint64_t)regs->sp;
> +}
> +
> +static inline uint64_t
> +get_ptregs_fp(struct pt_regs *regs)
> +{
> +	return (uint64_t)regs->bp;
> +}
> +
> +static inline uint64_t
> +get_ptregs_ra(struct pt_regs *regs)
> +{
> +	return 0; /* SFRAME_CFA_FIXED_RA_INVALID */
> +}
> +#endif /* ASM_X86_SFRAME_REGS_H */
> diff --git a/include/sframe/sframe_regs.h b/include/sframe/sframe_regs.h
> new file mode 100644
> index 000000000000..32b67f7a7c78
> --- /dev/null
> +++ b/include/sframe/sframe_regs.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2023, Oracle and/or its affiliates.
> + */
> +
> +#ifndef _SFRAME_REGS_H
> +#define _SFRAME_REGS_H
> +
> +#include <asm/sframe_regs.h>
> +
> +#endif /* _SFRAME_REGS_H */
> diff --git a/include/sframe/sframe_unwind.h b/include/sframe/sframe_unwind.h
> new file mode 100644
> index 000000000000..3e2c12816b60
> --- /dev/null
> +++ b/include/sframe/sframe_unwind.h

Also, these should probably go into include/linux, Unless there's going to
be a lot more header files.

> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2023, Oracle and/or its affiliates.
> + */
> +
> +#ifndef _SFRAME_UNWIND_H
> +#define _SFRAME_UNWIND_H
> +
> +#include <linux/sched.h>
> +#include <linux/perf_event.h>
> +
> +#define PT_GNU_SFRAME  0x6474e554
> +
> +/*
> + * State used for SFrame-based stack tracing for a user space task.
> + */
> +struct user_unwind_state {
> +	uint64_t pc, sp, fp, ra;

I know this is POC, but please make each structure field a separate item.
Also, should be tab delimited.

> +	enum stack_type stype;
> +	struct task_struct *task;
> +	bool error;
> +};

Also swap the task and the stype, as the pointer to the task will create a
hole in the structure.

struct user_unwind_state {
	uint64_t		pc;
	uint64_t		sp;
	uint64_t		fp;
	uint64_t		ra;
	struct task_stuct	*task;
	enum stack_type		stype;
	bool			error;
};

> +
> +/*
> + * APIs for an SFrame based stack tracer.
> + */
> +
> +void sframe_unwind_start(struct user_unwind_state *state,
> +			 struct task_struct *task, struct pt_regs *regs);
> +bool sframe_unwind_next_frame(struct user_unwind_state *state);
> +uint64_t sframe_unwind_get_return_address(struct user_unwind_state *state);
> +
> +static inline bool sframe_unwind_done(struct user_unwind_state *state)
> +{
> +	return state->stype == STACK_TYPE_UNKNOWN;
> +}
> +
> +static inline bool sframe_unwind_error(struct user_unwind_state *state)
> +{
> +	return state->error;
> +}
> +
> +/*
> + * APIs to manage the SFrame state per task for stack tracing.
> + */
> +
> +extern struct sframe_state *unwind_sframe_state_alloc(struct task_struct *task);
> +extern int unwind_sframe_state_update(struct task_struct *task);
> +extern void unwind_sframe_state_cleanup(struct task_struct *task);
> +
> +extern bool unwind_sframe_state_valid_p(struct sframe_state *sfstate);
> +extern bool unwind_sframe_state_ready_p(struct sframe_state *sftate);
> +
> +/*
> + * Get the callchain using SFrame unwind info for the given task.
> + */
> +extern int
> +sframe_callchain_user(struct task_struct *task,
> +		      struct perf_callchain_entry_ctx *entry,
> +		      struct pt_regs *regs);


I plan on using this without any perf involvement, I'd like to keep perf
separate from the sframe logic. As I mentioned in a previous email, I
expect sframe to have callbacks. So the callchain format should be defined
by sframe, and not reuse perf.

> +
> +#endif /* _SFRAME_UNWIND_H */
> diff --git a/lib/sframe/Makefile b/lib/sframe/Makefile
> index 4e4291d9294f..5ee9e3e7ec93 100644
> --- a/lib/sframe/Makefile
> +++ b/lib/sframe/Makefile
> @@ -1,5 +1,11 @@
>  # SPDX-License-Identifier: GPL-2.0
>  ##################################
> -obj-$(CONFIG_USER_UNWINDER_SFRAME) += sframe_read.o \
> +obj-$(CONFIG_USER_UNWINDER_SFRAME) += iterate_phdr.o \
> +				      sframe_read.o \
> +				      sframe_state.o \
> +				      sframe_unwind.o

Ah, the backslash is fixed here.

>  
> +CFLAGS_iterate_phdr.o += -I $(srctree)/lib/sframe/ -Wno-error=declaration-after-statement
>  CFLAGS_sframe_read.o += -I $(srctree)/lib/sframe/
> +CFLAGS_sframe_state.o += -I $(srctree)/lib/sframe/
> +CFLAGS_sframe_unwind.o += -I $(srctree)/lib/sframe/
> diff --git a/lib/sframe/iterate_phdr.c b/lib/sframe/iterate_phdr.c
> new file mode 100644
> index 000000000000..c10d590ecc67
> --- /dev/null
> +++ b/lib/sframe/iterate_phdr.c
> @@ -0,0 +1,113 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2023, Oracle and/or its affiliates.
> + */
> +
> +#include <linux/elf.h>
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mm_types.h>
> +
> +#include "iterate_phdr.h"
> +
> +/*
> + * Iterate over the task's memory mappings and find the ELF headers.
> + *
> + * This is expected to be called from perf_callchain_user(), so user process
> + * context is expected.

My thought is that this will be called in the ptrace path (not the perf
path), so yes, it will be in user process context.

> + */
> +
> +int iterate_phdr(int (*callback)(struct phdr_info *info,
> +				 struct task_struct *task,
> +				 void *data),
> +		 struct task_struct *task, void *data)
> +{
> +	struct mm_struct *mm;
> +	struct vm_area_struct *vma_mt;
> +	struct page *page;
> +
> +	Elf64_Ehdr *ehdr;
> +	struct phdr_info phinfo;
> +
> +	int ret = 0, res = 0;
> +	int err = 0;
> +	bool first = true;
> +
> +	memset(&phinfo, 0, sizeof(struct phdr_info));
> +
> +	mm = task->mm;
> +
> +	MA_STATE(mas, &mm->mm_mt, 0, 0);
> +

So this is the code I want to discuss at LSFMM :-) As there will be more
experts about this than what I know.

Let me go and start making the infrastructure to encompass this.

-- Steve


> +	mas_for_each(&mas, vma_mt, ULONG_MAX) {
> +		/* ELF header has a fixed place in the file, starting at offset
> +		 * zero.
> +		 */
> +		if (vma_mt->vm_pgoff)
> +			continue;
> +
> +		/* For the callback to infer if its the prog or DSO we are
> +		 * dealing with.
> +		 */
> +		phinfo.pi_prog = first;
> +		first = false;
> +		/* FIXME TODO
> +		 *  - This code assumes 64-bit ELF by using Elf64_Ehdr.
> +		 *  - Detect the case when ELF program headers to be of
> +		 * size > 1 page.
> +		 */
> +
> +		/* FIXME TODO KERNEL
> +		 *  - get_user_pages_WHAT, which API.
> +		 *  What flags ? Is this correct ?
> +		 */
> +		ret = get_user_pages_remote(mm, vma_mt->vm_start, 1, FOLL_GET,
> +					    &page, &vma_mt, NULL);
> +		if (ret <= 0)
> +			continue;
> +
> +		/* The first page must have the ELF header. */
> +		ehdr = vmap(&page, 1, VM_MAP, PAGE_KERNEL);
> +		if (!ehdr)
> +			goto put_page;
> +
> +		/* Check for magic bytes to make sure this is ehdr. */
> +		err = 0;
> +		err |= ((ehdr->e_ident[EI_MAG0] != ELFMAG0)
> +			|| (ehdr->e_ident[EI_MAG1] != ELFMAG1)
> +			|| (ehdr->e_ident[EI_MAG2] != ELFMAG2)
> +			|| (ehdr->e_ident[EI_MAG3] != ELFMAG3));
> +		if (err)
> +			goto unmap;
> +
> +		/*
> +		 * FIXME TODO handle the case when number of program headers is
> +		 * greater than or equal to PN_XNUM later.
> +		 */
> +		if (ehdr->e_phnum == PN_XNUM)
> +			goto unmap;
> +		/*
> +		 * FIXME TODO handle the case when Elf phdrs span more than one
> +		 * page later ?
> +		 */
> +		if ((sizeof(Elf64_Ehdr) + ehdr->e_phentsize * ehdr->e_phnum)
> +		    > PAGE_SIZE)
> +			goto unmap;
> +
> +		/* Save the location of program headers and the phnum. */
> +		phinfo.pi_addr = vma_mt->vm_start;
> +		phinfo.pi_phdr = (void *)ehdr + ehdr->e_phoff;
> +		phinfo.pi_phnum = ehdr->e_phnum;
> +
> +		res = callback(&phinfo, task, data);
> +unmap:
> +		vunmap(ehdr);
> +put_page:
> +		put_page(page);
> +
> +		if (res < 0)
> +			break;
> +	}
> +
> +	return res;
> +}
>

  reply	other threads:[~2023-05-01 23:00 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-01 20:04 [POC 0/5] SFrame based stack tracer for user space in the kernel Indu Bhagat
2023-05-01 20:04 ` [POC 1/5] Kconfig: x86: Add new config options for userspace unwinder Indu Bhagat
2023-05-01 20:04 ` [POC 2/5] task_struct : add additional member for sframe state Indu Bhagat
2023-05-01 20:04 ` [POC 3/5] sframe: add new SFrame library Indu Bhagat
2023-05-01 22:40   ` Steven Rostedt
2023-05-02  5:07     ` Indu Bhagat
2023-05-02  8:46     ` Peter Zijlstra
2023-05-02  9:09   ` Peter Zijlstra
2023-05-02  9:20   ` Peter Zijlstra
2023-05-02  9:28   ` Peter Zijlstra
2023-05-02  9:30   ` Peter Zijlstra
2023-05-03  6:03     ` Indu Bhagat
2023-05-02 10:31   ` Peter Zijlstra
2023-05-02 10:41   ` Peter Zijlstra
2023-05-02 15:22     ` Steven Rostedt
2023-05-01 20:04 ` [POC 4/5] sframe: add an SFrame format stack tracer Indu Bhagat
2023-05-01 23:00   ` Steven Rostedt [this message]
2023-05-02  6:16     ` Indu Bhagat
2023-05-02  8:53   ` Peter Zijlstra
2023-05-02  9:04   ` Peter Zijlstra
2023-05-01 20:04 ` [POC 5/5] x86_64: invoke SFrame based stack tracer for user space Indu Bhagat
2023-05-01 23:11   ` Steven Rostedt
2023-05-02 10:53   ` Peter Zijlstra
2023-05-02 15:27     ` Steven Rostedt
2023-05-16 17:25       ` Andrii Nakryiko
2023-05-16 17:38         ` Steven Rostedt
2023-05-16 17:51           ` Andrii Nakryiko
2024-03-13 14:37       ` Tatsuyuki Ishi
2024-03-13 14:52         ` Steven Rostedt
2024-03-13 14:58           ` Tatsuyuki Ishi
2024-03-13 15:04             ` Steven Rostedt
2023-05-01 22:15 ` [POC 0/5] SFrame based stack tracer for user space in the kernel Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230501190018.24ae7704@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=andrii@kernel.org \
    --cc=daandemeyer@meta.com \
    --cc=elena.zannoni@oracle.com \
    --cc=indu.bhagat@oracle.com \
    --cc=kris.van.hees@oracle.com \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=nick.alcock@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).