linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Indu Bhagat <indu.bhagat@oracle.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-toolchains@vger.kernel.org, daandemeyer@meta.com,
	andrii@kernel.org, kris.van.hees@oracle.com,
	elena.zannoni@oracle.com, nick.alcock@oracle.com
Subject: Re: [POC 4/5] sframe: add an SFrame format stack tracer
Date: Mon, 1 May 2023 23:16:38 -0700	[thread overview]
Message-ID: <84bf4aee-e5a5-8e49-3826-ecb79eefc85b@oracle.com> (raw)
In-Reply-To: <20230501190018.24ae7704@gandalf.local.home>

On 5/1/23 16:00, Steven Rostedt wrote:
> On Mon,  1 May 2023 13:04:09 -0700
> Indu Bhagat <indu.bhagat@oracle.com> wrote:
> 
>> This patch adds an SFrame format based stack tracer.
>>
>> The files iterate_phdr.c, iterate_phdr.h implement a dl_iterate_phdr()
>> like functionality.
>>
>> The SFrame format based stack tracer is implemented in the
>> sframe_unwind.c with architecture specific bits in the
>> arch/arm64/include/asm/sframe_regs.h and
>> arch/x86/include/asm/sframe_regs.h.  Please note that the SFrame format
>> is supported for x86_64 (AMD64 ABI) and aarch64 (AAPCS64 ABI) only at
>> this time.
>>
>> The files sframe_state.[ch] implement the SFrame state management APIs.
>>
>> Some aspects of the implementation are "POC like". These will need to
>> addressed for the implementation to become more palatable:
>> - dealing with only Elf64_Phdr (no Elf32_Phdr) at this time, and other
>>    TODOs in the iterate_phdr.c,
>> - detecting whether a program did a dlopen/dlclose,
>> - code stubs around user space memory access (.sframe section, ELF hdr
>>    etc.) by the kernel need careful review.
>>
>> There are more aspects than above; The intention of this patch set is to
>> help drive the discussion on how to best incorporate an SFrame-based user
>> space unwinder in the kernel.
>>
>> Signed-off-by: Indu Bhagat <indu.bhagat@oracle.com>
>> ---
>>   arch/arm64/include/asm/sframe_regs.h |  37 +++
>>   arch/x86/include/asm/sframe_regs.h   |  34 +++
>>   include/sframe/sframe_regs.h         |  11 +
>>   include/sframe/sframe_unwind.h       |  62 ++++
>>   lib/sframe/Makefile                  |   8 +-
>>   lib/sframe/iterate_phdr.c            | 113 +++++++
>>   lib/sframe/iterate_phdr.h            |  34 +++
>>   lib/sframe/sframe_state.c            | 424 +++++++++++++++++++++++++++
>>   lib/sframe/sframe_state.h            |  80 +++++
>>   lib/sframe/sframe_unwind.c           | 208 +++++++++++++
>>   10 files changed, 1010 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/arm64/include/asm/sframe_regs.h
>>   create mode 100644 arch/x86/include/asm/sframe_regs.h
>>   create mode 100644 include/sframe/sframe_regs.h
>>   create mode 100644 include/sframe/sframe_unwind.h
>>   create mode 100644 lib/sframe/iterate_phdr.c
>>   create mode 100644 lib/sframe/iterate_phdr.h
>>   create mode 100644 lib/sframe/sframe_state.c
>>   create mode 100644 lib/sframe/sframe_state.h
>>   create mode 100644 lib/sframe/sframe_unwind.c
>>
>> diff --git a/arch/arm64/include/asm/sframe_regs.h b/arch/arm64/include/asm/sframe_regs.h
>> new file mode 100644
>> index 000000000000..ae9ab9d5d3c1
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/sframe_regs.h
>> @@ -0,0 +1,37 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +/*
>> + * Copyright (C) 2023, Oracle and/or its affiliates.
>> + */
>> +
>> +#ifdef ASM_ARM64_SFRAME_REGS_H
>> +#define ASM_ARM64_SFRAME_REGS_H
>> +
>> +#define STACK_ACCESS_LEN 8
>> +
>> +static inline uint64_t
>> +get_ptregs_ip(struct pt_regs *regs)
>> +{
>> +	return regs->pc;
>> +}
>> +
>> +static inline uint64_t
>> +get_ptregs_sp(struct pt_regs *regs)
>> +{
>> +	return regs->sp;
>> +}
>> +
>> +static inline uint64_t
>> +get_ptregs_fp(struct pt_regs *regs)
>> +{
>> +#define UNWIND_AARCH64_X29              29      /* 64-bit frame pointer.  */
>> +	return (uint64_t)regs->regs[UNWIND_AARCH64_X29];
>> +}
>> +
>> +static inline uint64_t
>> +get_ptregs_ra(struct pt_regs *regs)
>> +{
>> +#define UNWIND_AARCH64_X30              30      /* 64-bit link pointer.  */
>> +	return (uint64_t)regs->regs[UNWIND_AARCH64_X30];
>> +}
>> +
>> +#endif /* ASM_ARM64_SFRAME_REGS_H */
>> diff --git a/arch/x86/include/asm/sframe_regs.h b/arch/x86/include/asm/sframe_regs.h
>> new file mode 100644
>> index 000000000000..99f84955854a
>> --- /dev/null
>> +++ b/arch/x86/include/asm/sframe_regs.h
>> @@ -0,0 +1,34 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +/*
>> + * Copyright (C) 2023, Oracle and/or its affiliates.
>> + */
>> +
>> +#ifndef ASM_X86_SFRAME_REGS_H
>> +#define ASM_X86_SFRAME_REGS_H
>> +
>> +#define STACK_ACCESS_LEN 8
>> +
>> +static inline uint64_t
>> +get_ptregs_ip(struct pt_regs *regs)
>> +{
>> +	return (uint64_t)regs->ip;
>> +}
>> +
>> +static inline uint64_t
>> +get_ptregs_sp(struct pt_regs *regs)
>> +{
>> +	return (uint64_t)regs->sp;
>> +}
>> +
>> +static inline uint64_t
>> +get_ptregs_fp(struct pt_regs *regs)
>> +{
>> +	return (uint64_t)regs->bp;
>> +}
>> +
>> +static inline uint64_t
>> +get_ptregs_ra(struct pt_regs *regs)
>> +{
>> +	return 0; /* SFRAME_CFA_FIXED_RA_INVALID */
>> +}
>> +#endif /* ASM_X86_SFRAME_REGS_H */
>> diff --git a/include/sframe/sframe_regs.h b/include/sframe/sframe_regs.h
>> new file mode 100644
>> index 000000000000..32b67f7a7c78
>> --- /dev/null
>> +++ b/include/sframe/sframe_regs.h
>> @@ -0,0 +1,11 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +/*
>> + * Copyright (C) 2023, Oracle and/or its affiliates.
>> + */
>> +
>> +#ifndef _SFRAME_REGS_H
>> +#define _SFRAME_REGS_H
>> +
>> +#include <asm/sframe_regs.h>
>> +
>> +#endif /* _SFRAME_REGS_H */
>> diff --git a/include/sframe/sframe_unwind.h b/include/sframe/sframe_unwind.h
>> new file mode 100644
>> index 000000000000..3e2c12816b60
>> --- /dev/null
>> +++ b/include/sframe/sframe_unwind.h
> 
> Also, these should probably go into include/linux, Unless there's going to
> be a lot more header files.
> 

I'd expect at most the current headers:
include/sframe/sframe_unwind.h
include/sframe/sframe_regs.h

And perhaps one more, for a callchain format and callbacks suggested below ?

>> @@ -0,0 +1,62 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +/*
>> + * Copyright (C) 2023, Oracle and/or its affiliates.
>> + */
>> +
>> +#ifndef _SFRAME_UNWIND_H
>> +#define _SFRAME_UNWIND_H
>> +
>> +#include <linux/sched.h>
>> +#include <linux/perf_event.h>
>> +
>> +#define PT_GNU_SFRAME  0x6474e554
>> +
>> +/*
>> + * State used for SFrame-based stack tracing for a user space task.
>> + */
>> +struct user_unwind_state {
>> +	uint64_t pc, sp, fp, ra;
> 
> I know this is POC, but please make each structure field a separate item.
> Also, should be tab delimited.
> 

OK.

>> +	enum stack_type stype;
>> +	struct task_struct *task;
>> +	bool error;
>> +};
> 
> Also swap the task and the stype, as the pointer to the task will create a
> hole in the structure.
> 
> struct user_unwind_state {
> 	uint64_t		pc;
> 	uint64_t		sp;
> 	uint64_t		fp;
> 	uint64_t		ra;
> 	struct task_stuct	*task;
> 	enum stack_type		stype;
> 	bool			error;
> };
> 

OK.

>> +
>> +/*
>> + * APIs for an SFrame based stack tracer.
>> + */
>> +
>> +void sframe_unwind_start(struct user_unwind_state *state,
>> +			 struct task_struct *task, struct pt_regs *regs);
>> +bool sframe_unwind_next_frame(struct user_unwind_state *state);
>> +uint64_t sframe_unwind_get_return_address(struct user_unwind_state *state);
>> +
>> +static inline bool sframe_unwind_done(struct user_unwind_state *state)
>> +{
>> +	return state->stype == STACK_TYPE_UNKNOWN;
>> +}
>> +
>> +static inline bool sframe_unwind_error(struct user_unwind_state *state)
>> +{
>> +	return state->error;
>> +}
>> +
>> +/*
>> + * APIs to manage the SFrame state per task for stack tracing.
>> + */
>> +
>> +extern struct sframe_state *unwind_sframe_state_alloc(struct task_struct *task);
>> +extern int unwind_sframe_state_update(struct task_struct *task);
>> +extern void unwind_sframe_state_cleanup(struct task_struct *task);
>> +
>> +extern bool unwind_sframe_state_valid_p(struct sframe_state *sfstate);
>> +extern bool unwind_sframe_state_ready_p(struct sframe_state *sftate);
>> +
>> +/*
>> + * Get the callchain using SFrame unwind info for the given task.
>> + */
>> +extern int
>> +sframe_callchain_user(struct task_struct *task,
>> +		      struct perf_callchain_entry_ctx *entry,
>> +		      struct pt_regs *regs);
> 
> 
> I plan on using this without any perf involvement, I'd like to keep perf
> separate from the sframe logic. As I mentioned in a previous email, I
> expect sframe to have callbacks. So the callchain format should be defined
> by sframe, and not reuse perf.
> 

I will think about this. Do you have some model of the expected 
callbacks for me to explore ?

>> +
>> +#endif /* _SFRAME_UNWIND_H */
>> diff --git a/lib/sframe/Makefile b/lib/sframe/Makefile
>> index 4e4291d9294f..5ee9e3e7ec93 100644
>> --- a/lib/sframe/Makefile
>> +++ b/lib/sframe/Makefile
>> @@ -1,5 +1,11 @@
>>   # SPDX-License-Identifier: GPL-2.0
>>   ##################################
>> -obj-$(CONFIG_USER_UNWINDER_SFRAME) += sframe_read.o \
>> +obj-$(CONFIG_USER_UNWINDER_SFRAME) += iterate_phdr.o \
>> +				      sframe_read.o \
>> +				      sframe_state.o \
>> +				      sframe_unwind.o
> 
> Ah, the backslash is fixed here.
> 

Yes, it was a rebase thing. It got missed when moving code between patches.

>>   
>> +CFLAGS_iterate_phdr.o += -I $(srctree)/lib/sframe/ -Wno-error=declaration-after-statement
>>   CFLAGS_sframe_read.o += -I $(srctree)/lib/sframe/
>> +CFLAGS_sframe_state.o += -I $(srctree)/lib/sframe/
>> +CFLAGS_sframe_unwind.o += -I $(srctree)/lib/sframe/
>> diff --git a/lib/sframe/iterate_phdr.c b/lib/sframe/iterate_phdr.c
>> new file mode 100644
>> index 000000000000..c10d590ecc67
>> --- /dev/null
>> +++ b/lib/sframe/iterate_phdr.c
>> @@ -0,0 +1,113 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * Copyright (C) 2023, Oracle and/or its affiliates.
>> + */
>> +
>> +#include <linux/elf.h>
>> +#include <linux/mm.h>
>> +#include <linux/vmalloc.h>
>> +#include <linux/mm_types.h>
>> +
>> +#include "iterate_phdr.h"
>> +
>> +/*
>> + * Iterate over the task's memory mappings and find the ELF headers.
>> + *
>> + * This is expected to be called from perf_callchain_user(), so user process
>> + * context is expected.
> 
> My thought is that this will be called in the ptrace path (not the perf
> path), so yes, it will be in user process context.
> 
>> + */
>> +
>> +int iterate_phdr(int (*callback)(struct phdr_info *info,
>> +				 struct task_struct *task,
>> +				 void *data),
>> +		 struct task_struct *task, void *data)
>> +{
>> +	struct mm_struct *mm;
>> +	struct vm_area_struct *vma_mt;
>> +	struct page *page;
>> +
>> +	Elf64_Ehdr *ehdr;
>> +	struct phdr_info phinfo;
>> +
>> +	int ret = 0, res = 0;
>> +	int err = 0;
>> +	bool first = true;
>> +
>> +	memset(&phinfo, 0, sizeof(struct phdr_info));
>> +
>> +	mm = task->mm;
>> +
>> +	MA_STATE(mas, &mm->mm_mt, 0, 0);
>> +
> 
> So this is the code I want to discuss at LSFMM :-) As there will be more
> experts about this than what I know.
> 
> Let me go and start making the infrastructure to encompass this.
> 
> -- Steve
> 
> 
>> +	mas_for_each(&mas, vma_mt, ULONG_MAX) {
>> +		/* ELF header has a fixed place in the file, starting at offset
>> +		 * zero.
>> +		 */
>> +		if (vma_mt->vm_pgoff)
>> +			continue;
>> +
>> +		/* For the callback to infer if its the prog or DSO we are
>> +		 * dealing with.
>> +		 */
>> +		phinfo.pi_prog = first;
>> +		first = false;
>> +		/* FIXME TODO
>> +		 *  - This code assumes 64-bit ELF by using Elf64_Ehdr.
>> +		 *  - Detect the case when ELF program headers to be of
>> +		 * size > 1 page.
>> +		 */
>> +
>> +		/* FIXME TODO KERNEL
>> +		 *  - get_user_pages_WHAT, which API.
>> +		 *  What flags ? Is this correct ?
>> +		 */
>> +		ret = get_user_pages_remote(mm, vma_mt->vm_start, 1, FOLL_GET,
>> +					    &page, &vma_mt, NULL);
>> +		if (ret <= 0)
>> +			continue;
>> +
>> +		/* The first page must have the ELF header. */
>> +		ehdr = vmap(&page, 1, VM_MAP, PAGE_KERNEL);
>> +		if (!ehdr)
>> +			goto put_page;
>> +
>> +		/* Check for magic bytes to make sure this is ehdr. */
>> +		err = 0;
>> +		err |= ((ehdr->e_ident[EI_MAG0] != ELFMAG0)
>> +			|| (ehdr->e_ident[EI_MAG1] != ELFMAG1)
>> +			|| (ehdr->e_ident[EI_MAG2] != ELFMAG2)
>> +			|| (ehdr->e_ident[EI_MAG3] != ELFMAG3));
>> +		if (err)
>> +			goto unmap;
>> +
>> +		/*
>> +		 * FIXME TODO handle the case when number of program headers is
>> +		 * greater than or equal to PN_XNUM later.
>> +		 */
>> +		if (ehdr->e_phnum == PN_XNUM)
>> +			goto unmap;
>> +		/*
>> +		 * FIXME TODO handle the case when Elf phdrs span more than one
>> +		 * page later ?
>> +		 */
>> +		if ((sizeof(Elf64_Ehdr) + ehdr->e_phentsize * ehdr->e_phnum)
>> +		    > PAGE_SIZE)
>> +			goto unmap;
>> +
>> +		/* Save the location of program headers and the phnum. */
>> +		phinfo.pi_addr = vma_mt->vm_start;
>> +		phinfo.pi_phdr = (void *)ehdr + ehdr->e_phoff;
>> +		phinfo.pi_phnum = ehdr->e_phnum;
>> +
>> +		res = callback(&phinfo, task, data);
>> +unmap:
>> +		vunmap(ehdr);
>> +put_page:
>> +		put_page(page);
>> +
>> +		if (res < 0)
>> +			break;
>> +	}
>> +
>> +	return res;
>> +}
>>


  reply	other threads:[~2023-05-02  6:17 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-01 20:04 [POC 0/5] SFrame based stack tracer for user space in the kernel Indu Bhagat
2023-05-01 20:04 ` [POC 1/5] Kconfig: x86: Add new config options for userspace unwinder Indu Bhagat
2023-05-01 20:04 ` [POC 2/5] task_struct : add additional member for sframe state Indu Bhagat
2023-05-01 20:04 ` [POC 3/5] sframe: add new SFrame library Indu Bhagat
2023-05-01 22:40   ` Steven Rostedt
2023-05-02  5:07     ` Indu Bhagat
2023-05-02  8:46     ` Peter Zijlstra
2023-05-02  9:09   ` Peter Zijlstra
2023-05-02  9:20   ` Peter Zijlstra
2023-05-02  9:28   ` Peter Zijlstra
2023-05-02  9:30   ` Peter Zijlstra
2023-05-03  6:03     ` Indu Bhagat
2023-05-02 10:31   ` Peter Zijlstra
2023-05-02 10:41   ` Peter Zijlstra
2023-05-02 15:22     ` Steven Rostedt
2023-05-01 20:04 ` [POC 4/5] sframe: add an SFrame format stack tracer Indu Bhagat
2023-05-01 23:00   ` Steven Rostedt
2023-05-02  6:16     ` Indu Bhagat [this message]
2023-05-02  8:53   ` Peter Zijlstra
2023-05-02  9:04   ` Peter Zijlstra
2023-05-01 20:04 ` [POC 5/5] x86_64: invoke SFrame based stack tracer for user space Indu Bhagat
2023-05-01 23:11   ` Steven Rostedt
2023-05-02 10:53   ` Peter Zijlstra
2023-05-02 15:27     ` Steven Rostedt
2023-05-16 17:25       ` Andrii Nakryiko
2023-05-16 17:38         ` Steven Rostedt
2023-05-16 17:51           ` Andrii Nakryiko
2024-03-13 14:37       ` Tatsuyuki Ishi
2024-03-13 14:52         ` Steven Rostedt
2024-03-13 14:58           ` Tatsuyuki Ishi
2024-03-13 15:04             ` Steven Rostedt
2023-05-01 22:15 ` [POC 0/5] SFrame based stack tracer for user space in the kernel Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84bf4aee-e5a5-8e49-3826-ecb79eefc85b@oracle.com \
    --to=indu.bhagat@oracle.com \
    --cc=andrii@kernel.org \
    --cc=daandemeyer@meta.com \
    --cc=elena.zannoni@oracle.com \
    --cc=kris.van.hees@oracle.com \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=nick.alcock@oracle.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).