linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josh Poimboeuf <jpoimboe@kernel.org>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	Indu Bhagat <indu.bhagat@oracle.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-perf-users@vger.kernel.org, Mark Brown <broonie@kernel.org>,
	linux-toolchains@vger.kernel.org
Subject: Re: [PATCH RFC 04/10] perf: Introduce deferred user callchains
Date: Sat, 11 Nov 2023 10:49:08 -0800	[thread overview]
Message-ID: <20231111184908.ym4l6cwzwnkl7e6m@treble> (raw)
In-Reply-To: <CAM9d7chZcqR8WCEYtjpP4KzUOeNdJ=kSvae0UrjsO8OgsepjDw@mail.gmail.com>

On Fri, Nov 10, 2023 at 10:57:58PM -0800, Namhyung Kim wrote:
> On Wed, Nov 8, 2023 at 4:43 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > +++ b/kernel/bpf/stackmap.c
> > @@ -294,8 +294,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
> >         if (max_depth > sysctl_perf_event_max_stack)
> >                 max_depth = sysctl_perf_event_max_stack;
> >
> > -       trace = get_perf_callchain(regs, kernel, user, max_depth, false);
> > -
> > +       trace = get_perf_callchain(regs, kernel, user, max_depth, false, false);
> >         if (unlikely(!trace))
> >                 /* couldn't fetch the stack trace */
> >                 return -EFAULT;
> > @@ -420,7 +419,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> >                 trace = get_callchain_entry_for_task(task, max_depth);
> >         else
> >                 trace = get_perf_callchain(regs, kernel, user, max_depth,
> > -                                          false);
> > +                                          false, false);
> 
> Hmm.. BPF is not supported.  Any plans?

I'm not sure whether this feature makes sense for BPF.  If the BPF
program runs in atomic context then it would have to defer the unwind
until right before exiting to user space.  But then the program is no
longer running.

> > +++ b/kernel/events/callchain.c
> > @@ -178,7 +178,7 @@ put_callchain_entry(int rctx)
> >
> >  struct perf_callchain_entry *
> >  get_perf_callchain(struct pt_regs *regs, bool kernel, bool user,
> > -                  u32 max_stack, bool add_mark)
> > +                  u32 max_stack, bool add_mark, bool defer_user)
> >  {
> >         struct perf_callchain_entry *entry;
> >         struct perf_callchain_entry_ctx ctx;
> > @@ -207,6 +207,11 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, bool user,
> >                         regs = task_pt_regs(current);
> >                 }
> >
> > +               if (defer_user) {
> > +                       perf_callchain_store_context(&ctx, PERF_CONTEXT_USER_DEFERRED);
> > +                       goto exit_put;
> 
> Hmm.. ok.  Then now perf tools need to stitch the call stacks
> in two (or more?) samples.

Yes.  And it could be more than two samples if there were multiple NMIs
before returning to user space.  In that case there would be a single
user sample which needs to be stitched to each of the kernel samples.

> > +               }
> > +
> >                 if (add_mark)
> >                         perf_callchain_store_context(&ctx, PERF_CONTEXT_USER);
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 5e41a3b70bcd..290e06b0071c 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -6751,6 +6751,12 @@ static void perf_pending_irq(struct irq_work *entry)
> >         struct perf_event *event = container_of(entry, struct perf_event, pending_irq);
> >         int rctx;
> >
> > +       if (!is_software_event(event)) {
> > +               if (event->pending_unwind)
> > +                       task_work_add(current, &event->pending_task, TWA_RESUME);
> 
> I'm not familiar with this code.  Is it possible to the current task
> is preempted before returning to user space (and run the
> perf_pending_task_unwind) ?

Yes, the task work (perf_pending_task_unwind) is preemptible.

> > +static void perf_pending_task_unwind(struct perf_event *event)
> > +{
> > +       struct pt_regs *regs = task_pt_regs(current);
> > +       struct perf_output_handle handle;
> > +       struct perf_event_header header;
> > +       struct perf_sample_data data;
> > +       struct perf_callchain_entry *callchain;
> > +
> > +       callchain = kmalloc(sizeof(struct perf_callchain_entry) +
> > +                           (sizeof(__u64) * event->attr.sample_max_stack) +
> > +                           (sizeof(__u64) * 1) /* one context */,
> > +                           GFP_KERNEL);
> 
> Any chance it can reuse get_perf_callchain() instead of
> allocating the callchains every time?

I don't think so, because if it gets preempted, the new task might also
need to do an unwind.  But there's only one task-context callchain per
CPU.

The allocation is less than ideal.  I could just allocate it on the
stack, and keep the number of entries bounded to 128 entries or so.

> > +       if (!callchain)
> > +               return;
> > +
> > +       callchain->nr = 0;
> > +       data.callchain = callchain;
> > +
> > +       perf_sample_data_init(&data, 0, event->hw.last_period);
> 
> It would double count the period for a sample.
> I guess we want to set the period to 0.

Ok.

> > +
> > +       data.deferred = true;
> > +
> > +       perf_prepare_sample(&data, event, regs);
> 
> I don't think this would work properly.  Not to mention it duplicates
> sample data unnecessarily, some data needs special handling to
> avoid inefficient (full) data copy like for (user) stack, regs and aux.

Yeah.  I tried sending only the sample ID and callchain, but perf tool
didn't appreciate that ;-) So for the RFC I gave up and did this.

> Anyway I'm not sure it can support these additional samples for
> deferred callchains without breaking the existing perf tools.
> Anyway it doesn't know PERF_CONTEXT_USER_DEFERRED at least.
> I think this should be controlled by a new feature bit in the
> perf_event_attr.
> 
> Then we might add a breaking change to have a special
> sample record for the deferred callchains and sample ID only.

Sounds like a good idea.  I'll need to study the code to figure out how
to do that on the perf tool side.  Or would you care to write a patch?

> > -struct perf_callchain_entry *
> > -perf_callchain(struct perf_event *event, struct pt_regs *regs)
> > +void perf_callchain(struct perf_sample_data *data, struct perf_event *event,
> > +                   struct pt_regs *regs)
> >  {
> >         bool kernel = !event->attr.exclude_callchain_kernel;
> >         bool user   = !event->attr.exclude_callchain_user;
> >         const u32 max_stack = event->attr.sample_max_stack;
> > -       struct perf_callchain_entry *callchain;
> > +       bool defer_user = IS_ENABLED(CONFIG_HAVE_PERF_CALLCHAIN_DEFERRED);
> 
> Is it always enabled depending on the kernel config?
> It could be controlled by event->attr.something..

Possibly, depending on what y'all think.  Also, fp vs sframe could be an
attr (though sframe implies deferred).

Thanks for the review!

-- 
Josh

  reply	other threads:[~2023-11-11 18:49 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09  0:41 [PATCH RFC 00/10] perf: user space sframe unwinding Josh Poimboeuf
2023-11-09  0:41 ` [PATCH RFC 01/10] perf: Remove get_perf_callchain() 'init_nr' argument Josh Poimboeuf
2023-11-11  6:09   ` Namhyung Kim
2023-11-09  0:41 ` [PATCH RFC 02/10] perf: Remove get_perf_callchain() 'crosstask' argument Josh Poimboeuf
2023-11-11  6:11   ` Namhyung Kim
2023-11-11 20:53     ` Jordan Rome
2023-11-09  0:41 ` [PATCH RFC 03/10] perf: Simplify get_perf_callchain() user logic Josh Poimboeuf
2023-11-11  6:11   ` Namhyung Kim
2023-11-09  0:41 ` [PATCH RFC 04/10] perf: Introduce deferred user callchains Josh Poimboeuf
2023-11-11  6:57   ` Namhyung Kim
2023-11-11 18:49     ` Josh Poimboeuf [this message]
2023-11-11 18:54       ` Josh Poimboeuf
2023-11-13 16:56       ` Namhyung Kim
2023-11-13 17:21         ` Peter Zijlstra
2023-11-13 17:48           ` Namhyung Kim
2023-11-13 18:49             ` Peter Zijlstra
2023-11-13 19:16               ` Namhyung Kim
2023-11-15 16:13         ` Namhyung Kim
2023-11-20 14:03           ` Peter Zijlstra
2023-11-09  0:41 ` [PATCH RFC 05/10] perf/x86: Add HAVE_PERF_CALLCHAIN_DEFERRED Josh Poimboeuf
2023-11-09  0:41 ` [PATCH RFC 06/10] unwind: Introduce generic user space unwinding interfaces Josh Poimboeuf
2023-11-09  0:41 ` [PATCH RFC 07/10] unwind/x86: Add HAVE_USER_UNWIND Josh Poimboeuf
2023-11-09  0:41 ` [PATCH RFC 08/10] perf/x86: Use user_unwind interface Josh Poimboeuf
2023-11-09  0:41 ` [PATCH RFC 09/10] unwind: Introduce SFrame user space unwinding Josh Poimboeuf
2023-11-09 19:31   ` Indu Bhagat
2023-11-09 19:37     ` Josh Poimboeuf
2023-11-09 19:49       ` Steven Rostedt
2023-11-09 19:53         ` Josh Poimboeuf
2023-11-09  0:41 ` [PATCH RFC 10/10] unwind/x86/64: Add HAVE_USER_UNWIND_SFRAME Josh Poimboeuf
2023-11-09  0:45 ` [PATCH RFC 00/10] perf: user space sframe unwinding Josh Poimboeuf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231111184908.ym4l6cwzwnkl7e6m@treble \
    --to=jpoimboe@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=broonie@kernel.org \
    --cc=indu.bhagat@oracle.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).