All of lore.kernel.org
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "open list:BPF (Safe dynamic programs and tools)" 
	<bpf@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"acme@kernel.org" <acme@kernel.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	Kernel Team <Kernel-team@fb.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	"Like Xu" <like.xu@linux.intel.com>,
	Alexey Budankov <alexey.budankov@linux.intel.com>
Subject: Re: [RFC] bpf: lbr: enable reading LBR from tracing bpf programs
Date: Thu, 19 Aug 2021 16:46:20 +0000	[thread overview]
Message-ID: <AB509D87-67C6-4B7F-AEFB-2324845C310C@fb.com> (raw)
In-Reply-To: <YR5HJkPyaM3TWkkl@hirez.programming.kicks-ass.net>

Hi Peter,

Thanks for these helpful information and insights!

> On Aug 19, 2021, at 4:57 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Aug 18, 2021 at 04:46:32PM +0000, Song Liu wrote:
> 
>>> Urgghhh.. I so really hate BPF specials like this.
>> 
>> I don't really like this design either. But it does show that LBR can be
>> very useful in non-PMI scenario. 
>> 
>>> Also, the PMI race
>>> you describe is because you're doing abysmal layer violations. If you'd
>>> have used perf_pmu_disable() that wouldn't have been a problem.
>> 
>> Do you mean instead of disable/enable lbr, we disable/enable the whole 
>> pmu? 
> 
> Yep, that way you're serialized against PMIs. It's what all of the perf
> core does.
> 
>>> I'd much rather see a generic 'fake/inject' PMI facility, something that
>>> works across the board and isn't tied to x86/intel.
>> 
>> How would that work? Do we have a function to trigger PMI from software, 
>> and then gather the LBR data after the PMI? This does sound like a much
>> cleaner solution. Where can I find code examples that fake/inject PMI?
> 
> We don't yet have anything like it; but it would look a little like:
> 
> void perf_inject_event(struct perf_event *event, struct pt_regs *regs)
> {
> 	struct perf_sample_data data;
> 	struct pmu *pmu = event->pmu;
> 	unsigned long flags;
> 
> 	local_irq_save(flags);
> 	perf_pmu_disable(pmu);
> 
> 	perf_sample_data_init(&data, 0, 0);
> 	/*
> 	 * XXX or a variant with more _ that starts at the overflow
> 	 * handler...
> 	 */
> 	__perf_event_overflow(event, 0, &data, regs);
> 
> 	perf_pmu_enable(pmu);
> 	local_irq_restore(flags);
> }
> 
> But please consider carefully, I haven't...

Hmm... This is a little weird to me. 
IIUC, we need to call perf_inject_event() after the software event, say
a kretprobe, triggers. So it gonna look like:

  1. kretprobe trigger;
  2. handler calls perf_inject_event();
  3. PMI kicks in, and saves LBR;
  4. after the PMI, consumer of LBR uses the saved data;

However, given perf_inject_event() disables PMU, we can just save the LBR
right there? And it should be a lot easier? Something like:

  1. kretprobe triggers;
  2. handler calls perf_snapshot_lbr();
     2.1 perf_pmu_disable(pmu);
     2.2 saves LBR 
     2.3 perf_pmu_enable(pmu);
  3. consumer of LBR uses the saved data;

What is the downside of this approach? 

> 
>> There is another limitation right now: we need to enable LBR with a 
>> hardware perf event (cycles, etc.). However, unless we use the event for 
>> something else, it wastes a hardware counter. So I was thinking to allow
>> software event, i.e. dummy event, to enable LBR. Does this idea sound 
>> sane to you?
> 
> We have a VLBR dummy event, but I'm not sure it does exactly as you
> want. However, we should also consider Power, which also has the branch
> stack feature.

VLBR event does look similar to the use case we have. I will take a closer
look. Thanks for the pointer!

> 
> You can't really make a software event with LBR on, because then it
> wouldn't be a software event anymore. You'll need some hybrid like
> thing, which will be yuck and I suspect it needs arch support one way or
> the other :/

Yeah, I guess it could be a "LBR only hardware event". All it needs to do 
is to keep LBR enabled (lbr_users++). I will try to keep the arch support
clean. 

Thanks,
Song


  reply	other threads:[~2021-08-19 16:46 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-18  1:29 [RFC] bpf: lbr: enable reading LBR from tracing bpf programs Song Liu
2021-08-18  3:06 ` kernel test robot
2021-08-18  8:43 ` kernel test robot
2021-08-18  8:43 ` [RFC PATCH] bpf: lbr: __pcpu_scope_bpf_lbr_entries can be static kernel test robot
2021-08-18  9:15 ` [RFC] bpf: lbr: enable reading LBR from tracing bpf programs Peter Zijlstra
2021-08-18 16:46   ` Song Liu
2021-08-19 11:57     ` Peter Zijlstra
2021-08-19 16:46       ` Song Liu [this message]
2021-08-19 18:06         ` Peter Zijlstra
2021-08-19 18:22           ` Song Liu
2021-08-19 18:27             ` Peter Zijlstra
2021-08-19 18:45               ` Song Liu
2021-08-20  7:33               ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AB509D87-67C6-4B7F-AEFB-2324845C310C@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=acme@kernel.org \
    --cc=alexey.budankov@linux.intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=like.xu@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.