From: Song Liu <songliubraving@fb.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "open list:BPF (Safe dynamic programs and tools)"
<bpf@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"acme@kernel.org" <acme@kernel.org>,
"mingo@redhat.com" <mingo@redhat.com>,
Kernel Team <Kernel-team@fb.com>,
Kan Liang <kan.liang@linux.intel.com>,
"Like Xu" <like.xu@linux.intel.com>,
Alexey Budankov <alexey.budankov@linux.intel.com>
Subject: Re: [RFC] bpf: lbr: enable reading LBR from tracing bpf programs
Date: Wed, 18 Aug 2021 16:46:32 +0000 [thread overview]
Message-ID: <962EDD5A-1B35-4C7F-A0A1-3EBC32EE63AB@fb.com> (raw)
In-Reply-To: <YRzPwClswwxHXVHe@hirez.programming.kicks-ass.net>
Hi Peter,
Thanks for you quick response!
> On Aug 18, 2021, at 2:15 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Aug 17, 2021 at 06:29:37PM -0700, Song Liu wrote:
>> The typical way to access LBR is via hardware perf_event. For CPUs with
>> FREEZE_LBRS_ON_PMI support, PMI could capture reliable LBR. On the other
>> hand, LBR could also be useful in non-PMI scenario. For example, in
>> kretprobe or bpf fexit program, LBR could provide a lot of information
>> on what happened with the function.
>>
>> In this RFC, we try to enable LBR for BPF program. This works like:
>> 1. Create a hardware perf_event with PERF_SAMPLE_BRANCH_* on each CPU;
>> 2. Call a new bpf helper (bpf_get_branch_trace) from the BPF program;
>> 3. Before calling this bpf program, the kernel stops LBR on local CPU,
>> make a copy of LBR, and resumes LBR;
>> 4. In the bpf program, the helper access the copy from #3.
>>
>> Please see tools/testing/selftests/bpf/[progs|prog_tests]/get_call_trace.c
>> for a detailed example. Not that, this process is far from ideal, but it
>> allows quick prototype of this feature.
>>
>> AFAICT, the biggest challenge here is that we are now sharing LBR in PMI
>> and out of PMI, which could trigger some interesting race conditions.
>> However, if we allow some level of missed/corrupted samples, this should
>> still be very useful.
>>
>> Please share your thoughts and comments on this. Thanks in advance!
>
>> +int bpf_branch_record_read(void)
>> +{
>> + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> +
>> + intel_pmu_lbr_disable_all();
>> + intel_pmu_lbr_read();
>> + memcpy(this_cpu_ptr(&bpf_lbr_entries), cpuc->lbr_entries,
>> + sizeof(struct perf_branch_entry) * x86_pmu.lbr_nr);
>> + *this_cpu_ptr(&bpf_lbr_cnt) = x86_pmu.lbr_nr;
>> + intel_pmu_lbr_enable_all(false);
>> + return 0;
>> +}
>
> Urgghhh.. I so really hate BPF specials like this.
I don't really like this design either. But it does show that LBR can be
very useful in non-PMI scenario.
> Also, the PMI race
> you describe is because you're doing abysmal layer violations. If you'd
> have used perf_pmu_disable() that wouldn't have been a problem.
Do you mean instead of disable/enable lbr, we disable/enable the whole
pmu?
>
> I'd much rather see a generic 'fake/inject' PMI facility, something that
> works across the board and isn't tied to x86/intel.
How would that work? Do we have a function to trigger PMI from software,
and then gather the LBR data after the PMI? This does sound like a much
cleaner solution. Where can I find code examples that fake/inject PMI?
There is another limitation right now: we need to enable LBR with a
hardware perf event (cycles, etc.). However, unless we use the event for
something else, it wastes a hardware counter. So I was thinking to allow
software event, i.e. dummy event, to enable LBR. Does this idea sound
sane to you?
Thanks,
Song
next prev parent reply other threads:[~2021-08-18 16:46 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-18 1:29 [RFC] bpf: lbr: enable reading LBR from tracing bpf programs Song Liu
2021-08-18 9:15 ` Peter Zijlstra
2021-08-18 16:46 ` Song Liu [this message]
2021-08-19 11:57 ` Peter Zijlstra
2021-08-19 16:46 ` Song Liu
2021-08-19 18:06 ` Peter Zijlstra
2021-08-19 18:22 ` Song Liu
2021-08-19 18:27 ` Peter Zijlstra
2021-08-19 18:45 ` Song Liu
2021-08-20 7:33 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=962EDD5A-1B35-4C7F-A0A1-3EBC32EE63AB@fb.com \
--to=songliubraving@fb.com \
--cc=Kernel-team@fb.com \
--cc=acme@kernel.org \
--cc=alexey.budankov@linux.intel.com \
--cc=bpf@vger.kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=like.xu@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).