linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Namhyung Kim <namhyung@kernel.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	"Arnaldo Carvalho de Melo" <acme@redhat.com>,
	Jiri Olsa <jolsa@kernel.org>
Subject: Re: [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF
Date: Sat, 13 Mar 2021 19:37:44 +0000	[thread overview]
Message-ID: <E5F52446-8902-4A4B-AF5E-90EBAB7F7A07@fb.com> (raw)
In-Reply-To: <CAM9d7cg+HD3-vLXX_rUSg1kWSZ3MGeyrQwdJoa5CgbZjeD2+GA@mail.gmail.com>



> On Mar 12, 2021, at 6:47 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> On Sat, Mar 13, 2021 at 12:38 AM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Mar 12, 2021, at 12:36 AM, Namhyung Kim <namhyung@kernel.org> wrote:
>>> 
>>> Hi,
>>> 
>>> On Fri, Mar 12, 2021 at 11:03 AM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> perf uses performance monitoring counters (PMCs) to monitor system
>>>> performance. The PMCs are limited hardware resources. For example,
>>>> Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
>>>> 
>>>> Modern data center systems use these PMCs in many different ways:
>>>> system level monitoring, (maybe nested) container level monitoring, per
>>>> process monitoring, profiling (in sample mode), etc. In some cases,
>>>> there are more active perf_events than available hardware PMCs. To allow
>>>> all perf_events to have a chance to run, it is necessary to do expensive
>>>> time multiplexing of events.
>>>> 
>>>> On the other hand, many monitoring tools count the common metrics (cycles,
>>>> instructions). It is a waste to have multiple tools create multiple
>>>> perf_events of "cycles" and occupy multiple PMCs.
>>>> 
>>>> bperf tries to reduce such wastes by allowing multiple perf_events of
>>>> "cycles" or "instructions" (at different scopes) to share PMUs. Instead
>>>> of having each perf-stat session to read its own perf_events, bperf uses
>>>> BPF programs to read the perf_events and aggregate readings to BPF maps.
>>>> Then, the perf-stat session(s) reads the values from these BPF maps.
>>>> 
>>>> Please refer to the comment before the definition of bperf_ops for the
>>>> description of bperf architecture.
>>> 
>>> Interesting!  Actually I thought about something similar before,
>>> but my BPF knowledge is outdated.  So I need to catch up but
>>> failed to have some time for it so far. ;-)
>>> 
>>>> 
>>>> bperf is off by default. To enable it, pass --use-bpf option to perf-stat.
>>>> bperf uses a BPF hashmap to share information about BPF programs and maps
>>>> used by bperf. This map is pinned to bpffs. The default address is
>>>> /sys/fs/bpf/bperf_attr_map. The user could change the address with option
>>>> --attr-map.
>>>> 
>>>> ---
>>>> Known limitations:
>>>> 1. Do not support per cgroup events;
>>>> 2. Do not support monitoring of BPF program (perf-stat -b);
>>>> 3. Do not support event groups.
>>> 
>>> In my case, per cgroup event counting is very important.
>>> And I'd like to do that with lots of cpus and cgroups.
>> 
>> We can easily extend this approach to support cgroups events. I didn't
>> implement it to keep the first version simple.
> 
> OK.
> 
>> 
>>> So I'm working on an in-kernel solution (without BPF),
>>> I hope to share it soon.
>> 
>> This is interesting! I cannot wait to see how it looks like. I spent
>> quite some time try to enable in kernel sharing (not just cgroup
>> events), but finally decided to try BPF approach.
> 
> Well I found it hard to support generic event sharing that works
> for all use cases.  So I'm focusing on the per cgroup case only.
> 
>> 
>>> 
>>> And for event groups, it seems the current implementation
>>> cannot handle more than one event (not even in a group).
>>> That could be a serious limitation..
>> 
>> It supports multiple events. Multiple events are independent, i.e.,
>> "cycles" and "instructions" would use two independent leader programs.
> 
> OK, then do you need multiple bperf_attr_maps?  Does it work
> for an arbitrary number of events?

The bperf_attr_map (or perf_attr_map) is shared among different events. 
It is a hash map with perf_event_attr as the key. Currently, I hard coded
its size to 16. We can introduce more flexible management of this map. 

Thanks,
Song


  reply	other threads:[~2021-03-13 19:38 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-12  2:02 [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF Song Liu
2021-03-12  8:36 ` Namhyung Kim
2021-03-12 15:38   ` Song Liu
2021-03-13  2:47     ` Namhyung Kim
2021-03-13 19:37       ` Song Liu [this message]
2021-03-12 12:12 ` Jiri Olsa
2021-03-12 15:45   ` Song Liu
2021-03-12 16:09     ` Song Liu
2021-03-13 22:06       ` Jiri Olsa
2021-03-13 23:03         ` Song Liu
2021-03-12 14:24 ` Arnaldo Carvalho de Melo
2021-03-12 16:07   ` Song Liu
2021-03-12 17:37     ` Arnaldo Carvalho de Melo
2021-03-12 18:52   ` Song Liu
2021-03-15 13:10     ` Arnaldo Carvalho de Melo
2021-03-15 15:34       ` Andi Kleen
2021-03-15 18:29         ` Song Liu
2021-03-14 22:48 ` Jiri Olsa
2021-03-15  7:51   ` Song Liu
2021-03-15 14:09     ` Jiri Olsa
2021-03-15 18:24       ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E5F52446-8902-4A4B-AF5E-90EBAB7F7A07@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).