Re: [PATCH v2 bpf-next 1/3] libbpf: add perf buffer API

From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andriin@fb.com>, Alexei Starovoitov <ast@fb.com>,
	bpf <bpf@vger.kernel.org>, Networking <netdev@vger.kernel.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 1/3] libbpf: add perf buffer API
Date: Fri, 28 Jun 2019 22:56:24 -0700	[thread overview]
Message-ID: <CAEf4BzZ9qjbRJBQsX7WHJ4YXwJ_1=vw5BocTiX=ToEqxOYnMQA@mail.gmail.com> (raw)
In-Reply-To: <CAEf4Bzb4U73jb80eCv+JoFGFd2ACXK4j6d=ZeVOoRH1d0f-dPg@mail.gmail.com>

On Thu, Jun 27, 2019 at 2:45 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Jun 27, 2019 at 2:04 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 06/26/2019 08:12 AM, Andrii Nakryiko wrote:
> > > BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program
> > > to user space for additional processing. libbpf already has very low-level API
> > > to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to
> > > use and requires a lot of code to set everything up. This patch adds
> > > perf_buffer abstraction on top of it, abstracting setting up and polling
> > > per-CPU logic into simple and convenient API, similar to what BCC provides.
> > >
> > > perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF
> > > map entries. It accepts two user-provided callbacks: one for handling raw
> > > samples and one for get notifications of lost samples due to buffer overflow.
> > >
> > > perf_buffer__poll() is used to fetch ring buffer data across all CPUs,
> > > utilizing epoll instance.
> > >
> > > perf_buffer__free() does corresponding clean up and unsets FDs from BPF map.
> > >
> > > All APIs are not thread-safe. User should ensure proper locking/coordination if
> > > used in multi-threaded set up.
> > >
> > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >
> > Aside from current feedback, this series generally looks great! Two small things:
> >
> > > diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> > > index 2382fbda4cbb..10f48103110a 100644
> > > --- a/tools/lib/bpf/libbpf.map
> > > +++ b/tools/lib/bpf/libbpf.map
> > > @@ -170,13 +170,16 @@ LIBBPF_0.0.4 {
> > >               btf_dump__dump_type;
> > >               btf_dump__free;
> > >               btf_dump__new;
> > > -             btf__parse_elf;
> > >               bpf_object__load_xattr;
> > >               bpf_program__attach_kprobe;
> > >               bpf_program__attach_perf_event;
> > >               bpf_program__attach_raw_tracepoint;
> > >               bpf_program__attach_tracepoint;
> > >               bpf_program__attach_uprobe;
> > > +             btf__parse_elf;
> > >               libbpf_num_possible_cpus;
> > >               libbpf_perf_event_disable_and_close;
> > > +             perf_buffer__free;
> > > +             perf_buffer__new;
> > > +             perf_buffer__poll;
> >
> > We should prefix with libbpf_* given it's not strictly BPF-only and rather
> > helper function.
>
> Well, perf_buffer is an object similar to `struct btf`, `struct
> bpf_program`, etc. So it seems appropriate to follow this
> "<object>__<method>" convention. Also, `struct libbpf_perf_buffer` and
> `libbpf_perf_buffer__new` looks verbose and pretty ugly, IMO.
>
> >
> > Also, we should convert bpftool (tools/bpf/bpftool/map_perf_ring.c) to make
> > use of these new helpers instead of open-coding there.
>
> Yep, absolutely, will do that as well, thanks for pointing me there!

This turned out to require much bigger changes, as bpftool needed way
more low-level control over structure of events and attachment
policies (custom cpu index and map key). So I ended up having two
APIs:
1. simple common-case perf_buffer__new with 2 callbacks, that attaches
to all CPUs (up to max_elements of map)
2. perf_buffer__new_raw, that allows to provide custom
perf_event_attr, callback that accepts pointer to raw perf event and
allows to specify any set of cpu/map keys.

bpftool uses the latter one. Please see v3.

>
> >
> > Thanks,
> > Daniel