netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@fomichev.me>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Andrii Nakryiko <andriin@fb.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
Date: Mon, 3 Jun 2019 18:02:54 -0700	[thread overview]
Message-ID: <20190604010254.GB14556@mini-arch> (raw)
In-Reply-To: <CAEf4BzbRXAZMXY3kG9HuRC93j5XhyA3EbWxkLrrZsG7K4abdBg@mail.gmail.com>

On 06/03, Andrii Nakryiko wrote:
> On Mon, Jun 3, 2019 at 9:32 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 05/31, Andrii Nakryiko wrote:
> > > On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > >
> > > > On 05/31, Andrii Nakryiko wrote:
> > > > > This patch adds support for a new way to define BPF maps. It relies on
> > > > > BTF to describe mandatory and optional attributes of a map, as well as
> > > > > captures type information of key and value naturally. This eliminates
> > > > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > > > > always in sync with the key/value type.
> > > > My 2c: this is too magical and relies on me knowing the expected fields.
> > > > (also, the compiler won't be able to help with the misspellings).
> > >
> > > I don't think it's really worse than current bpf_map_def approach. In
> > > typical scenario, there are only two fields you need to remember: type
> > > and max_entries (notice, they are called exactly the same as in
> > > bpf_map_def, so this knowledge is transferrable). Then you'll have
> > > key/value, using which you are describing both type (using field's
> > > type) and size (calculated from the type).
> > >
> > > I can relate a bit to that with bpf_map_def you can find definition
> > > and see all possible fields, but one can also find a lot of examples
> > > for new map definitions as well.
> > >
> > > One big advantage of this scheme, though, is that you get that type
> > > association automagically without using BPF_ANNOTATE_KV_PAIR hack,
> > > with no chance of having a mismatch, etc. This is less duplication (no
> > > need to do sizeof(struct my_struct) and struct my_struct as an arg to
> > > that macro) and there is no need to go and ping people to add those
> > > annotations to improve introspection of BPF maps.
> > Don't get me wrong, it looks good and there are advantages compared to
> > the existing way. But, again, feels to me a bit too magic. We should somehow
> > make it less magic (see below).
> >
> > > > I don't know how others feel about it, but I'd be much more comfortable
> > > > with a simpler TLV-like approach. Have a new section where the format
> > > > is |4-byte size|struct bpf_map_def_extendable|. That would essentially
> > > > allow us to extend it the way we do with a syscall args.
> > >
> > > It would help with extensibility, sure, though even current
> > > bpf_map_def approach sort of can be extended already. But it won't
> > > solve the problem of having BTF types captured for key/value (see
> > > above). Also, you'd need another macro to lay everything out properly.
> > I didn't know that we look into the list of exported symbols to estimate
> > the number of maps and then use it to derive struct bpf_map_def size.
> >
> > In that case, maybe we can keep extending struct bpf_map_def
> > and support BTF mode as a better alternative? bpf_map_def could be
> > used as a reference for which fields there are, people can still use it
> > (with BPF_ANNOTATE_KV_PAIR if needed), but they can also use
> > new BTF mode if they find that works better for them?
> >
> > Because the biggest issue for me with the BTF mode is the question
> > of where to look for the supported fields (and misspellings). People
> > on this mailing list can probably figure it out, but people who don't
> > work full time on bpf might find it hard. Having 'struct bpf_map_def'
> > as a reference (or a good supported piece of documentation) might help
> 
> So yeah, it's more about documentation and examples, it seems, rather
> than having a C struct in code, right? Today, if I need to add new
> map, I copy/paste either from example, existing code or look up
Well, you know where to copy paste from ;-)

> documentation. You'll be able to do the same with new way (just grep
> for \.maps).
Yes, it's mostly about discoverability. Either documentation or
the real underlaying structure could help with that.

> > with that.
> >
> > What do you think? The only issue is that we now have two formats
> > to support :-/
> 
> We'll have to support existing bpf_map_def for backwards compatibility
> (and see my reply to Jakub, you can just plain re-use struct
> bpf_map_def today with BTF approach, just put it into .maps section),
> but I'd love to avoid having to support new features using two
> different way, so if we go with BTF, I'd restrict new features to BTF
> only, moving forward.
But what's wrong with trying to extend bpf_map_def for a while? It looks like
we have everything in place to do that. I understand your desire
to deprecate everything and move on, but when was BTF support added to
LLVM? 8.0.0? 8.0.1? Six months ago? Is there a major distro with the
latest llvm+btf? Do we want to lock everyone out of new libbpf features?
(Consider that a lot of people run on the LTS kernels).

What's wrong with having BTF be just a syntactic sugar on top of
bpf_map_def? One major use-case is supporting iproute2 features,
but some of those features can go into bpf_map_def as well and
be used by non-BTF enabled users.

One other point to consider here might be pure Go libbpf that Lorenz is
maintaining. Having simple underlying bpf_map_def which we can agree
on might be beneficial.

> > > > Also, (un)related: we don't currently use BTF internally, so if
> > > > you convert all tests, we'd be unable to run them :-(
> > >
> > > Not exactly sure what you mean "you'd be unable to run them". Do you
> > > mean that you use old Clang that doesn't emit BTF? If that's what you
> > > are saying, a lot of tests already rely on latest Clang, so those
> > > tests already don't work for you, probably. I'll leave it up to Daniel
> > > and Alexei to decide if we want to convert selftests right now or not.
> > > I did it mostly to prove that we can handle all existing cases (and
> > > found few gotchas and bugs along the way, both in my implementation
> > > and in kernel - fixes coming soon).
> > Yes, I mean that we don't always use the latest features of clang,
> > so having the existing tests in the old form (at least for a while)
> > would be appreciated. Good candidates to showcase new format can
> > be features that explicitly require BTF, stuff like spinlocks.
> 
> I totally understand a concern, but I'll still defer to maintainers to
> make a call as to when to do conversion.
Sure, totally up to you and the maintainers. Just raising my voice,
so you'd at least consider not converting everything.

> > > > > Relying on BTF, this approach allows for both forward and backward
> > > > > compatibility w.r.t. extending supported map definition features. Old
> > > > > libbpf implementation will ignore fields it doesn't recognize, while new
> > > > > implementations will parse and recognize new optional attributes.
> > > > I also don't know how to feel about old libbpf ignoring some attributes.
> > > > In the kernel we require that the unknown fields are zeroed.
> > > > We probably need to do something like that here? What do you think
> > > > would be a good example of an optional attribute?
> > >
> > > Ignoring is required for forward-compatibility, where old libbpf will
> > > be used to load newer user BPF programs. We can decided not to do it,
> > > in that case it's just a question of erroring out on first unknown
> > > field. This RFC was posted exactly to discuss all these issues with
> > > more general community, as there is no single true way to do this.
> > >
> > > As for examples of when it can be used. It's any feature that can be
> > > considered optional or a hint, so if old libbpf doesn't do that, it's
> > > still not the end of the world (and we can live with that, or can
> > > correct using direct libbpf API calls).
> > In general, doing what we do right now with bpf_map_def (returning an error
> > for non-zero unknown options) seems like the safest option. We should
> > probably do the same with the unknown BTF fields (return an error
> > for non-zero value).
> 
> Yeah, as I replied to Jakub, libbpf already has strict/non-strict
> mode, we should probably do the same. The only potential difference is
> that there is no need to check for zeros and stuff: just don't define
> a field. And using an extra flag, we can allow more relaxed semantics
> (just debug/info/warn message on unknown fields). This is what
> __bpf_object__open_xattr does today with MAPS_RELAX_COMPAT flag.
> 
> >
> > For a general BTF case, we can have some predefined policy: if, for example,
> > the field name starts with an underscore, it's optional and doesn't require
> > non-zero check. (or the name ends with '_opt' or some other clear policy).
> >
> > > > > The outline of the new map definition (short, BTF-defined maps) is as follows:
> > > > > 1. All the maps should be defined in .maps ELF section. It's possible to
> > > > >    have both "legacy" map definitions in `maps` sections and BTF-defined
> > > > >    maps in .maps sections. Everything will still work transparently.
> > > > > 2. The map declaration and initialization is done through
> > > > >    a global/static variable of a struct type with few mandatory and
> > > > >    extra optional fields:
> > > > >    - type field is mandatory and specified type of BPF map;
> > > > >    - key/value fields are mandatory and capture key/value type/size information;
> > > > >    - max_entries attribute is optional; if max_entries is not specified or
> > > > >      initialized, it has to be provided in runtime through libbpf API
> > > > >      before loading bpf_object;
> > > > >    - map_flags is optional and if not defined, will be assumed to be 0.
> > > > > 3. Key/value fields should be **a pointer** to a type describing
> > > > >    key/value. The pointee type is assumed (and will be recorded as such
> > > > >    and used for size determination) to be a type describing key/value of
> > > > >    the map. This is done to save excessive amounts of space allocated in
> > > > >    corresponding ELF sections for key/value of big size.
> > > > > 4. As some maps disallow having BTF type ID associated with key/value,
> > > > >    it's possible to specify key/value size explicitly without
> > > > >    associating BTF type ID with it. Use key_size and value_size fields
> > > > >    to do that (see example below).
> > > > >
> > > > > Here's an example of simple ARRAY map defintion:
> > > > >
> > > > > struct my_value { int x, y, z; };
> > > > >
> > > > > struct {
> > > > >       int type;
> > > > >       int max_entries;
> > > > >       int *key;
> > > > >       struct my_value *value;
> > > > > } btf_map SEC(".maps") = {
> > > > >       .type = BPF_MAP_TYPE_ARRAY,
> > > > >       .max_entries = 16,
> > > > > };
> > > > >
> > > > > This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
> > > > > be of type int and thus key size will be 4 bytes. The value is struct
> > > > > my_value of size 12 bytes. This map can be used from C code exactly the
> > > > > same as with existing maps defined through struct bpf_map_def.
> > > > >
> > > > > Here's an example of STACKMAP definition (which currently disallows BTF type
> > > > > IDs for key/value):
> > > > >
> > > > > struct {
> > > > >       __u32 type;
> > > > >       __u32 max_entries;
> > > > >       __u32 map_flags;
> > > > >       __u32 key_size;
> > > > >       __u32 value_size;
> > > > > } stackmap SEC(".maps") = {
> > > > >       .type = BPF_MAP_TYPE_STACK_TRACE,
> > > > >       .max_entries = 128,
> > > > >       .map_flags = BPF_F_STACK_BUILD_ID,
> > > > >       .key_size = sizeof(__u32),
> > > > >       .value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
> > > > > };
> > > > >
> > > > > This approach is naturally extended to support map-in-map, by making a value
> > > > > field to be another struct that describes inner map. This feature is not
> > > > > implemented yet. It's also possible to incrementally add features like pinning
> > > > > with full backwards and forward compatibility.
> > > > >
> > > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > > ---
> > > > >  tools/lib/bpf/btf.h    |   1 +
> > > > >  tools/lib/bpf/libbpf.c | 333 +++++++++++++++++++++++++++++++++++++++--
> > > > >  2 files changed, 325 insertions(+), 9 deletions(-)

  reply	other threads:[~2019-06-04  1:02 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-31 20:21 [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 1/8] libbpf: add common min/max macro to libbpf_internal.h Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 2/8] libbpf: extract BTF loading and simplify ELF parsing logic Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 3/8] libbpf: refactor map initialization Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 4/8] libbpf: identify maps by section index in addition to offset Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 5/8] libbpf: split initialization and loading of BTF Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF Andrii Nakryiko
2019-05-31 21:28   ` Stanislav Fomichev
2019-05-31 22:58     ` Andrii Nakryiko
2019-06-03  0:33       ` Jakub Kicinski
2019-06-03 21:54         ` Andrii Nakryiko
2019-06-03 23:34           ` Jakub Kicinski
2019-06-03 16:32       ` Stanislav Fomichev
2019-06-03 22:03         ` Andrii Nakryiko
2019-06-04  1:02           ` Stanislav Fomichev [this message]
2019-06-04  1:07             ` Alexei Starovoitov
2019-06-04  4:29               ` Stanislav Fomichev
2019-06-04 13:45                 ` Stanislav Fomichev
2019-06-04 17:31                   ` Andrii Nakryiko
2019-06-04 21:07                     ` Stanislav Fomichev
2019-06-04 21:22                       ` Andrii Nakryiko
2019-06-06 21:09                     ` Daniel Borkmann
2019-06-06 23:02                       ` Andrii Nakryiko
2019-06-06 23:27                         ` Alexei Starovoitov
2019-06-07  0:10                           ` Jakub Kicinski
2019-06-07  0:27                             ` Alexei Starovoitov
2019-06-07  1:02                               ` Jakub Kicinski
2019-06-10  1:17                                 ` explicit maps. Was: " Alexei Starovoitov
2019-06-10 21:15                                   ` Jakub Kicinski
2019-06-10 23:48                                   ` Andrii Nakryiko
2019-06-03 22:34   ` Andrii Nakryiko
2019-06-06 16:42   ` Lorenz Bauer
2019-06-06 22:34     ` Andrii Nakryiko
2019-06-17  9:07       ` Lorenz Bauer
2019-06-17 20:59         ` Andrii Nakryiko
2019-06-20  9:27           ` Lorenz Bauer
2019-06-21  4:05             ` Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 7/8] selftests/bpf: add test for BTF-defined maps Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 8/8] selftests/bpf: switch tests to BTF-defined map definitions Andrii Nakryiko
2019-06-11  4:34 [RFC PATCH bpf-next 0/8] BTF-defined BPF " Andrii Nakryiko
2019-06-11  4:35 ` [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190604010254.GB14556@mini-arch \
    --to=sdf@fomichev.me \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).