bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Stanislav Fomichev <sdf@fomichev.me>,
	Andrii Nakryiko <andriin@fb.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
Date: Mon, 3 Jun 2019 14:54:53 -0700	[thread overview]
Message-ID: <CAEf4BzZ9vYDbdeZyhj_CQpqRUE_BLO9dbXRvnBNaikoO9OpVsw@mail.gmail.com> (raw)
In-Reply-To: <20190602173334.18e68d66@cakuba.netronome.com>

On Sun, Jun 2, 2019 at 5:33 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Fri, 31 May 2019 15:58:41 -0700, Andrii Nakryiko wrote:
> > On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > On 05/31, Andrii Nakryiko wrote:
> > > > This patch adds support for a new way to define BPF maps. It relies on
> > > > BTF to describe mandatory and optional attributes of a map, as well as
> > > > captures type information of key and value naturally. This eliminates
> > > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > > > always in sync with the key/value type.
> > > My 2c: this is too magical and relies on me knowing the expected fields.
> > > (also, the compiler won't be able to help with the misspellings).
>
> I have mixed feelings, too.  Especially the key and value fields are
> very non-idiomatic for C :(  They never hold any value or data, while
> the other fields do.  That feels so awkward.  I'm no compiler expert,
> but even something like:
>
> struct map_def {
>         void *key_type_ref;
> } mamap = {
>         .key_type_ref = &(struct key_xyz){},
> };
>
> Would feel like less of a hack to me, and then map_def doesn't have to
> be different for every map.  But yea, IDK if it's easy to (a) resolve
> the type of what key_type points to, or (b) how to do this for scalar
> types.

The syntax for scalar would be &(int){0}, that compiles.

But there are a bunch of things that make it infeasible. So let's take
an example and see what's happening:

/* huge struct */
struct custom {int a; int b; int c; int d[1000000];};

struct {
        void *key;
        void *value;
} new_map = {
        .key = &(int){0},
        .value = &(struct custom){},
};

If we dump BTF, here's what we get:

$ bpftool btf dump file tail_call_test.o
[1] FUNC_PROTO '(anon)' ret_type_id=2 vlen=0
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[3] FUNC 'main' type_id=1
[4] VAR '.compoundliteral' type_id=0, linkage=static
[5] VAR '.compoundliteral.1' type_id=0, linkage=static
[6] STRUCT '(anon)' size=24 vlen=3
        'type' type_id=2 bits_offset=0
        'key' type_id=7 bits_offset=64
        'value' type_id=7 bits_offset=128
[7] PTR '(anon)' type_id=0
[8] VAR 'new_map' type_id=6, linkage=global-alloc
[9] DATASEC '.bss' size=0 vlen=2
        type_id=4 offset=0 size=4
        type_id=5 offset=4 size=4000012
[10] DATASEC '.maps' size=0 vlen=1
        type_id=8 offset=0 size=24

So notice how we get two .bss entries, one for 4 bytes (for key, var
'compoundliteral') and another for 4MB (for huge struct, var
'.compoundliteral.1'). So while this won't increase the size of ELF,
it will force a huge .bss (and corresponding global data map) to be
created, which is no good.

Also, notice how there is no type information associated with [4] and
[5] vars, they are just of type void. There is no type information
about struct custom at all, though it might be (?) possible to fix it
by modifying compiler to preserve more type information.

So while the second one is a technical hurdle, which we might overcome
(not sure, actually), the issue with big .BSS is a showstopper for
some applications.

To eliminate .BSS issue, we'd need something like this to capture type
information:

struct {
        void *key;
        void *value;
} new_map = {
        .key = (int)0,
        .value = (struct custom *)0,
};

But that doesn't capture any type information for those type casts at
all, so more compiler work (if at all possible).

Which is why I think capturing type information using a standard
non-convoluted C way/syntax using a field declaration is the most
reliable, simple, and clean way. You do intialize key/value, it's just
a NULL pointer to corresponding type:

struct {
        int type;
        int *key;
        struct custom *value;
} new_map __attribute__((section(".maps"), used)) = {
        .type = 2,
        .key = (int)0,
        .value = (struct custom *)NULL,
};


Notice, btw, that this approach doesn't prevent you to re-use struct
definitions for multiple maps, if they have identical key/value types
or if you are not capturing type information at all.

struct my_typical_map {
        int type;
        int max_entries;
        u64 *key;
        struct custom *value;
};

struct my_typical_map map1 SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 10,
};

struct my_typical_map map2 SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 20,
};

Or, you can just re-use struct bpf_map_def today like this (but you
won't have type info for key/value, of course):

struct bpf_map_def my_map_without_type_info SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 100,
        .key_size = sizeof(u64),
        .value_size = sizeof(struct custom),
};

This approach gives you as much flexibility as possible, you only will
have to have different definition struct, if you have different
key/value type (in C++ that would be solved by templates, but alas we
are in C land).


>
> > I don't think it's really worse than current bpf_map_def approach. In
> > typical scenario, there are only two fields you need to remember: type
> > and max_entries (notice, they are called exactly the same as in
> > bpf_map_def, so this knowledge is transferrable). Then you'll have
> > key/value, using which you are describing both type (using field's
> > type) and size (calculated from the type).
> >
> > I can relate a bit to that with bpf_map_def you can find definition
> > and see all possible fields, but one can also find a lot of examples
> > for new map definitions as well.
> >
> > One big advantage of this scheme, though, is that you get that type
> > association automagically without using BPF_ANNOTATE_KV_PAIR hack,
> > with no chance of having a mismatch, etc. This is less duplication (no
> > need to do sizeof(struct my_struct) and struct my_struct as an arg to
> > that macro) and there is no need to go and ping people to add those
> > annotations to improve introspection of BPF maps.
>
> > > > Relying on BTF, this approach allows for both forward and backward
> > > > compatibility w.r.t. extending supported map definition features. Old
> > > > libbpf implementation will ignore fields it doesn't recognize, while new
> > > > implementations will parse and recognize new optional attributes.
> > > I also don't know how to feel about old libbpf ignoring some attributes.
> > > In the kernel we require that the unknown fields are zeroed.
> > > We probably need to do something like that here? What do you think
> > > would be a good example of an optional attribute?
> >
> > Ignoring is required for forward-compatibility, where old libbpf will
> > be used to load newer user BPF programs. We can decided not to do it,
> > in that case it's just a question of erroring out on first unknown
> > field. This RFC was posted exactly to discuss all these issues with
> > more general community, as there is no single true way to do this.
> >
> > As for examples of when it can be used. It's any feature that can be
> > considered optional or a hint, so if old libbpf doesn't do that, it's
> > still not the end of the world (and we can live with that, or can
> > correct using direct libbpf API calls).
>
> On forward compatibility my 0.02c would be - if we want to go there
> and silently ignore fields it'd be good to have some form of "hard
> required" bit.  For TLVs ABIs it can be a "you have to understand
> this one" bit, for libbpf perhaps we could add a "min libbpf version
> required" section?  That kind of ties us ELF formats to libbpf
> specifics (the libbpf version presumably would imply support for
> features), but I think we want to go there, anyway.

I think we can go with strict/non-strict mode, which we already
support in libbpf with MAPS_RELAX_COMPAT flag (see
__bpf_object__open_xattr), would that work?

  reply	other threads:[~2019-06-03 21:55 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-31 20:21 [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 1/8] libbpf: add common min/max macro to libbpf_internal.h Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 2/8] libbpf: extract BTF loading and simplify ELF parsing logic Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 3/8] libbpf: refactor map initialization Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 4/8] libbpf: identify maps by section index in addition to offset Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 5/8] libbpf: split initialization and loading of BTF Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF Andrii Nakryiko
2019-05-31 21:28   ` Stanislav Fomichev
2019-05-31 22:58     ` Andrii Nakryiko
2019-06-03  0:33       ` Jakub Kicinski
2019-06-03 21:54         ` Andrii Nakryiko [this message]
2019-06-03 23:34           ` Jakub Kicinski
2019-06-03 16:32       ` Stanislav Fomichev
2019-06-03 22:03         ` Andrii Nakryiko
2019-06-04  1:02           ` Stanislav Fomichev
2019-06-04  1:07             ` Alexei Starovoitov
2019-06-04  4:29               ` Stanislav Fomichev
2019-06-04 13:45                 ` Stanislav Fomichev
2019-06-04 17:31                   ` Andrii Nakryiko
2019-06-04 21:07                     ` Stanislav Fomichev
2019-06-04 21:22                       ` Andrii Nakryiko
2019-06-06 21:09                     ` Daniel Borkmann
2019-06-06 23:02                       ` Andrii Nakryiko
2019-06-06 23:27                         ` Alexei Starovoitov
2019-06-07  0:10                           ` Jakub Kicinski
2019-06-07  0:27                             ` Alexei Starovoitov
2019-06-07  1:02                               ` Jakub Kicinski
2019-06-10  1:17                                 ` explicit maps. Was: " Alexei Starovoitov
2019-06-10 21:15                                   ` Jakub Kicinski
2019-06-10 23:48                                   ` Andrii Nakryiko
2019-06-03 22:34   ` Andrii Nakryiko
2019-06-06 16:42   ` Lorenz Bauer
2019-06-06 22:34     ` Andrii Nakryiko
2019-06-17  9:07       ` Lorenz Bauer
2019-06-17 20:59         ` Andrii Nakryiko
2019-06-20  9:27           ` Lorenz Bauer
2019-06-21  4:05             ` Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 7/8] selftests/bpf: add test for BTF-defined maps Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 8/8] selftests/bpf: switch tests to BTF-defined map definitions Andrii Nakryiko
2019-06-11  4:34 [RFC PATCH bpf-next 0/8] BTF-defined BPF " Andrii Nakryiko
2019-06-11  4:35 ` [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzZ9vYDbdeZyhj_CQpqRUE_BLO9dbXRvnBNaikoO9OpVsw@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jakub.kicinski@netronome.com \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@fomichev.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).