netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lorenz Bauer <lmb@cloudflare.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andriin@fb.com>, Alexei Starovoitov <ast@fb.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Kernel Team <kernel-team@fb.com>,
	Jakub Kicinski <jakub.kicinski@netronome.com>,
	Joe Stringer <joe@wand.net.nz>
Subject: Re: [PATCH v2 bpf-next 00/11] BTF-defined BPF map definitions
Date: Thu, 20 Jun 2019 15:49:06 +0100	[thread overview]
Message-ID: <CACAyw9-L0qx8d9O66SaYhJGjsyKo_6iozqLAQHEVa1AW-U=2Tg@mail.gmail.com> (raw)
In-Reply-To: <CAEf4Bzae1CPDkhPrESa2ZmiOH8Mqf0KA_4ty9z=xnYn=q7Frhw@mail.gmail.com>

On Tue, 18 Jun 2019 at 22:37, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> > I would just drop the object-scope pinning. We avoided using it and I'm not
> > aware if anyone else make use. It also has the ugly side-effect that this
> > relies on AF_ALG which e.g. on some cloud provider shipped kernels is disabled.
> > The pinning attribute should be part of the standard set of map attributes for
> > libbpf though as it's generally useful for networking applications.
>
> Sounds good. I'll do some more surveying of use cases inside FB to see
> if anyone needs object-scope pinning, just to be sure we are not
> short-cutting anyone.

I'm also curious what the use cases for declarative pinning are. From my
limited POV it doesn't seem that useful? There are a couple of factors:

* Systemd mounts the default location only accessible to root, so I have to
  used my own bpffs mount.
* Since I don't want to hard code that, I put it in a config file.
* After loading the ELF we pin maps from the daemon managing the XDP.

How do other people work around this? Hard coding it in the ELF seems
suboptimal.

> > And the loader should figure this out and combine everything in the background.
> > Otherwise above 'struct inner_map_t value' would be mixing convention of using
> > pointer vs non-pointer which may be even more confusing.
>
> There are two reasons I didn't want to go with that approach:
>
> 1. This syntax makes my_inner_map usable as a stand-alone map, while
> it's purpose is to serve as a inner map prototype. While technically
> it is ok to use my_inner_map as real map, it's kind of confusing and
> feels unclean.

I agree, avoiding this problem is good.

> So we came up with a way to "encode" integer constants as part of BTF
> type information, so that *all* declarative information is part of BTF
> type, w/o the need to compile-time initialization. We tried to go the
> other way (what Jakub was pushing for), but we couldn't figure out
> anything that would work w/o more compiler hacks. So here's the
> updated proposal:
>
> #define __int(name, val) int (*name)[val]

Consider my mind blown: https://cdecl.org/?q=int+%28*foo%29%5B10%5D

> #define __type(name, val) val (*foo)

Maybe it's enough to just hide the pointer-ness?

  #define __member(name) (*name)
  struct my_value __member(value);

> struct my_inner_map {
>         __int(type, BPF_MAP_TYPE_ARRAY);
>         __int(max_entries, 1000);
>         __type(key, int);
>         __type(value, struct my_value);

What if this did

  __type(value, struct my_value)[1000];
  struct my_value __member(value)[1000]; // alternative

instead, and skipped max_entries?

> static struct {
>         __int(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
>         __int(max_entries, 1000);
>         __type(key, int);
>         __type(value, struct my_inner_map);
>         struct my_inner_map *values[];
> } my_initialized_outer_map SEC(".maps") = {
>         .values = {
>                 &imap1,
>                 [500] = &imap2,
>         },
> };
>
> Here struct my_inner_map is complete definition of array map w/ 1000
> elements w/ all the type info for k/v. That struct is used as a
> template for my_outer_map map-in-map. my_initialized_outer_map is the
> case of pre-initialization of array-of-maps w/ instances of existing
> maps imap1 and imap2.

For my_initialized_outer_map, which section does .values end up in the
generated ELF? How much space is going to be allocated? 501 * 4 bytes?

> The idea is that we encode integer fields as array dimensions + use
> pointer to an array to save space. Given that syntax in plain C is a
> bit ugly and hard to remember, we hide that behind __int macro. Then
> in line with __int, we also have __type macro, that hides that hateful
> pointer for key/value types. This allows map definition to be
> self-describing w/o having to look at initialized ELF data section at
> all, except for special cases of explicitly initializing map-in-map or
> prog_array.
>
> What do you think?

I think this is an interesting approach. One thing I'm not sure of is handling
these types from C. For example:

  sizeof(my_outer_map.value)

This compiles, but doesn't produce the intended result. Correct would be:

  sizeof(my_outer_map.value[0])

At that point you have to understand that value is a pointer so all of
our efforts
are for naught. I suspect there is other weirdness like this, but I need to play
with it a little bit more.

> Yeah I can definitely see some confusion here. But it seems like this
> is more of a semantics of map sharing, and maybe it should be some
> extra option for when we have automatic support for extern (shared)
> maps. E.g., something like
>
> __int(sharing, SHARE_STRATEGY_MERGE) vs __int(sharing, SHARE_STRATEGY_OVERWRITE)
>
> Haven't though through exact syntax, naming, semantics, but it seems
> doable to support both, depending on desired behavior.
>
> Maybe we should also unify this w/ pinning? E.g., there are many
> sensible ways to handle already existing pinned map:
>
> 1. Reject program (e.g., if BPF application is the source of truth for that map)
> 2. Use pinned as is (e.g., if BPF application wants to consume data
> from source of truth app)
> 3. Merge (what you described above)
> 4. Replace/reset - not sure if useful/desirable.

From my experience, trying to support many use cases in a purely declarative
fashion ends up creating many edge cases, and quirky behaviour that is hard to
fix later on. It's a bit like merging dictionaries in $LANGUAGE,
which starts out simple and then gets complicated because sometimes you
want to override a key, but lists should be concatenated, except in
that one case...

I wonder: are there many use cases where writing some glue code isn't
possible? With libbpf getting more mature APIs that should become easier and
easier. We could probably support existing iproute2 features that way as well.

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

  reply	other threads:[~2019-06-20 14:49 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17 19:26 [PATCH v2 bpf-next 00/11] BTF-defined BPF map definitions Andrii Nakryiko
2019-06-17 19:26 ` [PATCH v2 bpf-next 01/11] libbpf: add common min/max macro to libbpf_internal.h Andrii Nakryiko
2019-06-17 19:26 ` [PATCH v2 bpf-next 02/11] libbpf: extract BTF loading logic Andrii Nakryiko
2019-06-17 19:40   ` Song Liu
2019-06-17 19:26 ` [PATCH v2 bpf-next 03/11] libbpf: streamline ELF parsing error-handling Andrii Nakryiko
2019-06-17 19:46   ` Song Liu
2019-06-17 19:26 ` [PATCH v2 bpf-next 04/11] libbpf: refactor map initialization Andrii Nakryiko
2019-06-17 19:39   ` Song Liu
2019-06-26 14:48   ` Matt Hart
2019-06-26 18:28     ` Andrii Nakryiko
2019-06-27 15:11       ` Matt Hart
2019-06-17 19:26 ` [PATCH v2 bpf-next 05/11] libbpf: identify maps by section index in addition to offset Andrii Nakryiko
2019-06-17 19:26 ` [PATCH v2 bpf-next 06/11] libbpf: split initialization and loading of BTF Andrii Nakryiko
2019-06-17 19:26 ` [PATCH v2 bpf-next 07/11] libbpf: allow specifying map definitions using BTF Andrii Nakryiko
2019-06-17 19:43   ` Song Liu
2019-06-17 19:26 ` [PATCH v2 bpf-next 08/11] selftests/bpf: add test for BTF-defined maps Andrii Nakryiko
2019-06-17 19:26 ` [PATCH v2 bpf-next 09/11] selftests/bpf: switch BPF_ANNOTATE_KV_PAIR tests to " Andrii Nakryiko
2019-06-17 21:41   ` Song Liu
2019-06-17 22:20     ` Daniel Borkmann
2019-06-17 19:26 ` [PATCH v2 bpf-next 10/11] selftests/bpf: convert tests w/ custom values " Andrii Nakryiko
2019-06-17 19:27 ` [PATCH v2 bpf-next 11/11] selftests/bpf: convert remaining selftests " Andrii Nakryiko
2019-06-17 21:17 ` [PATCH v2 bpf-next 00/11] BTF-defined BPF map definitions Daniel Borkmann
2019-06-17 22:17   ` Daniel Borkmann
2019-06-18 21:37   ` Andrii Nakryiko
2019-06-20 14:49     ` Lorenz Bauer [this message]
2019-06-21  4:19       ` Andrii Nakryiko
2019-06-21 10:29         ` Lorenz Bauer
2019-06-21 17:56           ` Andrii Nakryiko
2019-06-25 18:14             ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACAyw9-L0qx8d9O66SaYhJGjsyKo_6iozqLAQHEVa1AW-U=2Tg@mail.gmail.com' \
    --to=lmb@cloudflare.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jakub.kicinski@netronome.com \
    --cc=joe@wand.net.nz \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).