bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alan Maguire <alan.maguire@oracle.com>,
	Alexei Starovoitov <ast@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	 Arnaldo Carvalho de Melo <acme@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>,  Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	 Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,
	 Quentin Monnet <quentin@isovalent.com>,
	Mykola Lysenko <mykolal@fb.com>, bpf <bpf@vger.kernel.org>
Subject: Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
Date: Tue, 6 Jun 2023 09:50:42 -0700	[thread overview]
Message-ID: <CAEf4BzYG2FFcM_0mkiARzKnYinQYHpWE8ct35Z==-Fsefv9oQw@mail.gmail.com> (raw)
In-Reply-To: <CAADnVQJAmYgR91WKJ_Jif6c3ja=OAmkMXoUO9sTnmp-xmnbVJQ@mail.gmail.com>

On Mon, Jun 5, 2023 at 7:46 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jun 5, 2023 at 3:38 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jun 5, 2023 at 9:15 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Jun 2, 2023 at 1:34 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > >
> > > > > > > > >> +
> > > > > > > > >> +struct btf_kind_meta {
> > > > > > > > >> +    __u32 name_off;         /* kind name string offset */
> > > > > >
> > > > > > I'm not sure why we'd need to record this for every KIND? The tool
> > > > > > that doesn't know about this new kind can't do much about it anyways,
> > > > > > so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
> > > > > > doesn't make much difference?
> > > > >
> > > > > The name is certainly more meaningful than 123.
> > > > > bpftool output is consumed by humans who will be able to tell the difference.
> > > > > I'd keep the name here.
> > > >
> > > > Ok, it's fine. When I was originally proposing this compact metadata,
> > > > I was trying to make it as minimal as possible, so adding 80 bytes
> > > > just for string offset fields (plus a bunch of strings) felt like an
> > > > unnecessary overhead. But it's not a big deal.
> > >
> > > Exactly. It's just 4 * num_kinds bytes in and ~20 * num_kinds for
> > > names, but it implements 'self description'.
> > > Otherwise the names become an external knowledge and BTF is not self described.
> > >
> > >
> > > > >
> > > > > > > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
> > > > > >
> > > > > > Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
> > > > > > existing kernels (and potentially many tools that parse BTF). Given we
> > > > > > can actually extend BTF in backwards compatible way by just adding an
> > > > > > optional two fields to btf_header + extra bytes for metadata sections,
> > > > > > why making our lives harder by bumping this version?
> > > > >
> > > > > I fail to see how bumping the version makes it harder.
> > > > > libbpf needs to sanitize meta* fields in the struct btf_header on
> > > > > older kernels anway. At the same time sanitizing the version from 2 to
> > > > > 1
> > > > > in the same header is one extra line of code in libbpf.
> > > > > What am I missing?
> > > >
> > > > So I checked libbpf code, and libbpf doesn't really check the version
> > > > field. So for the most part this BTF_VERSION bump wouldn't matter for
> > > > any tool that's based on libbpf's struct btf API. But if libbpf did
> > > > check version (as it probably should have), then by upgrading to newer
> > > > Clang that would emit BTF with this metadata (but no new fancy
> > > > BTF_KIND_KERNEL_FUNC or anything like that), we automatically make
> > > > such BTF incompatible with all those tools.
> > > >
> > > > Kernel is a bit different because it's extremely strict about BTF. I'm
> > > > more worried about tools like bpftool (but we don't check BTF_VERSION
> > > > there due to libbpf), llvm-objdump (when it supports BTF), etc.
> > > >
> > > > On the other hand, what do we gain by bumping this BTF_VERSION?
> > >
> > > The version bump will be an indication that
> > > v2 of BTF has enough info in the format for any tool/kernel to consume it.
> > > With v2 we should make BTF_KIND_META description mandatory.
> > > If we keep it as v1 then the presence of BTF_KIND_META would be
> > > an indication of 'self described' format.
> > > Which is also ok-ish, but seems less clean.
> > > zero vs not-zero of meta_off in btf_header is pretty much v1 vs v2.
> > >
> >
> > We had a long offline discussion w/ Alexei about this whole
> > self-describing BTF, and I will try to summarize it here a bit,
> > because I think we both think about "self-describing" differently, and
> > as a result few different aspects are conflated with each other (there
> > are at least 3(!) different things here).
>
> Thanks for summarizing. All correct.
>
> > From my perspective, this self-describing BTF metadata was purely
> > designed to allow tools without latest BTF knowledge to be able to
> > skip over unknown BTF_KIND_xxx, at most being able to tell whether
> > it's critical for understanding BTF (that's the OPTIONAL flag) or not.
> > I.e., with older bpftool (but the one that knows about btf_metadata,
> > of course), it would still be possible to `bpftool btf dump file
> > <file-with-newer-btf-kinds>` just fine, except for new KINDS (which
> > would be just emitted as "unknown BTF_KIND_XXX, skipping...".
> >
> > I think this problem is solved with this fixed + per-vlen sz and those
> > few extra flags.
>
> I'm fine with this approach as long as we don't fool ourselves that
> we're doing a "self described" format.
> We have a "size" field in btf_header. With this btf_metadata extension
> we're effectively adding "size" fields for each btf kind and its vlen part.
> So what Alan proposed:
> +struct btf_kind_meta {
> +       __u16 flags;            /* see BTF_KIND_META_* values above */
> +       __u8 info_sz;           /* size of singular element after btf_type */
> +       __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
> +};
>
> _without_ name_off it makes the most sense.

Yep. I didn't see much need for name_off as well.

>
> As soon as we're trying to add 'name_off' to the kind we're falling into
> the trap of thinking that we're adding "self described" format and
> btf_kind_meta needs to actually describe it for printing (with
> real name and not just integer id) and further trying to describe
> semantics of unknown kind with another flag that Andrii's proposed:
> "Another flag I was thinking about was a flag whether struct btf_type's
> type/size field is a type or a size (or something else)."
>
> imo name_off and that other flag in addition to optional_or_not flag
> are carrying the concept too far.
>
> We should just say upfront that this "struct btf_kind_meta" is to be
> able to extend BTF easier and nothing else.

Yep, that was precisely (at least my) intent. It's great that we are
converging on this.


> "old" bpftool will be able to skip unknown kinds, but dedup
> probably won't be able to skip much anyway.

Agreed, I don't think we can ever make BTF dedup work reliably with
KINDs it doesn't understand. I wouldn't even try. I'd also say that
kernel should keep being strict about this (even if we add
"is-it-optional" field, kernel can't trust it). Libbpf and other
libraries will have to keep sanitizing BTF anyways.

I'm also ok with dropping "optional_or_not" flag. Same for
"does-it-point-to-type" flag. I can see some use for the latter (e.g.,
still being able to calculate sizes and stuff), but it's nothing super
critical, IMO.

>
> I'd also call it "struct btf_kind_description|layout|sizes"

I like btf_kind_layout, it's short and to the point.

> to narrow the scope.
> This BTF extension is not going to describe semantics of unknown kinds.
> Instead of "best effort" attempts with flags like "what type/size means"
> let's not even go there.
>

Ack.

> If we go this simple route I'm fine with hard coded crc and base_crc
> fields. They probably should go to btf_header though.

Yep, on btf_header fields. But I'd not hardcode "crc" name. If we are
doing them as strings (which I think we should instead of dooming them
to 32-bit integer crc32 value only), then can we just say generically
that it's either "id" or "checksum"?

But I guess crc32 would be fine in practice as well. So not something
I strongly feel about.

> We don't need "struct btf_metadata" as well.
> It's making things sound beyond what it actually is.
> btf_header can point to an array of struct btf_kind_description.
> As simple as it can get.

Agreed. Still, it's a third section, and we should at least have a
count of those btf_kind_layout items somewhere.

> No need for json like format and key/value things either.
> We're not creating a self described BTF format.
> We're just adding a few size fields.

Ack, great.

> The kernel/libbpf/dedup still needs to known semantics of future kinds
> to be able to print/operate on them.

Yes.

  parent reply	other threads:[~2023-06-06 16:50 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI Alan Maguire
2023-06-01  3:53   ` Alexei Starovoitov
2023-06-01 10:36     ` Alan Maguire
2023-06-01 16:53       ` Alexei Starovoitov
2023-06-02 16:32         ` Andrii Nakryiko
2023-06-02 16:34           ` Andrii Nakryiko
2023-06-02 18:11           ` Alexei Starovoitov
2023-06-02 20:33             ` Andrii Nakryiko
2023-06-05 16:14               ` Alexei Starovoitov
2023-06-05 22:38                 ` Andrii Nakryiko
2023-06-06  2:46                   ` Alexei Starovoitov
2023-06-06 11:30                     ` Toke Høiland-Jørgensen
2023-06-07 11:55                       ` Eduard Zingerman
2023-06-07 15:29                         ` Yonghong Song
2023-06-07 16:14                           ` Eduard Zingerman
2023-06-07 21:47                             ` Andrii Nakryiko
2023-06-07 22:05                               ` Eduard Zingerman
2023-06-07 22:34                                 ` Andrii Nakryiko
2023-06-06 16:50                     ` Andrii Nakryiko [this message]
2023-06-07  1:16                       ` Alexei Starovoitov
2023-06-07 21:43                         ` Andrii Nakryiko
2023-05-31 20:19 ` [RFC bpf-next 2/8] libbpf: support handling of metadata section in BTF Alan Maguire
2023-06-05 11:01   ` Jiri Olsa
2023-06-05 21:40     ` Andrii Nakryiko
2023-05-31 20:19 ` [RFC bpf-next 3/8] libbpf: use metadata to compute an unknown kind size Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 4/8] btf: support kernel parsing of BTF with metadata, use it to parse BTF with unknown kinds Alan Maguire
2023-06-07 19:51   ` Eduard Zingerman
2023-05-31 20:19 ` [RFC bpf-next 5/8] libbpf: add metadata encoding support Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 6/8] btf: generate metadata for vmlinux/module BTF Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 7/8] bpftool: add BTF dump "format meta" to dump header/metadata Alan Maguire
2023-06-01 16:33   ` Quentin Monnet
2023-06-02 16:57   ` Andrii Nakryiko
2023-05-31 20:19 ` [RFC bpf-next 8/8] selftests/bpf: test kind encoding/decoding Alan Maguire
2023-05-31 20:19 ` [RFC dwarves] dwarves: encode BTF metadata if --btf_gen_meta is set Alan Maguire

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzYG2FFcM_0mkiARzKnYinQYHpWE8ct35Z==-Fsefv9oQw@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=acme@kernel.org \
    --cc=alan.maguire@oracle.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mykolal@fb.com \
    --cc=quentin@isovalent.com \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).