bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@kernel.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alan Maguire <alan.maguire@oracle.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Quentin Monnet <quentin@isovalent.com>,
	Mykola Lysenko <mykolal@fb.com>, bpf <bpf@vger.kernel.org>
Subject: Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
Date: Tue, 06 Jun 2023 13:30:11 +0200	[thread overview]
Message-ID: <878rcw3k1o.fsf@toke.dk> (raw)
In-Reply-To: <CAADnVQJAmYgR91WKJ_Jif6c3ja=OAmkMXoUO9sTnmp-xmnbVJQ@mail.gmail.com>

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Jun 5, 2023 at 3:38 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
>>
>> On Mon, Jun 5, 2023 at 9:15 AM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> >
>> > On Fri, Jun 2, 2023 at 1:34 PM Andrii Nakryiko
>> > <andrii.nakryiko@gmail.com> wrote:
>> > >
>> > > >
>> > > > > > > >> +
>> > > > > > > >> +struct btf_kind_meta {
>> > > > > > > >> +    __u32 name_off;         /* kind name string offset */
>> > > > >
>> > > > > I'm not sure why we'd need to record this for every KIND? The tool
>> > > > > that doesn't know about this new kind can't do much about it anyways,
>> > > > > so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
>> > > > > doesn't make much difference?
>> > > >
>> > > > The name is certainly more meaningful than 123.
>> > > > bpftool output is consumed by humans who will be able to tell the difference.
>> > > > I'd keep the name here.
>> > >
>> > > Ok, it's fine. When I was originally proposing this compact metadata,
>> > > I was trying to make it as minimal as possible, so adding 80 bytes
>> > > just for string offset fields (plus a bunch of strings) felt like an
>> > > unnecessary overhead. But it's not a big deal.
>> >
>> > Exactly. It's just 4 * num_kinds bytes in and ~20 * num_kinds for
>> > names, but it implements 'self description'.
>> > Otherwise the names become an external knowledge and BTF is not self described.
>> >
>> >
>> > > >
>> > > > > > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
>> > > > >
>> > > > > Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
>> > > > > existing kernels (and potentially many tools that parse BTF). Given we
>> > > > > can actually extend BTF in backwards compatible way by just adding an
>> > > > > optional two fields to btf_header + extra bytes for metadata sections,
>> > > > > why making our lives harder by bumping this version?
>> > > >
>> > > > I fail to see how bumping the version makes it harder.
>> > > > libbpf needs to sanitize meta* fields in the struct btf_header on
>> > > > older kernels anway. At the same time sanitizing the version from 2 to
>> > > > 1
>> > > > in the same header is one extra line of code in libbpf.
>> > > > What am I missing?
>> > >
>> > > So I checked libbpf code, and libbpf doesn't really check the version
>> > > field. So for the most part this BTF_VERSION bump wouldn't matter for
>> > > any tool that's based on libbpf's struct btf API. But if libbpf did
>> > > check version (as it probably should have), then by upgrading to newer
>> > > Clang that would emit BTF with this metadata (but no new fancy
>> > > BTF_KIND_KERNEL_FUNC or anything like that), we automatically make
>> > > such BTF incompatible with all those tools.
>> > >
>> > > Kernel is a bit different because it's extremely strict about BTF. I'm
>> > > more worried about tools like bpftool (but we don't check BTF_VERSION
>> > > there due to libbpf), llvm-objdump (when it supports BTF), etc.
>> > >
>> > > On the other hand, what do we gain by bumping this BTF_VERSION?
>> >
>> > The version bump will be an indication that
>> > v2 of BTF has enough info in the format for any tool/kernel to consume it.
>> > With v2 we should make BTF_KIND_META description mandatory.
>> > If we keep it as v1 then the presence of BTF_KIND_META would be
>> > an indication of 'self described' format.
>> > Which is also ok-ish, but seems less clean.
>> > zero vs not-zero of meta_off in btf_header is pretty much v1 vs v2.
>> >
>>
>> We had a long offline discussion w/ Alexei about this whole
>> self-describing BTF, and I will try to summarize it here a bit,
>> because I think we both think about "self-describing" differently, and
>> as a result few different aspects are conflated with each other (there
>> are at least 3(!) different things here).
>
> Thanks for summarizing. All correct.
>
>> From my perspective, this self-describing BTF metadata was purely
>> designed to allow tools without latest BTF knowledge to be able to
>> skip over unknown BTF_KIND_xxx, at most being able to tell whether
>> it's critical for understanding BTF (that's the OPTIONAL flag) or not.
>> I.e., with older bpftool (but the one that knows about btf_metadata,
>> of course), it would still be possible to `bpftool btf dump file
>> <file-with-newer-btf-kinds>` just fine, except for new KINDS (which
>> would be just emitted as "unknown BTF_KIND_XXX, skipping...".
>>
>> I think this problem is solved with this fixed + per-vlen sz and those
>> few extra flags.
>
> I'm fine with this approach as long as we don't fool ourselves that
> we're doing a "self described" format.
> We have a "size" field in btf_header. With this btf_metadata extension
> we're effectively adding "size" fields for each btf kind and its vlen part.
> So what Alan proposed:
> +struct btf_kind_meta {
> +       __u16 flags;            /* see BTF_KIND_META_* values above */
> +       __u8 info_sz;           /* size of singular element after btf_type */
> +       __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
> +};
>
> _without_ name_off it makes the most sense.
>
> As soon as we're trying to add 'name_off' to the kind we're falling into
> the trap of thinking that we're adding "self described" format and
> btf_kind_meta needs to actually describe it for printing (with
> real name and not just integer id) and further trying to describe
> semantics of unknown kind with another flag that Andrii's proposed:
> "Another flag I was thinking about was a flag whether struct btf_type's
> type/size field is a type or a size (or something else)."
>
> imo name_off and that other flag in addition to optional_or_not flag
> are carrying the concept too far.
>
> We should just say upfront that this "struct btf_kind_meta" is to be
> able to extend BTF easier and nothing else.
> "old" bpftool will be able to skip unknown kinds, but dedup
> probably won't be able to skip much anyway.
>
> I'd also call it "struct btf_kind_description|layout|sizes"
> to narrow the scope.
> This BTF extension is not going to describe semantics of unknown kinds.
> Instead of "best effort" attempts with flags like "what type/size means"
> let's not even go there.
>
> If we go this simple route I'm fine with hard coded crc and base_crc
> fields. They probably should go to btf_header though.
> We don't need "struct btf_metadata" as well.
> It's making things sound beyond what it actually is.
> btf_header can point to an array of struct btf_kind_description.
> As simple as it can get.
> No need for json like format and key/value things either.
> We're not creating a self described BTF format.
> We're just adding a few size fields.
> The kernel/libbpf/dedup still needs to known semantics of future kinds
> to be able to print/operate on them.

I've only been following this discussion on the sidelines, but FWIW I
agree that it is futile to try to describe semantics of fields inside
the format. Anything that needs to do transformations on the whole of
the BTF is going to have to understand the semantics anyway. And a
pretty-printer can just skip over the fields it doesn't understand and
emit a "unknown type XXX" message when doing so.

I'll also add that I am thrilled with the effort to make sure new BTF
kinds always embed their length so parsers can skip over them; the fact
that the older ones don't is, IMO, one of the biggest flaws of the BTF
format, and I'm thrilled to see it fixed! The "type-length-value with a
'required' flag" is also a pretty standard way to do this in, e.g.,
network protocols.

As for bumping the version number, I don't think it's a good idea to
deliberately break compatibility this way unless it's absolutely
necessary. With "absolutely necessary" meaning "things will break in
subtle ways in any case, so it's better to make the breakage obvious".
But it libbpf is not checking the version field anyway, that becomes
kind of a moot point, as bumping it doesn't really gain us anything,
then...

-Toke

  reply	other threads:[~2023-06-06 11:30 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI Alan Maguire
2023-06-01  3:53   ` Alexei Starovoitov
2023-06-01 10:36     ` Alan Maguire
2023-06-01 16:53       ` Alexei Starovoitov
2023-06-02 16:32         ` Andrii Nakryiko
2023-06-02 16:34           ` Andrii Nakryiko
2023-06-02 18:11           ` Alexei Starovoitov
2023-06-02 20:33             ` Andrii Nakryiko
2023-06-05 16:14               ` Alexei Starovoitov
2023-06-05 22:38                 ` Andrii Nakryiko
2023-06-06  2:46                   ` Alexei Starovoitov
2023-06-06 11:30                     ` Toke Høiland-Jørgensen [this message]
2023-06-07 11:55                       ` Eduard Zingerman
2023-06-07 15:29                         ` Yonghong Song
2023-06-07 16:14                           ` Eduard Zingerman
2023-06-07 21:47                             ` Andrii Nakryiko
2023-06-07 22:05                               ` Eduard Zingerman
2023-06-07 22:34                                 ` Andrii Nakryiko
2023-06-06 16:50                     ` Andrii Nakryiko
2023-06-07  1:16                       ` Alexei Starovoitov
2023-06-07 21:43                         ` Andrii Nakryiko
2023-05-31 20:19 ` [RFC bpf-next 2/8] libbpf: support handling of metadata section in BTF Alan Maguire
2023-06-05 11:01   ` Jiri Olsa
2023-06-05 21:40     ` Andrii Nakryiko
2023-05-31 20:19 ` [RFC bpf-next 3/8] libbpf: use metadata to compute an unknown kind size Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 4/8] btf: support kernel parsing of BTF with metadata, use it to parse BTF with unknown kinds Alan Maguire
2023-06-07 19:51   ` Eduard Zingerman
2023-05-31 20:19 ` [RFC bpf-next 5/8] libbpf: add metadata encoding support Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 6/8] btf: generate metadata for vmlinux/module BTF Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 7/8] bpftool: add BTF dump "format meta" to dump header/metadata Alan Maguire
2023-06-01 16:33   ` Quentin Monnet
2023-06-02 16:57   ` Andrii Nakryiko
2023-05-31 20:19 ` [RFC bpf-next 8/8] selftests/bpf: test kind encoding/decoding Alan Maguire
2023-05-31 20:19 ` [RFC dwarves] dwarves: encode BTF metadata if --btf_gen_meta is set Alan Maguire

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878rcw3k1o.fsf@toke.dk \
    --to=toke@kernel.org \
    --cc=acme@kernel.org \
    --cc=alan.maguire@oracle.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mykolal@fb.com \
    --cc=quentin@isovalent.com \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).