Re: [PATCH v2 bpf-next 09/13] libbpf: split BTF relocation

From: Alan Maguire <alan.maguire@oracle.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: andrii@kernel.org, ast@kernel.org, jolsa@kernel.org,
	acme@redhat.com, quentin@isovalent.com, eddyz87@gmail.com,
	mykolal@fb.com, daniel@iogearbox.net, martin.lau@linux.dev,
	song@kernel.org, yonghong.song@linux.dev,
	john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com,
	haoluo@google.com, houtao1@huawei.com, bpf@vger.kernel.org,
	masahiroy@kernel.org, mcgrof@kernel.org, nathan@kernel.org
Subject: Re: [PATCH v2 bpf-next 09/13] libbpf: split BTF relocation
Date: Tue, 30 Apr 2024 17:56:29 +0100	[thread overview]
Message-ID: <8483cbf7-6729-471c-8aa8-f88c9e306fe5@oracle.com> (raw)
In-Reply-To: <CAEf4BzYr8ONqLuH0q+FFJijx3ADrqn464pf8E4A3s+uJ03cyVQ@mail.gmail.com>

On 30/04/2024 01:14, Andrii Nakryiko wrote:
> On Wed, Apr 24, 2024 at 8:49 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> Map distilled base BTF type ids referenced in split BTF and their
>> references to the base BTF passed in, and if the mapping succeeds,
>> reparent the split BTF to the base BTF.
>>
>> Relocation rules are
>>
>> - base types must match exactly
>> - enum[64] types should match all value name/value pairs, but the
>>   to-be-relocated enum[64] can also define additional name/value pairs
>> - an enum64 can match an enum and vice versa provided the values match
>>   as described above
>> - named fwds match to the correspondingly-named struct/union/enum/enum64
>> - structs with no members match to the correspondingly-named struct/union
>>   provided their sizes match
>> - anon struct/unions must have field names/offsets specified in base
>>   reference BTF matched by those in base BTF we are matching with
>>
>> Relocation can not recurse, since it will be used in-kernel also and
>> we do not want to blow up the kernel stack when carrying out type
>> compatibility checks.  Hence we use a stack for reference type
>> relocation rather then recursive function calls.  The approach however
>> is the same; we use a depth-first search to match the referents
>> associated with reference types, and work back from there to match
>> the reference type itself.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>>  tools/lib/bpf/Build             |   2 +-
>>  tools/lib/bpf/btf.c             |  58 +++
>>  tools/lib/bpf/btf.h             |   8 +
>>  tools/lib/bpf/btf_relocate.c    | 601 ++++++++++++++++++++++++++++++++
>>  tools/lib/bpf/libbpf.map        |   1 +
>>  tools/lib/bpf/libbpf_internal.h |   2 +
>>  6 files changed, 671 insertions(+), 1 deletion(-)
>>  create mode 100644 tools/lib/bpf/btf_relocate.c
>>
>> diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
>> index b6619199a706..336da6844d42 100644
>> --- a/tools/lib/bpf/Build
>> +++ b/tools/lib/bpf/Build
>> @@ -1,4 +1,4 @@
>>  libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
>>             netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \
>>             btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
>> -           usdt.o zip.o elf.o features.o
>> +           usdt.o zip.o elf.o features.o btf_relocate.o
>> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
>> index 9036c1dc45d0..f00a84fea9b5 100644
>> --- a/tools/lib/bpf/btf.c
>> +++ b/tools/lib/bpf/btf.c
>> @@ -5541,3 +5541,61 @@ int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf,
>>         errno = -ret;
>>         return ret;
>>  }
>> +
>> +struct btf_rewrite_strs {
>> +       struct btf *btf;
>> +       const struct btf *old_base_btf;
>> +       int str_start;
>> +       int str_diff;
>> +};
>> +
>> +static int btf_rewrite_strs(__u32 *str_off, void *ctx)
>> +{
>> +       struct btf_rewrite_strs *r = ctx;
>> +       const char *s;
>> +       int off;
>> +
>> +       if (!*str_off)
>> +               return 0;
>> +       if (*str_off >= r->str_start) {
>> +               *str_off += r->str_diff;
>> +       } else {
>> +               s = btf__str_by_offset(r->old_base_btf, *str_off);
>> +               if (!s)
>> +                       return -ENOENT;
>> +               off = btf__add_str(r->btf, s);
>> +               if (off < 0)
>> +                       return off;
>> +               *str_off = off;
>> +       }
>> +       return 0;
>> +}
>> +
>> +int btf_set_base_btf(struct btf *btf, struct btf *base_btf)
>> +{
>> +       struct btf_rewrite_strs r = {};
>> +       struct btf_type *t;
>> +       int i, err;
>> +
>> +       r.old_base_btf = btf__base_btf(btf);
>> +       if (!r.old_base_btf)
>> +               return -EINVAL;
>> +       r.btf = btf;
>> +       r.str_start = r.old_base_btf->hdr->str_len;
>> +       r.str_diff = base_btf->hdr->str_len - r.old_base_btf->hdr->str_len;
>> +       btf->base_btf = base_btf;
>> +       btf->start_id = btf__type_cnt(base_btf);
>> +       btf->start_str_off = base_btf->hdr->str_len;
>> +       for (i = 0; i < btf->nr_types; i++) {
>> +               t = (struct btf_type *)btf__type_by_id(btf, i + btf->start_id);
> 
> btf_type_by_id()
> 
>> +               err = btf_type_visit_str_offs(t, btf_rewrite_strs, &r);
>> +               if (err)
>> +                       break;
>> +       }
>> +       return err;
>> +}
>> +
>> +int btf__relocate(struct btf *btf, const struct btf *base_btf)
>> +{
>> +       return btf_relocate(btf, base_btf, NULL);
>> +}
> 
> [...]
> 
>> +               /* either names must match or both be anon. */
>> +               if (t->name_off && nt->name_off) {
>> +                       if (strcmp(btf__name_by_offset(r->btf, t->name_off),
>> +                                  btf__name_by_offset(r->base_btf, nt->name_off)))
>> +                               continue;
>> +               } else if (t->name_off != nt->name_off) {
>> +                       continue;
>> +               }
> 
> btf__name_by_offset(0) return "", so you don't need this if/else
> guard, just do strcmp()
> 
>> +               *tp = nt;
>> +               *id = i;
>> +               return 0;
>> +       }
>> +       return -ENOENT;
>> +}
>> +
>> +static int btf_relocate_int(struct btf_relocate *r, const char *name,
>> +                            const struct btf_type *t, const struct btf_type *bt)
>> +{
>> +       __u8 encoding, bencoding, bits, bbits;
>> +
>> +       if (t->size != bt->size) {
>> +               pr_warn("INT types '%s' disagree on size; distilled base BTF says %d; base BTF says %d\n",
>> +                       name, t->size, bt->size);
>> +               return -EINVAL;
>> +       }
>> +       encoding = btf_int_encoding(t);
>> +       bencoding = btf_int_encoding(bt);
>> +       if (encoding != bencoding) {
>> +               pr_warn("INT types '%s' disagree on encoding; distilled base BTF says '(%s/%s/%s); base BTF says '(%s/%s/%s)'\n",
>> +                       name,
>> +                       encoding & BTF_INT_SIGNED ? "signed" : "unsigned",
>> +                       encoding & BTF_INT_CHAR ? "char" : "nonchar",
>> +                       encoding & BTF_INT_BOOL ? "bool" : "nonbool",
>> +                       bencoding & BTF_INT_SIGNED ? "signed" : "unsigned",
>> +                       bencoding & BTF_INT_CHAR ? "char" : "nonchar",
>> +                       bencoding & BTF_INT_BOOL ? "bool" : "nonbool");
>> +               return -EINVAL;
>> +       }
>> +       bits = btf_int_bits(t);
>> +       bbits = btf_int_bits(bt);
>> +       if (bits != bbits) {
> 
> nit: this b* prefix is a bit ugly, maybe use enc/base_enc and bits/base_bits?
> 
>> +               pr_warn("INT types '%s' disagree on bit size; distilled base BTF says %d; base BTF says %d\n",
>> +                       name, bits, bbits);
>> +               return -EINVAL;
>> +       }
>> +       return 0;
>> +}
>> +
>> +static int btf_relocate_float(struct btf_relocate *r, const char *name,
>> +                              const struct btf_type *t, const struct btf_type *bt)
>> +{
>> +
>> +       if (t->size != bt->size) {
>> +               pr_warn("float types '%s' disagree on size; distilled base BTF says %d; base BTF says %d\n",
>> +                       name, t->size, bt->size);
>> +               return -EINVAL;
>> +       }
>> +       return 0;
>> +}
>> +
>> +/* ensure each enum[64] value in type t has equivalent in base BTF and that
>> + * values match; we must support matching enum64 to enum and vice versa
>> + * as well as enum to enum and enum64 to enum64.
>> + */
>> +static int btf_relocate_enum(struct btf_relocate *r, const char *name,
>> +                             const struct btf_type *t, const struct btf_type *bt)
>> +{
>> +       struct btf_enum *v = btf_enum(t);
>> +       struct btf_enum *bv = btf_enum(bt);
>> +       struct btf_enum64 *v64 = btf_enum64(t);
>> +       struct btf_enum64 *bv64 = btf_enum64(bt);
>> +       bool found, match, bisenum, isenum;
> 
> is_enum? bisenum is a bit too much to read without underscores (and
> I'd still use base_ prefix)
> 
>> +       const char *vname, *bvname;
>> +       __u32 name_off, bname_off;
>> +       __u64 val = 0, bval = 0;
>> +       int i, j;
>> +
> 
> [...]
> 
>> +               if (!match) {
>> +                       if (t->name_off)
>> +                               pr_warn("ENUM[64] types '%s' disagree on enum value '%s'; distilled base BTF specifies value %lld; base BTF specifies value %lld\n",
>> +                                       name, vname, val, bval);
>> +                       return -EINVAL;
>> +               }
> 
> What's the motivation to check enum values if we don't really do any
> check like this for struct/union? It feels like just checking that
> enum names and byte sizes match would be adequate, no?
> 
> I have similar feelings about INT checks, we assume the kernel module
> was built against valid base BTF in the first place, so as long as
> general memory layout matches, it should be OK to relocate. So I'd
> stick to NAME + size checks.
> 
> If the kernel module was built with an enum definition that's not
> compatible with the base kernel, it's a bigger problem than BTF. Just
> like what we discussed with STRUCT/UNION.
> 
>> +       }
>> +       return 0;
>> +}
>> +
>> +/* relocate base types (int, float, enum, enum64 and fwd) */
>> +static int btf_relocate_base_type(struct btf_relocate *r, __u32 id)
>> +{
>> +       const struct btf_type *t = btf_type_by_id(r->dist_base_btf, id);
>> +       const char *name = btf__name_by_offset(r->dist_base_btf, t->name_off);
>> +       const struct btf_type *bt = NULL;
>> +       __u32 base_id = 0;
>> +       int err = 0;
>> +
>> +       switch (btf_kind(t)) {
>> +       case BTF_KIND_INT:
>> +       case BTF_KIND_ENUM:
>> +       case BTF_KIND_FLOAT:
>> +       case BTF_KIND_ENUM64:
>> +       case BTF_KIND_FWD:
>> +               break;
>> +       default:
>> +               return 0;
> 
> why this is not an error?
> 
>> +       }
>> +
>> +       if (r->map[id] <= BTF_MAX_NR_TYPES)
>> +               return 0;
>> +
>> +       while ((err = btf_relocate_find_next(r, t, &base_id, &bt)) != -ENOENT) {
>> +               bt = btf_type_by_id(r->base_btf, base_id);
>> +               switch (btf_kind(t)) {
>> +               case BTF_KIND_INT:
>> +                       err = btf_relocate_int(r, name, t, bt);
>> +                       break;
>> +               case BTF_KIND_ENUM:
>> +               case BTF_KIND_ENUM64:
>> +                       err = btf_relocate_enum(r, name, t, bt);
>> +                       break;
>> +               case BTF_KIND_FLOAT:
>> +                       err = btf_relocate_float(r, name, t, bt);
>> +                       break;
>> +               case BTF_KIND_FWD:
>> +                       err = 0;
>> +                       break;
>> +               default:
>> +                       return 0;
>> +               }
>> +               if (!err) {
>> +                       r->map[id] = base_id;
>> +                       return 0;
>> +               }
>> +       }
> 
> I'm apprehensive of this linear search (many times) over vmlinux BTF,
> it feels slow and sloppy, tbh
> 
> What if we mandate that distilled base BTF should be sorted by (kind,
> name) by pahole/libbpf (which is simple enough to do), and then we can
> do a single linear pass over vmlinux BTF + quick binary search over
> distilled base BTF, marking (on the side) which base distilled BTF
> type was processed. Then keep a pointer of processed distilled base
> BTF types, and if at the end it doesn't match base distilled BTF
> number of types, we couldn't relocate some of base types.
> 

Hmm, so are you saying something like

	foreach vmlinux type
		binary search for an equivalent distilled base type, and record the
mapping

? Would be great to just have to iterate once alright.

> Simple and fast, WDYT? Or if we don't want to make pahole/libbpf sort,
> we can build *distilled base* index cheaply in memory, and do
> effectively the same (that's perhaps a bit more robust, but I think we
> can just say that distilled base has to be sorted).
>

Sorting BTF is something that's come up a lot. We should probably do it;
more below..

> For STRUCT/UNION we'd need to do two searches, once for FWD+name and
> if not found (embedded struct/union case) STRUCT/UNION+name, but
> that's still fast with two binary searches.
> 
> A lot of the code below would go away (once we keep only named types
> in distilled base), so I didn't spend much time reviewing it, sorry.
>

The only concern I'd have is that the kernel would I suppose need to be
skeptical of getting sorted data (in distilled base or elsewhere), so
we'd probably need to validate sort order I guess. We could share some
of the mechanics of sorting in btf_common.c to do that specifically for
.BTF.base, but thinking about it, as part of general BTF validation we
could mark BTF as sorted or not. What would be nice about this is that
once we knew BTF was sorted,  we could speed up btf_find_by_name_kind()
by using binary search on the sorted BTF.

>> +       return err;
>> +}
>> +
> 
> [...]