Re: [PATCH bpf-next 01/12] bpf: Add btf enum64 support

From: Yonghong Song <yhs@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 01/12] bpf: Add btf enum64 support
Date: Tue, 10 May 2022 15:06:38 -0700	[thread overview]
Message-ID: <be27f832-c803-1ab0-2180-74bf7177ca41@fb.com> (raw)
In-Reply-To: <CAEf4BzbqQDVsiaY1u5G6QAu_3Zq8Vn19qBkzuzVYX0T_e3OLSw@mail.gmail.com>

On 5/9/22 3:29 PM, Andrii Nakryiko wrote:
> On Sun, May 1, 2022 at 12:00 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> Currently, BTF only supports upto 32bit enum value with BTF_KIND_ENUM.
>> But in kernel, some enum indeed has 64bit values, e.g.,
>> in uapi bpf.h, we have
>>    enum {
>>          BPF_F_INDEX_MASK                = 0xffffffffULL,
>>          BPF_F_CURRENT_CPU               = BPF_F_INDEX_MASK,
>>          BPF_F_CTXLEN_MASK               = (0xfffffULL << 32),
>>    };
>> In this case, BTF_KIND_ENUM will encode the value of BPF_F_CTXLEN_MASK
>> as 0, which certainly is incorrect.
>>
>> This patch added a new btf kind, BTF_KIND_ENUM64, which permits
>> 64bit value to cover the above use case. The BTF_KIND_ENUM64 has
>> the following three bytes followed by the common type:
> 
> you probably meant three fields, not bytes

correct.

> 
>>    struct bpf_enum64 {
>>      __u32 nume_off;
>>      __u32 hi32;
>>      __u32 lo32;
> 
> I'd like to nitpick on name here, as hi/lo of what? Maybe val_hi32 and
> val_lo32? Can we also reverse the order here? For x86 you'll be able
> to use &lo32 to get value directly if you really want, without a local
> copy. It also just logically seems better to have something low first,
> then high next.

I can go with val_hi32, val_lo32 and put val_lo32 before val_hi32.
I don't have any preference for the ordering of these two fields.

> 
> 
>>    };
>> Currently, btf type section has an alignment of 4 as all element types
>> are u32. Representing the value with __u64 will introduce a pad
>> for bpf_enum64 and may also introduce misalignment for the 64bit value.
>> Hence, two members of hi32 and lo32 are chosen to avoid these issues.
>>
>> The kflag is also introduced for BTF_KIND_ENUM and BTF_KIND_ENUM64
>> to indicate whether the value is signed or unsigned. The kflag intends
>> to provide consistent output of BTF C fortmat with the original
>> source code. For example, the original BTF_KIND_ENUM bit value is 0xffffffff.
>> The format C has two choices, print out 0xffffffff or -1 and current libbpf
>> prints out as unsigned value. But if the signedness is preserved in btf,
>> the value can be printed the same as the original source code.
>>
>> The new BTF_KIND_ENUM64 is intended to support the enum value represented as
>> 64bit value. But it can represent all BTF_KIND_ENUM values as well.
>> The value size of BTF_KIND_ENUM64 is encoded to 8 to represent its intent.
>> The compiler ([1]) and pahole will generate BTF_KIND_ENUM64 only if the value has
>> to be represented with 64 bits.
>>
>>    [1] https://reviews.llvm.org/D124641
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   include/linux/btf.h            |  18 ++++-
>>   include/uapi/linux/btf.h       |  17 ++++-
>>   kernel/bpf/btf.c               | 132 ++++++++++++++++++++++++++++++---
>>   tools/include/uapi/linux/btf.h |  17 ++++-
>>   4 files changed, 168 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>> index 2611cea2c2b6..280c33c9414a 100644
>> --- a/include/linux/btf.h
>> +++ b/include/linux/btf.h
>> @@ -174,7 +174,8 @@ static inline bool btf_type_is_small_int(const struct btf_type *t)
>>
>>   static inline bool btf_type_is_enum(const struct btf_type *t)
>>   {
>> -       return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM;
>> +       return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM ||
>> +              BTF_INFO_KIND(t->info) == BTF_KIND_ENUM64;
>>   }
> 
> a bit hesitant about this change, there is no helper to check for ENUM
> vs ENUM64. Inside the kernel this change seems to be correct as we
> don't care, but for libbpf I'd probably keep btf_is_enum() unchanged
> (you can't really work with them in the same generic way in
> user-space, as their memory layout is very different, so it's better
> not to generalize them unnecessarily)

Let me introduce a new helper called
btf_type_is_any_enum(...) to check both
BTF_KIND_ENUM or BTF_KIND_ENUM64.

> 
>>
>>   static inline bool str_is_empty(const char *s)
>> @@ -192,6 +193,16 @@ static inline bool btf_is_enum(const struct btf_type *t)
>>          return btf_kind(t) == BTF_KIND_ENUM;
>>   }
>>
>> +static inline bool btf_is_enum64(const struct btf_type *t)
>> +{
>> +       return btf_kind(t) == BTF_KIND_ENUM64;
>> +}
>> +
>> +static inline u64 btf_enum64_value(const struct btf_enum64 *e)
>> +{
>> +       return (u64)e->hi32 << 32 | e->lo32;
> 
> this might be correct but () around bit shift would make it more obvious

I can do this.

> 
>> +}
>> +
>>   static inline bool btf_is_composite(const struct btf_type *t)
>>   {
>>          u16 kind = btf_kind(t);
>> @@ -332,6 +343,11 @@ static inline struct btf_enum *btf_enum(const struct btf_type *t)
>>          return (struct btf_enum *)(t + 1);
>>   }
>>
>> +static inline struct btf_enum64 *btf_enum64(const struct btf_type *t)
>> +{
>> +       return (struct btf_enum64 *)(t + 1);
>> +}
>> +
>>   static inline const struct btf_var_secinfo *btf_type_var_secinfo(
>>                  const struct btf_type *t)
>>   {
>> diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
>> index a9162a6c0284..2aac226a27b2 100644
>> --- a/include/uapi/linux/btf.h
>> +++ b/include/uapi/linux/btf.h
>> @@ -36,10 +36,10 @@ struct btf_type {
>>           * bits 24-28: kind (e.g. int, ptr, array...etc)
>>           * bits 29-30: unused
>>           * bit     31: kind_flag, currently used by
>> -        *             struct, union and fwd
>> +        *             struct, union, enum, fwd and enum64
> 
> see comment below on kflag for enum itself, but reading this I
> realized that we don't really describe the meaning of kind_flag for
> different kinds. Should it be done here?

We have detailed description in Documentation/bpf/btf.rst.
Hopefully it will be enough if people wants to understand
what kflag means for each kind.

> 
>>           */
>>          __u32 info;
>> -       /* "size" is used by INT, ENUM, STRUCT, UNION and DATASEC.
>> +       /* "size" is used by INT, ENUM, STRUCT, UNION, DATASEC and ENUM64.
>>           * "size" tells the size of the type it is describing.
>>           *
>>           * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
>> @@ -63,7 +63,7 @@ enum {
>>          BTF_KIND_ARRAY          = 3,    /* Array        */
>>          BTF_KIND_STRUCT         = 4,    /* Struct       */
>>          BTF_KIND_UNION          = 5,    /* Union        */
>> -       BTF_KIND_ENUM           = 6,    /* Enumeration  */
>> +       BTF_KIND_ENUM           = 6,    /* Enumeration for int/unsigned int values */
> 
> nit: "Enumeration for up to 32-bit values" ?

This should work.

> 
>>          BTF_KIND_FWD            = 7,    /* Forward      */
>>          BTF_KIND_TYPEDEF        = 8,    /* Typedef      */
>>          BTF_KIND_VOLATILE       = 9,    /* Volatile     */
>> @@ -76,6 +76,7 @@ enum {
>>          BTF_KIND_FLOAT          = 16,   /* Floating point       */
>>          BTF_KIND_DECL_TAG       = 17,   /* Decl Tag */
>>          BTF_KIND_TYPE_TAG       = 18,   /* Type Tag */
>> +       BTF_KIND_ENUM64         = 19,   /* Enumeration for long/unsigned long values */
> 
> and then "for 64-bit values" (or maybe up to 64 bit values, but in
> practice we won't do that, right?)

We can do "up to 64-bit values". In practice, from llvm and pahole,
we will only encode 64-bit values in ENUM64.

> 
>>
>>          NR_BTF_KINDS,
>>          BTF_KIND_MAX            = NR_BTF_KINDS - 1,
>> @@ -186,4 +187,14 @@ struct btf_decl_tag {
>>          __s32   component_idx;
>>   };
>>
> 
> [...]
> 
>> @@ -3716,7 +3721,8 @@ static s32 btf_enum_check_meta(struct btf_verifier_env *env,
>>
>>                  if (env->log.level == BPF_LOG_KERNEL)
>>                          continue;
>> -               btf_verifier_log(env, "\t%s val=%d\n",
>> +               fmt_str = btf_type_kflag(t) ? "\t%s val=%u\n" : "\t%s val=%d\n";
>> +               btf_verifier_log(env, fmt_str,
>>                                   __btf_name_by_offset(btf, enums[i].name_off),
>>                                   enums[i].val);
>>          }
>> @@ -3757,7 +3763,10 @@ static void btf_enum_show(const struct btf *btf, const struct btf_type *t,
>>                  return;
>>          }
>>
>> -       btf_show_type_value(show, "%d", v);
>> +       if (btf_type_kflag(t))
> 
> libbpf's assumption right now and most common case is unsigned enum,
> so it seems more desirable to have kflag == 0 mean unsigned, with
> kflag == 1 being signed. It will make most existing enum definitions
> not change but also make it easy for libbpf's btf_dumper to use kflag
> if it's set, but otherwise have the same unsigned printing whether
> enum is really unsigned or Clang is too old to emit the kflag for
> enum. WDYT?

Right, libbpf assumption is unsigned enum and the kernel prints as 
signed. I agree that default unsigned should cover more cases.
Will change that in the next revision.

> 
>> +               btf_show_type_value(show, "%u", v);
>> +       else
>> +               btf_show_type_value(show, "%d", v);
> 
> you didn't got with ternary operator for fmt string selector like in
> previous case? I have mild preference for keeping it consistent (and
> keeping btf_type_kflag(t) ? "fmt1" : "fmt2" inline)

The reason I didn't do it is the line is a little long.
But I can do it.

> 
>>          btf_show_end_type(show);
>>   }
>>
>> @@ -3770,6 +3779,109 @@ static struct btf_kind_operations enum_ops = {
>>          .show = btf_enum_show,
>>   };
>>
>> +static s32 btf_enum64_check_meta(struct btf_verifier_env *env,
>> +                                const struct btf_type *t,
>> +                                u32 meta_left)
>> +{
>> +       const struct btf_enum64 *enums = btf_type_enum64(t);
>> +       struct btf *btf = env->btf;
>> +       const char *fmt_str;
>> +       u16 i, nr_enums;
>> +       u32 meta_needed;
>> +
>> +       nr_enums = btf_type_vlen(t);
>> +       meta_needed = nr_enums * sizeof(*enums);
>> +
>> +       if (meta_left < meta_needed) {
>> +               btf_verifier_log_basic(env, t,
>> +                                      "meta_left:%u meta_needed:%u",
>> +                                      meta_left, meta_needed);
>> +               return -EINVAL;
>> +       }
>> +
>> +       if (t->size != 8) {
> 
> technically there is nothing wrong with using enum64 for smaller
> sizes, right? Any particular reason to prevent this? We can just
> define that 64-bit value is sign-extended if enum is signed and has
> size < 8?

My original idea is to support 64-bit enum only for ENUM64 kind.
But it is certainly possible to encode 32-bit enums as well for
ENUM64. So I will remove this restriction.

The dwarf only generates sizes 4 (for up-to 32 bit values)
and 8 (for 64 bit values). But BTF_KIND_ENUM supports 1/2/4/8
sizes, so BTF_KIND_ENUM64 will also support 1/2/4/8 sizes.

> 
>> +               btf_verifier_log_type(env, t, "Unexpected size");
>> +               return -EINVAL;
>> +       }
>> +
>> +       /* enum type either no name or a valid one */
>> +       if (t->name_off &&
>> +           !btf_name_valid_identifier(env->btf, t->name_off)) {
>> +               btf_verifier_log_type(env, t, "Invalid name");
>> +               return -EINVAL;
>> +       }
>> +
> 
> [...]