All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate
@ 2023-05-31 20:19 Alan Maguire
  2023-05-31 20:19 ` [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI Alan Maguire
                   ` (8 more replies)
  0 siblings, 9 replies; 35+ messages in thread
From: Alan Maguire @ 2023-05-31 20:19 UTC (permalink / raw)
  To: ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, quentin, mykolal, bpf, Alan Maguire

BTF kind metadata provides information to parse BTF kinds.
By separating parsing BTF from using all the information
it provides, we allow BTF to encode new features even if
they cannot be used.  This is helpful in particular for
cases where newer tools for BTF generation run on an
older kernel; BTF kinds may be present that the kernel
cannot yet use, but at least it can parse the BTF
provided.  Meanwhile userspace tools with newer libbpf
may be able to use the newer information.

The intent is to support encoding of kind metadata
optionally so that tools like pahole can add this
information.  So for each kind we record

- a kind name string
- kind-related flags
- length of singular element following struct btf_type
- length of each of the btf_vlen() elements following

In addition we make space in the metadata for
CRC32s computed over the BTF along with a CRC for
the base BTF; this allows split BTF to identify
a mismatch explicitly.

The ideas here were discussed at [1]. One additional
change is here that I believe will help BTF debugging;
support for adding a description string to BTF
metadata.  It can be used by pahole to describe
the version used, options etc.

Future work can take more advantage of these features
such as

- using base CRC to identify base/module BTF mismatch
  explicitly
- using absence of a base BTF CRC as evidence that
  BTF is standalone

...and new BTF kind addition should present less
trouble, provided the kinds are optional.  BTF
parsing _will_ still fail if a non-optional
kind is encountered, as the assumption is that
such kinds are needed.  To take a few examples,
the tag kinds are optional, however enum64
is required, so BTF containing an enum64
(that did not know about enum64) would be
rejected.  This makes sense as if for example
a struct contained an enum64 we would not
be able to fully represent that struct unless
we knew about enum64s.

Patch 1 is the UAPI changes, patches 2-3 provide
libbpf support for handling and using metadata.
Patch 4 adds kernel support for handling and using
metadata.  Patch 5 adds libbpf support to add
metadata.  Patch 6 adds BTF encoding flag
--btf_gen_meta for kernel/module BTF encoding.
Patch 7 adds bpftool support to dump header
and metadata info.  Patch 8 is a selftest
that validates metadata encoding and tests
that an unknown (optional) kind can be skipped
over without BTF failure.

Finally patch 9 is a patch for dwarves
to use the libbpf APIs to encode metadata.
dwarves has to be built with the updated
libbpf for this to work.

Changes tested on x86_64, aarch64.  BTF metadata
validation likely needs some tidying up but
wanted to get it out ASAP so probably still
rough around the edges.

[1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/

Alan Maguire (8):
  btf: add kind metadata encoding to UAPI
  libbpf: support handling of metadata section in BTF
  libbpf: use metadata to compute an unknown kind size
  btf: support kernel parsing of BTF with metadata, use it to parse BTF
    with unknown kinds
  libbpf: add metadata encoding support
  btf: generate metadata for vmlinux/module BTF
  bpftool: add BTF dump "format meta" to dump header/metadata
  selftests/bpf: test kind encoding/decoding

 include/uapi/linux/btf.h                      |  29 ++
 kernel/bpf/btf.c                              | 102 +++++-
 scripts/pahole-flags.sh                       |   2 +-
 tools/bpf/bpftool/btf.c                       |  46 +++
 tools/include/uapi/linux/btf.h                |  29 ++
 tools/lib/bpf/btf.c                           | 297 +++++++++++++++---
 tools/lib/bpf/btf.h                           |  11 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/btf_kind.c       | 138 ++++++++
 9 files changed, 595 insertions(+), 60 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_kind.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
@ 2023-05-31 20:19 ` Alan Maguire
  2023-06-01  3:53   ` Alexei Starovoitov
  2023-05-31 20:19 ` [RFC bpf-next 2/8] libbpf: support handling of metadata section in BTF Alan Maguire
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Alan Maguire @ 2023-05-31 20:19 UTC (permalink / raw)
  To: ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, quentin, mykolal, bpf, Alan Maguire

BTF kind metadata provides information to parse BTF kinds.
By separating parsing BTF from using all the information
it provides, we allow BTF to encode new features even if
they cannot be used.  This is helpful in particular for
cases where newer tools for BTF generation run on an
older kernel; BTF kinds may be present that the kernel
cannot yet use, but at least it can parse the BTF
provided.  Meanwhile userspace tools with newer libbpf
may be able to use the newer information.

The intent is to support encoding of kind metadata
optionally so that tools like pahole can add this
information.  So for each kind we record

- a kind name string
- kind-related flags
- length of singular element following struct btf_type
- length of each of the btf_vlen() elements following

In addition we make space in the metadata for
CRC32s computed over the BTF along with a CRC for
the base BTF; this allows split BTF to identify
a mismatch explicitly.  Finally we provide an
offset for an optional description string.

The ideas here were discussed at [1] hence

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>

[1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/
---
 include/uapi/linux/btf.h       | 29 +++++++++++++++++++++++++++++
 tools/include/uapi/linux/btf.h | 29 +++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
index ec1798b6d3ff..94c1f4518249 100644
--- a/include/uapi/linux/btf.h
+++ b/include/uapi/linux/btf.h
@@ -8,6 +8,34 @@
 #define BTF_MAGIC	0xeB9F
 #define BTF_VERSION	1
 
+/* is this information required? If so it cannot be sanitized safely. */
+#define BTF_KIND_META_OPTIONAL		(1 << 0)
+
+struct btf_kind_meta {
+	__u32 name_off;		/* kind name string offset */
+	__u16 flags;		/* see BTF_KIND_META_* values above */
+	__u8 info_sz;		/* size of singular element after btf_type */
+	__u8 elem_sz;		/* size of each of btf_vlen(t) elements */
+};
+
+/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
+#define BTF_META_CRC_SET		(1 << 0)
+#define BTF_META_BASE_CRC_SET		(1 << 1)
+
+struct btf_metadata {
+	__u8	kind_meta_cnt;		/* number of struct btf_kind_meta */
+	__u32	flags;
+	__u32	description_off;	/* optional description string */
+	__u32	crc;			/* crc32 of BTF */
+	__u32	base_crc;		/* crc32 of base BTF */
+	struct btf_kind_meta kind_meta[];
+};
+
+struct btf_meta_header {
+	__u32	meta_off;	/* offset of metadata section */
+	__u32	meta_len;	/* length of metadata section */
+};
+
 struct btf_header {
 	__u16	magic;
 	__u8	version;
@@ -19,6 +47,7 @@ struct btf_header {
 	__u32	type_len;	/* length of type section	*/
 	__u32	str_off;	/* offset of string section	*/
 	__u32	str_len;	/* length of string section	*/
+	struct btf_meta_header meta_header;
 };
 
 /* Max # of type identifier */
diff --git a/tools/include/uapi/linux/btf.h b/tools/include/uapi/linux/btf.h
index ec1798b6d3ff..94c1f4518249 100644
--- a/tools/include/uapi/linux/btf.h
+++ b/tools/include/uapi/linux/btf.h
@@ -8,6 +8,34 @@
 #define BTF_MAGIC	0xeB9F
 #define BTF_VERSION	1
 
+/* is this information required? If so it cannot be sanitized safely. */
+#define BTF_KIND_META_OPTIONAL		(1 << 0)
+
+struct btf_kind_meta {
+	__u32 name_off;		/* kind name string offset */
+	__u16 flags;		/* see BTF_KIND_META_* values above */
+	__u8 info_sz;		/* size of singular element after btf_type */
+	__u8 elem_sz;		/* size of each of btf_vlen(t) elements */
+};
+
+/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
+#define BTF_META_CRC_SET		(1 << 0)
+#define BTF_META_BASE_CRC_SET		(1 << 1)
+
+struct btf_metadata {
+	__u8	kind_meta_cnt;		/* number of struct btf_kind_meta */
+	__u32	flags;
+	__u32	description_off;	/* optional description string */
+	__u32	crc;			/* crc32 of BTF */
+	__u32	base_crc;		/* crc32 of base BTF */
+	struct btf_kind_meta kind_meta[];
+};
+
+struct btf_meta_header {
+	__u32	meta_off;	/* offset of metadata section */
+	__u32	meta_len;	/* length of metadata section */
+};
+
 struct btf_header {
 	__u16	magic;
 	__u8	version;
@@ -19,6 +47,7 @@ struct btf_header {
 	__u32	type_len;	/* length of type section	*/
 	__u32	str_off;	/* offset of string section	*/
 	__u32	str_len;	/* length of string section	*/
+	struct btf_meta_header meta_header;
 };
 
 /* Max # of type identifier */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC bpf-next 2/8] libbpf: support handling of metadata section in BTF
  2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
  2023-05-31 20:19 ` [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI Alan Maguire
@ 2023-05-31 20:19 ` Alan Maguire
  2023-06-05 11:01   ` Jiri Olsa
  2023-05-31 20:19 ` [RFC bpf-next 3/8] libbpf: use metadata to compute an unknown kind size Alan Maguire
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Alan Maguire @ 2023-05-31 20:19 UTC (permalink / raw)
  To: ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, quentin, mykolal, bpf, Alan Maguire

support reading in metadata, fixing endian issues on reading;
also support writing metadata section to raw BTF object.
There is not yet an API to populate the metadata with meaningful
information.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/btf.c | 141 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 111 insertions(+), 30 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 8484b563b53d..036dc1505969 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -16,6 +16,7 @@
 #include <linux/err.h>
 #include <linux/btf.h>
 #include <gelf.h>
+#include <zlib.h>
 #include "btf.h"
 #include "bpf.h"
 #include "libbpf.h"
@@ -39,36 +40,40 @@ struct btf {
 
 	/*
 	 * When BTF is loaded from an ELF or raw memory it is stored
-	 * in a contiguous memory block. The hdr, type_data, and, strs_data
-	 * point inside that memory region to their respective parts of BTF
-	 * representation:
+	 * in a contiguous memory block. The hdr, type_data, strs_data,
+	 * and optional meta_data point inside that memory region to their
+	 * respective parts of BTF representation:
 	 *
-	 * +--------------------------------+
-	 * |  Header  |  Types  |  Strings  |
-	 * +--------------------------------+
-	 * ^          ^         ^
-	 * |          |         |
-	 * hdr        |         |
-	 * types_data-+         |
-	 * strs_data------------+
+	 * +--------------------------------+----------+
+	 * |  Header  |  Types  |  Strings  | Metadata |
+	 * +--------------------------------+----------+
+	 * ^          ^         ^           ^
+	 * |          |         |           |
+	 * hdr        |         |           |
+	 * types_data-+         |           |
+	 * strs_data------------+           |
+	 * meta_data------------------------+
+	 *
+	 * meta_data is optional.
 	 *
 	 * If BTF data is later modified, e.g., due to types added or
 	 * removed, BTF deduplication performed, etc, this contiguous
-	 * representation is broken up into three independently allocated
-	 * memory regions to be able to modify them independently.
+	 * representation is broken up into three or four independently
+	 * allocated memory regions to be able to modify them independently.
 	 * raw_data is nulled out at that point, but can be later allocated
 	 * and cached again if user calls btf__raw_data(), at which point
-	 * raw_data will contain a contiguous copy of header, types, and
-	 * strings:
+	 * raw_data will contain a contiguous copy of header, types, strings
+	 * and (again optionally) metadata:
 	 *
-	 * +----------+  +---------+  +-----------+
-	 * |  Header  |  |  Types  |  |  Strings  |
-	 * +----------+  +---------+  +-----------+
-	 * ^             ^            ^
-	 * |             |            |
-	 * hdr           |            |
-	 * types_data----+            |
-	 * strset__data(strs_set)-----+
+	 * +----------+  +---------+  +-----------+  +----------+
+	 * |  Header  |  |  Types  |  |  Strings  |  | Metadata |
+	 * +----------+  +---------+  +-----------+  +---------_+
+	 * ^             ^            ^              ^
+	 * |             |            |              |
+	 * hdr           |            |              |
+	 * types_data----+            |              |
+	 * strset__data(strs_set)-----+              |
+	 * meta_data---------------------------------+
 	 *
 	 *               +----------+---------+-----------+
 	 *               |  Header  |  Types  |  Strings  |
@@ -116,6 +121,8 @@ struct btf {
 	/* whether strings are already deduplicated */
 	bool strs_deduped;
 
+	void *meta_data;
+
 	/* BTF object FD, if loaded into kernel */
 	int fd;
 
@@ -215,6 +222,11 @@ static void btf_bswap_hdr(struct btf_header *h)
 	h->type_len = bswap_32(h->type_len);
 	h->str_off = bswap_32(h->str_off);
 	h->str_len = bswap_32(h->str_len);
+	if (h->hdr_len >= sizeof(struct btf_header)) {
+		h->meta_header.meta_off = bswap_32(h->meta_header.meta_off);
+		h->meta_header.meta_len = bswap_32(h->meta_header.meta_len);
+	}
+
 }
 
 static int btf_parse_hdr(struct btf *btf)
@@ -222,14 +234,17 @@ static int btf_parse_hdr(struct btf *btf)
 	struct btf_header *hdr = btf->hdr;
 	__u32 meta_left;
 
-	if (btf->raw_size < sizeof(struct btf_header)) {
+	if (btf->raw_size < sizeof(struct btf_header) - sizeof(struct btf_meta_header)) {
 		pr_debug("BTF header not found\n");
 		return -EINVAL;
 	}
 
 	if (hdr->magic == bswap_16(BTF_MAGIC)) {
+		int swapped_len = bswap_32(hdr->hdr_len);
+
 		btf->swapped_endian = true;
-		if (bswap_32(hdr->hdr_len) != sizeof(struct btf_header)) {
+		if (swapped_len != sizeof(struct btf_header) &&
+		    swapped_len != sizeof(struct btf_header) - sizeof(struct btf_meta_header)) {
 			pr_warn("Can't load BTF with non-native endianness due to unsupported header length %u\n",
 				bswap_32(hdr->hdr_len));
 			return -ENOTSUP;
@@ -285,6 +300,42 @@ static int btf_parse_str_sec(struct btf *btf)
 	return 0;
 }
 
+static void btf_bswap_meta(struct btf_metadata *meta, int len)
+{
+	struct btf_kind_meta *m = &meta->kind_meta[0];
+	struct btf_kind_meta *end = (void *)meta + len;
+
+	meta->flags = bswap_32(meta->flags);
+	meta->crc = bswap_32(meta->crc);
+	meta->base_crc = bswap_32(meta->base_crc);
+	meta->description_off = bswap_32(meta->description_off);
+
+	while (m < end) {
+		m->name_off = bswap_32(m->name_off);
+		m->flags = bswap_16(m->flags);
+		m++;
+	}
+}
+
+static int btf_parse_meta_sec(struct btf *btf)
+{
+	const struct btf_header *hdr = btf->hdr;
+
+	if (hdr->hdr_len < sizeof(struct btf_header) ||
+	    !hdr->meta_header.meta_off || !hdr->meta_header.meta_len)
+		return 0;
+	if (hdr->meta_header.meta_len < sizeof(struct btf_metadata)) {
+		pr_debug("Invalid BTF metadata section\n");
+		return -EINVAL;
+	}
+	btf->meta_data = btf->raw_data + btf->hdr->hdr_len + btf->hdr->meta_header.meta_off;
+
+	if (btf->swapped_endian)
+		btf_bswap_meta(btf->meta_data, hdr->meta_header.meta_len);
+
+	return 0;
+}
+
 static int btf_type_size(const struct btf_type *t)
 {
 	const int base_size = sizeof(struct btf_type);
@@ -904,6 +955,7 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf)
 	err = err ?: btf_parse_type_sec(btf);
 	if (err)
 		goto done;
+	err = btf_parse_meta_sec(btf);
 
 done:
 	if (err) {
@@ -1267,6 +1319,11 @@ static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endi
 	}
 
 	data_sz = hdr->hdr_len + hdr->type_len + hdr->str_len;
+	if (btf->meta_data) {
+		data_sz = roundup(data_sz, 8);
+		data_sz += hdr->meta_header.meta_len;
+		hdr->meta_header.meta_off = roundup(hdr->type_len + hdr->str_len, 8);
+	}
 	data = calloc(1, data_sz);
 	if (!data)
 		return NULL;
@@ -1293,8 +1350,21 @@ static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endi
 	p += hdr->type_len;
 
 	memcpy(p, btf_strs_data(btf), hdr->str_len);
-	p += hdr->str_len;
+	/* round up to 8 byte alignment to match offset above */
+	p = data + hdr->hdr_len + roundup(hdr->type_len + hdr->str_len, 8);
+
+	if (btf->meta_data) {
+		struct btf_metadata *meta = p;
 
+		memcpy(p, btf->meta_data, hdr->meta_header.meta_len);
+		if (!swap_endian) {
+			meta->crc = crc32(0L, (const Bytef *)&data, sizeof(data));
+			meta->flags |= BTF_META_CRC_SET;
+		}
+		if (swap_endian)
+			btf_bswap_meta(p, hdr->meta_header.meta_len);
+		p += hdr->meta_header.meta_len;
+	}
 	*size = data_sz;
 	return data;
 err_out:
@@ -1425,13 +1495,13 @@ static void btf_invalidate_raw_data(struct btf *btf)
 	}
 }
 
-/* Ensure BTF is ready to be modified (by splitting into a three memory
- * regions for header, types, and strings). Also invalidate cached
- * raw_data, if any.
+/* Ensure BTF is ready to be modified (by splitting into a three or four memory
+ * regions for header, types, strings and optional metadata). Also invalidate
+ * cached raw_data, if any.
  */
 static int btf_ensure_modifiable(struct btf *btf)
 {
-	void *hdr, *types;
+	void *hdr, *types, *meta = NULL;
 	struct strset *set = NULL;
 	int err = -ENOMEM;
 
@@ -1446,9 +1516,17 @@ static int btf_ensure_modifiable(struct btf *btf)
 	types = malloc(btf->hdr->type_len);
 	if (!hdr || !types)
 		goto err_out;
+	if (btf->hdr->hdr_len >= sizeof(struct btf_header)  &&
+	    btf->hdr->meta_header.meta_off && btf->hdr->meta_header.meta_len) {
+		meta = calloc(1, btf->hdr->meta_header.meta_len);
+		if (!meta)
+			goto err_out;
+	}
 
 	memcpy(hdr, btf->hdr, btf->hdr->hdr_len);
 	memcpy(types, btf->types_data, btf->hdr->type_len);
+	if (meta)
+		memcpy(meta, btf->meta_data, btf->hdr->meta_header.meta_len);
 
 	/* build lookup index for all strings */
 	set = strset__new(BTF_MAX_STR_OFFSET, btf->strs_data, btf->hdr->str_len);
@@ -1463,6 +1541,8 @@ static int btf_ensure_modifiable(struct btf *btf)
 	btf->types_data_cap = btf->hdr->type_len;
 	btf->strs_data = NULL;
 	btf->strs_set = set;
+	btf->meta_data = meta;
+
 	/* if BTF was created from scratch, all strings are guaranteed to be
 	 * unique and deduplicated
 	 */
@@ -1480,6 +1560,7 @@ static int btf_ensure_modifiable(struct btf *btf)
 	strset__free(set);
 	free(hdr);
 	free(types);
+	free(meta);
 	return err;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC bpf-next 3/8] libbpf: use metadata to compute an unknown kind size
  2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
  2023-05-31 20:19 ` [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI Alan Maguire
  2023-05-31 20:19 ` [RFC bpf-next 2/8] libbpf: support handling of metadata section in BTF Alan Maguire
@ 2023-05-31 20:19 ` Alan Maguire
  2023-05-31 20:19 ` [RFC bpf-next 4/8] btf: support kernel parsing of BTF with metadata, use it to parse BTF with unknown kinds Alan Maguire
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Alan Maguire @ 2023-05-31 20:19 UTC (permalink / raw)
  To: ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, quentin, mykolal, bpf, Alan Maguire

This allows BTF parsing to proceed even if we do not know the
kind.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/btf.c | 50 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 40 insertions(+), 10 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 036dc1505969..77a072716d58 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -336,7 +336,40 @@ static int btf_parse_meta_sec(struct btf *btf)
 	return 0;
 }
 
-static int btf_type_size(const struct btf_type *t)
+/* for unknown kinds, consult kind metadata. */
+static int btf_type_size_unknown(const struct btf *btf, const struct btf_type *t)
+{
+	int size = sizeof(struct btf_type);
+	struct btf_kind_meta *m = NULL;
+	__u16 vlen = btf_vlen(t);
+	__u8 kind = btf_kind(t);
+
+	if (btf->meta_data) {
+		struct btf_metadata *meta = btf->meta_data;
+
+		if (kind < meta->kind_meta_cnt)
+			m = &meta->kind_meta[kind];
+	}
+
+	if (!m || (void *)m > (btf->meta_data + btf->hdr->meta_header.meta_len)) {
+		pr_debug("Unsupported BTF_KIND: %u\n", btf_kind(t));
+		return -EINVAL;
+	}
+
+	if (!(m->flags & BTF_KIND_META_OPTIONAL)) {
+		/* a required kind, and we do not know about it.. */
+		pr_debug("unknown but required kind: %s(%u)\n",
+			 btf__name_by_offset(btf, m->name_off), kind);
+		return -EINVAL;
+	}
+
+	size += m->info_sz;
+	size += vlen * m->elem_sz;
+
+	return size;
+}
+
+static int btf_type_size(const struct btf *btf, const struct btf_type *t)
 {
 	const int base_size = sizeof(struct btf_type);
 	__u16 vlen = btf_vlen(t);
@@ -372,8 +405,7 @@ static int btf_type_size(const struct btf_type *t)
 	case BTF_KIND_DECL_TAG:
 		return base_size + sizeof(struct btf_decl_tag);
 	default:
-		pr_debug("Unsupported BTF_KIND:%u\n", btf_kind(t));
-		return -EINVAL;
+		return btf_type_size_unknown(btf, t);
 	}
 }
 
@@ -472,7 +504,7 @@ static int btf_parse_type_sec(struct btf *btf)
 		if (btf->swapped_endian)
 			btf_bswap_type_base(next_type);
 
-		type_size = btf_type_size(next_type);
+		type_size = btf_type_size(btf, next_type);
 		if (type_size < 0)
 			return type_size;
 		if (next_type + type_size > end_type) {
@@ -952,10 +984,8 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf)
 	btf->types_data = btf->raw_data + btf->hdr->hdr_len + btf->hdr->type_off;
 
 	err = btf_parse_str_sec(btf);
+	err = err ?: btf_parse_meta_sec(btf);
 	err = err ?: btf_parse_type_sec(btf);
-	if (err)
-		goto done;
-	err = btf_parse_meta_sec(btf);
 
 done:
 	if (err) {
@@ -1687,7 +1717,7 @@ int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_t
 	struct btf_type *t;
 	int sz, err;
 
-	sz = btf_type_size(src_type);
+	sz = btf_type_size(src_btf, src_type);
 	if (sz < 0)
 		return libbpf_err(sz);
 
@@ -1768,7 +1798,7 @@ int btf__add_btf(struct btf *btf, const struct btf *src_btf)
 	memcpy(t, src_btf->types_data, data_sz);
 
 	for (i = 0; i < cnt; i++) {
-		sz = btf_type_size(t);
+		sz = btf_type_size(src_btf, t);
 		if (sz < 0) {
 			/* unlikely, has to be corrupted src_btf */
 			err = sz;
@@ -4764,7 +4794,7 @@ static int btf_dedup_compact_types(struct btf_dedup *d)
 			continue;
 
 		t = btf__type_by_id(d->btf, id);
-		len = btf_type_size(t);
+		len = btf_type_size(d->btf, t);
 		if (len < 0)
 			return len;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC bpf-next 4/8] btf: support kernel parsing of BTF with metadata, use it to parse BTF with unknown kinds
  2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
                   ` (2 preceding siblings ...)
  2023-05-31 20:19 ` [RFC bpf-next 3/8] libbpf: use metadata to compute an unknown kind size Alan Maguire
@ 2023-05-31 20:19 ` Alan Maguire
  2023-06-07 19:51   ` Eduard Zingerman
  2023-05-31 20:19 ` [RFC bpf-next 5/8] libbpf: add metadata encoding support Alan Maguire
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Alan Maguire @ 2023-05-31 20:19 UTC (permalink / raw)
  To: ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, quentin, mykolal, bpf, Alan Maguire

Validate metadata if present, and use it to parse BTF with
unrecognized kinds. Reject BTF that contains a type
of a kind that is not optional.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 kernel/bpf/btf.c | 102 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 85 insertions(+), 17 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index bd2cac057928..67f42d9ce099 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -257,6 +257,7 @@ struct btf {
 	struct btf_kfunc_set_tab *kfunc_set_tab;
 	struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;
 	struct btf_struct_metas *struct_meta_tab;
+	struct btf_metadata *meta_data;
 
 	/* split BTF support */
 	struct btf *base_btf;
@@ -4965,22 +4966,41 @@ static s32 btf_check_meta(struct btf_verifier_env *env,
 		return -EINVAL;
 	}
 
-	if (BTF_INFO_KIND(t->info) > BTF_KIND_MAX ||
-	    BTF_INFO_KIND(t->info) == BTF_KIND_UNKN) {
-		btf_verifier_log(env, "[%u] Invalid kind:%u",
-				 env->log_type_id, BTF_INFO_KIND(t->info));
-		return -EINVAL;
-	}
-
 	if (!btf_name_offset_valid(env->btf, t->name_off)) {
 		btf_verifier_log(env, "[%u] Invalid name_offset:%u",
 				 env->log_type_id, t->name_off);
 		return -EINVAL;
 	}
 
-	var_meta_size = btf_type_ops(t)->check_meta(env, t, meta_left);
-	if (var_meta_size < 0)
-		return var_meta_size;
+	if (BTF_INFO_KIND(t->info) == BTF_KIND_UNKN) {
+		btf_verifier_log(env, "[%u] Invalid kind:%u",
+				 env->log_type_id, BTF_INFO_KIND(t->info));
+		return -EINVAL;
+	}
+
+	if (BTF_INFO_KIND(t->info) > BTF_KIND_MAX && env->btf->meta_data &&
+	    BTF_INFO_KIND(t->info) < env->btf->meta_data->kind_meta_cnt) {
+		struct btf_kind_meta *m = &env->btf->meta_data->kind_meta[BTF_INFO_KIND(t->info)];
+
+		if (!(m->flags & BTF_KIND_META_OPTIONAL)) {
+			btf_verifier_log(env, "[%u] unknown but required kind '%s'(%u)",
+					 env->log_type_id,
+					 btf_name_by_offset(env->btf, m->name_off),
+					 BTF_INFO_KIND(t->info));
+			return -EINVAL;
+		}
+		var_meta_size = sizeof(struct btf_type);
+		var_meta_size += m->info_sz + (btf_type_vlen(t) * m->elem_sz);
+	} else {
+		if (BTF_INFO_KIND(t->info) > BTF_KIND_MAX) {
+			btf_verifier_log(env, "[%u] Invalid kind:%u",
+					 env->log_type_id, BTF_INFO_KIND(t->info));
+			return -EINVAL;
+		}
+		var_meta_size = btf_type_ops(t)->check_meta(env, t, meta_left);
+		if (var_meta_size < 0)
+			return var_meta_size;
+	}
 
 	meta_left -= var_meta_size;
 
@@ -5155,7 +5175,8 @@ static int btf_parse_str_sec(struct btf_verifier_env *env)
 	start = btf->nohdr_data + hdr->str_off;
 	end = start + hdr->str_len;
 
-	if (end != btf->data + btf->data_size) {
+	if (hdr->hdr_len < sizeof(struct btf_header) &&
+	    end != btf->data + btf->data_size) {
 		btf_verifier_log(env, "String section is not at the end");
 		return -EINVAL;
 	}
@@ -5176,9 +5197,47 @@ static int btf_parse_str_sec(struct btf_verifier_env *env)
 	return 0;
 }
 
+static int btf_parse_meta_sec(struct btf_verifier_env *env)
+{
+	const struct btf_header *hdr = &env->btf->hdr;
+	struct btf *btf = env->btf;
+	void *start, *end;
+
+	if (hdr->hdr_len < sizeof(struct btf_header) ||
+	    hdr->meta_header.meta_len == 0)
+		return 0;
+
+	/* Meta section must align to 8 bytes */
+	if (hdr->meta_header.meta_off & (sizeof(u64) - 1)) {
+		btf_verifier_log(env, "Unaligned meta_off");
+		return -EINVAL;
+	}
+	start = btf->nohdr_data + hdr->meta_header.meta_off;
+	end = start + hdr->meta_header.meta_len;
+
+	if (hdr->meta_header.meta_len < sizeof(struct btf_meta_header)) {
+		btf_verifier_log(env, "Metadata section is too small");
+		return -EINVAL;
+	}
+	if (end != btf->data + btf->data_size) {
+		btf_verifier_log(env, "Metadata section is not at the end");
+		return -EINVAL;
+	}
+	btf->meta_data = start;
+
+	if (hdr->meta_header.meta_len != sizeof(struct btf_metadata) +
+					(btf->meta_data->kind_meta_cnt *
+					 sizeof(struct btf_kind_meta))) {
+		btf_verifier_log(env, "Metadata section size mismatch");
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static const size_t btf_sec_info_offset[] = {
 	offsetof(struct btf_header, type_off),
 	offsetof(struct btf_header, str_off),
+	offsetof(struct btf_header, meta_header.meta_off),
 };
 
 static int btf_sec_info_cmp(const void *a, const void *b)
@@ -5193,15 +5252,19 @@ static int btf_check_sec_info(struct btf_verifier_env *env,
 			      u32 btf_data_size)
 {
 	struct btf_sec_info secs[ARRAY_SIZE(btf_sec_info_offset)];
-	u32 total, expected_total, i;
+	u32 nr_secs = ARRAY_SIZE(btf_sec_info_offset);
+	u32 total, expected_total, gap, i;
 	const struct btf_header *hdr;
 	const struct btf *btf;
 
 	btf = env->btf;
 	hdr = &btf->hdr;
 
+	if (hdr->hdr_len < sizeof(struct btf_header))
+		nr_secs--;
+
 	/* Populate the secs from hdr */
-	for (i = 0; i < ARRAY_SIZE(btf_sec_info_offset); i++)
+	for (i = 0; i < nr_secs; i++)
 		secs[i] = *(struct btf_sec_info *)((void *)hdr +
 						   btf_sec_info_offset[i]);
 
@@ -5216,9 +5279,10 @@ static int btf_check_sec_info(struct btf_verifier_env *env,
 			btf_verifier_log(env, "Invalid section offset");
 			return -EINVAL;
 		}
-		if (total < secs[i].off) {
-			/* gap */
-			btf_verifier_log(env, "Unsupported section found");
+		gap = secs[i].off - total;
+		if (gap >= 8) {
+			/* gap larger than alignment gap */
+			btf_verifier_log(env, "Unsupported section gap found");
 			return -EINVAL;
 		}
 		if (total > secs[i].off) {
@@ -5230,7 +5294,7 @@ static int btf_check_sec_info(struct btf_verifier_env *env,
 					 "Total section length too long");
 			return -EINVAL;
 		}
-		total += secs[i].len;
+		total += secs[i].len + gap;
 	}
 
 	/* There is data other than hdr and known sections */
@@ -5530,6 +5594,10 @@ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uat
 	if (err)
 		goto errout;
 
+	err = btf_parse_meta_sec(env);
+	if (err)
+		goto errout;
+
 	err = btf_parse_type_sec(env);
 	if (err)
 		goto errout;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC bpf-next 5/8] libbpf: add metadata encoding support
  2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
                   ` (3 preceding siblings ...)
  2023-05-31 20:19 ` [RFC bpf-next 4/8] btf: support kernel parsing of BTF with metadata, use it to parse BTF with unknown kinds Alan Maguire
@ 2023-05-31 20:19 ` Alan Maguire
  2023-05-31 20:19 ` [RFC bpf-next 6/8] btf: generate metadata for vmlinux/module BTF Alan Maguire
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Alan Maguire @ 2023-05-31 20:19 UTC (permalink / raw)
  To: ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, quentin, mykolal, bpf, Alan Maguire

Support encoding of BTF metadata via btf__new_empty_opts().
Current supported opts are base_btf, add_meta and description
for metadata.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/btf.c      | 108 +++++++++++++++++++++++++++++++++++++--
 tools/lib/bpf/btf.h      |  11 ++++
 tools/lib/bpf/libbpf.map |   1 +
 3 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 77a072716d58..4b85325336b4 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -122,6 +122,7 @@ struct btf {
 	bool strs_deduped;
 
 	void *meta_data;
+	size_t meta_sz;
 
 	/* BTF object FD, if loaded into kernel */
 	int fd;
@@ -896,8 +897,84 @@ void btf__free(struct btf *btf)
 	free(btf);
 }
 
-static struct btf *btf_new_empty(struct btf *base_btf)
+static void __btf_add_kind_meta(struct btf *btf, __u8 kind, const char *name,
+				__u16 flags, __u8 info_sz, __u8 elem_sz)
 {
+	struct btf_metadata *meta = btf->meta_data;
+	struct btf_kind_meta *kind_meta = &meta->kind_meta[kind];
+
+	kind_meta->name_off = btf__add_str(btf, name);
+	kind_meta->flags = flags;
+	kind_meta->info_sz = info_sz;
+	kind_meta->elem_sz = elem_sz;
+}
+
+#define btf_add_kind_meta(btf, kind, flags, info_sz, elem_sz)	\
+	__btf_add_kind_meta(btf, kind, #kind, flags, info_sz, elem_sz)
+
+static int btf_ensure_modifiable(struct btf *btf);
+
+static int btf_add_meta(struct btf *btf, struct btf_new_opts *opts)
+{
+	const char *description = OPTS_GET(opts, description, NULL);
+	struct btf_metadata *meta;
+
+	if (btf_ensure_modifiable(btf))
+		return libbpf_err(-ENOMEM);
+
+	btf->meta_sz = sizeof(struct btf_metadata) +
+		       (NR_BTF_KINDS * sizeof(struct btf_kind_meta));
+	btf->meta_data = calloc(1, btf->meta_sz);
+
+	if (!btf->meta_data) {
+		btf->meta_sz = 0;
+		return -ENOMEM;
+	}
+
+	meta = btf->meta_data;
+
+	if (btf->base_btf) {
+		struct btf_metadata *base_meta = btf->base_btf->meta_data;
+
+		if (base_meta && (base_meta->flags & BTF_META_CRC_SET)) {
+			meta->base_crc =  base_meta->crc;
+			meta->flags |= BTF_META_BASE_CRC_SET;
+		}
+	}
+
+	if (description)
+		meta->description_off = btf__add_str(btf, description);
+
+	meta->kind_meta_cnt = NR_BTF_KINDS;
+
+	/* all supported kinds should describe their format here. */
+	btf_add_kind_meta(btf, BTF_KIND_UNKN, 0, 0, 0);
+	btf_add_kind_meta(btf, BTF_KIND_INT, 0, sizeof(__u32), 0);
+	btf_add_kind_meta(btf, BTF_KIND_PTR, 0, 0, 0);
+	btf_add_kind_meta(btf, BTF_KIND_ARRAY, 0, sizeof(struct btf_array), 0);
+	btf_add_kind_meta(btf, BTF_KIND_STRUCT, 0, 0, sizeof(struct btf_member));
+	btf_add_kind_meta(btf, BTF_KIND_UNION, 0, 0, sizeof(struct btf_member));
+	btf_add_kind_meta(btf, BTF_KIND_ENUM, 0, 0, sizeof(struct btf_enum));
+	btf_add_kind_meta(btf, BTF_KIND_FWD, 0, 0, 0);
+	btf_add_kind_meta(btf, BTF_KIND_TYPEDEF, 0, 0, 0);
+	btf_add_kind_meta(btf, BTF_KIND_VOLATILE, 0, 0, 0);
+	btf_add_kind_meta(btf, BTF_KIND_CONST, 0, 0, 0);
+	btf_add_kind_meta(btf, BTF_KIND_RESTRICT, 0, 0, 0);
+	btf_add_kind_meta(btf, BTF_KIND_FUNC, 0, 0, 0);
+	btf_add_kind_meta(btf, BTF_KIND_FUNC_PROTO, 0, 0, sizeof(struct btf_param));
+	btf_add_kind_meta(btf, BTF_KIND_VAR, 0, sizeof(struct btf_var), 0);
+	btf_add_kind_meta(btf, BTF_KIND_DATASEC, 0, 0, sizeof(struct btf_var_secinfo));
+	btf_add_kind_meta(btf, BTF_KIND_FLOAT, 0, 0, 0);
+	btf_add_kind_meta(btf, BTF_KIND_DECL_TAG, BTF_KIND_META_OPTIONAL,
+							sizeof(struct btf_decl_tag), 0);
+	btf_add_kind_meta(btf, BTF_KIND_TYPE_TAG, BTF_KIND_META_OPTIONAL, 0, 0);
+	btf_add_kind_meta(btf, BTF_KIND_ENUM64, 0, 0, sizeof(struct btf_enum64));
+	return 0;
+}
+
+static struct btf *btf_new_empty(struct btf_new_opts *opts)
+{
+	struct btf *base_btf = OPTS_GET(opts, base_btf, NULL);
 	struct btf *btf;
 
 	btf = calloc(1, sizeof(*btf));
@@ -934,17 +1011,42 @@ static struct btf *btf_new_empty(struct btf *base_btf)
 	btf->strs_data = btf->raw_data + btf->hdr->hdr_len;
 	btf->hdr->str_len = base_btf ? 0 : 1; /* empty string at offset 0 */
 
+	if (opts->add_meta) {
+		int err = btf_add_meta(btf, opts);
+
+		if (err) {
+			free(btf->raw_data);
+			free(btf);
+			return ERR_PTR(err);
+		}
+		btf->hdr->meta_header.meta_len = btf->meta_sz;
+	}
+
 	return btf;
 }
 
 struct btf *btf__new_empty(void)
 {
-	return libbpf_ptr(btf_new_empty(NULL));
+	LIBBPF_OPTS(btf_new_opts, opts);
+
+	return libbpf_ptr(btf_new_empty(&opts));
 }
 
 struct btf *btf__new_empty_split(struct btf *base_btf)
 {
-	return libbpf_ptr(btf_new_empty(base_btf));
+	LIBBPF_OPTS(btf_new_opts, opts);
+
+	opts.base_btf = base_btf;
+
+	return libbpf_ptr(btf_new_empty(&opts));
+}
+
+struct btf *btf__new_empty_opts(struct btf_new_opts *opts)
+{
+	if (!OPTS_VALID(opts, btf_new_opts))
+		return libbpf_err_ptr(-EINVAL);
+
+	return libbpf_ptr(btf_new_empty(opts));
 }
 
 static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf)
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 8e6880d91c84..6de445225f02 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -107,6 +107,17 @@ LIBBPF_API struct btf *btf__new_empty(void);
  */
 LIBBPF_API struct btf *btf__new_empty_split(struct btf *base_btf);
 
+struct btf_new_opts {
+	size_t sz;
+	struct btf *base_btf;	/* optional base BTF */
+	bool add_meta;		/* add BTF metadata about encoding */
+	const char *description;/* add description to metadata */
+	size_t :0;
+};
+#define btf_new_opts__last_field description
+
+LIBBPF_API struct btf *btf__new_empty_opts(struct btf_new_opts *opts);
+
 LIBBPF_API struct btf *btf__parse(const char *path, struct btf_ext **btf_ext);
 LIBBPF_API struct btf *btf__parse_split(const char *path, struct btf *base_btf);
 LIBBPF_API struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 7521a2fb7626..edd8be4b21d0 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -395,4 +395,5 @@ LIBBPF_1.2.0 {
 LIBBPF_1.3.0 {
 	global:
 		bpf_obj_pin_opts;
+		btf__new_empty_opts;
 } LIBBPF_1.2.0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC bpf-next 6/8] btf: generate metadata for vmlinux/module BTF
  2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
                   ` (4 preceding siblings ...)
  2023-05-31 20:19 ` [RFC bpf-next 5/8] libbpf: add metadata encoding support Alan Maguire
@ 2023-05-31 20:19 ` Alan Maguire
  2023-05-31 20:19 ` [RFC bpf-next 7/8] bpftool: add BTF dump "format meta" to dump header/metadata Alan Maguire
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Alan Maguire @ 2023-05-31 20:19 UTC (permalink / raw)
  To: ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, quentin, mykolal, bpf, Alan Maguire

Generate BTF metadata for kernel and module BTF.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 scripts/pahole-flags.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/pahole-flags.sh b/scripts/pahole-flags.sh
index 728d55190d97..28fa9120a770 100755
--- a/scripts/pahole-flags.sh
+++ b/scripts/pahole-flags.sh
@@ -24,7 +24,7 @@ if [ "${pahole_ver}" -ge "124" ]; then
 	extra_paholeopt="${extra_paholeopt} --lang_exclude=rust"
 fi
 if [ "${pahole_ver}" -ge "125" ]; then
-	extra_paholeopt="${extra_paholeopt} --skip_encoding_btf_inconsistent_proto --btf_gen_optimized"
+	extra_paholeopt="${extra_paholeopt} --skip_encoding_btf_inconsistent_proto --btf_gen_optimized --btf_gen_meta"
 fi
 
 echo ${extra_paholeopt}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC bpf-next 7/8] bpftool: add BTF dump "format meta" to dump header/metadata
  2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
                   ` (5 preceding siblings ...)
  2023-05-31 20:19 ` [RFC bpf-next 6/8] btf: generate metadata for vmlinux/module BTF Alan Maguire
@ 2023-05-31 20:19 ` Alan Maguire
  2023-06-01 16:33   ` Quentin Monnet
  2023-06-02 16:57   ` Andrii Nakryiko
  2023-05-31 20:19 ` [RFC bpf-next 8/8] selftests/bpf: test kind encoding/decoding Alan Maguire
  2023-05-31 20:19 ` [RFC dwarves] dwarves: encode BTF metadata if --btf_gen_meta is set Alan Maguire
  8 siblings, 2 replies; 35+ messages in thread
From: Alan Maguire @ 2023-05-31 20:19 UTC (permalink / raw)
  To: ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, quentin, mykolal, bpf, Alan Maguire

Provide a way to dump BTF header and metadata info via
bpftool; for example

$ bpftool btf dump file vmliux format meta
BTF: data size 4963656
Header: magic 0xeb9f, version 1, flags 0x0, hdr_len 32
Types: len 2927556, offset 0
Strings: len 2035881, offset 2927556
Metadata header found: len 184, offset 4963440, flags 0x1
Description: 'generated by dwarves v1.25'
CRC 0x6da2a930 ; base CRC 0x0
Kind metadata for 20 kinds:
       BTF_KIND_UNKN[ 0] flags 0x0    info_sz  0 elem_sz  0
        BTF_KIND_INT[ 1] flags 0x0    info_sz  4 elem_sz  0
        BTF_KIND_PTR[ 2] flags 0x0    info_sz  0 elem_sz  0
      BTF_KIND_ARRAY[ 3] flags 0x0    info_sz 12 elem_sz  0
     BTF_KIND_STRUCT[ 4] flags 0x0    info_sz  0 elem_sz 12
      BTF_KIND_UNION[ 5] flags 0x0    info_sz  0 elem_sz 12
       BTF_KIND_ENUM[ 6] flags 0x0    info_sz  0 elem_sz  8
        BTF_KIND_FWD[ 7] flags 0x0    info_sz  0 elem_sz  0
    BTF_KIND_TYPEDEF[ 8] flags 0x0    info_sz  0 elem_sz  0
   BTF_KIND_VOLATILE[ 9] flags 0x0    info_sz  0 elem_sz  0
      BTF_KIND_CONST[10] flags 0x0    info_sz  0 elem_sz  0
   BTF_KIND_RESTRICT[11] flags 0x0    info_sz  0 elem_sz  0
       BTF_KIND_FUNC[12] flags 0x0    info_sz  0 elem_sz  0
 BTF_KIND_FUNC_PROTO[13] flags 0x0    info_sz  0 elem_sz  8
        BTF_KIND_VAR[14] flags 0x0    info_sz  4 elem_sz  0
    BTF_KIND_DATASEC[15] flags 0x0    info_sz  0 elem_sz 12
      BTF_KIND_FLOAT[16] flags 0x0    info_sz  0 elem_sz  0
   BTF_KIND_DECL_TAG[17] flags 0x1    info_sz  4 elem_sz  0
   BTF_KIND_TYPE_TAG[18] flags 0x1    info_sz  0 elem_sz  0
     BTF_KIND_ENUM64[19] flags 0x0    info_sz  0 elem_sz 12

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/bpf/bpftool/btf.c | 46 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index 91fcb75babe3..da4257e00ba8 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -504,6 +504,47 @@ static int dump_btf_c(const struct btf *btf,
 	return err;
 }
 
+static int dump_btf_meta(const struct btf *btf)
+{
+	const struct btf_header *hdr;
+	const struct btf_metadata *m;
+	const void *data;
+	__u32 data_sz;
+	__u8 i;
+
+	data = btf__raw_data(btf, &data_sz);
+	if (!data)
+		return -ENOMEM;
+	hdr = data;
+	printf("BTF: data size %u\n", data_sz);
+	printf("Header: magic 0x%x, version %d, flags 0x%x, hdr_len %u\n",
+	       hdr->magic, hdr->version, hdr->flags, hdr->hdr_len);
+	printf("Types: len %u, offset %u\n", hdr->type_len, hdr->type_off);
+	printf("Strings: len %u, offset %u\n", hdr->str_len, hdr->str_off);
+
+	if (hdr->hdr_len < sizeof(struct btf_header) ||
+	    hdr->meta_header.meta_len == 0 ||
+	    hdr->meta_header.meta_off == 0)
+		return 0;
+
+	m = (void *)hdr + hdr->hdr_len + hdr->meta_header.meta_off;
+
+	printf("Metadata header found: len %u, offset %u, flags 0x%x\n",
+	       hdr->meta_header.meta_len, hdr->meta_header.meta_off, m->flags);
+	if (m->description_off)
+		printf("Description: '%s'\n", btf__name_by_offset(btf, m->description_off));
+	printf("CRC 0x%x ; base CRC 0x%x\n", m->crc, m->base_crc);
+	printf("Kind metadata for %d kinds:\n", m->kind_meta_cnt);
+	for (i = 0; i < m->kind_meta_cnt; i++) {
+		printf("%20s[%2d] flags 0x%-4x info_sz %2d elem_sz %2d\n",
+		       btf__name_by_offset(btf, m->kind_meta[i].name_off),
+		       i, m->kind_meta[i].flags, m->kind_meta[i].info_sz,
+		       m->kind_meta[i].elem_sz);
+	}
+
+	return 0;
+}
+
 static const char sysfs_vmlinux[] = "/sys/kernel/btf/vmlinux";
 
 static struct btf *get_vmlinux_btf_from_sysfs(void)
@@ -553,6 +594,7 @@ static int do_dump(int argc, char **argv)
 	__u32 root_type_ids[2];
 	int root_type_cnt = 0;
 	bool dump_c = false;
+	bool dump_meta = false;
 	__u32 btf_id = -1;
 	const char *src;
 	int fd = -1;
@@ -654,6 +696,8 @@ static int do_dump(int argc, char **argv)
 			}
 			if (strcmp(*argv, "c") == 0) {
 				dump_c = true;
+			} else if (strcmp(*argv, "meta") == 0) {
+				dump_meta = true;
 			} else if (strcmp(*argv, "raw") == 0) {
 				dump_c = false;
 			} else {
@@ -692,6 +736,8 @@ static int do_dump(int argc, char **argv)
 			goto done;
 		}
 		err = dump_btf_c(btf, root_type_ids, root_type_cnt);
+	} else if (dump_meta) {
+		err = dump_btf_meta(btf);
 	} else {
 		err = dump_btf_raw(btf, root_type_ids, root_type_cnt);
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC bpf-next 8/8] selftests/bpf: test kind encoding/decoding
  2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
                   ` (6 preceding siblings ...)
  2023-05-31 20:19 ` [RFC bpf-next 7/8] bpftool: add BTF dump "format meta" to dump header/metadata Alan Maguire
@ 2023-05-31 20:19 ` Alan Maguire
  2023-05-31 20:19 ` [RFC dwarves] dwarves: encode BTF metadata if --btf_gen_meta is set Alan Maguire
  8 siblings, 0 replies; 35+ messages in thread
From: Alan Maguire @ 2023-05-31 20:19 UTC (permalink / raw)
  To: ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, quentin, mykolal, bpf, Alan Maguire

verify btf__add_kinds() adds kind encodings for all kinds supported,
and after adding kind-related types for an unknown kind, ensure that
parsing uses this info when that kind is encountered rather than
giving up.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 .../selftests/bpf/prog_tests/btf_kind.c       | 138 ++++++++++++++++++
 1 file changed, 138 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_kind.c

diff --git a/tools/testing/selftests/bpf/prog_tests/btf_kind.c b/tools/testing/selftests/bpf/prog_tests/btf_kind.c
new file mode 100644
index 000000000000..a928415c60ff
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/btf_kind.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023, Oracle and/or its affiliates. */
+
+#include <test_progs.h>
+#include <bpf/btf.h>
+#include <bpf/libbpf.h>
+
+/* verify kind encoding exists for each kind */
+void test_btf_kind_encoding(struct btf *btf, char *description)
+{
+	const struct btf_header *hdr;
+	const struct btf_metadata *meta;
+	const void *raw_btf;
+	__u32 raw_size;
+	__u16 i;
+
+	raw_btf = btf__raw_data(btf, &raw_size);
+	if (!ASSERT_OK_PTR(raw_btf, "btf__raw_data"))
+		return;
+
+	hdr = raw_btf;
+	meta = raw_btf + hdr->hdr_len + hdr->meta_header.meta_off;
+
+	if (!ASSERT_EQ(meta->kind_meta_cnt, NR_BTF_KINDS, "unexpected kind_meta_cnt"))
+		return;
+
+	if (!ASSERT_EQ(strcmp(description, btf__name_by_offset(btf, meta->description_off)),
+		       0, "check meta description"))
+		return;
+
+	for (i = 0; i <= BTF_KIND_MAX; i++) {
+		const struct btf_kind_meta *k = &meta->kind_meta[i];
+
+		if (ASSERT_OK_PTR(btf__name_by_offset(btf, k->name_off), "kind_name_valid"))
+			return;
+	}
+}
+
+/* fabricate an unrecognized kind at BTF_KIND_MAX + 1, and after adding
+ * the appropriate struct/typedefs to the BTF such that it recognizes
+ * this kind, ensure that parsing of BTF containing the unrecognized kind
+ * can succeed.
+ */
+void test_btf_kind_decoding(struct btf *btf)
+{
+	__s32 int_id, unrec_id, id;
+	struct btf_type *t;
+	char btf_path[64];
+	const void *raw_btf;
+	void *new_raw_btf;
+	struct btf *new_btf;
+	struct btf_header *hdr;
+	struct btf_metadata *meta;
+	struct btf_kind_meta *k;
+	__u32 raw_size;
+	int fd;
+
+	int_id = btf__add_int(btf, "test_char", 1, BTF_INT_CHAR);
+	if (!ASSERT_GT(int_id, 0, "add_int_id"))
+		return;
+
+	/* now create our type with unrecognized kind by adding a typedef kind
+	 * we will overwrite it with our unrecognized kind value.
+	 */
+	unrec_id = btf__add_typedef(btf, "unrec_kind", int_id);
+	if (!ASSERT_GT(unrec_id, 0, "add_unrec_id"))
+		return;
+
+	/* add an id after it that we will look up to verify we can parse
+	 * beyond unrecognized kinds.
+	 */
+	id = btf__add_typedef(btf, "test_lookup", int_id);
+	if (!ASSERT_GT(id, 0, "add_test_lookup_id"))
+		return;
+
+	raw_btf = (void *)btf__raw_data(btf, &raw_size);
+	if (!ASSERT_OK_PTR(raw_btf, "btf__raw_data"))
+		return;
+
+	new_raw_btf = calloc(1, raw_size + sizeof(*k));
+	memcpy(new_raw_btf, raw_btf, raw_size);
+
+	/* add new metadata description */
+	hdr = new_raw_btf;
+	hdr->meta_header.meta_len += sizeof(*k);
+	meta = new_raw_btf + hdr->hdr_len + hdr->meta_header.meta_off;
+	meta->kind_meta_cnt += 1;
+	/* we will call our kinds UNKN, re-using the string offsets from BTF_KIND_UNKN */
+	k = &meta->kind_meta[NR_BTF_KINDS];
+	k->name_off = meta->kind_meta[0].name_off + strlen("BTF_KIND_");
+	k->flags = BTF_KIND_META_OPTIONAL;
+	k->info_sz = 0;
+	k->elem_sz = 0;
+
+	/* now modify our typedef added above to be an unrecognized kind. */
+	t = (void *)hdr + hdr->hdr_len + hdr->type_off + sizeof(struct btf_type) +
+		sizeof(__u32);
+	t->info = (NR_BTF_KINDS << 24);
+
+	/* now write our BTF to a raw file, ready for parsing. */
+	snprintf(btf_path, sizeof(btf_path), "/tmp/btf_kind.%d", getpid());
+	fd = open(btf_path, O_WRONLY | O_CREAT);
+	write(fd, new_raw_btf, raw_size + sizeof(*k));
+	close(fd);
+
+	/* verify parsing succeeds, and that we can read type info past
+	 * the unrecognized kind.
+	 */
+	new_btf = btf__parse_raw(btf_path);
+	if (ASSERT_OK_PTR(new_btf, "btf__parse_raw")) {
+		ASSERT_EQ(btf__find_by_name_kind(new_btf, "test_lookup",
+						 BTF_KIND_TYPEDEF), id,
+			  "verify_id_lookup");
+		/* verify the kernel can handle unrecognized kinds. */
+		ASSERT_EQ(btf__load_into_kernel(new_btf), 0, "btf_load_into_kernel");
+	}
+	unlink(btf_path);
+}
+
+void test_btf_kind(void)
+{
+	LIBBPF_OPTS(btf_new_opts, opts);
+	char *description = "testing metadata!";
+
+	opts.add_meta = true;
+	opts.description = description;
+
+	struct btf *btf = btf__new_empty_opts(&opts);
+
+	if (!ASSERT_OK_PTR(btf, "btf_new"))
+		return;
+
+	if (test__start_subtest("btf_kind_encoding"))
+		test_btf_kind_encoding(btf, description);
+	if (test__start_subtest("btf_kind_decoding"))
+		test_btf_kind_decoding(btf);
+	btf__free(btf);
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC dwarves] dwarves: encode BTF metadata if --btf_gen_meta is set
  2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
                   ` (7 preceding siblings ...)
  2023-05-31 20:19 ` [RFC bpf-next 8/8] selftests/bpf: test kind encoding/decoding Alan Maguire
@ 2023-05-31 20:19 ` Alan Maguire
  8 siblings, 0 replies; 35+ messages in thread
From: Alan Maguire @ 2023-05-31 20:19 UTC (permalink / raw)
  To: ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, quentin, mykolal, bpf, Alan Maguire

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 btf_encoder.c | 13 +++++++++++--
 btf_encoder.h |  2 +-
 lib/bpf       |  2 +-
 pahole.c      | 12 +++++++++++-
 4 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 65f6e71..d3d6c67 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1625,17 +1625,26 @@ out:
 	return err;
 }
 
-struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, bool verbose)
+struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, bool gen_meta, bool verbose)
 {
 	struct btf_encoder *encoder = zalloc(sizeof(*encoder));
 
 	if (encoder) {
+		LIBBPF_OPTS(btf_new_opts, opts);
+		char description[128];
+
+		snprintf(description, sizeof(description), "generated by dwarves v%u.%u",
+			 DWARVES_MAJOR_VERSION, DWARVES_MINOR_VERSION);
+		opts.base_btf = base_btf;
+		opts.add_meta = gen_meta;
+		opts.description = description;
+
 		encoder->raw_output = detached_filename != NULL;
 		encoder->filename = strdup(encoder->raw_output ? detached_filename : cu->filename);
 		if (encoder->filename == NULL)
 			goto out_delete;
 
-		encoder->btf = btf__new_empty_split(base_btf);
+		encoder->btf = btf__new_empty_opts(&opts);
 		if (encoder->btf == NULL)
 			goto out_delete;
 
diff --git a/btf_encoder.h b/btf_encoder.h
index 34516bb..0c8ea1a 100644
--- a/btf_encoder.h
+++ b/btf_encoder.h
@@ -16,7 +16,7 @@ struct btf;
 struct cu;
 struct list_head;
 
-struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, bool verbose);
+struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, bool gen_meta, bool verbose);
 void btf_encoder__delete(struct btf_encoder *encoder);
 
 int btf_encoder__encode(struct btf_encoder *encoder);
diff --git a/lib/bpf b/lib/bpf
index 6597330..3f591a6 160000
--- a/lib/bpf
+++ b/lib/bpf
@@ -1 +1 @@
-Subproject commit 6597330c45d185381900037f0130712cd326ae59
+Subproject commit d40a48fbfa2a357a6245c59e7bb811f72b5fc94d
diff --git a/pahole.c b/pahole.c
index 6fc4ed6..78fa778 100644
--- a/pahole.c
+++ b/pahole.c
@@ -40,6 +40,7 @@ static bool first_obj_only;
 static bool skip_encoding_btf_vars;
 static bool btf_encode_force;
 static const char *base_btf_file;
+static bool btf_gen_meta;
 
 static const char *prettify_input_filename;
 static FILE *prettify_input;
@@ -1232,6 +1233,7 @@ ARGP_PROGRAM_VERSION_HOOK_DEF = dwarves_print_version;
 #define ARGP_skip_emitting_atomic_typedefs 338
 #define ARGP_btf_gen_optimized  339
 #define ARGP_skip_encoding_btf_inconsistent_proto 340
+#define ARGP_btf_gen_meta          341
 
 static const struct argp_option pahole__options[] = {
 	{
@@ -1654,6 +1656,11 @@ static const struct argp_option pahole__options[] = {
 		.key = ARGP_skip_encoding_btf_inconsistent_proto,
 		.doc = "Skip functions that have multiple inconsistent function prototypes sharing the same name, or that use unexpected registers for parameter values."
 	},
+	{
+		.name = "btf_gen_meta",
+		.key  = ARGP_btf_gen_meta,
+		.doc  = "Generate BTF metadata including kinds available."
+	},
 	{
 		.name = NULL,
 	}
@@ -1829,6 +1836,8 @@ static error_t pahole__options_parser(int key, char *arg,
 		conf_load.btf_gen_optimized = true;		break;
 	case ARGP_skip_encoding_btf_inconsistent_proto:
 		conf_load.skip_encoding_btf_inconsistent_proto = true; break;
+	case ARGP_btf_gen_meta:
+		btf_gen_meta = true;			break;
 	default:
 		return ARGP_ERR_UNKNOWN;
 	}
@@ -3064,7 +3073,7 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 			 * create it.
 			 */
 			btf_encoder = btf_encoder__new(cu, detached_btf_filename, conf_load->base_btf, skip_encoding_btf_vars,
-						       btf_encode_force, btf_gen_floats, global_verbose);
+						       btf_encode_force, btf_gen_floats, btf_gen_meta, global_verbose);
 			if (btf_encoder && thr_data) {
 				struct thread_data *thread = thr_data;
 
@@ -3096,6 +3105,7 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 							 skip_encoding_btf_vars,
 							 btf_encode_force,
 							 btf_gen_floats,
+							 btf_gen_meta,
 							 global_verbose);
 				thread->btf = btf_encoder__btf(thread->encoder);
 			}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-05-31 20:19 ` [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI Alan Maguire
@ 2023-06-01  3:53   ` Alexei Starovoitov
  2023-06-01 10:36     ` Alan Maguire
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2023-06-01  3:53 UTC (permalink / raw)
  To: Alan Maguire
  Cc: ast, daniel, andrii, acme, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, quentin, mykolal, bpf

On Wed, May 31, 2023 at 09:19:28PM +0100, Alan Maguire wrote:
> BTF kind metadata provides information to parse BTF kinds.
> By separating parsing BTF from using all the information
> it provides, we allow BTF to encode new features even if
> they cannot be used.  This is helpful in particular for
> cases where newer tools for BTF generation run on an
> older kernel; BTF kinds may be present that the kernel
> cannot yet use, but at least it can parse the BTF
> provided.  Meanwhile userspace tools with newer libbpf
> may be able to use the newer information.
> 
> The intent is to support encoding of kind metadata
> optionally so that tools like pahole can add this
> information.  So for each kind we record
> 
> - a kind name string
> - kind-related flags
> - length of singular element following struct btf_type
> - length of each of the btf_vlen() elements following
> 
> In addition we make space in the metadata for
> CRC32s computed over the BTF along with a CRC for
> the base BTF; this allows split BTF to identify
> a mismatch explicitly.  Finally we provide an
> offset for an optional description string.
> 
> The ideas here were discussed at [1] hence
> 
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> 
> [1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/
> ---
>  include/uapi/linux/btf.h       | 29 +++++++++++++++++++++++++++++
>  tools/include/uapi/linux/btf.h | 29 +++++++++++++++++++++++++++++
>  2 files changed, 58 insertions(+)
> 
> diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
> index ec1798b6d3ff..94c1f4518249 100644
> --- a/include/uapi/linux/btf.h
> +++ b/include/uapi/linux/btf.h
> @@ -8,6 +8,34 @@
>  #define BTF_MAGIC	0xeB9F
>  #define BTF_VERSION	1
>  
> +/* is this information required? If so it cannot be sanitized safely. */
> +#define BTF_KIND_META_OPTIONAL		(1 << 0)
> +
> +struct btf_kind_meta {
> +	__u32 name_off;		/* kind name string offset */
> +	__u16 flags;		/* see BTF_KIND_META_* values above */
> +	__u8 info_sz;		/* size of singular element after btf_type */
> +	__u8 elem_sz;		/* size of each of btf_vlen(t) elements */
> +};
> +
> +/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
> +#define BTF_META_CRC_SET		(1 << 0)
> +#define BTF_META_BASE_CRC_SET		(1 << 1)
> +
> +struct btf_metadata {
> +	__u8	kind_meta_cnt;		/* number of struct btf_kind_meta */
> +	__u32	flags;
> +	__u32	description_off;	/* optional description string */
> +	__u32	crc;			/* crc32 of BTF */
> +	__u32	base_crc;		/* crc32 of base BTF */
> +	struct btf_kind_meta kind_meta[];
> +};
> +
> +struct btf_meta_header {
> +	__u32	meta_off;	/* offset of metadata section */
> +	__u32	meta_len;	/* length of metadata section */
> +};
> +
>  struct btf_header {
>  	__u16	magic;
>  	__u8	version;
> @@ -19,6 +47,7 @@ struct btf_header {
>  	__u32	type_len;	/* length of type section	*/
>  	__u32	str_off;	/* offset of string section	*/
>  	__u32	str_len;	/* length of string section	*/
> +	struct btf_meta_header meta_header;
>  };
>  
>  /* Max # of type identifier */
> diff --git a/tools/include/uapi/linux/btf.h b/tools/include/uapi/linux/btf.h
> index ec1798b6d3ff..94c1f4518249 100644
> --- a/tools/include/uapi/linux/btf.h
> +++ b/tools/include/uapi/linux/btf.h
> @@ -8,6 +8,34 @@
>  #define BTF_MAGIC	0xeB9F
>  #define BTF_VERSION	1
>  
> +/* is this information required? If so it cannot be sanitized safely. */
> +#define BTF_KIND_META_OPTIONAL		(1 << 0)
> +
> +struct btf_kind_meta {
> +	__u32 name_off;		/* kind name string offset */
> +	__u16 flags;		/* see BTF_KIND_META_* values above */
> +	__u8 info_sz;		/* size of singular element after btf_type */
> +	__u8 elem_sz;		/* size of each of btf_vlen(t) elements */
> +};
> +
> +/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
> +#define BTF_META_CRC_SET		(1 << 0)
> +#define BTF_META_BASE_CRC_SET		(1 << 1)
> +
> +struct btf_metadata {
> +	__u8	kind_meta_cnt;		/* number of struct btf_kind_meta */

Overall, looks great.
Few small nits:
I'd make kind_meta_cnt u32, since padding we won't be able to reuse anyway
and would bump the BTF_VERSION to 2 to make it a 'milestone'.
v2 -> self described.

> +	__u32	flags;
> +	__u32	description_off;	/* optional description string */
> +	__u32	crc;			/* crc32 of BTF */
> +	__u32	base_crc;		/* crc32 of base BTF */

Hard coded CRC also gives me a pause.
Should it be an optional KIND like btf tags?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-01  3:53   ` Alexei Starovoitov
@ 2023-06-01 10:36     ` Alan Maguire
  2023-06-01 16:53       ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Alan Maguire @ 2023-06-01 10:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: ast, daniel, andrii, acme, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, quentin, mykolal, bpf

On 01/06/2023 04:53, Alexei Starovoitov wrote:
> On Wed, May 31, 2023 at 09:19:28PM +0100, Alan Maguire wrote:
>> BTF kind metadata provides information to parse BTF kinds.
>> By separating parsing BTF from using all the information
>> it provides, we allow BTF to encode new features even if
>> they cannot be used.  This is helpful in particular for
>> cases where newer tools for BTF generation run on an
>> older kernel; BTF kinds may be present that the kernel
>> cannot yet use, but at least it can parse the BTF
>> provided.  Meanwhile userspace tools with newer libbpf
>> may be able to use the newer information.
>>
>> The intent is to support encoding of kind metadata
>> optionally so that tools like pahole can add this
>> information.  So for each kind we record
>>
>> - a kind name string
>> - kind-related flags
>> - length of singular element following struct btf_type
>> - length of each of the btf_vlen() elements following
>>
>> In addition we make space in the metadata for
>> CRC32s computed over the BTF along with a CRC for
>> the base BTF; this allows split BTF to identify
>> a mismatch explicitly.  Finally we provide an
>> offset for an optional description string.
>>
>> The ideas here were discussed at [1] hence
>>
>> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>>
>> [1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/
>> ---
>>  include/uapi/linux/btf.h       | 29 +++++++++++++++++++++++++++++
>>  tools/include/uapi/linux/btf.h | 29 +++++++++++++++++++++++++++++
>>  2 files changed, 58 insertions(+)
>>
>> diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
>> index ec1798b6d3ff..94c1f4518249 100644
>> --- a/include/uapi/linux/btf.h
>> +++ b/include/uapi/linux/btf.h
>> @@ -8,6 +8,34 @@
>>  #define BTF_MAGIC	0xeB9F
>>  #define BTF_VERSION	1
>>  
>> +/* is this information required? If so it cannot be sanitized safely. */
>> +#define BTF_KIND_META_OPTIONAL		(1 << 0)
>> +
>> +struct btf_kind_meta {
>> +	__u32 name_off;		/* kind name string offset */
>> +	__u16 flags;		/* see BTF_KIND_META_* values above */
>> +	__u8 info_sz;		/* size of singular element after btf_type */
>> +	__u8 elem_sz;		/* size of each of btf_vlen(t) elements */
>> +};
>> +
>> +/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
>> +#define BTF_META_CRC_SET		(1 << 0)
>> +#define BTF_META_BASE_CRC_SET		(1 << 1)
>> +
>> +struct btf_metadata {
>> +	__u8	kind_meta_cnt;		/* number of struct btf_kind_meta */
>> +	__u32	flags;
>> +	__u32	description_off;	/* optional description string */
>> +	__u32	crc;			/* crc32 of BTF */
>> +	__u32	base_crc;		/* crc32 of base BTF */
>> +	struct btf_kind_meta kind_meta[];
>> +};
>> +
>> +struct btf_meta_header {
>> +	__u32	meta_off;	/* offset of metadata section */
>> +	__u32	meta_len;	/* length of metadata section */
>> +};
>> +
>>  struct btf_header {
>>  	__u16	magic;
>>  	__u8	version;
>> @@ -19,6 +47,7 @@ struct btf_header {
>>  	__u32	type_len;	/* length of type section	*/
>>  	__u32	str_off;	/* offset of string section	*/
>>  	__u32	str_len;	/* length of string section	*/
>> +	struct btf_meta_header meta_header;
>>  };
>>  
>>  /* Max # of type identifier */
>> diff --git a/tools/include/uapi/linux/btf.h b/tools/include/uapi/linux/btf.h
>> index ec1798b6d3ff..94c1f4518249 100644
>> --- a/tools/include/uapi/linux/btf.h
>> +++ b/tools/include/uapi/linux/btf.h
>> @@ -8,6 +8,34 @@
>>  #define BTF_MAGIC	0xeB9F
>>  #define BTF_VERSION	1
>>  
>> +/* is this information required? If so it cannot be sanitized safely. */
>> +#define BTF_KIND_META_OPTIONAL		(1 << 0)
>> +
>> +struct btf_kind_meta {
>> +	__u32 name_off;		/* kind name string offset */
>> +	__u16 flags;		/* see BTF_KIND_META_* values above */
>> +	__u8 info_sz;		/* size of singular element after btf_type */
>> +	__u8 elem_sz;		/* size of each of btf_vlen(t) elements */
>> +};
>> +
>> +/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
>> +#define BTF_META_CRC_SET		(1 << 0)
>> +#define BTF_META_BASE_CRC_SET		(1 << 1)
>> +
>> +struct btf_metadata {
>> +	__u8	kind_meta_cnt;		/* number of struct btf_kind_meta */
> 
> Overall, looks great.
> Few small nits:
> I'd make kind_meta_cnt u32, since padding we won't be able to reuse anyway
> and would bump the BTF_VERSION to 2 to make it a 'milestone'.
> v2 -> self described.

sure, sounds good. One other change perhaps worth making; currently
we assume that the kind metadata is at the end of the struct
btf_metadata, but if we ever wanted to add metadata fields in the
future, we'd want so support both the current metadata structure and
any future structure which had additional fields.

With that in mind, it might make sense to go with something like

struct btf_metadata {
	__u32	kind_meta_cnt;
	__u32	kind_meta_offset;	/* kind_meta_cnt instances of struct
btf_kind_meta start here */
	__u32	flags;
	__u32	description_off;	/* optional description string*/
	__u32	crc;			/* crc32 of BTF */
	__u32	base_crc;		/* crc32 of base BTF */
};

For the original version, kind_meta_offset would just be
at meta_off + sizeof(struct btf_metadata), but if we had multiple
versions of the btf_metadata header to handle, they could all rely on
the kind_meta_offset being where kind metadata is stored.
For validation we'd have to make sure kind_meta_offset was within
the the metadata header range.

> 
>> +	__u32	flags;
>> +	__u32	description_off;	/* optional description string */
>> +	__u32	crc;			/* crc32 of BTF */
>> +	__u32	base_crc;		/* crc32 of base BTF */
> 
> Hard coded CRC also gives me a pause.
> Should it be an optional KIND like btf tags?

The goal of the CRC is really just to provide a unique identifier that
we can use for things like checking if there's a mismatch between
base and module BTF. If we want to ever do CRC validation (not sure
if there's a case for that) we probably need to think about cases like
BTF sanitization of BPF program BTF; this would likely only be an
issue if metadata support is added to BPF compilers.

The problem with adding it via a kind is that if we first compute
the CRC over the entire BTF object and then add the kind, the addition
of the kind breaks the CRC; as a result I _think_ we're stuck with
having to have it in the header.

That said I don't think CRC is necessarily the only identifier
we could use, and we don't even need to identify it as a
CRC in the UAPI, just as a "unique identifier"; that would deal
with issues about breaking the CRC during sanitization. All
depends on whether we ever see a need to verify BTF via CRC
really.

One note; I found that the changes in patch 4 break kernel BTF
parsing when using the original BTF header, so if anyone is trying this
out, make sure the dwarves changes to generate BTF metadata are in
place. I've got a fix and will send it with v2 but don't want to spam
the list with a whole new series, so the following diff will fix it:

--- kernel/bpf/btf.c.original	2023-06-01 11:20:39.418807425 +0100
+++ kernel/bpf/btf.c	2023-06-01 11:20:57.012807358 +0100
@@ -5260,13 +5260,13 @@
 		secs[i] = *(struct btf_sec_info *)((void *)hdr +
 						   btf_sec_info_offset[i]);

-	sort(secs, ARRAY_SIZE(btf_sec_info_offset),
+	sort(secs, nr_secs,
 	     sizeof(struct btf_sec_info), btf_sec_info_cmp, NULL);

 	/* Check for gaps and overlap among sections */
 	total = 0;
 	expected_total = btf_data_size - hdr->hdr_len;
-	for (i = 0; i < ARRAY_SIZE(btf_sec_info_offset); i++) {
+	for (i = 0; i < nr_secs; i++) {
 		if (expected_total < secs[i].off) {
 			btf_verifier_log(env, "Invalid section offset");
 			return -EINVAL;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 7/8] bpftool: add BTF dump "format meta" to dump header/metadata
  2023-05-31 20:19 ` [RFC bpf-next 7/8] bpftool: add BTF dump "format meta" to dump header/metadata Alan Maguire
@ 2023-06-01 16:33   ` Quentin Monnet
  2023-06-02 16:57   ` Andrii Nakryiko
  1 sibling, 0 replies; 35+ messages in thread
From: Quentin Monnet @ 2023-06-01 16:33 UTC (permalink / raw)
  To: Alan Maguire, ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, mykolal, bpf

2023-05-31 21:19 UTC+0100 ~ Alan Maguire <alan.maguire@oracle.com>
> Provide a way to dump BTF header and metadata info via
> bpftool; for example
> 
> $ bpftool btf dump file vmliux format meta

Typo: vmliux

> BTF: data size 4963656
> Header: magic 0xeb9f, version 1, flags 0x0, hdr_len 32
> Types: len 2927556, offset 0
> Strings: len 2035881, offset 2927556
> Metadata header found: len 184, offset 4963440, flags 0x1
> Description: 'generated by dwarves v1.25'
> CRC 0x6da2a930 ; base CRC 0x0
> Kind metadata for 20 kinds:
>        BTF_KIND_UNKN[ 0] flags 0x0    info_sz  0 elem_sz  0
>         BTF_KIND_INT[ 1] flags 0x0    info_sz  4 elem_sz  0
>         BTF_KIND_PTR[ 2] flags 0x0    info_sz  0 elem_sz  0
>       BTF_KIND_ARRAY[ 3] flags 0x0    info_sz 12 elem_sz  0
>      BTF_KIND_STRUCT[ 4] flags 0x0    info_sz  0 elem_sz 12
>       BTF_KIND_UNION[ 5] flags 0x0    info_sz  0 elem_sz 12
>        BTF_KIND_ENUM[ 6] flags 0x0    info_sz  0 elem_sz  8
>         BTF_KIND_FWD[ 7] flags 0x0    info_sz  0 elem_sz  0
>     BTF_KIND_TYPEDEF[ 8] flags 0x0    info_sz  0 elem_sz  0
>    BTF_KIND_VOLATILE[ 9] flags 0x0    info_sz  0 elem_sz  0
>       BTF_KIND_CONST[10] flags 0x0    info_sz  0 elem_sz  0
>    BTF_KIND_RESTRICT[11] flags 0x0    info_sz  0 elem_sz  0
>        BTF_KIND_FUNC[12] flags 0x0    info_sz  0 elem_sz  0
>  BTF_KIND_FUNC_PROTO[13] flags 0x0    info_sz  0 elem_sz  8
>         BTF_KIND_VAR[14] flags 0x0    info_sz  4 elem_sz  0
>     BTF_KIND_DATASEC[15] flags 0x0    info_sz  0 elem_sz 12
>       BTF_KIND_FLOAT[16] flags 0x0    info_sz  0 elem_sz  0
>    BTF_KIND_DECL_TAG[17] flags 0x1    info_sz  4 elem_sz  0
>    BTF_KIND_TYPE_TAG[18] flags 0x1    info_sz  0 elem_sz  0
>      BTF_KIND_ENUM64[19] flags 0x0    info_sz  0 elem_sz 12
> 

Thanks for this! For the non-RFC, can you please add the following:

- JSON output
- btf.c's do_help() update
- Documentation/bpftool-btf.rst update (cmd summary, and description)
- bash-completion/bpftool update (should be straightforward, we just
need to offer "metadata" after "format", like we already offer "c" and
"raw".

> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/bpf/bpftool/btf.c | 46 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
> index 91fcb75babe3..da4257e00ba8 100644
> --- a/tools/bpf/bpftool/btf.c
> +++ b/tools/bpf/bpftool/btf.c
> @@ -504,6 +504,47 @@ static int dump_btf_c(const struct btf *btf,
>  	return err;
>  }
>  
> +static int dump_btf_meta(const struct btf *btf)
> +{
> +	const struct btf_header *hdr;
> +	const struct btf_metadata *m;
> +	const void *data;
> +	__u32 data_sz;
> +	__u8 i;
> +
> +	data = btf__raw_data(btf, &data_sz);
> +	if (!data)
> +		return -ENOMEM;
> +	hdr = data;
> +	printf("BTF: data size %u\n", data_sz);
> +	printf("Header: magic 0x%x, version %d, flags 0x%x, hdr_len %u\n",
> +	       hdr->magic, hdr->version, hdr->flags, hdr->hdr_len);

Nit: bpftool's output fields don't usually start with a capital letter
(save for acronyms, obviously). I don't mind much, though.

> +	printf("Types: len %u, offset %u\n", hdr->type_len, hdr->type_off);
> +	printf("Strings: len %u, offset %u\n", hdr->str_len, hdr->str_off);
> +
> +	if (hdr->hdr_len < sizeof(struct btf_header) ||
> +	    hdr->meta_header.meta_len == 0 ||
> +	    hdr->meta_header.meta_off == 0)
> +		return 0;
> +
> +	m = (void *)hdr + hdr->hdr_len + hdr->meta_header.meta_off;
> +
> +	printf("Metadata header found: len %u, offset %u, flags 0x%x\n",
> +	       hdr->meta_header.meta_len, hdr->meta_header.meta_off, m->flags);
> +	if (m->description_off)
> +		printf("Description: '%s'\n", btf__name_by_offset(btf, m->description_off));
> +	printf("CRC 0x%x ; base CRC 0x%x\n", m->crc, m->base_crc);
> +	printf("Kind metadata for %d kinds:\n", m->kind_meta_cnt);
> +	for (i = 0; i < m->kind_meta_cnt; i++) {
> +		printf("%20s[%2d] flags 0x%-4x info_sz %2d elem_sz %2d\n",
> +		       btf__name_by_offset(btf, m->kind_meta[i].name_off),
> +		       i, m->kind_meta[i].flags, m->kind_meta[i].info_sz,
> +		       m->kind_meta[i].elem_sz);

Nit: I would maybe add a double space for separating the different
field, especially because we have some left padding for values < 10 in
your example and it looks strange to have numbers closer to the next
field name (on their right) rather than their own (on their left).

> +	}
> +
> +	return 0;
> +}
> +
>  static const char sysfs_vmlinux[] = "/sys/kernel/btf/vmlinux";
>  
>  static struct btf *get_vmlinux_btf_from_sysfs(void)
> @@ -553,6 +594,7 @@ static int do_dump(int argc, char **argv)
>  	__u32 root_type_ids[2];
>  	int root_type_cnt = 0;
>  	bool dump_c = false;
> +	bool dump_meta = false;
>  	__u32 btf_id = -1;
>  	const char *src;
>  	int fd = -1;
> @@ -654,6 +696,8 @@ static int do_dump(int argc, char **argv)
>  			}
>  			if (strcmp(*argv, "c") == 0) {
>  				dump_c = true;
> +			} else if (strcmp(*argv, "meta") == 0) {
> +				dump_meta = true;

We could use is_prefix() instead of strcmp() (same for "raw" below, by
the way), to make it possible to pass the keyword by prefix (as in
"bpftool b d f vmlinux f m".

>  			} else if (strcmp(*argv, "raw") == 0) {
>  				dump_c = false;
>  			} else {
> @@ -692,6 +736,8 @@ static int do_dump(int argc, char **argv)
>  			goto done;
>  		}
>  		err = dump_btf_c(btf, root_type_ids, root_type_cnt);
> +	} else if (dump_meta) {
> +		err = dump_btf_meta(btf);
>  	} else {
>  		err = dump_btf_raw(btf, root_type_ids, root_type_cnt);
>  	}


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-01 10:36     ` Alan Maguire
@ 2023-06-01 16:53       ` Alexei Starovoitov
  2023-06-02 16:32         ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2023-06-01 16:53 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Arnaldo Carvalho de Melo, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Mykola Lysenko, bpf

On Thu, Jun 1, 2023 at 3:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 01/06/2023 04:53, Alexei Starovoitov wrote:
> > On Wed, May 31, 2023 at 09:19:28PM +0100, Alan Maguire wrote:
> >> BTF kind metadata provides information to parse BTF kinds.
> >> By separating parsing BTF from using all the information
> >> it provides, we allow BTF to encode new features even if
> >> they cannot be used.  This is helpful in particular for
> >> cases where newer tools for BTF generation run on an
> >> older kernel; BTF kinds may be present that the kernel
> >> cannot yet use, but at least it can parse the BTF
> >> provided.  Meanwhile userspace tools with newer libbpf
> >> may be able to use the newer information.
> >>
> >> The intent is to support encoding of kind metadata
> >> optionally so that tools like pahole can add this
> >> information.  So for each kind we record
> >>
> >> - a kind name string
> >> - kind-related flags
> >> - length of singular element following struct btf_type
> >> - length of each of the btf_vlen() elements following
> >>
> >> In addition we make space in the metadata for
> >> CRC32s computed over the BTF along with a CRC for
> >> the base BTF; this allows split BTF to identify
> >> a mismatch explicitly.  Finally we provide an
> >> offset for an optional description string.
> >>
> >> The ideas here were discussed at [1] hence
> >>
> >> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >>
> >> [1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/
> >> ---
> >>  include/uapi/linux/btf.h       | 29 +++++++++++++++++++++++++++++
> >>  tools/include/uapi/linux/btf.h | 29 +++++++++++++++++++++++++++++
> >>  2 files changed, 58 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
> >> index ec1798b6d3ff..94c1f4518249 100644
> >> --- a/include/uapi/linux/btf.h
> >> +++ b/include/uapi/linux/btf.h
> >> @@ -8,6 +8,34 @@
> >>  #define BTF_MAGIC   0xeB9F
> >>  #define BTF_VERSION 1
> >>
> >> +/* is this information required? If so it cannot be sanitized safely. */
> >> +#define BTF_KIND_META_OPTIONAL              (1 << 0)
> >> +
> >> +struct btf_kind_meta {
> >> +    __u32 name_off;         /* kind name string offset */
> >> +    __u16 flags;            /* see BTF_KIND_META_* values above */
> >> +    __u8 info_sz;           /* size of singular element after btf_type */
> >> +    __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
> >> +};
> >> +
> >> +/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
> >> +#define BTF_META_CRC_SET            (1 << 0)
> >> +#define BTF_META_BASE_CRC_SET               (1 << 1)
> >> +
> >> +struct btf_metadata {
> >> +    __u8    kind_meta_cnt;          /* number of struct btf_kind_meta */
> >> +    __u32   flags;
> >> +    __u32   description_off;        /* optional description string */
> >> +    __u32   crc;                    /* crc32 of BTF */
> >> +    __u32   base_crc;               /* crc32 of base BTF */
> >> +    struct btf_kind_meta kind_meta[];
> >> +};
> >> +
> >> +struct btf_meta_header {
> >> +    __u32   meta_off;       /* offset of metadata section */
> >> +    __u32   meta_len;       /* length of metadata section */
> >> +};
> >> +
> >>  struct btf_header {
> >>      __u16   magic;
> >>      __u8    version;
> >> @@ -19,6 +47,7 @@ struct btf_header {
> >>      __u32   type_len;       /* length of type section       */
> >>      __u32   str_off;        /* offset of string section     */
> >>      __u32   str_len;        /* length of string section     */
> >> +    struct btf_meta_header meta_header;
> >>  };
> >>
> >>  /* Max # of type identifier */
> >> diff --git a/tools/include/uapi/linux/btf.h b/tools/include/uapi/linux/btf.h
> >> index ec1798b6d3ff..94c1f4518249 100644
> >> --- a/tools/include/uapi/linux/btf.h
> >> +++ b/tools/include/uapi/linux/btf.h
> >> @@ -8,6 +8,34 @@
> >>  #define BTF_MAGIC   0xeB9F
> >>  #define BTF_VERSION 1
> >>
> >> +/* is this information required? If so it cannot be sanitized safely. */
> >> +#define BTF_KIND_META_OPTIONAL              (1 << 0)
> >> +
> >> +struct btf_kind_meta {
> >> +    __u32 name_off;         /* kind name string offset */
> >> +    __u16 flags;            /* see BTF_KIND_META_* values above */
> >> +    __u8 info_sz;           /* size of singular element after btf_type */
> >> +    __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
> >> +};
> >> +
> >> +/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
> >> +#define BTF_META_CRC_SET            (1 << 0)
> >> +#define BTF_META_BASE_CRC_SET               (1 << 1)
> >> +
> >> +struct btf_metadata {
> >> +    __u8    kind_meta_cnt;          /* number of struct btf_kind_meta */
> >
> > Overall, looks great.
> > Few small nits:
> > I'd make kind_meta_cnt u32, since padding we won't be able to reuse anyway
> > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
> > v2 -> self described.
>
> sure, sounds good. One other change perhaps worth making; currently
> we assume that the kind metadata is at the end of the struct
> btf_metadata, but if we ever wanted to add metadata fields in the
> future, we'd want so support both the current metadata structure and
> any future structure which had additional fields.
>
> With that in mind, it might make sense to go with something like
>
> struct btf_metadata {
>         __u32   kind_meta_cnt;
>         __u32   kind_meta_offset;       /* kind_meta_cnt instances of struct
> btf_kind_meta start here */
>         __u32   flags;
>         __u32   description_off;        /* optional description string*/
>         __u32   crc;                    /* crc32 of BTF */
>         __u32   base_crc;               /* crc32 of base BTF */
> };
>
> For the original version, kind_meta_offset would just be
> at meta_off + sizeof(struct btf_metadata), but if we had multiple
> versions of the btf_metadata header to handle, they could all rely on
> the kind_meta_offset being where kind metadata is stored.
> For validation we'd have to make sure kind_meta_offset was within
> the the metadata header range.

kind_meta_offset is an ok idea, but I don't quite see why we'd have
multiple 'struct btf_metadata' pointing to the same set of 'struct
btf_kind_meta'.

Also why do we need description_off ? Shouldn't string go into
btf_header->str_off ?

> >
> >> +    __u32   flags;
> >> +    __u32   description_off;        /* optional description string */
> >> +    __u32   crc;                    /* crc32 of BTF */
> >> +    __u32   base_crc;               /* crc32 of base BTF */
> >
> > Hard coded CRC also gives me a pause.
> > Should it be an optional KIND like btf tags?
>
> The goal of the CRC is really just to provide a unique identifier that
> we can use for things like checking if there's a mismatch between
> base and module BTF. If we want to ever do CRC validation (not sure
> if there's a case for that) we probably need to think about cases like
> BTF sanitization of BPF program BTF; this would likely only be an
> issue if metadata support is added to BPF compilers.
>
> The problem with adding it via a kind is that if we first compute
> the CRC over the entire BTF object and then add the kind, the addition
> of the kind breaks the CRC; as a result I _think_ we're stuck with
> having to have it in the header.

Hmm. libbpf can add BTF_KIND_CRC with zero-ed u32 crc field
and later fill it in.
It's really not different than u32 crc field inside 'struct btf_metadata'.

> That said I don't think CRC is necessarily the only identifier
> we could use, and we don't even need to identify it as a
> CRC in the UAPI, just as a "unique identifier"; that would deal
> with issues about breaking the CRC during sanitization. All
> depends on whether we ever see a need to verify BTF via CRC
> really.

Right. It could be sha or anything else, but user space and kernel
need to agree on the math to compute it, so something got to indicate
that this 32-bit is a crc.
Hence KIND_CRC, KIND_SHA fit better.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-01 16:53       ` Alexei Starovoitov
@ 2023-06-02 16:32         ` Andrii Nakryiko
  2023-06-02 16:34           ` Andrii Nakryiko
  2023-06-02 18:11           ` Alexei Starovoitov
  0 siblings, 2 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-06-02 16:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Thu, Jun 1, 2023 at 9:54 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jun 1, 2023 at 3:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >
> > On 01/06/2023 04:53, Alexei Starovoitov wrote:
> > > On Wed, May 31, 2023 at 09:19:28PM +0100, Alan Maguire wrote:
> > >> BTF kind metadata provides information to parse BTF kinds.
> > >> By separating parsing BTF from using all the information
> > >> it provides, we allow BTF to encode new features even if
> > >> they cannot be used.  This is helpful in particular for
> > >> cases where newer tools for BTF generation run on an
> > >> older kernel; BTF kinds may be present that the kernel
> > >> cannot yet use, but at least it can parse the BTF
> > >> provided.  Meanwhile userspace tools with newer libbpf
> > >> may be able to use the newer information.
> > >>
> > >> The intent is to support encoding of kind metadata
> > >> optionally so that tools like pahole can add this
> > >> information.  So for each kind we record
> > >>
> > >> - a kind name string
> > >> - kind-related flags
> > >> - length of singular element following struct btf_type
> > >> - length of each of the btf_vlen() elements following
> > >>
> > >> In addition we make space in the metadata for
> > >> CRC32s computed over the BTF along with a CRC for
> > >> the base BTF; this allows split BTF to identify
> > >> a mismatch explicitly.  Finally we provide an
> > >> offset for an optional description string.
> > >>
> > >> The ideas here were discussed at [1] hence
> > >>
> > >> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> > >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> > >>
> > >> [1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/
> > >> ---
> > >>  include/uapi/linux/btf.h       | 29 +++++++++++++++++++++++++++++
> > >>  tools/include/uapi/linux/btf.h | 29 +++++++++++++++++++++++++++++
> > >>  2 files changed, 58 insertions(+)
> > >>
> > >> diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
> > >> index ec1798b6d3ff..94c1f4518249 100644
> > >> --- a/include/uapi/linux/btf.h
> > >> +++ b/include/uapi/linux/btf.h
> > >> @@ -8,6 +8,34 @@
> > >>  #define BTF_MAGIC   0xeB9F
> > >>  #define BTF_VERSION 1
> > >>
> > >> +/* is this information required? If so it cannot be sanitized safely. */
> > >> +#define BTF_KIND_META_OPTIONAL              (1 << 0)

Another flag I was thinking about was a flag whether struct btf_type's
type/size field is a type or a size (or something else). E.g., let's
say we haven't had btf_type_tag yet and were adding it after we had
this new metadata. We could say that type_tag's type/size field is
actually a type ID, and generic tools like bpftool could basically
skip type_tag and resolve to underlying type. This way, optional
modifier/decorator KINDs won't even have to break applications using
old libbpf's when it comes to calculating type sizes and resolving
them.

> > >> +
> > >> +struct btf_kind_meta {
> > >> +    __u32 name_off;         /* kind name string offset */

I'm not sure why we'd need to record this for every KIND? The tool
that doesn't know about this new kind can't do much about it anyways,
so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
doesn't make much difference?

> > >> +    __u16 flags;            /* see BTF_KIND_META_* values above */
> > >> +    __u8 info_sz;           /* size of singular element after btf_type */
> > >> +    __u8 elem_sz;           /* size of each of btf_vlen(t) elements */

on the other hand, reserving __u32 as "extra" might be useful if we
ever want to have some more stuff here (worst case we can define it as
string offset which describes some more metadata


> > >> +};
> > >> +
> > >> +/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
> > >> +#define BTF_META_CRC_SET            (1 << 0)
> > >> +#define BTF_META_BASE_CRC_SET               (1 << 1)
> > >> +
> > >> +struct btf_metadata {
> > >> +    __u8    kind_meta_cnt;          /* number of struct btf_kind_meta */
> > >> +    __u32   flags;
> > >> +    __u32   description_off;        /* optional description string */
> > >> +    __u32   crc;                    /* crc32 of BTF */
> > >> +    __u32   base_crc;               /* crc32 of base BTF */
> > >> +    struct btf_kind_meta kind_meta[];
> > >> +};
> > >> +
> > >> +struct btf_meta_header {
> > >> +    __u32   meta_off;       /* offset of metadata section */
> > >> +    __u32   meta_len;       /* length of metadata section */
> > >> +};
> > >> +
> > >>  struct btf_header {
> > >>      __u16   magic;
> > >>      __u8    version;
> > >> @@ -19,6 +47,7 @@ struct btf_header {
> > >>      __u32   type_len;       /* length of type section       */
> > >>      __u32   str_off;        /* offset of string section     */
> > >>      __u32   str_len;        /* length of string section     */
> > >> +    struct btf_meta_header meta_header;

looking through all this, it seems like it would be better to have a
separate "metadata" section, just like we have types and strings? And
then the contents are described by btf_metadata struct?


> > >>  };
> > >>
> > >>  /* Max # of type identifier */
> > >> diff --git a/tools/include/uapi/linux/btf.h b/tools/include/uapi/linux/btf.h
> > >> index ec1798b6d3ff..94c1f4518249 100644
> > >> --- a/tools/include/uapi/linux/btf.h
> > >> +++ b/tools/include/uapi/linux/btf.h
> > >> @@ -8,6 +8,34 @@
> > >>  #define BTF_MAGIC   0xeB9F
> > >>  #define BTF_VERSION 1
> > >>
> > >> +/* is this information required? If so it cannot be sanitized safely. */
> > >> +#define BTF_KIND_META_OPTIONAL              (1 << 0)
> > >> +
> > >> +struct btf_kind_meta {
> > >> +    __u32 name_off;         /* kind name string offset */
> > >> +    __u16 flags;            /* see BTF_KIND_META_* values above */
> > >> +    __u8 info_sz;           /* size of singular element after btf_type */
> > >> +    __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
> > >> +};
> > >> +
> > >> +/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
> > >> +#define BTF_META_CRC_SET            (1 << 0)
> > >> +#define BTF_META_BASE_CRC_SET               (1 << 1)
> > >> +
> > >> +struct btf_metadata {
> > >> +    __u8    kind_meta_cnt;          /* number of struct btf_kind_meta */
> > >
> > > Overall, looks great.
> > > Few small nits:
> > > I'd make kind_meta_cnt u32, since padding we won't be able to reuse anyway
> > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.

Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
existing kernels (and potentially many tools that parse BTF). Given we
can actually extend BTF in backwards compatible way by just adding an
optional two fields to btf_header + extra bytes for metadata sections,
why making our lives harder by bumping this version?

> > > v2 -> self described.
> >
> > sure, sounds good. One other change perhaps worth making; currently
> > we assume that the kind metadata is at the end of the struct
> > btf_metadata, but if we ever wanted to add metadata fields in the
> > future, we'd want so support both the current metadata structure and
> > any future structure which had additional fields.

see above, another reason to make metadata a separate section, in
addition to types and strings

> >
> > With that in mind, it might make sense to go with something like
> >
> > struct btf_metadata {
> >         __u32   kind_meta_cnt;
> >         __u32   kind_meta_offset;       /* kind_meta_cnt instances of struct
> > btf_kind_meta start here */
> >         __u32   flags;
> >         __u32   description_off;        /* optional description string*/
> >         __u32   crc;                    /* crc32 of BTF */
> >         __u32   base_crc;               /* crc32 of base BTF */
> > };
> >
> > For the original version, kind_meta_offset would just be
> > at meta_off + sizeof(struct btf_metadata), but if we had multiple
> > versions of the btf_metadata header to handle, they could all rely on
> > the kind_meta_offset being where kind metadata is stored.
> > For validation we'd have to make sure kind_meta_offset was within
> > the the metadata header range.
>
> kind_meta_offset is an ok idea, but I don't quite see why we'd have
> multiple 'struct btf_metadata' pointing to the same set of 'struct
> btf_kind_meta'.
>
> Also why do we need description_off ? Shouldn't string go into
> btf_header->str_off ?
>
> > >
> > >> +    __u32   flags;
> > >> +    __u32   description_off;        /* optional description string */
> > >> +    __u32   crc;                    /* crc32 of BTF */
> > >> +    __u32   base_crc;               /* crc32 of base BTF */
> > >
> > > Hard coded CRC also gives me a pause.
> > > Should it be an optional KIND like btf tags?
> >
> > The goal of the CRC is really just to provide a unique identifier that
> > we can use for things like checking if there's a mismatch between
> > base and module BTF. If we want to ever do CRC validation (not sure
> > if there's a case for that) we probably need to think about cases like
> > BTF sanitization of BPF program BTF; this would likely only be an
> > issue if metadata support is added to BPF compilers.
> >
> > The problem with adding it via a kind is that if we first compute
> > the CRC over the entire BTF object and then add the kind, the addition
> > of the kind breaks the CRC; as a result I _think_ we're stuck with
> > having to have it in the header.
>
> Hmm. libbpf can add BTF_KIND_CRC with zero-ed u32 crc field
> and later fill it in.
> It's really not different than u32 crc field inside 'struct btf_metadata'.
>
> > That said I don't think CRC is necessarily the only identifier
> > we could use, and we don't even need to identify it as a
> > CRC in the UAPI, just as a "unique identifier"; that would deal
> > with issues about breaking the CRC during sanitization. All
> > depends on whether we ever see a need to verify BTF via CRC
> > really.
>
> Right. It could be sha or anything else, but user space and kernel
> need to agree on the math to compute it, so something got to indicate
> that this 32-bit is a crc.
> Hence KIND_CRC, KIND_SHA fit better.

what if instead of crc and base_src fields, we have

__u32 id_str_off;
__u32 base_id_str_off;

and they are offsets into a string section. We can then define that
those strings have to be something like "crc:<crc-value>" or
"sha:<sha-value". This will be a generic ID, and extensible (and more
easily extensible, probably), but won't require new KIND.

This also has a good property that 0 means "no ID", which helps with
the base BTF case. Current "__u32 crc;" doesn't have this property and
requires a flag.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-02 16:32         ` Andrii Nakryiko
@ 2023-06-02 16:34           ` Andrii Nakryiko
  2023-06-02 18:11           ` Alexei Starovoitov
  1 sibling, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-06-02 16:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Fri, Jun 2, 2023 at 9:32 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Jun 1, 2023 at 9:54 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jun 1, 2023 at 3:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > >
> > > On 01/06/2023 04:53, Alexei Starovoitov wrote:
> > > > On Wed, May 31, 2023 at 09:19:28PM +0100, Alan Maguire wrote:
> > > >> BTF kind metadata provides information to parse BTF kinds.
> > > >> By separating parsing BTF from using all the information
> > > >> it provides, we allow BTF to encode new features even if
> > > >> they cannot be used.  This is helpful in particular for
> > > >> cases where newer tools for BTF generation run on an
> > > >> older kernel; BTF kinds may be present that the kernel
> > > >> cannot yet use, but at least it can parse the BTF
> > > >> provided.  Meanwhile userspace tools with newer libbpf
> > > >> may be able to use the newer information.
> > > >>
> > > >> The intent is to support encoding of kind metadata
> > > >> optionally so that tools like pahole can add this
> > > >> information.  So for each kind we record
> > > >>
> > > >> - a kind name string
> > > >> - kind-related flags
> > > >> - length of singular element following struct btf_type
> > > >> - length of each of the btf_vlen() elements following
> > > >>
> > > >> In addition we make space in the metadata for
> > > >> CRC32s computed over the BTF along with a CRC for
> > > >> the base BTF; this allows split BTF to identify
> > > >> a mismatch explicitly.  Finally we provide an
> > > >> offset for an optional description string.
> > > >>
> > > >> The ideas here were discussed at [1] hence
> > > >>
> > > >> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> > > >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> > > >>
> > > >> [1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/
> > > >> ---
> > > >>  include/uapi/linux/btf.h       | 29 +++++++++++++++++++++++++++++
> > > >>  tools/include/uapi/linux/btf.h | 29 +++++++++++++++++++++++++++++
> > > >>  2 files changed, 58 insertions(+)
> > > >>
> > > >> diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
> > > >> index ec1798b6d3ff..94c1f4518249 100644
> > > >> --- a/include/uapi/linux/btf.h
> > > >> +++ b/include/uapi/linux/btf.h
> > > >> @@ -8,6 +8,34 @@
> > > >>  #define BTF_MAGIC   0xeB9F
> > > >>  #define BTF_VERSION 1
> > > >>
> > > >> +/* is this information required? If so it cannot be sanitized safely. */
> > > >> +#define BTF_KIND_META_OPTIONAL              (1 << 0)
>
> Another flag I was thinking about was a flag whether struct btf_type's
> type/size field is a type or a size (or something else). E.g., let's
> say we haven't had btf_type_tag yet and were adding it after we had
> this new metadata. We could say that type_tag's type/size field is
> actually a type ID, and generic tools like bpftool could basically
> skip type_tag and resolve to underlying type. This way, optional
> modifier/decorator KINDs won't even have to break applications using
> old libbpf's when it comes to calculating type sizes and resolving
> them.
>
> > > >> +
> > > >> +struct btf_kind_meta {
> > > >> +    __u32 name_off;         /* kind name string offset */
>
> I'm not sure why we'd need to record this for every KIND? The tool
> that doesn't know about this new kind can't do much about it anyways,
> so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
> doesn't make much difference?
>
> > > >> +    __u16 flags;            /* see BTF_KIND_META_* values above */
> > > >> +    __u8 info_sz;           /* size of singular element after btf_type */
> > > >> +    __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
>
> on the other hand, reserving __u32 as "extra" might be useful if we
> ever want to have some more stuff here (worst case we can define it as
> string offset which describes some more metadata
>
>
> > > >> +};
> > > >> +
> > > >> +/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
> > > >> +#define BTF_META_CRC_SET            (1 << 0)
> > > >> +#define BTF_META_BASE_CRC_SET               (1 << 1)
> > > >> +
> > > >> +struct btf_metadata {
> > > >> +    __u8    kind_meta_cnt;          /* number of struct btf_kind_meta */
> > > >> +    __u32   flags;
> > > >> +    __u32   description_off;        /* optional description string */
> > > >> +    __u32   crc;                    /* crc32 of BTF */
> > > >> +    __u32   base_crc;               /* crc32 of base BTF */
> > > >> +    struct btf_kind_meta kind_meta[];
> > > >> +};
> > > >> +
> > > >> +struct btf_meta_header {
> > > >> +    __u32   meta_off;       /* offset of metadata section */
> > > >> +    __u32   meta_len;       /* length of metadata section */
> > > >> +};
> > > >> +
> > > >>  struct btf_header {
> > > >>      __u16   magic;
> > > >>      __u8    version;
> > > >> @@ -19,6 +47,7 @@ struct btf_header {
> > > >>      __u32   type_len;       /* length of type section       */
> > > >>      __u32   str_off;        /* offset of string section     */
> > > >>      __u32   str_len;        /* length of string section     */
> > > >> +    struct btf_meta_header meta_header;
>
> looking through all this, it seems like it would be better to have a
> separate "metadata" section, just like we have types and strings? And
> then the contents are described by btf_metadata struct?

Ok, wait, I'm dumb. You are effectively already doing this, but for
some reason added unnecessary struct around meta_off, meta_len fields.
Why? Let's keep it consistent with {type,str}_{off,len}?

>
>
> > > >>  };
> > > >>
> > > >>  /* Max # of type identifier */
> > > >> diff --git a/tools/include/uapi/linux/btf.h b/tools/include/uapi/linux/btf.h
> > > >> index ec1798b6d3ff..94c1f4518249 100644
> > > >> --- a/tools/include/uapi/linux/btf.h
> > > >> +++ b/tools/include/uapi/linux/btf.h
> > > >> @@ -8,6 +8,34 @@
> > > >>  #define BTF_MAGIC   0xeB9F
> > > >>  #define BTF_VERSION 1
> > > >>
> > > >> +/* is this information required? If so it cannot be sanitized safely. */
> > > >> +#define BTF_KIND_META_OPTIONAL              (1 << 0)
> > > >> +
> > > >> +struct btf_kind_meta {
> > > >> +    __u32 name_off;         /* kind name string offset */
> > > >> +    __u16 flags;            /* see BTF_KIND_META_* values above */
> > > >> +    __u8 info_sz;           /* size of singular element after btf_type */
> > > >> +    __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
> > > >> +};
> > > >> +
> > > >> +/* for CRCs for BTF, base BTF to be considered usable, flags must be set. */
> > > >> +#define BTF_META_CRC_SET            (1 << 0)
> > > >> +#define BTF_META_BASE_CRC_SET               (1 << 1)
> > > >> +
> > > >> +struct btf_metadata {
> > > >> +    __u8    kind_meta_cnt;          /* number of struct btf_kind_meta */
> > > >
> > > > Overall, looks great.
> > > > Few small nits:
> > > > I'd make kind_meta_cnt u32, since padding we won't be able to reuse anyway
> > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
>
> Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
> existing kernels (and potentially many tools that parse BTF). Given we
> can actually extend BTF in backwards compatible way by just adding an
> optional two fields to btf_header + extra bytes for metadata sections,
> why making our lives harder by bumping this version?
>
> > > > v2 -> self described.
> > >
> > > sure, sounds good. One other change perhaps worth making; currently
> > > we assume that the kind metadata is at the end of the struct
> > > btf_metadata, but if we ever wanted to add metadata fields in the
> > > future, we'd want so support both the current metadata structure and
> > > any future structure which had additional fields.
>
> see above, another reason to make metadata a separate section, in
> addition to types and strings
>
> > >
> > > With that in mind, it might make sense to go with something like
> > >
> > > struct btf_metadata {
> > >         __u32   kind_meta_cnt;
> > >         __u32   kind_meta_offset;       /* kind_meta_cnt instances of struct
> > > btf_kind_meta start here */
> > >         __u32   flags;
> > >         __u32   description_off;        /* optional description string*/
> > >         __u32   crc;                    /* crc32 of BTF */
> > >         __u32   base_crc;               /* crc32 of base BTF */
> > > };
> > >
> > > For the original version, kind_meta_offset would just be
> > > at meta_off + sizeof(struct btf_metadata), but if we had multiple
> > > versions of the btf_metadata header to handle, they could all rely on
> > > the kind_meta_offset being where kind metadata is stored.
> > > For validation we'd have to make sure kind_meta_offset was within
> > > the the metadata header range.
> >
> > kind_meta_offset is an ok idea, but I don't quite see why we'd have
> > multiple 'struct btf_metadata' pointing to the same set of 'struct
> > btf_kind_meta'.
> >
> > Also why do we need description_off ? Shouldn't string go into
> > btf_header->str_off ?
> >
> > > >
> > > >> +    __u32   flags;
> > > >> +    __u32   description_off;        /* optional description string */
> > > >> +    __u32   crc;                    /* crc32 of BTF */
> > > >> +    __u32   base_crc;               /* crc32 of base BTF */
> > > >
> > > > Hard coded CRC also gives me a pause.
> > > > Should it be an optional KIND like btf tags?
> > >
> > > The goal of the CRC is really just to provide a unique identifier that
> > > we can use for things like checking if there's a mismatch between
> > > base and module BTF. If we want to ever do CRC validation (not sure
> > > if there's a case for that) we probably need to think about cases like
> > > BTF sanitization of BPF program BTF; this would likely only be an
> > > issue if metadata support is added to BPF compilers.
> > >
> > > The problem with adding it via a kind is that if we first compute
> > > the CRC over the entire BTF object and then add the kind, the addition
> > > of the kind breaks the CRC; as a result I _think_ we're stuck with
> > > having to have it in the header.
> >
> > Hmm. libbpf can add BTF_KIND_CRC with zero-ed u32 crc field
> > and later fill it in.
> > It's really not different than u32 crc field inside 'struct btf_metadata'.
> >
> > > That said I don't think CRC is necessarily the only identifier
> > > we could use, and we don't even need to identify it as a
> > > CRC in the UAPI, just as a "unique identifier"; that would deal
> > > with issues about breaking the CRC during sanitization. All
> > > depends on whether we ever see a need to verify BTF via CRC
> > > really.
> >
> > Right. It could be sha or anything else, but user space and kernel
> > need to agree on the math to compute it, so something got to indicate
> > that this 32-bit is a crc.
> > Hence KIND_CRC, KIND_SHA fit better.
>
> what if instead of crc and base_src fields, we have
>
> __u32 id_str_off;
> __u32 base_id_str_off;
>
> and they are offsets into a string section. We can then define that
> those strings have to be something like "crc:<crc-value>" or
> "sha:<sha-value". This will be a generic ID, and extensible (and more
> easily extensible, probably), but won't require new KIND.
>
> This also has a good property that 0 means "no ID", which helps with
> the base BTF case. Current "__u32 crc;" doesn't have this property and
> requires a flag.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 7/8] bpftool: add BTF dump "format meta" to dump header/metadata
  2023-05-31 20:19 ` [RFC bpf-next 7/8] bpftool: add BTF dump "format meta" to dump header/metadata Alan Maguire
  2023-06-01 16:33   ` Quentin Monnet
@ 2023-06-02 16:57   ` Andrii Nakryiko
  1 sibling, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-06-02 16:57 UTC (permalink / raw)
  To: Alan Maguire
  Cc: ast, daniel, andrii, acme, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, quentin, mykolal, bpf

On Wed, May 31, 2023 at 1:21 PM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Provide a way to dump BTF header and metadata info via
> bpftool; for example
>
> $ bpftool btf dump file vmliux format meta
> BTF: data size 4963656
> Header: magic 0xeb9f, version 1, flags 0x0, hdr_len 32
> Types: len 2927556, offset 0
> Strings: len 2035881, offset 2927556
> Metadata header found: len 184, offset 4963440, flags 0x1
> Description: 'generated by dwarves v1.25'
> CRC 0x6da2a930 ; base CRC 0x0
> Kind metadata for 20 kinds:
>        BTF_KIND_UNKN[ 0] flags 0x0    info_sz  0 elem_sz  0
>         BTF_KIND_INT[ 1] flags 0x0    info_sz  4 elem_sz  0
>         BTF_KIND_PTR[ 2] flags 0x0    info_sz  0 elem_sz  0
>       BTF_KIND_ARRAY[ 3] flags 0x0    info_sz 12 elem_sz  0
>      BTF_KIND_STRUCT[ 4] flags 0x0    info_sz  0 elem_sz 12
>       BTF_KIND_UNION[ 5] flags 0x0    info_sz  0 elem_sz 12
>        BTF_KIND_ENUM[ 6] flags 0x0    info_sz  0 elem_sz  8
>         BTF_KIND_FWD[ 7] flags 0x0    info_sz  0 elem_sz  0
>     BTF_KIND_TYPEDEF[ 8] flags 0x0    info_sz  0 elem_sz  0
>    BTF_KIND_VOLATILE[ 9] flags 0x0    info_sz  0 elem_sz  0
>       BTF_KIND_CONST[10] flags 0x0    info_sz  0 elem_sz  0
>    BTF_KIND_RESTRICT[11] flags 0x0    info_sz  0 elem_sz  0
>        BTF_KIND_FUNC[12] flags 0x0    info_sz  0 elem_sz  0
>  BTF_KIND_FUNC_PROTO[13] flags 0x0    info_sz  0 elem_sz  8
>         BTF_KIND_VAR[14] flags 0x0    info_sz  4 elem_sz  0
>     BTF_KIND_DATASEC[15] flags 0x0    info_sz  0 elem_sz 12
>       BTF_KIND_FLOAT[16] flags 0x0    info_sz  0 elem_sz  0
>    BTF_KIND_DECL_TAG[17] flags 0x1    info_sz  4 elem_sz  0
>    BTF_KIND_TYPE_TAG[18] flags 0x1    info_sz  0 elem_sz  0
>      BTF_KIND_ENUM64[19] flags 0x0    info_sz  0 elem_sz 12

nit: looks very weird to be right aligned for the "BTF_KIND_xxx"
column, let's align it left here?

Also, btfdump ([0]) emits stats on per-kind basis, and I found it
quite useful on multiple occasions, do you think it would be
worthwhile to add that to bpftool as well. It looks like this in
btfdump's case:

BTF types
=======================================
Total        2293996 bytes (90517 types)
FuncProto:    731420 bytes (21455 types)
Struct:       625944 bytes (7575 types)
Func:         480876 bytes (40073 types)
Enum:         137652 bytes (1721 types)
Ptr:          132180 bytes (11015 types)
Array:         67440 bytes (2810 types)
Union:         57744 bytes (1348 types)
Const:         28140 bytes (2345 types)
Typedef:       20964 bytes (1747 types)
Var:            5024 bytes (314 types)
Datasec:        3780 bytes (1 types)
Enum64:         1500 bytes (7 types)
Fwd:             828 bytes (69 types)
Int:             240 bytes (15 types)
Volatile:        228 bytes (19 types)
Restrict:         24 bytes (2 types)
Float:            12 bytes (1 types)


  [0] https://github.com/anakryiko/btfdump

>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/bpf/bpftool/btf.c | 46 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
>
> diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
> index 91fcb75babe3..da4257e00ba8 100644
> --- a/tools/bpf/bpftool/btf.c
> +++ b/tools/bpf/bpftool/btf.c
> @@ -504,6 +504,47 @@ static int dump_btf_c(const struct btf *btf,
>         return err;
>  }
>

[...]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-02 16:32         ` Andrii Nakryiko
  2023-06-02 16:34           ` Andrii Nakryiko
@ 2023-06-02 18:11           ` Alexei Starovoitov
  2023-06-02 20:33             ` Andrii Nakryiko
  1 sibling, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2023-06-02 18:11 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Fri, Jun 2, 2023 at 9:32 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Jun 1, 2023 at 9:54 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jun 1, 2023 at 3:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > >
> > > On 01/06/2023 04:53, Alexei Starovoitov wrote:
> > > > On Wed, May 31, 2023 at 09:19:28PM +0100, Alan Maguire wrote:
> > > >> BTF kind metadata provides information to parse BTF kinds.
> > > >> By separating parsing BTF from using all the information
> > > >> it provides, we allow BTF to encode new features even if
> > > >> they cannot be used.  This is helpful in particular for
> > > >> cases where newer tools for BTF generation run on an
> > > >> older kernel; BTF kinds may be present that the kernel
> > > >> cannot yet use, but at least it can parse the BTF
> > > >> provided.  Meanwhile userspace tools with newer libbpf
> > > >> may be able to use the newer information.
> > > >>
> > > >> The intent is to support encoding of kind metadata
> > > >> optionally so that tools like pahole can add this
> > > >> information.  So for each kind we record
> > > >>
> > > >> - a kind name string
> > > >> - kind-related flags
> > > >> - length of singular element following struct btf_type
> > > >> - length of each of the btf_vlen() elements following
> > > >>
> > > >> In addition we make space in the metadata for
> > > >> CRC32s computed over the BTF along with a CRC for
> > > >> the base BTF; this allows split BTF to identify
> > > >> a mismatch explicitly.  Finally we provide an
> > > >> offset for an optional description string.
> > > >>
> > > >> The ideas here were discussed at [1] hence
> > > >>
> > > >> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> > > >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> > > >>
> > > >> [1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/
> > > >> ---
> > > >>  include/uapi/linux/btf.h       | 29 +++++++++++++++++++++++++++++
> > > >>  tools/include/uapi/linux/btf.h | 29 +++++++++++++++++++++++++++++
> > > >>  2 files changed, 58 insertions(+)
> > > >>
> > > >> diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
> > > >> index ec1798b6d3ff..94c1f4518249 100644
> > > >> --- a/include/uapi/linux/btf.h
> > > >> +++ b/include/uapi/linux/btf.h
> > > >> @@ -8,6 +8,34 @@
> > > >>  #define BTF_MAGIC   0xeB9F
> > > >>  #define BTF_VERSION 1
> > > >>
> > > >> +/* is this information required? If so it cannot be sanitized safely. */
> > > >> +#define BTF_KIND_META_OPTIONAL              (1 << 0)
>
> Another flag I was thinking about was a flag whether struct btf_type's
> type/size field is a type or a size (or something else). E.g., let's
> say we haven't had btf_type_tag yet and were adding it after we had
> this new metadata. We could say that type_tag's type/size field is
> actually a type ID, and generic tools like bpftool could basically
> skip type_tag and resolve to underlying type. This way, optional
> modifier/decorator KINDs won't even have to break applications using
> old libbpf's when it comes to calculating type sizes and resolving
> them.

+1

> > > >> +
> > > >> +struct btf_kind_meta {
> > > >> +    __u32 name_off;         /* kind name string offset */
>
> I'm not sure why we'd need to record this for every KIND? The tool
> that doesn't know about this new kind can't do much about it anyways,
> so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
> doesn't make much difference?

The name is certainly more meaningful than 123.
bpftool output is consumed by humans who will be able to tell the difference.
I'd keep the name here.

> > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
>
> Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
> existing kernels (and potentially many tools that parse BTF). Given we
> can actually extend BTF in backwards compatible way by just adding an
> optional two fields to btf_header + extra bytes for metadata sections,
> why making our lives harder by bumping this version?

I fail to see how bumping the version makes it harder.
libbpf needs to sanitize meta* fields in the struct btf_header on
older kernels anway. At the same time sanitizing the version from 2 to
1
in the same header is one extra line of code in libbpf.
What am I missing?

>
> > > > v2 -> self described.
> > >
> > > sure, sounds good. One other change perhaps worth making; currently
> > > we assume that the kind metadata is at the end of the struct
> > > btf_metadata, but if we ever wanted to add metadata fields in the
> > > future, we'd want so support both the current metadata structure and
> > > any future structure which had additional fields.
>
> see above, another reason to make metadata a separate section, in
> addition to types and strings
>
> > >
> > > With that in mind, it might make sense to go with something like
> > >
> > > struct btf_metadata {
> > >         __u32   kind_meta_cnt;
> > >         __u32   kind_meta_offset;       /* kind_meta_cnt instances of struct
> > > btf_kind_meta start here */
> > >         __u32   flags;
> > >         __u32   description_off;        /* optional description string*/
> > >         __u32   crc;                    /* crc32 of BTF */
> > >         __u32   base_crc;               /* crc32 of base BTF */
> > > };
> > >
> > > For the original version, kind_meta_offset would just be
> > > at meta_off + sizeof(struct btf_metadata), but if we had multiple
> > > versions of the btf_metadata header to handle, they could all rely on
> > > the kind_meta_offset being where kind metadata is stored.
> > > For validation we'd have to make sure kind_meta_offset was within
> > > the the metadata header range.
> >
> > kind_meta_offset is an ok idea, but I don't quite see why we'd have
> > multiple 'struct btf_metadata' pointing to the same set of 'struct
> > btf_kind_meta'.
> >
> > Also why do we need description_off ? Shouldn't string go into
> > btf_header->str_off ?
> >
> > > >
> > > >> +    __u32   flags;
> > > >> +    __u32   description_off;        /* optional description string */
> > > >> +    __u32   crc;                    /* crc32 of BTF */
> > > >> +    __u32   base_crc;               /* crc32 of base BTF */
> > > >
> > > > Hard coded CRC also gives me a pause.
> > > > Should it be an optional KIND like btf tags?
> > >
> > > The goal of the CRC is really just to provide a unique identifier that
> > > we can use for things like checking if there's a mismatch between
> > > base and module BTF. If we want to ever do CRC validation (not sure
> > > if there's a case for that) we probably need to think about cases like
> > > BTF sanitization of BPF program BTF; this would likely only be an
> > > issue if metadata support is added to BPF compilers.
> > >
> > > The problem with adding it via a kind is that if we first compute
> > > the CRC over the entire BTF object and then add the kind, the addition
> > > of the kind breaks the CRC; as a result I _think_ we're stuck with
> > > having to have it in the header.
> >
> > Hmm. libbpf can add BTF_KIND_CRC with zero-ed u32 crc field
> > and later fill it in.
> > It's really not different than u32 crc field inside 'struct btf_metadata'.
> >
> > > That said I don't think CRC is necessarily the only identifier
> > > we could use, and we don't even need to identify it as a
> > > CRC in the UAPI, just as a "unique identifier"; that would deal
> > > with issues about breaking the CRC during sanitization. All
> > > depends on whether we ever see a need to verify BTF via CRC
> > > really.
> >
> > Right. It could be sha or anything else, but user space and kernel
> > need to agree on the math to compute it, so something got to indicate
> > that this 32-bit is a crc.
> > Hence KIND_CRC, KIND_SHA fit better.
>
> what if instead of crc and base_src fields, we have
>
> __u32 id_str_off;
> __u32 base_id_str_off;
>
> and they are offsets into a string section. We can then define that
> those strings have to be something like "crc:<crc-value>" or
> "sha:<sha-value". This will be a generic ID, and extensible (and more
> easily extensible, probably), but won't require new KIND.

Encoding binary data in strings with \0 and other escape chars?
Ouch. Please no.
We can have variable size KIND_ID and encode crc vs sha in flags,
but binary data better stay binary.

> This also has a good property that 0 means "no ID", which helps with
> the base BTF case. Current "__u32 crc;" doesn't have this property and
> requires a flag.

imo this crc addition is a litmus test for this self-described format.
If we cannot encode it as a new KIND* it means this self-described
idea is broken.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-02 18:11           ` Alexei Starovoitov
@ 2023-06-02 20:33             ` Andrii Nakryiko
  2023-06-05 16:14               ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2023-06-02 20:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Fri, Jun 2, 2023 at 11:12 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jun 2, 2023 at 9:32 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Jun 1, 2023 at 9:54 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Jun 1, 2023 at 3:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > > >
> > > > On 01/06/2023 04:53, Alexei Starovoitov wrote:
> > > > > On Wed, May 31, 2023 at 09:19:28PM +0100, Alan Maguire wrote:
> > > > >> BTF kind metadata provides information to parse BTF kinds.
> > > > >> By separating parsing BTF from using all the information
> > > > >> it provides, we allow BTF to encode new features even if
> > > > >> they cannot be used.  This is helpful in particular for
> > > > >> cases where newer tools for BTF generation run on an
> > > > >> older kernel; BTF kinds may be present that the kernel
> > > > >> cannot yet use, but at least it can parse the BTF
> > > > >> provided.  Meanwhile userspace tools with newer libbpf
> > > > >> may be able to use the newer information.
> > > > >>
> > > > >> The intent is to support encoding of kind metadata
> > > > >> optionally so that tools like pahole can add this
> > > > >> information.  So for each kind we record
> > > > >>
> > > > >> - a kind name string
> > > > >> - kind-related flags
> > > > >> - length of singular element following struct btf_type
> > > > >> - length of each of the btf_vlen() elements following
> > > > >>
> > > > >> In addition we make space in the metadata for
> > > > >> CRC32s computed over the BTF along with a CRC for
> > > > >> the base BTF; this allows split BTF to identify
> > > > >> a mismatch explicitly.  Finally we provide an
> > > > >> offset for an optional description string.
> > > > >>
> > > > >> The ideas here were discussed at [1] hence
> > > > >>
> > > > >> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> > > > >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> > > > >>
> > > > >> [1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/
> > > > >> ---
> > > > >>  include/uapi/linux/btf.h       | 29 +++++++++++++++++++++++++++++
> > > > >>  tools/include/uapi/linux/btf.h | 29 +++++++++++++++++++++++++++++
> > > > >>  2 files changed, 58 insertions(+)
> > > > >>
> > > > >> diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
> > > > >> index ec1798b6d3ff..94c1f4518249 100644
> > > > >> --- a/include/uapi/linux/btf.h
> > > > >> +++ b/include/uapi/linux/btf.h
> > > > >> @@ -8,6 +8,34 @@
> > > > >>  #define BTF_MAGIC   0xeB9F
> > > > >>  #define BTF_VERSION 1
> > > > >>
> > > > >> +/* is this information required? If so it cannot be sanitized safely. */
> > > > >> +#define BTF_KIND_META_OPTIONAL              (1 << 0)
> >
> > Another flag I was thinking about was a flag whether struct btf_type's
> > type/size field is a type or a size (or something else). E.g., let's
> > say we haven't had btf_type_tag yet and were adding it after we had
> > this new metadata. We could say that type_tag's type/size field is
> > actually a type ID, and generic tools like bpftool could basically
> > skip type_tag and resolve to underlying type. This way, optional
> > modifier/decorator KINDs won't even have to break applications using
> > old libbpf's when it comes to calculating type sizes and resolving
> > them.
>
> +1
>
> > > > >> +
> > > > >> +struct btf_kind_meta {
> > > > >> +    __u32 name_off;         /* kind name string offset */
> >
> > I'm not sure why we'd need to record this for every KIND? The tool
> > that doesn't know about this new kind can't do much about it anyways,
> > so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
> > doesn't make much difference?
>
> The name is certainly more meaningful than 123.
> bpftool output is consumed by humans who will be able to tell the difference.
> I'd keep the name here.

Ok, it's fine. When I was originally proposing this compact metadata,
I was trying to make it as minimal as possible, so adding 80 bytes
just for string offset fields (plus a bunch of strings) felt like an
unnecessary overhead. But it's not a big deal.

>
> > > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
> >
> > Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
> > existing kernels (and potentially many tools that parse BTF). Given we
> > can actually extend BTF in backwards compatible way by just adding an
> > optional two fields to btf_header + extra bytes for metadata sections,
> > why making our lives harder by bumping this version?
>
> I fail to see how bumping the version makes it harder.
> libbpf needs to sanitize meta* fields in the struct btf_header on
> older kernels anway. At the same time sanitizing the version from 2 to
> 1
> in the same header is one extra line of code in libbpf.
> What am I missing?

So I checked libbpf code, and libbpf doesn't really check the version
field. So for the most part this BTF_VERSION bump wouldn't matter for
any tool that's based on libbpf's struct btf API. But if libbpf did
check version (as it probably should have), then by upgrading to newer
Clang that would emit BTF with this metadata (but no new fancy
BTF_KIND_KERNEL_FUNC or anything like that), we automatically make
such BTF incompatible with all those tools.

Kernel is a bit different because it's extremely strict about BTF. I'm
more worried about tools like bpftool (but we don't check BTF_VERSION
there due to libbpf), llvm-objdump (when it supports BTF), etc.

On the other hand, what do we gain by bumping this BTF_VERSION?

>
> >
> > > > > v2 -> self described.
> > > >
> > > > sure, sounds good. One other change perhaps worth making; currently
> > > > we assume that the kind metadata is at the end of the struct
> > > > btf_metadata, but if we ever wanted to add metadata fields in the
> > > > future, we'd want so support both the current metadata structure and
> > > > any future structure which had additional fields.
> >
> > see above, another reason to make metadata a separate section, in
> > addition to types and strings
> >
> > > >
> > > > With that in mind, it might make sense to go with something like
> > > >
> > > > struct btf_metadata {
> > > >         __u32   kind_meta_cnt;
> > > >         __u32   kind_meta_offset;       /* kind_meta_cnt instances of struct
> > > > btf_kind_meta start here */
> > > >         __u32   flags;
> > > >         __u32   description_off;        /* optional description string*/
> > > >         __u32   crc;                    /* crc32 of BTF */
> > > >         __u32   base_crc;               /* crc32 of base BTF */
> > > > };
> > > >
> > > > For the original version, kind_meta_offset would just be
> > > > at meta_off + sizeof(struct btf_metadata), but if we had multiple
> > > > versions of the btf_metadata header to handle, they could all rely on
> > > > the kind_meta_offset being where kind metadata is stored.
> > > > For validation we'd have to make sure kind_meta_offset was within
> > > > the the metadata header range.
> > >
> > > kind_meta_offset is an ok idea, but I don't quite see why we'd have
> > > multiple 'struct btf_metadata' pointing to the same set of 'struct
> > > btf_kind_meta'.
> > >
> > > Also why do we need description_off ? Shouldn't string go into
> > > btf_header->str_off ?
> > >
> > > > >
> > > > >> +    __u32   flags;
> > > > >> +    __u32   description_off;        /* optional description string */
> > > > >> +    __u32   crc;                    /* crc32 of BTF */
> > > > >> +    __u32   base_crc;               /* crc32 of base BTF */
> > > > >
> > > > > Hard coded CRC also gives me a pause.
> > > > > Should it be an optional KIND like btf tags?
> > > >
> > > > The goal of the CRC is really just to provide a unique identifier that
> > > > we can use for things like checking if there's a mismatch between
> > > > base and module BTF. If we want to ever do CRC validation (not sure
> > > > if there's a case for that) we probably need to think about cases like
> > > > BTF sanitization of BPF program BTF; this would likely only be an
> > > > issue if metadata support is added to BPF compilers.
> > > >
> > > > The problem with adding it via a kind is that if we first compute
> > > > the CRC over the entire BTF object and then add the kind, the addition
> > > > of the kind breaks the CRC; as a result I _think_ we're stuck with
> > > > having to have it in the header.
> > >
> > > Hmm. libbpf can add BTF_KIND_CRC with zero-ed u32 crc field
> > > and later fill it in.
> > > It's really not different than u32 crc field inside 'struct btf_metadata'.
> > >
> > > > That said I don't think CRC is necessarily the only identifier
> > > > we could use, and we don't even need to identify it as a
> > > > CRC in the UAPI, just as a "unique identifier"; that would deal
> > > > with issues about breaking the CRC during sanitization. All
> > > > depends on whether we ever see a need to verify BTF via CRC
> > > > really.
> > >
> > > Right. It could be sha or anything else, but user space and kernel
> > > need to agree on the math to compute it, so something got to indicate
> > > that this 32-bit is a crc.
> > > Hence KIND_CRC, KIND_SHA fit better.
> >
> > what if instead of crc and base_src fields, we have
> >
> > __u32 id_str_off;
> > __u32 base_id_str_off;
> >
> > and they are offsets into a string section. We can then define that
> > those strings have to be something like "crc:<crc-value>" or
> > "sha:<sha-value". This will be a generic ID, and extensible (and more
> > easily extensible, probably), but won't require new KIND.
>
> Encoding binary data in strings with \0 and other escape chars?
> Ouch. Please no.
> We can have variable size KIND_ID and encode crc vs sha in flags,
> but binary data better stay binary.

It's just a string representation of a hex dump of a byte array (or
u32 in crc32 case)? Only one of [0123456789abcdef], that's all. Just
like we have Git SHA string representation.

We don't have variable-sized KINDs, unless you are proposing to use
vlen as "number of bytes" of ID payload. And another KIND is
automatically breaking backwards compat for all existing tools. For no
good reason. This whole metadata is completely optional right now for
anything that's using libbpf for BTF parsing. But adding KIND_ID makes
it not optional at all.

Not sure what new KIND gives us in this case. This ID (whether it's
"crc:abcdef" or just "661866dbea52bfac7420cd35d0e502d4ccc11bb6" or
whatever) can be used by all application as is for comparison, you
don't need to understand how it is generated at all.

>
> > This also has a good property that 0 means "no ID", which helps with
> > the base BTF case. Current "__u32 crc;" doesn't have this property and
> > requires a flag.
>
> imo this crc addition is a litmus test for this self-described format.
> If we cannot encode it as a new KIND* it means this self-described
> idea is broken.

We can, but this can be a straightforward and simple *opaque* string
ID, so the new kind just seems unnecessary.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 2/8] libbpf: support handling of metadata section in BTF
  2023-05-31 20:19 ` [RFC bpf-next 2/8] libbpf: support handling of metadata section in BTF Alan Maguire
@ 2023-06-05 11:01   ` Jiri Olsa
  2023-06-05 21:40     ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Jiri Olsa @ 2023-06-05 11:01 UTC (permalink / raw)
  To: Alan Maguire
  Cc: ast, daniel, andrii, acme, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, quentin, mykolal, bpf

On Wed, May 31, 2023 at 09:19:29PM +0100, Alan Maguire wrote:
> support reading in metadata, fixing endian issues on reading;
> also support writing metadata section to raw BTF object.
> There is not yet an API to populate the metadata with meaningful
> information.
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/lib/bpf/btf.c | 141 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 111 insertions(+), 30 deletions(-)
> 
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 8484b563b53d..036dc1505969 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -16,6 +16,7 @@
>  #include <linux/err.h>
>  #include <linux/btf.h>
>  #include <gelf.h>
> +#include <zlib.h>
>  #include "btf.h"
>  #include "bpf.h"
>  #include "libbpf.h"
> @@ -39,36 +40,40 @@ struct btf {
>  
>  	/*
>  	 * When BTF is loaded from an ELF or raw memory it is stored
> -	 * in a contiguous memory block. The hdr, type_data, and, strs_data
> -	 * point inside that memory region to their respective parts of BTF
> -	 * representation:
> +	 * in a contiguous memory block. The hdr, type_data, strs_data,
> +	 * and optional meta_data point inside that memory region to their
> +	 * respective parts of BTF representation:
>  	 *
> -	 * +--------------------------------+
> -	 * |  Header  |  Types  |  Strings  |
> -	 * +--------------------------------+
> -	 * ^          ^         ^
> -	 * |          |         |
> -	 * hdr        |         |
> -	 * types_data-+         |
> -	 * strs_data------------+
> +	 * +--------------------------------+----------+
> +	 * |  Header  |  Types  |  Strings  | Metadata |
> +	 * +--------------------------------+----------+
> +	 * ^          ^         ^           ^
> +	 * |          |         |           |
> +	 * hdr        |         |           |
> +	 * types_data-+         |           |
> +	 * strs_data------------+           |
> +	 * meta_data------------------------+
> +	 *
> +	 * meta_data is optional.
>  	 *
>  	 * If BTF data is later modified, e.g., due to types added or
>  	 * removed, BTF deduplication performed, etc, this contiguous
> -	 * representation is broken up into three independently allocated
> -	 * memory regions to be able to modify them independently.
> +	 * representation is broken up into three or four independently
> +	 * allocated memory regions to be able to modify them independently.
>  	 * raw_data is nulled out at that point, but can be later allocated
>  	 * and cached again if user calls btf__raw_data(), at which point
> -	 * raw_data will contain a contiguous copy of header, types, and
> -	 * strings:
> +	 * raw_data will contain a contiguous copy of header, types, strings
> +	 * and (again optionally) metadata:
>  	 *
> -	 * +----------+  +---------+  +-----------+
> -	 * |  Header  |  |  Types  |  |  Strings  |
> -	 * +----------+  +---------+  +-----------+
> -	 * ^             ^            ^
> -	 * |             |            |
> -	 * hdr           |            |
> -	 * types_data----+            |
> -	 * strset__data(strs_set)-----+
> +	 * +----------+  +---------+  +-----------+  +----------+
> +	 * |  Header  |  |  Types  |  |  Strings  |  | Metadata |
> +	 * +----------+  +---------+  +-----------+  +---------_+
> +	 * ^             ^            ^              ^
> +	 * |             |            |              |
> +	 * hdr           |            |              |
> +	 * types_data----+            |              |
> +	 * strset__data(strs_set)-----+              |
> +	 * meta_data---------------------------------+
>  	 *
>  	 *               +----------+---------+-----------+
>  	 *               |  Header  |  Types  |  Strings  |
> @@ -116,6 +121,8 @@ struct btf {
>  	/* whether strings are already deduplicated */
>  	bool strs_deduped;
>  
> +	void *meta_data;
> +
>  	/* BTF object FD, if loaded into kernel */
>  	int fd;
>  
> @@ -215,6 +222,11 @@ static void btf_bswap_hdr(struct btf_header *h)
>  	h->type_len = bswap_32(h->type_len);
>  	h->str_off = bswap_32(h->str_off);
>  	h->str_len = bswap_32(h->str_len);
> +	if (h->hdr_len >= sizeof(struct btf_header)) {
> +		h->meta_header.meta_off = bswap_32(h->meta_header.meta_off);
> +		h->meta_header.meta_len = bswap_32(h->meta_header.meta_len);
> +	}
> +
>  }
>  
>  static int btf_parse_hdr(struct btf *btf)
> @@ -222,14 +234,17 @@ static int btf_parse_hdr(struct btf *btf)
>  	struct btf_header *hdr = btf->hdr;
>  	__u32 meta_left;
>  
> -	if (btf->raw_size < sizeof(struct btf_header)) {
> +	if (btf->raw_size < sizeof(struct btf_header) - sizeof(struct btf_meta_header)) {
>  		pr_debug("BTF header not found\n");
>  		return -EINVAL;
>  	}
>  
>  	if (hdr->magic == bswap_16(BTF_MAGIC)) {
> +		int swapped_len = bswap_32(hdr->hdr_len);
> +
>  		btf->swapped_endian = true;
> -		if (bswap_32(hdr->hdr_len) != sizeof(struct btf_header)) {
> +		if (swapped_len != sizeof(struct btf_header) &&
> +		    swapped_len != sizeof(struct btf_header) - sizeof(struct btf_meta_header)) {
>  			pr_warn("Can't load BTF with non-native endianness due to unsupported header length %u\n",
>  				bswap_32(hdr->hdr_len));
>  			return -ENOTSUP;
> @@ -285,6 +300,42 @@ static int btf_parse_str_sec(struct btf *btf)
>  	return 0;
>  }
>  
> +static void btf_bswap_meta(struct btf_metadata *meta, int len)
> +{
> +	struct btf_kind_meta *m = &meta->kind_meta[0];
> +	struct btf_kind_meta *end = (void *)meta + len;
> +
> +	meta->flags = bswap_32(meta->flags);
> +	meta->crc = bswap_32(meta->crc);
> +	meta->base_crc = bswap_32(meta->base_crc);
> +	meta->description_off = bswap_32(meta->description_off);
> +
> +	while (m < end) {
> +		m->name_off = bswap_32(m->name_off);
> +		m->flags = bswap_16(m->flags);
> +		m++;
> +	}
> +}
> +
> +static int btf_parse_meta_sec(struct btf *btf)
> +{
> +	const struct btf_header *hdr = btf->hdr;
> +
> +	if (hdr->hdr_len < sizeof(struct btf_header) ||
> +	    !hdr->meta_header.meta_off || !hdr->meta_header.meta_len)
> +		return 0;

I'm trying to figure out how is the meta data optional, and it seems to be
in here, right? but hdr->meta_header.meta_off or hdr->meta_header.meta_len
must be set NULL or zero

I'm getting crash when running btf test and it seems like correption when
parsing BTF generated from clang, which does not have this meta support

we do need clang support for this right? or bump version?

thanks,
jirka


> +	if (hdr->meta_header.meta_len < sizeof(struct btf_metadata)) {
> +		pr_debug("Invalid BTF metadata section\n");
> +		return -EINVAL;
> +	}
> +	btf->meta_data = btf->raw_data + btf->hdr->hdr_len + btf->hdr->meta_header.meta_off;
> +
> +	if (btf->swapped_endian)
> +		btf_bswap_meta(btf->meta_data, hdr->meta_header.meta_len);
> +
> +	return 0;
> +}
> +
>  static int btf_type_size(const struct btf_type *t)
>  {
>  	const int base_size = sizeof(struct btf_type);
> @@ -904,6 +955,7 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf)
>  	err = err ?: btf_parse_type_sec(btf);
>  	if (err)
>  		goto done;
> +	err = btf_parse_meta_sec(btf);
>  
>  done:
>  	if (err) {
> @@ -1267,6 +1319,11 @@ static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endi
>  	}
>  
>  	data_sz = hdr->hdr_len + hdr->type_len + hdr->str_len;
> +	if (btf->meta_data) {
> +		data_sz = roundup(data_sz, 8);
> +		data_sz += hdr->meta_header.meta_len;
> +		hdr->meta_header.meta_off = roundup(hdr->type_len + hdr->str_len, 8);
> +	}
>  	data = calloc(1, data_sz);
>  	if (!data)
>  		return NULL;
> @@ -1293,8 +1350,21 @@ static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endi
>  	p += hdr->type_len;
>  
>  	memcpy(p, btf_strs_data(btf), hdr->str_len);
> -	p += hdr->str_len;
> +	/* round up to 8 byte alignment to match offset above */
> +	p = data + hdr->hdr_len + roundup(hdr->type_len + hdr->str_len, 8);
> +
> +	if (btf->meta_data) {
> +		struct btf_metadata *meta = p;
>  
> +		memcpy(p, btf->meta_data, hdr->meta_header.meta_len);
> +		if (!swap_endian) {
> +			meta->crc = crc32(0L, (const Bytef *)&data, sizeof(data));
> +			meta->flags |= BTF_META_CRC_SET;
> +		}
> +		if (swap_endian)
> +			btf_bswap_meta(p, hdr->meta_header.meta_len);
> +		p += hdr->meta_header.meta_len;
> +	}
>  	*size = data_sz;
>  	return data;
>  err_out:
> @@ -1425,13 +1495,13 @@ static void btf_invalidate_raw_data(struct btf *btf)
>  	}
>  }
>  
> -/* Ensure BTF is ready to be modified (by splitting into a three memory
> - * regions for header, types, and strings). Also invalidate cached
> - * raw_data, if any.
> +/* Ensure BTF is ready to be modified (by splitting into a three or four memory
> + * regions for header, types, strings and optional metadata). Also invalidate
> + * cached raw_data, if any.
>   */
>  static int btf_ensure_modifiable(struct btf *btf)
>  {
> -	void *hdr, *types;
> +	void *hdr, *types, *meta = NULL;
>  	struct strset *set = NULL;
>  	int err = -ENOMEM;
>  
> @@ -1446,9 +1516,17 @@ static int btf_ensure_modifiable(struct btf *btf)
>  	types = malloc(btf->hdr->type_len);
>  	if (!hdr || !types)
>  		goto err_out;
> +	if (btf->hdr->hdr_len >= sizeof(struct btf_header)  &&
> +	    btf->hdr->meta_header.meta_off && btf->hdr->meta_header.meta_len) {
> +		meta = calloc(1, btf->hdr->meta_header.meta_len);
> +		if (!meta)
> +			goto err_out;
> +	}
>  
>  	memcpy(hdr, btf->hdr, btf->hdr->hdr_len);
>  	memcpy(types, btf->types_data, btf->hdr->type_len);
> +	if (meta)
> +		memcpy(meta, btf->meta_data, btf->hdr->meta_header.meta_len);
>  
>  	/* build lookup index for all strings */
>  	set = strset__new(BTF_MAX_STR_OFFSET, btf->strs_data, btf->hdr->str_len);
> @@ -1463,6 +1541,8 @@ static int btf_ensure_modifiable(struct btf *btf)
>  	btf->types_data_cap = btf->hdr->type_len;
>  	btf->strs_data = NULL;
>  	btf->strs_set = set;
> +	btf->meta_data = meta;
> +
>  	/* if BTF was created from scratch, all strings are guaranteed to be
>  	 * unique and deduplicated
>  	 */
> @@ -1480,6 +1560,7 @@ static int btf_ensure_modifiable(struct btf *btf)
>  	strset__free(set);
>  	free(hdr);
>  	free(types);
> +	free(meta);
>  	return err;
>  }
>  
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-02 20:33             ` Andrii Nakryiko
@ 2023-06-05 16:14               ` Alexei Starovoitov
  2023-06-05 22:38                 ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2023-06-05 16:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Fri, Jun 2, 2023 at 1:34 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> >
> > > > > >> +
> > > > > >> +struct btf_kind_meta {
> > > > > >> +    __u32 name_off;         /* kind name string offset */
> > >
> > > I'm not sure why we'd need to record this for every KIND? The tool
> > > that doesn't know about this new kind can't do much about it anyways,
> > > so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
> > > doesn't make much difference?
> >
> > The name is certainly more meaningful than 123.
> > bpftool output is consumed by humans who will be able to tell the difference.
> > I'd keep the name here.
>
> Ok, it's fine. When I was originally proposing this compact metadata,
> I was trying to make it as minimal as possible, so adding 80 bytes
> just for string offset fields (plus a bunch of strings) felt like an
> unnecessary overhead. But it's not a big deal.

Exactly. It's just 4 * num_kinds bytes in and ~20 * num_kinds for
names, but it implements 'self description'.
Otherwise the names become an external knowledge and BTF is not self described.


> >
> > > > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
> > >
> > > Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
> > > existing kernels (and potentially many tools that parse BTF). Given we
> > > can actually extend BTF in backwards compatible way by just adding an
> > > optional two fields to btf_header + extra bytes for metadata sections,
> > > why making our lives harder by bumping this version?
> >
> > I fail to see how bumping the version makes it harder.
> > libbpf needs to sanitize meta* fields in the struct btf_header on
> > older kernels anway. At the same time sanitizing the version from 2 to
> > 1
> > in the same header is one extra line of code in libbpf.
> > What am I missing?
>
> So I checked libbpf code, and libbpf doesn't really check the version
> field. So for the most part this BTF_VERSION bump wouldn't matter for
> any tool that's based on libbpf's struct btf API. But if libbpf did
> check version (as it probably should have), then by upgrading to newer
> Clang that would emit BTF with this metadata (but no new fancy
> BTF_KIND_KERNEL_FUNC or anything like that), we automatically make
> such BTF incompatible with all those tools.
>
> Kernel is a bit different because it's extremely strict about BTF. I'm
> more worried about tools like bpftool (but we don't check BTF_VERSION
> there due to libbpf), llvm-objdump (when it supports BTF), etc.
>
> On the other hand, what do we gain by bumping this BTF_VERSION?

The version bump will be an indication that
v2 of BTF has enough info in the format for any tool/kernel to consume it.
With v2 we should make BTF_KIND_META description mandatory.
If we keep it as v1 then the presence of BTF_KIND_META would be
an indication of 'self described' format.
Which is also ok-ish, but seems less clean.
zero vs not-zero of meta_off in btf_header is pretty much v1 vs v2.

>
> We don't have variable-sized KINDs, unless you are proposing to use
> vlen as "number of bytes" of ID payload.

Exactly. I'm proposing BTF_KIND_CHECKSUM and use vlen for size.

> And another KIND is
> automatically breaking backwards compat for all existing tools.

No. That's the whole point of 'self described'.
New kinds will not be breaking.

> For no
> good reason. This whole metadata is completely optional right now for
> anything that's using libbpf for BTF parsing. But adding KIND_ID makes
> it not optional at all.

and that's the crux of our disagreement.
If BTF_KIND_META are optional it's just a glorified comment inside BTF and
not a new 'self described' format.
If it's just a comment I'd rather not add it to BTF.
Such debug info can go to BTF.ext or don't do it at all.

The self described BTF would mean that struct btf_kind_meta-s contain
enough info for tools to parse from now on.
Imagine we didn't need CRC right now.
The self described format lands and now we want to add CRC.
If we're saying: "let's add a few hard coded fields to struct btf_header
or struct btf_metadata" then we failed.
It's not a self described format if we still need to extend hard coded
structs.
The idea of self description is that struct btf_kind_meta-s describe
absolutely everything about BTF from now on.
Meaning that not only things like ENUM64 and FUNC with addresses
become new KINDs, but crc-s and everything else is a new kind too,
because that's the only thing that btf_kind_meta-s can describe.

> Not sure what new KIND gives us in this case. This ID (whether it's
> "crc:abcdef" or just "661866dbea52bfac7420cd35d0e502d4ccc11bb6" or
> whatever) can be used by all application as is for comparison, you
> don't need to understand how it is generated at all.

That's fine. tools don't need to parse it.
With BTF_KIND_CHECKSUM the tools will just compare vlen sized binary data.

> >
> > > This also has a good property that 0 means "no ID", which helps with
> > > the base BTF case. Current "__u32 crc;" doesn't have this property and
> > > requires a flag.
> >
> > imo this crc addition is a litmus test for this self-described format.
> > If we cannot encode it as a new KIND* it means this self-described
> > idea is broken.
>
> We can, but this can be a straightforward and simple *opaque* string
> ID, so the new kind just seems unnecessary.

BTF_KIND_CHECKSUM can have a string in vlen part of it, but
it feels wrong to encode binary data as a string while everything else
in BTF is binary.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 2/8] libbpf: support handling of metadata section in BTF
  2023-06-05 11:01   ` Jiri Olsa
@ 2023-06-05 21:40     ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-06-05 21:40 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alan Maguire, ast, daniel, andrii, acme, martin.lau, song, yhs,
	john.fastabend, kpsingh, sdf, haoluo, quentin, mykolal, bpf

On Mon, Jun 5, 2023 at 4:01 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Wed, May 31, 2023 at 09:19:29PM +0100, Alan Maguire wrote:
> > support reading in metadata, fixing endian issues on reading;
> > also support writing metadata section to raw BTF object.
> > There is not yet an API to populate the metadata with meaningful
> > information.
> >
> > Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> > ---
> >  tools/lib/bpf/btf.c | 141 ++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 111 insertions(+), 30 deletions(-)
> >
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index 8484b563b53d..036dc1505969 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -16,6 +16,7 @@
> >  #include <linux/err.h>
> >  #include <linux/btf.h>
> >  #include <gelf.h>
> > +#include <zlib.h>
> >  #include "btf.h"
> >  #include "bpf.h"
> >  #include "libbpf.h"
> > @@ -39,36 +40,40 @@ struct btf {
> >
> >       /*
> >        * When BTF is loaded from an ELF or raw memory it is stored
> > -      * in a contiguous memory block. The hdr, type_data, and, strs_data
> > -      * point inside that memory region to their respective parts of BTF
> > -      * representation:
> > +      * in a contiguous memory block. The hdr, type_data, strs_data,
> > +      * and optional meta_data point inside that memory region to their
> > +      * respective parts of BTF representation:
> >        *
> > -      * +--------------------------------+
> > -      * |  Header  |  Types  |  Strings  |
> > -      * +--------------------------------+
> > -      * ^          ^         ^
> > -      * |          |         |
> > -      * hdr        |         |
> > -      * types_data-+         |
> > -      * strs_data------------+
> > +      * +--------------------------------+----------+
> > +      * |  Header  |  Types  |  Strings  | Metadata |
> > +      * +--------------------------------+----------+
> > +      * ^          ^         ^           ^
> > +      * |          |         |           |
> > +      * hdr        |         |           |
> > +      * types_data-+         |           |
> > +      * strs_data------------+           |
> > +      * meta_data------------------------+
> > +      *
> > +      * meta_data is optional.
> >        *
> >        * If BTF data is later modified, e.g., due to types added or
> >        * removed, BTF deduplication performed, etc, this contiguous
> > -      * representation is broken up into three independently allocated
> > -      * memory regions to be able to modify them independently.
> > +      * representation is broken up into three or four independently
> > +      * allocated memory regions to be able to modify them independently.
> >        * raw_data is nulled out at that point, but can be later allocated
> >        * and cached again if user calls btf__raw_data(), at which point
> > -      * raw_data will contain a contiguous copy of header, types, and
> > -      * strings:
> > +      * raw_data will contain a contiguous copy of header, types, strings
> > +      * and (again optionally) metadata:
> >        *
> > -      * +----------+  +---------+  +-----------+
> > -      * |  Header  |  |  Types  |  |  Strings  |
> > -      * +----------+  +---------+  +-----------+
> > -      * ^             ^            ^
> > -      * |             |            |
> > -      * hdr           |            |
> > -      * types_data----+            |
> > -      * strset__data(strs_set)-----+
> > +      * +----------+  +---------+  +-----------+  +----------+
> > +      * |  Header  |  |  Types  |  |  Strings  |  | Metadata |
> > +      * +----------+  +---------+  +-----------+  +---------_+
> > +      * ^             ^            ^              ^
> > +      * |             |            |              |
> > +      * hdr           |            |              |
> > +      * types_data----+            |              |
> > +      * strset__data(strs_set)-----+              |
> > +      * meta_data---------------------------------+
> >        *
> >        *               +----------+---------+-----------+
> >        *               |  Header  |  Types  |  Strings  |
> > @@ -116,6 +121,8 @@ struct btf {
> >       /* whether strings are already deduplicated */
> >       bool strs_deduped;
> >
> > +     void *meta_data;
> > +
> >       /* BTF object FD, if loaded into kernel */
> >       int fd;
> >
> > @@ -215,6 +222,11 @@ static void btf_bswap_hdr(struct btf_header *h)
> >       h->type_len = bswap_32(h->type_len);
> >       h->str_off = bswap_32(h->str_off);
> >       h->str_len = bswap_32(h->str_len);
> > +     if (h->hdr_len >= sizeof(struct btf_header)) {
> > +             h->meta_header.meta_off = bswap_32(h->meta_header.meta_off);
> > +             h->meta_header.meta_len = bswap_32(h->meta_header.meta_len);
> > +     }
> > +
> >  }
> >
> >  static int btf_parse_hdr(struct btf *btf)
> > @@ -222,14 +234,17 @@ static int btf_parse_hdr(struct btf *btf)
> >       struct btf_header *hdr = btf->hdr;
> >       __u32 meta_left;
> >
> > -     if (btf->raw_size < sizeof(struct btf_header)) {
> > +     if (btf->raw_size < sizeof(struct btf_header) - sizeof(struct btf_meta_header)) {
> >               pr_debug("BTF header not found\n");
> >               return -EINVAL;
> >       }
> >
> >       if (hdr->magic == bswap_16(BTF_MAGIC)) {
> > +             int swapped_len = bswap_32(hdr->hdr_len);
> > +
> >               btf->swapped_endian = true;
> > -             if (bswap_32(hdr->hdr_len) != sizeof(struct btf_header)) {
> > +             if (swapped_len != sizeof(struct btf_header) &&
> > +                 swapped_len != sizeof(struct btf_header) - sizeof(struct btf_meta_header)) {
> >                       pr_warn("Can't load BTF with non-native endianness due to unsupported header length %u\n",
> >                               bswap_32(hdr->hdr_len));
> >                       return -ENOTSUP;
> > @@ -285,6 +300,42 @@ static int btf_parse_str_sec(struct btf *btf)
> >       return 0;
> >  }
> >
> > +static void btf_bswap_meta(struct btf_metadata *meta, int len)
> > +{
> > +     struct btf_kind_meta *m = &meta->kind_meta[0];
> > +     struct btf_kind_meta *end = (void *)meta + len;
> > +
> > +     meta->flags = bswap_32(meta->flags);
> > +     meta->crc = bswap_32(meta->crc);
> > +     meta->base_crc = bswap_32(meta->base_crc);
> > +     meta->description_off = bswap_32(meta->description_off);
> > +
> > +     while (m < end) {
> > +             m->name_off = bswap_32(m->name_off);
> > +             m->flags = bswap_16(m->flags);
> > +             m++;
> > +     }
> > +}
> > +
> > +static int btf_parse_meta_sec(struct btf *btf)
> > +{
> > +     const struct btf_header *hdr = btf->hdr;
> > +
> > +     if (hdr->hdr_len < sizeof(struct btf_header) ||
> > +         !hdr->meta_header.meta_off || !hdr->meta_header.meta_len)
> > +             return 0;
>
> I'm trying to figure out how is the meta data optional, and it seems to be
> in here, right? but hdr->meta_header.meta_off or hdr->meta_header.meta_len
> must be set NULL or zero
>
> I'm getting crash when running btf test and it seems like correption when
> parsing BTF generated from clang, which does not have this meta support
>
> we do need clang support for this right? or bump version?

libbpf doesn't enforce that btf_header is all zeroes after last known
field (which until now was str_len). So the whole design of adding
fields to btf_header is backwards compatible from libbpf standpoint.

From the implementation standpoint, though, we should make our lives
simpler by not using btf->raw_data for btf_header, and instead do what
we do in kernel with bpf_attr. That is, have a local zero-initialized
copy, and overwrite it with data found in btf.


>
> thanks,
> jirka
>
>
> > +     if (hdr->meta_header.meta_len < sizeof(struct btf_metadata)) {
> > +             pr_debug("Invalid BTF metadata section\n");
> > +             return -EINVAL;
> > +     }
> > +     btf->meta_data = btf->raw_data + btf->hdr->hdr_len + btf->hdr->meta_header.meta_off;
> > +
> > +     if (btf->swapped_endian)
> > +             btf_bswap_meta(btf->meta_data, hdr->meta_header.meta_len);
> > +
> > +     return 0;
> > +}
> > +
> >  static int btf_type_size(const struct btf_type *t)
> >  {
> >       const int base_size = sizeof(struct btf_type);
> > @@ -904,6 +955,7 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf)
> >       err = err ?: btf_parse_type_sec(btf);
> >       if (err)
> >               goto done;
> > +     err = btf_parse_meta_sec(btf);
> >
> >  done:
> >       if (err) {
> > @@ -1267,6 +1319,11 @@ static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endi
> >       }
> >
> >       data_sz = hdr->hdr_len + hdr->type_len + hdr->str_len;
> > +     if (btf->meta_data) {
> > +             data_sz = roundup(data_sz, 8);
> > +             data_sz += hdr->meta_header.meta_len;
> > +             hdr->meta_header.meta_off = roundup(hdr->type_len + hdr->str_len, 8);
> > +     }
> >       data = calloc(1, data_sz);
> >       if (!data)
> >               return NULL;
> > @@ -1293,8 +1350,21 @@ static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endi
> >       p += hdr->type_len;
> >
> >       memcpy(p, btf_strs_data(btf), hdr->str_len);
> > -     p += hdr->str_len;
> > +     /* round up to 8 byte alignment to match offset above */
> > +     p = data + hdr->hdr_len + roundup(hdr->type_len + hdr->str_len, 8);
> > +
> > +     if (btf->meta_data) {
> > +             struct btf_metadata *meta = p;
> >
> > +             memcpy(p, btf->meta_data, hdr->meta_header.meta_len);
> > +             if (!swap_endian) {
> > +                     meta->crc = crc32(0L, (const Bytef *)&data, sizeof(data));
> > +                     meta->flags |= BTF_META_CRC_SET;
> > +             }
> > +             if (swap_endian)
> > +                     btf_bswap_meta(p, hdr->meta_header.meta_len);
> > +             p += hdr->meta_header.meta_len;
> > +     }
> >       *size = data_sz;
> >       return data;
> >  err_out:
> > @@ -1425,13 +1495,13 @@ static void btf_invalidate_raw_data(struct btf *btf)
> >       }
> >  }
> >
> > -/* Ensure BTF is ready to be modified (by splitting into a three memory
> > - * regions for header, types, and strings). Also invalidate cached
> > - * raw_data, if any.
> > +/* Ensure BTF is ready to be modified (by splitting into a three or four memory
> > + * regions for header, types, strings and optional metadata). Also invalidate
> > + * cached raw_data, if any.
> >   */
> >  static int btf_ensure_modifiable(struct btf *btf)
> >  {
> > -     void *hdr, *types;
> > +     void *hdr, *types, *meta = NULL;
> >       struct strset *set = NULL;
> >       int err = -ENOMEM;
> >
> > @@ -1446,9 +1516,17 @@ static int btf_ensure_modifiable(struct btf *btf)
> >       types = malloc(btf->hdr->type_len);
> >       if (!hdr || !types)
> >               goto err_out;
> > +     if (btf->hdr->hdr_len >= sizeof(struct btf_header)  &&
> > +         btf->hdr->meta_header.meta_off && btf->hdr->meta_header.meta_len) {
> > +             meta = calloc(1, btf->hdr->meta_header.meta_len);
> > +             if (!meta)
> > +                     goto err_out;
> > +     }
> >
> >       memcpy(hdr, btf->hdr, btf->hdr->hdr_len);
> >       memcpy(types, btf->types_data, btf->hdr->type_len);
> > +     if (meta)
> > +             memcpy(meta, btf->meta_data, btf->hdr->meta_header.meta_len);
> >
> >       /* build lookup index for all strings */
> >       set = strset__new(BTF_MAX_STR_OFFSET, btf->strs_data, btf->hdr->str_len);
> > @@ -1463,6 +1541,8 @@ static int btf_ensure_modifiable(struct btf *btf)
> >       btf->types_data_cap = btf->hdr->type_len;
> >       btf->strs_data = NULL;
> >       btf->strs_set = set;
> > +     btf->meta_data = meta;
> > +
> >       /* if BTF was created from scratch, all strings are guaranteed to be
> >        * unique and deduplicated
> >        */
> > @@ -1480,6 +1560,7 @@ static int btf_ensure_modifiable(struct btf *btf)
> >       strset__free(set);
> >       free(hdr);
> >       free(types);
> > +     free(meta);
> >       return err;
> >  }
> >
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-05 16:14               ` Alexei Starovoitov
@ 2023-06-05 22:38                 ` Andrii Nakryiko
  2023-06-06  2:46                   ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2023-06-05 22:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Mon, Jun 5, 2023 at 9:15 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jun 2, 2023 at 1:34 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > >
> > > > > > >> +
> > > > > > >> +struct btf_kind_meta {
> > > > > > >> +    __u32 name_off;         /* kind name string offset */
> > > >
> > > > I'm not sure why we'd need to record this for every KIND? The tool
> > > > that doesn't know about this new kind can't do much about it anyways,
> > > > so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
> > > > doesn't make much difference?
> > >
> > > The name is certainly more meaningful than 123.
> > > bpftool output is consumed by humans who will be able to tell the difference.
> > > I'd keep the name here.
> >
> > Ok, it's fine. When I was originally proposing this compact metadata,
> > I was trying to make it as minimal as possible, so adding 80 bytes
> > just for string offset fields (plus a bunch of strings) felt like an
> > unnecessary overhead. But it's not a big deal.
>
> Exactly. It's just 4 * num_kinds bytes in and ~20 * num_kinds for
> names, but it implements 'self description'.
> Otherwise the names become an external knowledge and BTF is not self described.
>
>
> > >
> > > > > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
> > > >
> > > > Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
> > > > existing kernels (and potentially many tools that parse BTF). Given we
> > > > can actually extend BTF in backwards compatible way by just adding an
> > > > optional two fields to btf_header + extra bytes for metadata sections,
> > > > why making our lives harder by bumping this version?
> > >
> > > I fail to see how bumping the version makes it harder.
> > > libbpf needs to sanitize meta* fields in the struct btf_header on
> > > older kernels anway. At the same time sanitizing the version from 2 to
> > > 1
> > > in the same header is one extra line of code in libbpf.
> > > What am I missing?
> >
> > So I checked libbpf code, and libbpf doesn't really check the version
> > field. So for the most part this BTF_VERSION bump wouldn't matter for
> > any tool that's based on libbpf's struct btf API. But if libbpf did
> > check version (as it probably should have), then by upgrading to newer
> > Clang that would emit BTF with this metadata (but no new fancy
> > BTF_KIND_KERNEL_FUNC or anything like that), we automatically make
> > such BTF incompatible with all those tools.
> >
> > Kernel is a bit different because it's extremely strict about BTF. I'm
> > more worried about tools like bpftool (but we don't check BTF_VERSION
> > there due to libbpf), llvm-objdump (when it supports BTF), etc.
> >
> > On the other hand, what do we gain by bumping this BTF_VERSION?
>
> The version bump will be an indication that
> v2 of BTF has enough info in the format for any tool/kernel to consume it.
> With v2 we should make BTF_KIND_META description mandatory.
> If we keep it as v1 then the presence of BTF_KIND_META would be
> an indication of 'self described' format.
> Which is also ok-ish, but seems less clean.
> zero vs not-zero of meta_off in btf_header is pretty much v1 vs v2.
>

We had a long offline discussion w/ Alexei about this whole
self-describing BTF, and I will try to summarize it here a bit,
because I think we both think about "self-describing" differently, and
as a result few different aspects are conflated with each other (there
are at least 3(!) different things here).

From my perspective, this self-describing BTF metadata was purely
designed to allow tools without latest BTF knowledge to be able to
skip over unknown BTF_KIND_xxx, at most being able to tell whether
it's critical for understanding BTF (that's the OPTIONAL flag) or not.
I.e., with older bpftool (but the one that knows about btf_metadata,
of course), it would still be possible to `bpftool btf dump file
<file-with-newer-btf-kinds>` just fine, except for new KINDS (which
would be just emitted as "unknown BTF_KIND_XXX, skipping...".

I think this problem is solved with this fixed + per-vlen sz and those
few extra flags. No one seems to deny this.

Now, crc/id and self-describing BTF in Alexei's sense. We can merge
them together, or we can skip them separately. Let's start with
Alexei's self-describing BTF in general.

His view is that we need some sort of way to keep adding new
information about BTF itself in a way that will be parsable by older
tools, and at the very least printable by those tools so that humans
can get this information. Sort of like a generic JSON-like format, but
encoded in BTF. As one of the example of using that might be `"crc" :
123123123` which would describe BTF's checksum.

Alexei's proposal is to add new BTF_KIND_xxx for each such new piece
of information, and given part #1 (information how to skip over this
new KIND), it would be an extensible mechanism. And while I agree that
it's one way to do this, I don't see how a generic tool like bpftool
can make any of that really printable, short of just hex-dumping an
entire new KIND contents, which doesn't seem to be very
human-readable.

But. If we think we need something like this generic metadata format,
I propose we add *one* new kind to describe it. Let's call it the
BTF_KIND_METADATA (or whatever) kind. It will/can have a name
(btf_type.name_off), and it has vlen items. Each item can be described
as:

struct bpf_metadata_item {
    __u32 key_off; /* key string */
    union {
        __u32 value_off; /* value string */
        __u32 value_type_id; /* type ID of a value of this item */
    }
    __u32 flags;
};


Each metadata_item represents one key:value pair, where key is always
described with a string ("id", "base_id", "something_fancy", etc), and
value can be interpreted based on flags. If one flag is set (e.g.,
BTF_METADATA_STR), it's a string offset, if, say, BTF_METADATA_TYPE,
then value_type_id points to another KIND describing value (it could
be another BTF_KIND_METADATA to create nested structures, if
necessary). We can reuse value_off/value_type_id also for 4-byte
integer value and whatever else, all based on flags. We can anticipate
arrays, if we want to. The point is not to completely design it here,
though.

Such approach will allow us to teach all tools about this data
structure once and just keep adding new items and codifying their key
names and semantics of the value. And we won't have to change bpftool
and such just to teach about new "some_fancy_key2" item.

With that, I'd argue we should also add new `__u32 metadata_id;` field
to `struct btf_metadata_header` to directly point to such root
type/kind, instead of doing linear search (which is especially
problematic with nested JSON-like structure, if we choose to do it).

This should satisfy Alexei's desire to have self-describing BTF in his
"self-describing" meaning. We'll get a generic bag of key:value pairs,
including nested objects.


Now, third. Regarding CRC, ID, etc. While the above metadata can
describe that, and we can just codify that, say "id" key is BTF's own
checksum/id, and "base_id" is checksum/id of expected base BTF. I
think the concept of ID is simple enough and important enough for
modules BTF and base/split BTF itself that it justifies having
dedicated two fields in btf_metadata_header for it, without
indirection of extra new KINDs.

As for binary vs string. I think most people don't think about GIT SHA
as array of bytes, and rather it's just a unique string, because we
work with them as strings in everyday life. So I think similarly here
it would be totally fine to just say that ID is a string (and we can
just hard-code SHA as an algorithm, or use this "algo:value" format of
a string), and keep things simple and straightforward for ID and base
ID.

Similarly, I think the whole bumping of V2 and adding new kinds just
for ID is just adding more complications for existing tooling, while
we can easily make new BTF be still consumable by old bpftool and
other tools. All that because libbpf is ignoring stuff in btf_header
past str_len field, so even if pahole or Clang start emitting metadata
describing btf_type byte size (fixed + per-vlen sizes), all that will
be backwards compatible with existing versions of tools.

The beauty of such an approach is that we can teach Clang to emit this
size metadata + ID today without breaking any of the existing
libbpf-based tooling (and I suspect other tooling dealing with BTF as
well, unless they chose to be as strict as kernel with unknown BTF
header fields). And libbpf will sanitize such BTF for old kernels
automatically like it does today with various KINDs that old kernel
don't understand. It will be a no-brainer to enable this
functionality, we won't even need a new option for Clang or pahole.
And then when we need to add yet more generic metadata, we can start
utilizing BTF_KIND_METADATA.

Sorry for the wall of text. At least I think I also addressed all the
below comments and won't write more text. :)


Thoughts?


> >
> > We don't have variable-sized KINDs, unless you are proposing to use
> > vlen as "number of bytes" of ID payload.
>
> Exactly. I'm proposing BTF_KIND_CHECKSUM and use vlen for size.
>
> > And another KIND is
> > automatically breaking backwards compat for all existing tools.
>
> No. That's the whole point of 'self described'.
> New kinds will not be breaking.
>
> > For no
> > good reason. This whole metadata is completely optional right now for
> > anything that's using libbpf for BTF parsing. But adding KIND_ID makes
> > it not optional at all.
>
> and that's the crux of our disagreement.
> If BTF_KIND_META are optional it's just a glorified comment inside BTF and
> not a new 'self described' format.
> If it's just a comment I'd rather not add it to BTF.
> Such debug info can go to BTF.ext or don't do it at all.
>
> The self described BTF would mean that struct btf_kind_meta-s contain
> enough info for tools to parse from now on.
> Imagine we didn't need CRC right now.
> The self described format lands and now we want to add CRC.
> If we're saying: "let's add a few hard coded fields to struct btf_header
> or struct btf_metadata" then we failed.
> It's not a self described format if we still need to extend hard coded
> structs.
> The idea of self description is that struct btf_kind_meta-s describe
> absolutely everything about BTF from now on.
> Meaning that not only things like ENUM64 and FUNC with addresses
> become new KINDs, but crc-s and everything else is a new kind too,
> because that's the only thing that btf_kind_meta-s can describe.
>
> > Not sure what new KIND gives us in this case. This ID (whether it's
> > "crc:abcdef" or just "661866dbea52bfac7420cd35d0e502d4ccc11bb6" or
> > whatever) can be used by all application as is for comparison, you
> > don't need to understand how it is generated at all.
>
> That's fine. tools don't need to parse it.
> With BTF_KIND_CHECKSUM the tools will just compare vlen sized binary data.
>
> > >
> > > > This also has a good property that 0 means "no ID", which helps with
> > > > the base BTF case. Current "__u32 crc;" doesn't have this property and
> > > > requires a flag.
> > >
> > > imo this crc addition is a litmus test for this self-described format.
> > > If we cannot encode it as a new KIND* it means this self-described
> > > idea is broken.
> >
> > We can, but this can be a straightforward and simple *opaque* string
> > ID, so the new kind just seems unnecessary.
>
> BTF_KIND_CHECKSUM can have a string in vlen part of it, but
> it feels wrong to encode binary data as a string while everything else
> in BTF is binary.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-05 22:38                 ` Andrii Nakryiko
@ 2023-06-06  2:46                   ` Alexei Starovoitov
  2023-06-06 11:30                     ` Toke Høiland-Jørgensen
  2023-06-06 16:50                     ` Andrii Nakryiko
  0 siblings, 2 replies; 35+ messages in thread
From: Alexei Starovoitov @ 2023-06-06  2:46 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Mon, Jun 5, 2023 at 3:38 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jun 5, 2023 at 9:15 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Jun 2, 2023 at 1:34 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > >
> > > > > > > >> +
> > > > > > > >> +struct btf_kind_meta {
> > > > > > > >> +    __u32 name_off;         /* kind name string offset */
> > > > >
> > > > > I'm not sure why we'd need to record this for every KIND? The tool
> > > > > that doesn't know about this new kind can't do much about it anyways,
> > > > > so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
> > > > > doesn't make much difference?
> > > >
> > > > The name is certainly more meaningful than 123.
> > > > bpftool output is consumed by humans who will be able to tell the difference.
> > > > I'd keep the name here.
> > >
> > > Ok, it's fine. When I was originally proposing this compact metadata,
> > > I was trying to make it as minimal as possible, so adding 80 bytes
> > > just for string offset fields (plus a bunch of strings) felt like an
> > > unnecessary overhead. But it's not a big deal.
> >
> > Exactly. It's just 4 * num_kinds bytes in and ~20 * num_kinds for
> > names, but it implements 'self description'.
> > Otherwise the names become an external knowledge and BTF is not self described.
> >
> >
> > > >
> > > > > > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
> > > > >
> > > > > Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
> > > > > existing kernels (and potentially many tools that parse BTF). Given we
> > > > > can actually extend BTF in backwards compatible way by just adding an
> > > > > optional two fields to btf_header + extra bytes for metadata sections,
> > > > > why making our lives harder by bumping this version?
> > > >
> > > > I fail to see how bumping the version makes it harder.
> > > > libbpf needs to sanitize meta* fields in the struct btf_header on
> > > > older kernels anway. At the same time sanitizing the version from 2 to
> > > > 1
> > > > in the same header is one extra line of code in libbpf.
> > > > What am I missing?
> > >
> > > So I checked libbpf code, and libbpf doesn't really check the version
> > > field. So for the most part this BTF_VERSION bump wouldn't matter for
> > > any tool that's based on libbpf's struct btf API. But if libbpf did
> > > check version (as it probably should have), then by upgrading to newer
> > > Clang that would emit BTF with this metadata (but no new fancy
> > > BTF_KIND_KERNEL_FUNC or anything like that), we automatically make
> > > such BTF incompatible with all those tools.
> > >
> > > Kernel is a bit different because it's extremely strict about BTF. I'm
> > > more worried about tools like bpftool (but we don't check BTF_VERSION
> > > there due to libbpf), llvm-objdump (when it supports BTF), etc.
> > >
> > > On the other hand, what do we gain by bumping this BTF_VERSION?
> >
> > The version bump will be an indication that
> > v2 of BTF has enough info in the format for any tool/kernel to consume it.
> > With v2 we should make BTF_KIND_META description mandatory.
> > If we keep it as v1 then the presence of BTF_KIND_META would be
> > an indication of 'self described' format.
> > Which is also ok-ish, but seems less clean.
> > zero vs not-zero of meta_off in btf_header is pretty much v1 vs v2.
> >
>
> We had a long offline discussion w/ Alexei about this whole
> self-describing BTF, and I will try to summarize it here a bit,
> because I think we both think about "self-describing" differently, and
> as a result few different aspects are conflated with each other (there
> are at least 3(!) different things here).

Thanks for summarizing. All correct.

> From my perspective, this self-describing BTF metadata was purely
> designed to allow tools without latest BTF knowledge to be able to
> skip over unknown BTF_KIND_xxx, at most being able to tell whether
> it's critical for understanding BTF (that's the OPTIONAL flag) or not.
> I.e., with older bpftool (but the one that knows about btf_metadata,
> of course), it would still be possible to `bpftool btf dump file
> <file-with-newer-btf-kinds>` just fine, except for new KINDS (which
> would be just emitted as "unknown BTF_KIND_XXX, skipping...".
>
> I think this problem is solved with this fixed + per-vlen sz and those
> few extra flags.

I'm fine with this approach as long as we don't fool ourselves that
we're doing a "self described" format.
We have a "size" field in btf_header. With this btf_metadata extension
we're effectively adding "size" fields for each btf kind and its vlen part.
So what Alan proposed:
+struct btf_kind_meta {
+       __u16 flags;            /* see BTF_KIND_META_* values above */
+       __u8 info_sz;           /* size of singular element after btf_type */
+       __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
+};

_without_ name_off it makes the most sense.

As soon as we're trying to add 'name_off' to the kind we're falling into
the trap of thinking that we're adding "self described" format and
btf_kind_meta needs to actually describe it for printing (with
real name and not just integer id) and further trying to describe
semantics of unknown kind with another flag that Andrii's proposed:
"Another flag I was thinking about was a flag whether struct btf_type's
type/size field is a type or a size (or something else)."

imo name_off and that other flag in addition to optional_or_not flag
are carrying the concept too far.

We should just say upfront that this "struct btf_kind_meta" is to be
able to extend BTF easier and nothing else.
"old" bpftool will be able to skip unknown kinds, but dedup
probably won't be able to skip much anyway.

I'd also call it "struct btf_kind_description|layout|sizes"
to narrow the scope.
This BTF extension is not going to describe semantics of unknown kinds.
Instead of "best effort" attempts with flags like "what type/size means"
let's not even go there.

If we go this simple route I'm fine with hard coded crc and base_crc
fields. They probably should go to btf_header though.
We don't need "struct btf_metadata" as well.
It's making things sound beyond what it actually is.
btf_header can point to an array of struct btf_kind_description.
As simple as it can get.
No need for json like format and key/value things either.
We're not creating a self described BTF format.
We're just adding a few size fields.
The kernel/libbpf/dedup still needs to known semantics of future kinds
to be able to print/operate on them.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-06  2:46                   ` Alexei Starovoitov
@ 2023-06-06 11:30                     ` Toke Høiland-Jørgensen
  2023-06-07 11:55                       ` Eduard Zingerman
  2023-06-06 16:50                     ` Andrii Nakryiko
  1 sibling, 1 reply; 35+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-06-06 11:30 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Jun 5, 2023 at 3:38 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
>>
>> On Mon, Jun 5, 2023 at 9:15 AM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> >
>> > On Fri, Jun 2, 2023 at 1:34 PM Andrii Nakryiko
>> > <andrii.nakryiko@gmail.com> wrote:
>> > >
>> > > >
>> > > > > > > >> +
>> > > > > > > >> +struct btf_kind_meta {
>> > > > > > > >> +    __u32 name_off;         /* kind name string offset */
>> > > > >
>> > > > > I'm not sure why we'd need to record this for every KIND? The tool
>> > > > > that doesn't know about this new kind can't do much about it anyways,
>> > > > > so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
>> > > > > doesn't make much difference?
>> > > >
>> > > > The name is certainly more meaningful than 123.
>> > > > bpftool output is consumed by humans who will be able to tell the difference.
>> > > > I'd keep the name here.
>> > >
>> > > Ok, it's fine. When I was originally proposing this compact metadata,
>> > > I was trying to make it as minimal as possible, so adding 80 bytes
>> > > just for string offset fields (plus a bunch of strings) felt like an
>> > > unnecessary overhead. But it's not a big deal.
>> >
>> > Exactly. It's just 4 * num_kinds bytes in and ~20 * num_kinds for
>> > names, but it implements 'self description'.
>> > Otherwise the names become an external knowledge and BTF is not self described.
>> >
>> >
>> > > >
>> > > > > > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
>> > > > >
>> > > > > Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
>> > > > > existing kernels (and potentially many tools that parse BTF). Given we
>> > > > > can actually extend BTF in backwards compatible way by just adding an
>> > > > > optional two fields to btf_header + extra bytes for metadata sections,
>> > > > > why making our lives harder by bumping this version?
>> > > >
>> > > > I fail to see how bumping the version makes it harder.
>> > > > libbpf needs to sanitize meta* fields in the struct btf_header on
>> > > > older kernels anway. At the same time sanitizing the version from 2 to
>> > > > 1
>> > > > in the same header is one extra line of code in libbpf.
>> > > > What am I missing?
>> > >
>> > > So I checked libbpf code, and libbpf doesn't really check the version
>> > > field. So for the most part this BTF_VERSION bump wouldn't matter for
>> > > any tool that's based on libbpf's struct btf API. But if libbpf did
>> > > check version (as it probably should have), then by upgrading to newer
>> > > Clang that would emit BTF with this metadata (but no new fancy
>> > > BTF_KIND_KERNEL_FUNC or anything like that), we automatically make
>> > > such BTF incompatible with all those tools.
>> > >
>> > > Kernel is a bit different because it's extremely strict about BTF. I'm
>> > > more worried about tools like bpftool (but we don't check BTF_VERSION
>> > > there due to libbpf), llvm-objdump (when it supports BTF), etc.
>> > >
>> > > On the other hand, what do we gain by bumping this BTF_VERSION?
>> >
>> > The version bump will be an indication that
>> > v2 of BTF has enough info in the format for any tool/kernel to consume it.
>> > With v2 we should make BTF_KIND_META description mandatory.
>> > If we keep it as v1 then the presence of BTF_KIND_META would be
>> > an indication of 'self described' format.
>> > Which is also ok-ish, but seems less clean.
>> > zero vs not-zero of meta_off in btf_header is pretty much v1 vs v2.
>> >
>>
>> We had a long offline discussion w/ Alexei about this whole
>> self-describing BTF, and I will try to summarize it here a bit,
>> because I think we both think about "self-describing" differently, and
>> as a result few different aspects are conflated with each other (there
>> are at least 3(!) different things here).
>
> Thanks for summarizing. All correct.
>
>> From my perspective, this self-describing BTF metadata was purely
>> designed to allow tools without latest BTF knowledge to be able to
>> skip over unknown BTF_KIND_xxx, at most being able to tell whether
>> it's critical for understanding BTF (that's the OPTIONAL flag) or not.
>> I.e., with older bpftool (but the one that knows about btf_metadata,
>> of course), it would still be possible to `bpftool btf dump file
>> <file-with-newer-btf-kinds>` just fine, except for new KINDS (which
>> would be just emitted as "unknown BTF_KIND_XXX, skipping...".
>>
>> I think this problem is solved with this fixed + per-vlen sz and those
>> few extra flags.
>
> I'm fine with this approach as long as we don't fool ourselves that
> we're doing a "self described" format.
> We have a "size" field in btf_header. With this btf_metadata extension
> we're effectively adding "size" fields for each btf kind and its vlen part.
> So what Alan proposed:
> +struct btf_kind_meta {
> +       __u16 flags;            /* see BTF_KIND_META_* values above */
> +       __u8 info_sz;           /* size of singular element after btf_type */
> +       __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
> +};
>
> _without_ name_off it makes the most sense.
>
> As soon as we're trying to add 'name_off' to the kind we're falling into
> the trap of thinking that we're adding "self described" format and
> btf_kind_meta needs to actually describe it for printing (with
> real name and not just integer id) and further trying to describe
> semantics of unknown kind with another flag that Andrii's proposed:
> "Another flag I was thinking about was a flag whether struct btf_type's
> type/size field is a type or a size (or something else)."
>
> imo name_off and that other flag in addition to optional_or_not flag
> are carrying the concept too far.
>
> We should just say upfront that this "struct btf_kind_meta" is to be
> able to extend BTF easier and nothing else.
> "old" bpftool will be able to skip unknown kinds, but dedup
> probably won't be able to skip much anyway.
>
> I'd also call it "struct btf_kind_description|layout|sizes"
> to narrow the scope.
> This BTF extension is not going to describe semantics of unknown kinds.
> Instead of "best effort" attempts with flags like "what type/size means"
> let's not even go there.
>
> If we go this simple route I'm fine with hard coded crc and base_crc
> fields. They probably should go to btf_header though.
> We don't need "struct btf_metadata" as well.
> It's making things sound beyond what it actually is.
> btf_header can point to an array of struct btf_kind_description.
> As simple as it can get.
> No need for json like format and key/value things either.
> We're not creating a self described BTF format.
> We're just adding a few size fields.
> The kernel/libbpf/dedup still needs to known semantics of future kinds
> to be able to print/operate on them.

I've only been following this discussion on the sidelines, but FWIW I
agree that it is futile to try to describe semantics of fields inside
the format. Anything that needs to do transformations on the whole of
the BTF is going to have to understand the semantics anyway. And a
pretty-printer can just skip over the fields it doesn't understand and
emit a "unknown type XXX" message when doing so.

I'll also add that I am thrilled with the effort to make sure new BTF
kinds always embed their length so parsers can skip over them; the fact
that the older ones don't is, IMO, one of the biggest flaws of the BTF
format, and I'm thrilled to see it fixed! The "type-length-value with a
'required' flag" is also a pretty standard way to do this in, e.g.,
network protocols.

As for bumping the version number, I don't think it's a good idea to
deliberately break compatibility this way unless it's absolutely
necessary. With "absolutely necessary" meaning "things will break in
subtle ways in any case, so it's better to make the breakage obvious".
But it libbpf is not checking the version field anyway, that becomes
kind of a moot point, as bumping it doesn't really gain us anything,
then...

-Toke

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-06  2:46                   ` Alexei Starovoitov
  2023-06-06 11:30                     ` Toke Høiland-Jørgensen
@ 2023-06-06 16:50                     ` Andrii Nakryiko
  2023-06-07  1:16                       ` Alexei Starovoitov
  1 sibling, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2023-06-06 16:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Mon, Jun 5, 2023 at 7:46 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jun 5, 2023 at 3:38 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jun 5, 2023 at 9:15 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Jun 2, 2023 at 1:34 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > >
> > > > > > > > >> +
> > > > > > > > >> +struct btf_kind_meta {
> > > > > > > > >> +    __u32 name_off;         /* kind name string offset */
> > > > > >
> > > > > > I'm not sure why we'd need to record this for every KIND? The tool
> > > > > > that doesn't know about this new kind can't do much about it anyways,
> > > > > > so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
> > > > > > doesn't make much difference?
> > > > >
> > > > > The name is certainly more meaningful than 123.
> > > > > bpftool output is consumed by humans who will be able to tell the difference.
> > > > > I'd keep the name here.
> > > >
> > > > Ok, it's fine. When I was originally proposing this compact metadata,
> > > > I was trying to make it as minimal as possible, so adding 80 bytes
> > > > just for string offset fields (plus a bunch of strings) felt like an
> > > > unnecessary overhead. But it's not a big deal.
> > >
> > > Exactly. It's just 4 * num_kinds bytes in and ~20 * num_kinds for
> > > names, but it implements 'self description'.
> > > Otherwise the names become an external knowledge and BTF is not self described.
> > >
> > >
> > > > >
> > > > > > > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
> > > > > >
> > > > > > Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
> > > > > > existing kernels (and potentially many tools that parse BTF). Given we
> > > > > > can actually extend BTF in backwards compatible way by just adding an
> > > > > > optional two fields to btf_header + extra bytes for metadata sections,
> > > > > > why making our lives harder by bumping this version?
> > > > >
> > > > > I fail to see how bumping the version makes it harder.
> > > > > libbpf needs to sanitize meta* fields in the struct btf_header on
> > > > > older kernels anway. At the same time sanitizing the version from 2 to
> > > > > 1
> > > > > in the same header is one extra line of code in libbpf.
> > > > > What am I missing?
> > > >
> > > > So I checked libbpf code, and libbpf doesn't really check the version
> > > > field. So for the most part this BTF_VERSION bump wouldn't matter for
> > > > any tool that's based on libbpf's struct btf API. But if libbpf did
> > > > check version (as it probably should have), then by upgrading to newer
> > > > Clang that would emit BTF with this metadata (but no new fancy
> > > > BTF_KIND_KERNEL_FUNC or anything like that), we automatically make
> > > > such BTF incompatible with all those tools.
> > > >
> > > > Kernel is a bit different because it's extremely strict about BTF. I'm
> > > > more worried about tools like bpftool (but we don't check BTF_VERSION
> > > > there due to libbpf), llvm-objdump (when it supports BTF), etc.
> > > >
> > > > On the other hand, what do we gain by bumping this BTF_VERSION?
> > >
> > > The version bump will be an indication that
> > > v2 of BTF has enough info in the format for any tool/kernel to consume it.
> > > With v2 we should make BTF_KIND_META description mandatory.
> > > If we keep it as v1 then the presence of BTF_KIND_META would be
> > > an indication of 'self described' format.
> > > Which is also ok-ish, but seems less clean.
> > > zero vs not-zero of meta_off in btf_header is pretty much v1 vs v2.
> > >
> >
> > We had a long offline discussion w/ Alexei about this whole
> > self-describing BTF, and I will try to summarize it here a bit,
> > because I think we both think about "self-describing" differently, and
> > as a result few different aspects are conflated with each other (there
> > are at least 3(!) different things here).
>
> Thanks for summarizing. All correct.
>
> > From my perspective, this self-describing BTF metadata was purely
> > designed to allow tools without latest BTF knowledge to be able to
> > skip over unknown BTF_KIND_xxx, at most being able to tell whether
> > it's critical for understanding BTF (that's the OPTIONAL flag) or not.
> > I.e., with older bpftool (but the one that knows about btf_metadata,
> > of course), it would still be possible to `bpftool btf dump file
> > <file-with-newer-btf-kinds>` just fine, except for new KINDS (which
> > would be just emitted as "unknown BTF_KIND_XXX, skipping...".
> >
> > I think this problem is solved with this fixed + per-vlen sz and those
> > few extra flags.
>
> I'm fine with this approach as long as we don't fool ourselves that
> we're doing a "self described" format.
> We have a "size" field in btf_header. With this btf_metadata extension
> we're effectively adding "size" fields for each btf kind and its vlen part.
> So what Alan proposed:
> +struct btf_kind_meta {
> +       __u16 flags;            /* see BTF_KIND_META_* values above */
> +       __u8 info_sz;           /* size of singular element after btf_type */
> +       __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
> +};
>
> _without_ name_off it makes the most sense.

Yep. I didn't see much need for name_off as well.

>
> As soon as we're trying to add 'name_off' to the kind we're falling into
> the trap of thinking that we're adding "self described" format and
> btf_kind_meta needs to actually describe it for printing (with
> real name and not just integer id) and further trying to describe
> semantics of unknown kind with another flag that Andrii's proposed:
> "Another flag I was thinking about was a flag whether struct btf_type's
> type/size field is a type or a size (or something else)."
>
> imo name_off and that other flag in addition to optional_or_not flag
> are carrying the concept too far.
>
> We should just say upfront that this "struct btf_kind_meta" is to be
> able to extend BTF easier and nothing else.

Yep, that was precisely (at least my) intent. It's great that we are
converging on this.


> "old" bpftool will be able to skip unknown kinds, but dedup
> probably won't be able to skip much anyway.

Agreed, I don't think we can ever make BTF dedup work reliably with
KINDs it doesn't understand. I wouldn't even try. I'd also say that
kernel should keep being strict about this (even if we add
"is-it-optional" field, kernel can't trust it). Libbpf and other
libraries will have to keep sanitizing BTF anyways.

I'm also ok with dropping "optional_or_not" flag. Same for
"does-it-point-to-type" flag. I can see some use for the latter (e.g.,
still being able to calculate sizes and stuff), but it's nothing super
critical, IMO.

>
> I'd also call it "struct btf_kind_description|layout|sizes"

I like btf_kind_layout, it's short and to the point.

> to narrow the scope.
> This BTF extension is not going to describe semantics of unknown kinds.
> Instead of "best effort" attempts with flags like "what type/size means"
> let's not even go there.
>

Ack.

> If we go this simple route I'm fine with hard coded crc and base_crc
> fields. They probably should go to btf_header though.

Yep, on btf_header fields. But I'd not hardcode "crc" name. If we are
doing them as strings (which I think we should instead of dooming them
to 32-bit integer crc32 value only), then can we just say generically
that it's either "id" or "checksum"?

But I guess crc32 would be fine in practice as well. So not something
I strongly feel about.

> We don't need "struct btf_metadata" as well.
> It's making things sound beyond what it actually is.
> btf_header can point to an array of struct btf_kind_description.
> As simple as it can get.

Agreed. Still, it's a third section, and we should at least have a
count of those btf_kind_layout items somewhere.

> No need for json like format and key/value things either.
> We're not creating a self described BTF format.
> We're just adding a few size fields.

Ack, great.

> The kernel/libbpf/dedup still needs to known semantics of future kinds
> to be able to print/operate on them.

Yes.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-06 16:50                     ` Andrii Nakryiko
@ 2023-06-07  1:16                       ` Alexei Starovoitov
  2023-06-07 21:43                         ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2023-06-07  1:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Tue, Jun 6, 2023 at 9:50 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> Agreed, I don't think we can ever make BTF dedup work reliably with
> KINDs it doesn't understand. I wouldn't even try. I'd also say that
> kernel should keep being strict about this (even if we add
> "is-it-optional" field, kernel can't trust it). Libbpf and other
> libraries will have to keep sanitizing BTF anyways.

Good point. "it-is-optional" flag should be for user space only.

> > If we go this simple route I'm fine with hard coded crc and base_crc
> > fields. They probably should go to btf_header though.
>
> Yep, on btf_header fields. But I'd not hardcode "crc" name. If we are
> doing them as strings (which I think we should instead of dooming them
> to 32-bit integer crc32 value only), then can we just say generically
> that it's either "id" or "checksum"?
>
> But I guess crc32 would be fine in practice as well. So not something
> I strongly feel about.

I still fail to see how generic string "id" helps.
We have to standardize on a way to checksum BTF-s.
Say, we pick crc32.
pahole/clang would have to use the same algorithm.
Then kernel during BTF_LOAD should check that crc32 matches
to make sure btf data didn't get corrupted between its creation
and loading into the kernel.
Just like btrfs uses crc32 to make sure data doesn't get corrupted by disk.
libbpf doing sanitization would need to tweak crc32 too.
So it's going to be hard coded semantics at every level.
id and especially string id would be cumbersome for all these layers
to deal with.


> > We don't need "struct btf_metadata" as well.
> > It's making things sound beyond what it actually is.
> > btf_header can point to an array of struct btf_kind_description.
> > As simple as it can get.
>
> Agreed. Still, it's a third section, and we should at least have a
> count of those btf_kind_layout items somewhere.

of course.
In btf_header we have
        __u32   type_off;       /* offset of type section       */
        __u32   type_len;       /* length of type section       */
I meant we add:
        __u32   kind_layouts_off;
        __u32   kind_layouts_len;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-06 11:30                     ` Toke Høiland-Jørgensen
@ 2023-06-07 11:55                       ` Eduard Zingerman
  2023-06-07 15:29                         ` Yonghong Song
  0 siblings, 1 reply; 35+ messages in thread
From: Eduard Zingerman @ 2023-06-07 11:55 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Alexei Starovoitov, Andrii Nakryiko
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Tue, 2023-06-06 at 13:30 +0200, Toke Høiland-Jørgensen wrote:
[...]
> 
> As for bumping the version number, I don't think it's a good idea to
> deliberately break compatibility this way unless it's absolutely
> necessary. With "absolutely necessary" meaning "things will break in
> subtle ways in any case, so it's better to make the breakage obvious".
> But it libbpf is not checking the version field anyway, that becomes
> kind of a moot point, as bumping it doesn't really gain us anything,
> then...

It seems to me that in terms of backward compatibility, the ability to
specify the size for each kind entry is more valuable than the
capability to add new BTF kinds:
- The former allows for extending kind records in
  a backward-compatible manner, such as adding a function address to
  BTF_KIND_FUNC.
- The latter is much more fragile. Types refer to each other,
  compatibility is already lost once a new "unknown" tag is introduced
  in a type chain.

However, changing the size of existing BTF kinds is itself a
backward-incompatible change. Therefore, a version bump may be
warranted in this regard.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-07 11:55                       ` Eduard Zingerman
@ 2023-06-07 15:29                         ` Yonghong Song
  2023-06-07 16:14                           ` Eduard Zingerman
  0 siblings, 1 reply; 35+ messages in thread
From: Yonghong Song @ 2023-06-07 15:29 UTC (permalink / raw)
  To: Eduard Zingerman, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Andrii Nakryiko
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf



On 6/7/23 4:55 AM, Eduard Zingerman wrote:
> On Tue, 2023-06-06 at 13:30 +0200, Toke Høiland-Jørgensen wrote:
> [...]
>>
>> As for bumping the version number, I don't think it's a good idea to
>> deliberately break compatibility this way unless it's absolutely
>> necessary. With "absolutely necessary" meaning "things will break in
>> subtle ways in any case, so it's better to make the breakage obvious".
>> But it libbpf is not checking the version field anyway, that becomes
>> kind of a moot point, as bumping it doesn't really gain us anything,
>> then...
> 
> It seems to me that in terms of backward compatibility, the ability to
> specify the size for each kind entry is more valuable than the
> capability to add new BTF kinds:
> - The former allows for extending kind records in
>    a backward-compatible manner, such as adding a function address to
>    BTF_KIND_FUNC.

Eduard, the new proposal is to add new kind, e.g., BTF_KIND_KFUNC, which
will have an 'address' field. BTF_KIND_KFUNC is for kernel functions.
So we will not have size compatibility issue for BTF_KIND_FUNC.

> - The latter is much more fragile. Types refer to each other,
>    compatibility is already lost once a new "unknown" tag is introduced
>    in a type chain.
> 
> However, changing the size of existing BTF kinds is itself a
> backward-incompatible change. Therefore, a version bump may be
> warranted in this regard.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-07 15:29                         ` Yonghong Song
@ 2023-06-07 16:14                           ` Eduard Zingerman
  2023-06-07 21:47                             ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Eduard Zingerman @ 2023-06-07 16:14 UTC (permalink / raw)
  To: Yonghong Song, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Andrii Nakryiko
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Wed, 2023-06-07 at 08:29 -0700, Yonghong Song wrote:
> 
> On 6/7/23 4:55 AM, Eduard Zingerman wrote:
> > On Tue, 2023-06-06 at 13:30 +0200, Toke Høiland-Jørgensen wrote:
> > [...]
> > > 
> > > As for bumping the version number, I don't think it's a good idea to
> > > deliberately break compatibility this way unless it's absolutely
> > > necessary. With "absolutely necessary" meaning "things will break in
> > > subtle ways in any case, so it's better to make the breakage obvious".
> > > But it libbpf is not checking the version field anyway, that becomes
> > > kind of a moot point, as bumping it doesn't really gain us anything,
> > > then...
> > 
> > It seems to me that in terms of backward compatibility, the ability to
> > specify the size for each kind entry is more valuable than the
> > capability to add new BTF kinds:
> > - The former allows for extending kind records in
> >    a backward-compatible manner, such as adding a function address to
> >    BTF_KIND_FUNC.
> 
> Eduard, the new proposal is to add new kind, e.g., BTF_KIND_KFUNC, which
> will have an 'address' field. BTF_KIND_KFUNC is for kernel functions.
> So we will not have size compatibility issue for BTF_KIND_FUNC.

Well, actually this might be a way to avoid BTF_KIND_KFUNC :)
What I wanted to say is that any use of this feature leads to
incompatibility with current BTF parsers, as either size of existing
kinds would be changed or a new kind with unknown size would be added.
It seems to me that this warrants version bump (or some other way to
signal existing parsers that format is incompatible).

> 
> > - The latter is much more fragile. Types refer to each other,
> >    compatibility is already lost once a new "unknown" tag is introduced
> >    in a type chain.
> > 
> > However, changing the size of existing BTF kinds is itself a
> > backward-incompatible change. Therefore, a version bump may be
> > warranted in this regard.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 4/8] btf: support kernel parsing of BTF with metadata, use it to parse BTF with unknown kinds
  2023-05-31 20:19 ` [RFC bpf-next 4/8] btf: support kernel parsing of BTF with metadata, use it to parse BTF with unknown kinds Alan Maguire
@ 2023-06-07 19:51   ` Eduard Zingerman
  0 siblings, 0 replies; 35+ messages in thread
From: Eduard Zingerman @ 2023-06-07 19:51 UTC (permalink / raw)
  To: Alan Maguire, ast, daniel, andrii, acme
  Cc: martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, quentin, mykolal, bpf

On Wed, 2023-05-31 at 21:19 +0100, Alan Maguire wrote:
> Validate metadata if present, and use it to parse BTF with
> unrecognized kinds. Reject BTF that contains a type
> of a kind that is not optional.
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  kernel/bpf/btf.c | 102 +++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 85 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index bd2cac057928..67f42d9ce099 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -257,6 +257,7 @@ struct btf {
>  	struct btf_kfunc_set_tab *kfunc_set_tab;
>  	struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;
>  	struct btf_struct_metas *struct_meta_tab;
> +	struct btf_metadata *meta_data;
>  
>  	/* split BTF support */
>  	struct btf *base_btf;
> @@ -4965,22 +4966,41 @@ static s32 btf_check_meta(struct btf_verifier_env *env,
>  		return -EINVAL;
>  	}
>  
> -	if (BTF_INFO_KIND(t->info) > BTF_KIND_MAX ||
> -	    BTF_INFO_KIND(t->info) == BTF_KIND_UNKN) {
> -		btf_verifier_log(env, "[%u] Invalid kind:%u",
> -				 env->log_type_id, BTF_INFO_KIND(t->info));
> -		return -EINVAL;
> -	}
> -
>  	if (!btf_name_offset_valid(env->btf, t->name_off)) {
>  		btf_verifier_log(env, "[%u] Invalid name_offset:%u",
>  				 env->log_type_id, t->name_off);
>  		return -EINVAL;
>  	}
>  
> -	var_meta_size = btf_type_ops(t)->check_meta(env, t, meta_left);
> -	if (var_meta_size < 0)
> -		return var_meta_size;
> +	if (BTF_INFO_KIND(t->info) == BTF_KIND_UNKN) {
> +		btf_verifier_log(env, "[%u] Invalid kind:%u",
> +				 env->log_type_id, BTF_INFO_KIND(t->info));
> +		return -EINVAL;
> +	}
> +
> +	if (BTF_INFO_KIND(t->info) > BTF_KIND_MAX && env->btf->meta_data &&
> +	    BTF_INFO_KIND(t->info) < env->btf->meta_data->kind_meta_cnt) {
> +		struct btf_kind_meta *m = &env->btf->meta_data->kind_meta[BTF_INFO_KIND(t->info)];
> +
> +		if (!(m->flags & BTF_KIND_META_OPTIONAL)) {
> +			btf_verifier_log(env, "[%u] unknown but required kind '%s'(%u)",
> +					 env->log_type_id,
> +					 btf_name_by_offset(env->btf, m->name_off),
> +					 BTF_INFO_KIND(t->info));
> +			return -EINVAL;
> +		}
> +		var_meta_size = sizeof(struct btf_type);
> +		var_meta_size += m->info_sz + (btf_type_vlen(t) * m->elem_sz);
> +	} else {
> +		if (BTF_INFO_KIND(t->info) > BTF_KIND_MAX) {
> +			btf_verifier_log(env, "[%u] Invalid kind:%u",
> +					 env->log_type_id, BTF_INFO_KIND(t->info));
> +			return -EINVAL;
> +		}
> +		var_meta_size = btf_type_ops(t)->check_meta(env, t, meta_left);
> +		if (var_meta_size < 0)
> +			return var_meta_size;
> +	}
>  
>  	meta_left -= var_meta_size;
>  
> @@ -5155,7 +5175,8 @@ static int btf_parse_str_sec(struct btf_verifier_env *env)
>  	start = btf->nohdr_data + hdr->str_off;
>  	end = start + hdr->str_len;
>  
> -	if (end != btf->data + btf->data_size) {
> +	if (hdr->hdr_len < sizeof(struct btf_header) &&
> +	    end != btf->data + btf->data_size) {
>  		btf_verifier_log(env, "String section is not at the end");
>  		return -EINVAL;
>  	}
> @@ -5176,9 +5197,47 @@ static int btf_parse_str_sec(struct btf_verifier_env *env)
>  	return 0;
>  }
>  
> +static int btf_parse_meta_sec(struct btf_verifier_env *env)
> +{
> +	const struct btf_header *hdr = &env->btf->hdr;
> +	struct btf *btf = env->btf;
> +	void *start, *end;
> +
> +	if (hdr->hdr_len < sizeof(struct btf_header) ||
> +	    hdr->meta_header.meta_len == 0)
> +		return 0;
> +
> +	/* Meta section must align to 8 bytes */
> +	if (hdr->meta_header.meta_off & (sizeof(u64) - 1)) {
> +		btf_verifier_log(env, "Unaligned meta_off");
> +		return -EINVAL;
> +	}
> +	start = btf->nohdr_data + hdr->meta_header.meta_off;
> +	end = start + hdr->meta_header.meta_len;
> +
> +	if (hdr->meta_header.meta_len < sizeof(struct btf_meta_header)) {
> +		btf_verifier_log(env, "Metadata section is too small");
> +		return -EINVAL;
> +	}
> +	if (end != btf->data + btf->data_size) {
> +		btf_verifier_log(env, "Metadata section is not at the end");
> +		return -EINVAL;
> +	}
> +	btf->meta_data = start;
> +
> +	if (hdr->meta_header.meta_len != sizeof(struct btf_metadata) +
> +					(btf->meta_data->kind_meta_cnt *
> +					 sizeof(struct btf_kind_meta))) {
> +		btf_verifier_log(env, "Metadata section size mismatch");
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
>  static const size_t btf_sec_info_offset[] = {
>  	offsetof(struct btf_header, type_off),
>  	offsetof(struct btf_header, str_off),
> +	offsetof(struct btf_header, meta_header.meta_off),
>  };
>  
>  static int btf_sec_info_cmp(const void *a, const void *b)
> @@ -5193,15 +5252,19 @@ static int btf_check_sec_info(struct btf_verifier_env *env,
>  			      u32 btf_data_size)
>  {
>  	struct btf_sec_info secs[ARRAY_SIZE(btf_sec_info_offset)];
> -	u32 total, expected_total, i;
> +	u32 nr_secs = ARRAY_SIZE(btf_sec_info_offset);
> +	u32 total, expected_total, gap, i;
>  	const struct btf_header *hdr;
>  	const struct btf *btf;
>  
>  	btf = env->btf;
>  	hdr = &btf->hdr;
>  
> +	if (hdr->hdr_len < sizeof(struct btf_header))
> +		nr_secs--;
> +
>  	/* Populate the secs from hdr */
> -	for (i = 0; i < ARRAY_SIZE(btf_sec_info_offset); i++)
> +	for (i = 0; i < nr_secs; i++)
>  		secs[i] = *(struct btf_sec_info *)((void *)hdr +
>  						   btf_sec_info_offset[i]);
>  
> @@ -5216,9 +5279,10 @@ static int btf_check_sec_info(struct btf_verifier_env *env,
>  			btf_verifier_log(env, "Invalid section offset");
>  			return -EINVAL;
>  		}
> -		if (total < secs[i].off) {
> -			/* gap */
> -			btf_verifier_log(env, "Unsupported section found");
> +		gap = secs[i].off - total;
> +		if (gap >= 8) {
> +			/* gap larger than alignment gap */
> +			btf_verifier_log(env, "Unsupported section gap found");

I have two `prog_tests` tests failing with this patch applied:
* btf/btf_header test. Gap between hdr and type
  Here expected string should be updated:

--- a/tools/testing/selftests/bpf/prog_tests/btf.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf.c
@@ -1579,7 +1579,7 @@ static struct btf_raw_test raw_tests[] = {
        .max_entries = 4,
        .btf_load_err = true,
        .type_off_delta = 4,
-       .err_str = "Unsupported section found",
+       .err_str = "Unsupported section gap found",
 },

* btf/btf_header test. Overlap between type and str
  This test expects error string "Section overlap found",
  but instead "Section overlap found" is printed.
  This happens with the following values of local variables:
    
    total=20, secs[2].off=16, gap=-4

  (`gap` is printed as signed using %d).

>  			return -EINVAL;
>  		}
>  		if (total > secs[i].off) {
> @@ -5230,7 +5294,7 @@ static int btf_check_sec_info(struct btf_verifier_env *env,
>  					 "Total section length too long");
>  			return -EINVAL;
>  		}
> -		total += secs[i].len;
> +		total += secs[i].len + gap;
>  	}
>  
>  	/* There is data other than hdr and known sections */
> @@ -5530,6 +5594,10 @@ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uat
>  	if (err)
>  		goto errout;
>  
> +	err = btf_parse_meta_sec(env);
> +	if (err)
> +		goto errout;
> +
>  	err = btf_parse_type_sec(env);
>  	if (err)
>  		goto errout;


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-07  1:16                       ` Alexei Starovoitov
@ 2023-06-07 21:43                         ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-06-07 21:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Arnaldo Carvalho de Melo, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Tue, Jun 6, 2023 at 6:16 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jun 6, 2023 at 9:50 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > Agreed, I don't think we can ever make BTF dedup work reliably with
> > KINDs it doesn't understand. I wouldn't even try. I'd also say that
> > kernel should keep being strict about this (even if we add
> > "is-it-optional" field, kernel can't trust it). Libbpf and other
> > libraries will have to keep sanitizing BTF anyways.
>
> Good point. "it-is-optional" flag should be for user space only.
>
> > > If we go this simple route I'm fine with hard coded crc and base_crc
> > > fields. They probably should go to btf_header though.
> >
> > Yep, on btf_header fields. But I'd not hardcode "crc" name. If we are
> > doing them as strings (which I think we should instead of dooming them
> > to 32-bit integer crc32 value only), then can we just say generically
> > that it's either "id" or "checksum"?
> >
> > But I guess crc32 would be fine in practice as well. So not something
> > I strongly feel about.
>
> I still fail to see how generic string "id" helps.
> We have to standardize on a way to checksum BTF-s.
> Say, we pick crc32.
> pahole/clang would have to use the same algorithm.
> Then kernel during BTF_LOAD should check that crc32 matches
> to make sure btf data didn't get corrupted between its creation
> and loading into the kernel.
> Just like btrfs uses crc32 to make sure data doesn't get corrupted by disk.
> libbpf doing sanitization would need to tweak crc32 too.
> So it's going to be hard coded semantics at every level.
> id and especially string id would be cumbersome for all these layers
> to deal with.

Ok, that's totally fine with me. For me this whole checksumming was
less about checksum and validating content of BTF wasn't corrupted. It
was more about making sure that split BTF matches base BTF. And for
that readers (libbpf, tools, etc) wouldn't recalculate checksums on
their own. They'd get those checksums/IDs and just compare them.
Whether it's opaque string or int is absolutely irrelevant for this
use case (which was the main one for me).

But as I said, I'm fine either way. Let's hard-code crc32, it's
simpler to generate for sure.

>
>
> > > We don't need "struct btf_metadata" as well.
> > > It's making things sound beyond what it actually is.
> > > btf_header can point to an array of struct btf_kind_description.
> > > As simple as it can get.
> >
> > Agreed. Still, it's a third section, and we should at least have a
> > count of those btf_kind_layout items somewhere.
>
> of course.
> In btf_header we have
>         __u32   type_off;       /* offset of type section       */
>         __u32   type_len;       /* length of type section       */
> I meant we add:
>         __u32   kind_layouts_off;
>         __u32   kind_layouts_len;

ok, sounds good

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-07 16:14                           ` Eduard Zingerman
@ 2023-06-07 21:47                             ` Andrii Nakryiko
  2023-06-07 22:05                               ` Eduard Zingerman
  0 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2023-06-07 21:47 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Yonghong Song, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Alan Maguire, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Arnaldo Carvalho de Melo,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Wed, Jun 7, 2023 at 9:14 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2023-06-07 at 08:29 -0700, Yonghong Song wrote:
> >
> > On 6/7/23 4:55 AM, Eduard Zingerman wrote:
> > > On Tue, 2023-06-06 at 13:30 +0200, Toke Høiland-Jørgensen wrote:
> > > [...]
> > > >
> > > > As for bumping the version number, I don't think it's a good idea to
> > > > deliberately break compatibility this way unless it's absolutely
> > > > necessary. With "absolutely necessary" meaning "things will break in
> > > > subtle ways in any case, so it's better to make the breakage obvious".
> > > > But it libbpf is not checking the version field anyway, that becomes
> > > > kind of a moot point, as bumping it doesn't really gain us anything,
> > > > then...
> > >
> > > It seems to me that in terms of backward compatibility, the ability to
> > > specify the size for each kind entry is more valuable than the
> > > capability to add new BTF kinds:
> > > - The former allows for extending kind records in
> > >    a backward-compatible manner, such as adding a function address to
> > >    BTF_KIND_FUNC.
> >
> > Eduard, the new proposal is to add new kind, e.g., BTF_KIND_KFUNC, which
> > will have an 'address' field. BTF_KIND_KFUNC is for kernel functions.
> > So we will not have size compatibility issue for BTF_KIND_FUNC.
>
> Well, actually this might be a way to avoid BTF_KIND_KFUNC :)
> What I wanted to say is that any use of this feature leads to
> incompatibility with current BTF parsers, as either size of existing
> kinds would be changed or a new kind with unknown size would be added.
> It seems to me that this warrants version bump (or some other way to
> signal existing parsers that format is incompatible).

It is probably too late to have existing KINDs changing their size. If
this layout metadata was mandatory from the very beginning, then we
could have relied on it for determining new extra fields for
BTF_KIND_FUNC.

As things stand right now, new BTF_KIND_KFUNC is both a signal of
newer format (for kernel-side BTF; nothing changes for BPF object file
BTFs, which is great side-effect making backwards compat pain
smaller), and is a simpler and safer way to add extra information.

>
> >
> > > - The latter is much more fragile. Types refer to each other,
> > >    compatibility is already lost once a new "unknown" tag is introduced
> > >    in a type chain.
> > >
> > > However, changing the size of existing BTF kinds is itself a
> > > backward-incompatible change. Therefore, a version bump may be
> > > warranted in this regard.
>

See above and previous emails. Not having to bump version means we can
start emitting this layout info from Clang and pahole with no extra
opt-in flags, and not worry about breaking existing tools and apps.
This is great, so let's not ruin that property :)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-07 21:47                             ` Andrii Nakryiko
@ 2023-06-07 22:05                               ` Eduard Zingerman
  2023-06-07 22:34                                 ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Eduard Zingerman @ 2023-06-07 22:05 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Yonghong Song, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Alan Maguire, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Arnaldo Carvalho de Melo,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Wed, 2023-06-07 at 14:47 -0700, Andrii Nakryiko wrote:
> On Wed, Jun 7, 2023 at 9:14 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Wed, 2023-06-07 at 08:29 -0700, Yonghong Song wrote:
> > > 
> > > On 6/7/23 4:55 AM, Eduard Zingerman wrote:
> > > > On Tue, 2023-06-06 at 13:30 +0200, Toke Høiland-Jørgensen wrote:
> > > > [...]
> > > > > 
> > > > > As for bumping the version number, I don't think it's a good idea to
> > > > > deliberately break compatibility this way unless it's absolutely
> > > > > necessary. With "absolutely necessary" meaning "things will break in
> > > > > subtle ways in any case, so it's better to make the breakage obvious".
> > > > > But it libbpf is not checking the version field anyway, that becomes
> > > > > kind of a moot point, as bumping it doesn't really gain us anything,
> > > > > then...
> > > > 
> > > > It seems to me that in terms of backward compatibility, the ability to
> > > > specify the size for each kind entry is more valuable than the
> > > > capability to add new BTF kinds:
> > > > - The former allows for extending kind records in
> > > >    a backward-compatible manner, such as adding a function address to
> > > >    BTF_KIND_FUNC.
> > > 
> > > Eduard, the new proposal is to add new kind, e.g., BTF_KIND_KFUNC, which
> > > will have an 'address' field. BTF_KIND_KFUNC is for kernel functions.
> > > So we will not have size compatibility issue for BTF_KIND_FUNC.
> > 
> > Well, actually this might be a way to avoid BTF_KIND_KFUNC :)
> > What I wanted to say is that any use of this feature leads to
> > incompatibility with current BTF parsers, as either size of existing
> > kinds would be changed or a new kind with unknown size would be added.
> > It seems to me that this warrants version bump (or some other way to
> > signal existing parsers that format is incompatible).
> 
> It is probably too late to have existing KINDs changing their size. If
> this layout metadata was mandatory from the very beginning, then we
> could have relied on it for determining new extra fields for
> BTF_KIND_FUNC.
> 
> As things stand right now, new BTF_KIND_KFUNC is both a signal of
> newer format (for kernel-side BTF; nothing changes for BPF object file
> BTFs, which is great side-effect making backwards compat pain
> smaller), and is a simpler and safer way to add extra information.
> 
> > 
> > > 
> > > > - The latter is much more fragile. Types refer to each other,
> > > >    compatibility is already lost once a new "unknown" tag is introduced
> > > >    in a type chain.
> > > > 
> > > > However, changing the size of existing BTF kinds is itself a
> > > > backward-incompatible change. Therefore, a version bump may be
> > > > warranted in this regard.
> > 
> 
> See above and previous emails. Not having to bump version means we can
> start emitting this layout info from Clang and pahole with no extra
> opt-in flags, and not worry about breaking existing tools and apps.
> This is great, so let's not ruin that property :)

I'm not sure I understand how this would help:
- If no new kinds are added, absence or presence of metadata section
  does not matter. Old parsers would ignore it, new parsers would work
  as old parsers, so there is no added value in generating metadata.
- As soon as new kind is added old parsers are broken.

What am I missing?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI
  2023-06-07 22:05                               ` Eduard Zingerman
@ 2023-06-07 22:34                                 ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-06-07 22:34 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Yonghong Song, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Alan Maguire, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Arnaldo Carvalho de Melo,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
	Mykola Lysenko, bpf

On Wed, Jun 7, 2023 at 3:05 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2023-06-07 at 14:47 -0700, Andrii Nakryiko wrote:
> > On Wed, Jun 7, 2023 at 9:14 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Wed, 2023-06-07 at 08:29 -0700, Yonghong Song wrote:
> > > >
> > > > On 6/7/23 4:55 AM, Eduard Zingerman wrote:
> > > > > On Tue, 2023-06-06 at 13:30 +0200, Toke Høiland-Jørgensen wrote:
> > > > > [...]
> > > > > >
> > > > > > As for bumping the version number, I don't think it's a good idea to
> > > > > > deliberately break compatibility this way unless it's absolutely
> > > > > > necessary. With "absolutely necessary" meaning "things will break in
> > > > > > subtle ways in any case, so it's better to make the breakage obvious".
> > > > > > But it libbpf is not checking the version field anyway, that becomes
> > > > > > kind of a moot point, as bumping it doesn't really gain us anything,
> > > > > > then...
> > > > >
> > > > > It seems to me that in terms of backward compatibility, the ability to
> > > > > specify the size for each kind entry is more valuable than the
> > > > > capability to add new BTF kinds:
> > > > > - The former allows for extending kind records in
> > > > >    a backward-compatible manner, such as adding a function address to
> > > > >    BTF_KIND_FUNC.
> > > >
> > > > Eduard, the new proposal is to add new kind, e.g., BTF_KIND_KFUNC, which
> > > > will have an 'address' field. BTF_KIND_KFUNC is for kernel functions.
> > > > So we will not have size compatibility issue for BTF_KIND_FUNC.
> > >
> > > Well, actually this might be a way to avoid BTF_KIND_KFUNC :)
> > > What I wanted to say is that any use of this feature leads to
> > > incompatibility with current BTF parsers, as either size of existing
> > > kinds would be changed or a new kind with unknown size would be added.
> > > It seems to me that this warrants version bump (or some other way to
> > > signal existing parsers that format is incompatible).
> >
> > It is probably too late to have existing KINDs changing their size. If
> > this layout metadata was mandatory from the very beginning, then we
> > could have relied on it for determining new extra fields for
> > BTF_KIND_FUNC.
> >
> > As things stand right now, new BTF_KIND_KFUNC is both a signal of
> > newer format (for kernel-side BTF; nothing changes for BPF object file
> > BTFs, which is great side-effect making backwards compat pain
> > smaller), and is a simpler and safer way to add extra information.
> >
> > >
> > > >
> > > > > - The latter is much more fragile. Types refer to each other,
> > > > >    compatibility is already lost once a new "unknown" tag is introduced
> > > > >    in a type chain.
> > > > >
> > > > > However, changing the size of existing BTF kinds is itself a
> > > > > backward-incompatible change. Therefore, a version bump may be
> > > > > warranted in this regard.
> > >
> >
> > See above and previous emails. Not having to bump version means we can
> > start emitting this layout info from Clang and pahole with no extra
> > opt-in flags, and not worry about breaking existing tools and apps.
> > This is great, so let's not ruin that property :)
>
> I'm not sure I understand how this would help:
> - If no new kinds are added, absence or presence of metadata section
>   does not matter. Old parsers would ignore it, new parsers would work
>   as old parsers, so there is no added value in generating metadata.
> - As soon as new kind is added old parsers are broken.
>
> What am I missing?

I was arguing both against changing BTF_KIND_FUNC (not adding new
fields to id, not changing its size based on klag, etc) and against
bumping BTF_VERSION to 2.

For kernel-side BTF breakage is unavoidable, unfortunately, either if
we extend BTF_KIND_FUNC with addr or add new kind BTF_KIND_KFUNC. Any
application that would want to open such new kernel BTF would need to
upgrade to latest libbpf to be able to do it.

What I'm trying to avoid is also breaking (unnecessarily) BPF object
file-side BTF generated by Clang. BPF object BTF also has
BTF_KIND_FUNC generated for each entry program and subprogram. So if
we change anything about BTF_KIND_FUNC, we break existing tools, so we
need to be careful about that.


If we are talking about this btf_layout information separately from
extending kernel-side function info. By adding just that, we can keep
both kernel and BPF object BTFs backwards compatible with existing
tooling (unless someone decided to be very strict about checking
BTF_VERSION or btf_header bytes after last known field).


So tl;dr:
  - btf_layout is useful and can be done in a backwards compatible
way, if we don't bump BTF_VERSION;
  - we can start emitting it from Clang and pahole unconditionally, if
done this way;
  - adding addrs to either new BTF_KIND_KFUNC or extending existing
BTF_KIND_FUNC is separate from btf_layout (it just prompted btf_layout
prioritization to help avoid unnecessary tooling breakages in the
future), and I'm leaning towards new BTF_KIND_KFUNC instead of trying
to extend existing BTF_KIND_FUNC.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-06-07 22:34 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-31 20:19 [RFC bpf-next 0/8] bpf: support BTF kind metadata to separate Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI Alan Maguire
2023-06-01  3:53   ` Alexei Starovoitov
2023-06-01 10:36     ` Alan Maguire
2023-06-01 16:53       ` Alexei Starovoitov
2023-06-02 16:32         ` Andrii Nakryiko
2023-06-02 16:34           ` Andrii Nakryiko
2023-06-02 18:11           ` Alexei Starovoitov
2023-06-02 20:33             ` Andrii Nakryiko
2023-06-05 16:14               ` Alexei Starovoitov
2023-06-05 22:38                 ` Andrii Nakryiko
2023-06-06  2:46                   ` Alexei Starovoitov
2023-06-06 11:30                     ` Toke Høiland-Jørgensen
2023-06-07 11:55                       ` Eduard Zingerman
2023-06-07 15:29                         ` Yonghong Song
2023-06-07 16:14                           ` Eduard Zingerman
2023-06-07 21:47                             ` Andrii Nakryiko
2023-06-07 22:05                               ` Eduard Zingerman
2023-06-07 22:34                                 ` Andrii Nakryiko
2023-06-06 16:50                     ` Andrii Nakryiko
2023-06-07  1:16                       ` Alexei Starovoitov
2023-06-07 21:43                         ` Andrii Nakryiko
2023-05-31 20:19 ` [RFC bpf-next 2/8] libbpf: support handling of metadata section in BTF Alan Maguire
2023-06-05 11:01   ` Jiri Olsa
2023-06-05 21:40     ` Andrii Nakryiko
2023-05-31 20:19 ` [RFC bpf-next 3/8] libbpf: use metadata to compute an unknown kind size Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 4/8] btf: support kernel parsing of BTF with metadata, use it to parse BTF with unknown kinds Alan Maguire
2023-06-07 19:51   ` Eduard Zingerman
2023-05-31 20:19 ` [RFC bpf-next 5/8] libbpf: add metadata encoding support Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 6/8] btf: generate metadata for vmlinux/module BTF Alan Maguire
2023-05-31 20:19 ` [RFC bpf-next 7/8] bpftool: add BTF dump "format meta" to dump header/metadata Alan Maguire
2023-06-01 16:33   ` Quentin Monnet
2023-06-02 16:57   ` Andrii Nakryiko
2023-05-31 20:19 ` [RFC bpf-next 8/8] selftests/bpf: test kind encoding/decoding Alan Maguire
2023-05-31 20:19 ` [RFC dwarves] dwarves: encode BTF metadata if --btf_gen_meta is set Alan Maguire

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.