All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC bpf-next 00/13] bpf: support resilient split BTF
@ 2024-03-22 10:24 Alan Maguire
  2024-03-22 10:24 ` [RFC dwarves] btf_encoder: add base_ref BTF feature to generate split BTF with base refs Alan Maguire
                   ` (13 more replies)
  0 siblings, 14 replies; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Split BPF Type Format (BTF) provides huge advantages in that kernel
modules only have to provide type information for types that they do not
share with the core kernel; for core kernel types, split BTF refers to
core kernel BTF type ids.  So for a STRUCT sk_buff, a module that
uses that structure (or a pointer to it) simply needs to refer to the
core kernel type id, saving the need to define the structure and its many
dependents.  This cuts down on duplication and makes BTF as compact
as possible.

However, there is a downside.  This scheme requires the references from
split BTF to base BTF to be valid not just at encoding time, but at use
time (when the module is loaded).  Even a small change in kernel types
can perturb the type ids in core kernel BTF, and due to pahole's
parallel processing of compilation units, even an unchanged kernel can
have different type ids if BTF is re-generated.  So we have a robustness
problem for split BTF for cases where a module is not always compiled at
the same time as the kernel.  This problem is particularly acute for
distros which generally want module builders to be able to compile a
module for the lifetime of a Linux stable-based release, and have it
continue to be valid over the lifetime of that release, even as changes
in data structures (and hence BTF types) accrue.  Today it's not
possible to generate BTF for modules that works beyond the initial
kernel it is compiled against - kernel bugfixes etc invalidate the split
BTF references to vmlinux BTF, and BTF is no longer usable for the
module.

The goal of this series is to provide options to provide additional
context for cases like this.  That context comes in the form of "base
reference" BTF; it stands in for the base BTF, and contains
information about the types referenced from split BTF, but not their
full descriptions.  The modified split BTF will refer to type ids in
this .BTF.base_ref section, and when the kernel loads such modules it
will use the base reference BTF to map references from split BTF to the
current vmlinux BTF - a process of reconciling split BTF with the
currently-running kernel's vmlinux base BTF.

A module builder - using this series along with the pahole changes -
can then build a module with base reference BTF via

BTF_BASE_REF=1 make -C . M=path/2/module

For this to work, pahole will have to be built with libbpf based on
this series and with the patch titled

[RFC dwarves] btf_encoder: add base_ref BTF feature to generate split
BTF with base refs

The module will have a .BTF section (the split BTF) and a
.BTF.base_ref section.  The latter is small in size - base reference
BTF does not need full struct/union/enum information for named
types for example.  For 2667 modules built with base reference BTF,
the average size observed was 1556 bytes (stddev 1563).

Note that for the in-tree modules, this approach is not needed as
split and base BTF in the case of in-tree modules are always built
and re-built together.

The series first focuses on generating split BTF with base reference
BTF, and provides btf__parse_opts() which allows specification
of the section name from which to read BTF data, since we now have
both .BTF and .BTF.base_ref sections that can contain such data.

Then we add support to resolve_btfids for generating the .BTF.ids
section with reference to the .BTF.base_ref section - this ensures the
.BTF.ids match those used in the split/base reference BTF.

Finally the series provides the mechanism for reconciling split BTF with
a new base; the base reference BTF is used to map the references to base
BTF in the split BTF to the new base.  For the kernel, this
reconciliation process happens at module load time, and we reconcile
split BTF references to base BTF with the current vmlinux BTF.  .BTF.ids
need to be reconciled also.

So concretely, what happens is

- we generate split BTF in the .BTF section of a module that refers to
  types in the .BTF.base_ref section as base types; these are not full
  type descriptions but provide information about the base type.  So
  a STRUCT sk_buff would be represented as a FWD struct sk_buff in
  base reference BTF for example.
- when the module is loaded, the split BTF is reconciled with vmlinux
  BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff
  in vmlinux BTF and map all split BTF references to the base reference
  FWD sk_buff to the vmlinux BTF STRUCT sk_buff.

Support is also added to bpftool to be able to display split BTF
relative to its .BTF.base_ref section and display the reconciled form.

A previous approach to this problem [1] utilized standalone BTF for such
cases - where the BTF is not defined relative to base BTF so there is no
reconciliation required.  The problem with that approach is that from
the verifier perspective, some types are special, and having a custom
representation of a core kernel type that did not necessarily match the
current representation is not tenable.  So the approach taken here was
to preserve the split BTF model while minimizing the representation of
the context needed to reconcile split and current vmlinux BTF.

[1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/

Alan Maguire (13):
  libbpf: add support to btf__add_fwd() for ENUM64
  libbpf: add btf__new_split_base_ref() creating split BTF with
    reference base BTF
  selftests/bpf: test split base reference BTF generation
  libbpf: add btf__parse_opts() API for flexible BTF parsing
  bpftool: support displaying raw split BTF using base reference BTF as
    base
  kbuild,bpf: switch to using --btf_features for pahole v1.26 and later
  resolve_btfids: use .BTF.base_ref BTF as base BTF if -r option is used
  kbuild, bpf: add module-specific pahole/resolve_btfids flags for base
    reference BTF
  libbpf: split BTF reconciliation
  module, bpf: store BTF base reference pointer in struct module
  libbpf,bpf: share BTF reconcile-related code with kernel
  selftests/bpf: extend base reference tests cover BTF reconciliation
  bpftool: support displaying reconciled-with-base split BTF

 include/linux/btf.h                           |  29 +
 include/linux/module.h                        |   2 +
 kernel/bpf/Makefile                           |   8 +
 kernel/bpf/btf.c                              | 197 +++++-
 kernel/module/main.c                          |   2 +
 scripts/Makefile.btf                          |  12 +-
 scripts/Makefile.modfinal                     |   4 +-
 .../bpf/bpftool/Documentation/bpftool-btf.rst |  17 +
 tools/bpf/bpftool/btf.c                       |  33 +-
 tools/bpf/bpftool/main.c                      |  14 +-
 tools/bpf/bpftool/main.h                      |   2 +
 tools/bpf/resolve_btfids/main.c               |  17 +-
 tools/lib/bpf/Build                           |   2 +-
 tools/lib/bpf/btf.c                           | 434 +++++++++----
 tools/lib/bpf/btf.h                           |  56 ++
 tools/lib/bpf/btf_common.c                    | 146 +++++
 tools/lib/bpf/btf_reconcile.c                 | 614 ++++++++++++++++++
 tools/lib/bpf/libbpf.map                      |   3 +
 tools/lib/bpf/libbpf_internal.h               |   2 +
 .../bpf/prog_tests/btf_split_base_ref.c       | 254 ++++++++
 20 files changed, 1668 insertions(+), 180 deletions(-)
 create mode 100644 tools/lib/bpf/btf_common.c
 create mode 100644 tools/lib/bpf/btf_reconcile.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_split_base_ref.c

-- 
2.39.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC dwarves] btf_encoder: add base_ref BTF feature to generate split BTF with base refs
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-22 10:24 ` [RFC bpf-next 01/13] libbpf: add support to btf__add_fwd() for ENUM64 Alan Maguire
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Adding "base_ref" to --btf_features when generating split BTF will generate
split and base reference BTF - the latter allows us to map references from
split BTF to base BTF, even if that base BTF has changed.  It does this
by providing just enough information about the base types in the
.BTF.base_ref section.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 btf_encoder.c | 43 ++++++++++++++++++++++++++++++-------------
 dwarves.h     |  1 +
 pahole.c      | 28 ++++++++++++++++------------
 3 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index e1e3529..ec5fa87 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -75,7 +75,8 @@ struct btf_encoder {
 			  verbose,
 			  force,
 			  gen_floats,
-			  is_rel;
+			  is_rel,
+			  gen_base_ref;
 	uint32_t	  array_index_id;
 	struct {
 		struct var_info *vars;
@@ -1255,9 +1256,9 @@ static int btf_encoder__write_raw_file(struct btf_encoder *encoder)
 	return err;
 }
 
-static int btf_encoder__write_elf(struct btf_encoder *encoder)
+static int btf_encoder__write_elf(struct btf_encoder *encoder, const struct btf *btf,
+				  const char *btf_secname)
 {
-	struct btf *btf = encoder->btf;
 	const char *filename = encoder->filename;
 	GElf_Shdr shdr_mem, *shdr;
 	Elf_Data *btf_data = NULL;
@@ -1297,7 +1298,7 @@ static int btf_encoder__write_elf(struct btf_encoder *encoder)
 		if (shdr == NULL)
 			continue;
 		char *secname = elf_strptr(elf, strndx, shdr->sh_name);
-		if (strcmp(secname, ".BTF") == 0) {
+		if (strcmp(secname, btf_secname) == 0) {
 			btf_data = elf_getdata(scn, btf_data);
 			break;
 		}
@@ -1341,11 +1342,11 @@ static int btf_encoder__write_elf(struct btf_encoder *encoder)
 			goto unlink;
 		}
 
-		snprintf(cmd, sizeof(cmd), "%s --add-section .BTF=%s %s",
-			 llvm_objcopy, tmp_fn, filename);
+		snprintf(cmd, sizeof(cmd), "%s --add-section %s=%s %s",
+			 llvm_objcopy, btf_secname, tmp_fn, filename);
 		if (system(cmd)) {
-			fprintf(stderr, "%s: failed to add .BTF section to '%s': %d!\n",
-				__func__, filename, errno);
+			fprintf(stderr, "%s: failed to add %s section to '%s': %d!\n",
+				__func__, btf_secname, filename, errno);
 			goto unlink;
 		}
 
@@ -1380,12 +1381,27 @@ int btf_encoder__encode(struct btf_encoder *encoder)
 		fprintf(stderr, "%s: btf__dedup failed!\n", __func__);
 		return -1;
 	}
-
-	if (encoder->raw_output)
+	if (encoder->raw_output) {
 		err = btf_encoder__write_raw_file(encoder);
-	else
-		err = btf_encoder__write_elf(encoder);
-
+	} else {
+		struct btf *btf = encoder->btf;
+
+		if (encoder->gen_base_ref) {
+			btf = btf__new_split_base_ref(encoder->btf);
+			if (!btf) {
+				fprintf(stderr, "could not generate base reference split BTF: %s\n",
+					strerror(errno));
+				return -1;
+			}
+		}
+		err = btf_encoder__write_elf(encoder, btf, BTF_ELF_SEC);
+		if (!err && encoder->gen_base_ref)
+			err = btf_encoder__write_elf(encoder, btf__base_btf(btf), BTF_BASE_REF_ELF_SEC);
+		if (btf != encoder->btf) {
+			btf__free((struct btf *)btf__base_btf(btf));
+			btf__free(btf);
+		}
+	}
 	return err;
 }
 
@@ -1659,6 +1675,7 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 		encoder->force		 = conf_load->btf_encode_force;
 		encoder->gen_floats	 = conf_load->btf_gen_floats;
 		encoder->skip_encoding_vars = conf_load->skip_encoding_btf_vars;
+		encoder->gen_base_ref = conf_load->btf_gen_base_ref;
 		encoder->verbose	 = verbose;
 		encoder->has_index_type  = false;
 		encoder->need_index_type = false;
diff --git a/dwarves.h b/dwarves.h
index be0e29a..ad01979 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -87,6 +87,7 @@ struct conf_load {
 	bool			skip_encoding_btf_vars;
 	bool			btf_gen_floats;
 	bool			btf_encode_force;
+	bool			btf_gen_base_ref;
 	uint8_t			hashtable_bits;
 	uint8_t			max_hashtable_bits;
 	uint16_t		kabi_prefix_len;
diff --git a/pahole.c b/pahole.c
index 0b9c2de..965285d 100644
--- a/pahole.c
+++ b/pahole.c
@@ -1264,23 +1264,25 @@ ARGP_PROGRAM_VERSION_HOOK_DEF = dwarves_print_version;
  * BTF encoding apply; we encode type/decl tags, do not encode
  * floats, etc.  This ensures backwards compatibility.
  */
-#define BTF_FEATURE(name, alias, default_value)			\
-	{ #name, #alias, &conf_load.alias, default_value }
+#define BTF_FEATURE(name, alias, default_value, enable_for_all)		\
+	{ #name, #alias, &conf_load.alias, default_value, enable_for_all }
 
 struct btf_feature {
 	const char      *name;
 	const char      *option_alias;
 	bool		*conf_value;
 	bool		default_value;
+	bool		enable_for_all;
 } btf_features[] = {
-	BTF_FEATURE(encode_force, btf_encode_force, false),
-	BTF_FEATURE(var, skip_encoding_btf_vars, true),
-	BTF_FEATURE(float, btf_gen_floats, false),
-	BTF_FEATURE(decl_tag, skip_encoding_btf_decl_tag, true),
-	BTF_FEATURE(type_tag, skip_encoding_btf_type_tag, true),
-	BTF_FEATURE(enum64, skip_encoding_btf_enum64, true),
-	BTF_FEATURE(optimized_func, btf_gen_optimized, false),
-	BTF_FEATURE(consistent_func, skip_encoding_btf_inconsistent_proto, false),
+	BTF_FEATURE(encode_force, btf_encode_force, false, true),
+	BTF_FEATURE(var, skip_encoding_btf_vars, true, true),
+	BTF_FEATURE(float, btf_gen_floats, false, true),
+	BTF_FEATURE(decl_tag, skip_encoding_btf_decl_tag, true, true),
+	BTF_FEATURE(type_tag, skip_encoding_btf_type_tag, true, true),
+	BTF_FEATURE(enum64, skip_encoding_btf_enum64, true, true),
+	BTF_FEATURE(optimized_func, btf_gen_optimized, false, true),
+	BTF_FEATURE(consistent_func, skip_encoding_btf_inconsistent_proto, false, true),
+	BTF_FEATURE(base_ref, btf_gen_base_ref, false, false),
 };
 
 #define BTF_MAX_FEATURE_STR	1024
@@ -1348,8 +1350,10 @@ static void parse_btf_features(const char *features, bool strict)
 	if (strcmp(features, "all") == 0) {
 		int i;
 
-		for (i = 0; i < ARRAY_SIZE(btf_features); i++)
-			enable_btf_feature(&btf_features[i]);
+		for (i = 0; i < ARRAY_SIZE(btf_features); i++) {
+			if (btf_features[i].enable_for_all)
+				enable_btf_feature(&btf_features[i]);
+		}
 		return;
 	}
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 01/13] libbpf: add support to btf__add_fwd() for ENUM64
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
  2024-03-22 10:24 ` [RFC dwarves] btf_encoder: add base_ref BTF feature to generate split BTF with base refs Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-29 21:59   ` Andrii Nakryiko
  2024-03-22 10:24 ` [RFC bpf-next 02/13] libbpf: add btf__new_split_base_ref() creating split BTF with reference base BTF Alan Maguire
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Forward declaration of BTF_KIND_ENUM64 is added by supporting BTF_FWD_ENUM64
as an enumerated value for btf_fwd_kind; an ENUM64 forward is an 8-byte
signed enum64 with no values.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/btf.c | 7 ++++++-
 tools/lib/bpf/btf.h | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 2d0840ef599a..dcff6c29c851 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -2418,7 +2418,7 @@ int btf__add_enum64_value(struct btf *btf, const char *name, __u64 value)
  * Append new BTF_KIND_FWD type with:
  *   - *name*, non-empty/non-NULL name;
  *   - *fwd_kind*, kind of forward declaration, one of BTF_FWD_STRUCT,
- *     BTF_FWD_UNION, or BTF_FWD_ENUM;
+ *     BTF_FWD_UNION, BTF_FWD_ENUM or BTF_FWD_ENUM64;
  * Returns:
  *   - >0, type ID of newly added BTF type;
  *   - <0, on error.
@@ -2446,6 +2446,11 @@ int btf__add_fwd(struct btf *btf, const char *name, enum btf_fwd_kind fwd_kind)
 		 * values; we also assume a standard 4-byte size for it
 		 */
 		return btf__add_enum(btf, name, sizeof(int));
+	case BTF_FWD_ENUM64:
+		/* enum64 forward is similarly just an enum64 with no enum
+		 * values; assume 8 byte size, signed.
+		 */
+		return btf__add_enum64(btf, name, 8, true);
 	default:
 		return libbpf_err(-EINVAL);
 	}
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 8e6880d91c84..47d3e00b25c7 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -194,6 +194,7 @@ enum btf_fwd_kind {
 	BTF_FWD_STRUCT = 0,
 	BTF_FWD_UNION = 1,
 	BTF_FWD_ENUM = 2,
+	BTF_FWD_ENUM64 = 3,
 };
 
 LIBBPF_API int btf__add_fwd(struct btf *btf, const char *name, enum btf_fwd_kind fwd_kind);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 02/13] libbpf: add btf__new_split_base_ref() creating split BTF with reference base BTF
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
  2024-03-22 10:24 ` [RFC dwarves] btf_encoder: add base_ref BTF feature to generate split BTF with base refs Alan Maguire
  2024-03-22 10:24 ` [RFC bpf-next 01/13] libbpf: add support to btf__add_fwd() for ENUM64 Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-29 22:00   ` Andrii Nakryiko
  2024-03-22 10:24 ` [RFC bpf-next 03/13] selftests/bpf: test split base reference BTF generation Alan Maguire
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

To support more robust split BTF, adding supplemental context for the
base BTF type ids that split BTF refers to is required.  Without such
references, a simple shuffling of base BTF type ids (without any other
significant change) invalidates the split BTF.  Here the attempt is made
to store additional context to make split BTF more robust.

This context comes in the form of "reference" base BTF - this base BTF
constitutes the minimal BTF representation needed to disambiguate split BTF
references to base BTF.  So for example if a struct is referred to from
split -> base BTF, a forward representation of that struct is retained
in reference base BTF, and split BTF refers to it.  Base types are kept
intact in reference base BTF, and reference types like pointers are stored
as-is.  Avoiding struct/union/enum/enum64 expansion is important to keep
reference base BTF representation to a minimum size; however anonymous
struct, union and enum[64] types are represented in full since type details
are needed to disambiguate the reference - the name is not enough in those
cases since there is no name.  In practice these are rare; in sample
cases where reference base BTF was generated for in-tree kernel modules,
only a few were needed in base reference BTF.  These represent the
anonymous struct/unions that are used by the module but were de-duplicated
to use base vmlinux BTF ids instead.

When successful, a new representation of the split BTF passed in is
returned.  It refers to the reference BTF as its base, and both returned
split and base BTF need to be freed by the caller.

So to take a simple example, with split BTF with a type referring
to "struct sk_buff", we will generate base reference BTF with a
FWD struct sk_buff, and the split BTF will refer to it instead.

Tools like pahole can utilize such split BTF to popuate the .BTF section
(split BTF) and an additional .BTF.base_ref section (base reference BTF).
Then when the split BTF is loaded, the base reference BTF can be used
to reconcile split BTF and the current - and possibly changed - base BTF.

So for example if "struct sk_buff" was id 502 when the split BTF was
originally generated,  we can use the reference base BTF to see that
id 502 refers to a "struct sk_buff" and replace instances of id 502
with the current (reconciled) base BTF sk_buff type id.

Base reference BTF is small; when building a kernel with all modules
using base reference BTF as a test, the average size for module
base reference BTF is 1555 bytes (standard deviation 1563).  The
maximum base reference BTF size across ~2700 modules was 37895 bytes.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/btf.c      | 215 +++++++++++++++++++++++++++++++++++++--
 tools/lib/bpf/btf.h      |  15 +++
 tools/lib/bpf/libbpf.map |   1 +
 3 files changed, 225 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index dcff6c29c851..f02477af3d79 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1771,9 +1771,8 @@ static int btf_rewrite_str(__u32 *str_off, void *ctx)
 	return 0;
 }
 
-int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
+static int btf_add_type(struct btf_pipe *p, const struct btf_type *src_type)
 {
-	struct btf_pipe p = { .src = src_btf, .dst = btf };
 	struct btf_type *t;
 	int sz, err;
 
@@ -1782,20 +1781,27 @@ int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_t
 		return libbpf_err(sz);
 
 	/* deconstruct BTF, if necessary, and invalidate raw_data */
-	if (btf_ensure_modifiable(btf))
+	if (btf_ensure_modifiable(p->dst))
 		return libbpf_err(-ENOMEM);
 
-	t = btf_add_type_mem(btf, sz);
+	t = btf_add_type_mem(p->dst, sz);
 	if (!t)
 		return libbpf_err(-ENOMEM);
 
 	memcpy(t, src_type, sz);
 
-	err = btf_type_visit_str_offs(t, btf_rewrite_str, &p);
+	err = btf_type_visit_str_offs(t, btf_rewrite_str, p);
 	if (err)
 		return libbpf_err(err);
 
-	return btf_commit_type(btf, sz);
+	return btf_commit_type(p->dst, sz);
+}
+
+int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
+{
+	struct btf_pipe p = { .src = src_btf, .dst = btf };
+
+	return btf_add_type(&p, src_type);
 }
 
 static int btf_rewrite_type_ids(__u32 *type_id, void *ctx)
@@ -5217,3 +5223,200 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
 
 	return 0;
 }
+
+struct btf_ref {
+	struct btf_pipe pipe;
+	__u32 *ids;
+	unsigned int nr_base_types;
+	unsigned int diff_id;
+};
+
+static bool btf_is_eligible_named_fwd(const struct btf_type *t)
+{
+	return (btf_is_composite(t) || btf_is_any_enum(t)) && t->name_off != 0;
+}
+
+static int btf_add_ref_type_ids(__u32 *id, void *ctx)
+{
+	struct btf_ref *ref = ctx;
+	struct btf_type *t = btf_type_by_id(ref->pipe.src, *id);
+	int ret;
+
+	/* split BTF id, not needed */
+	if (*id > ref->nr_base_types)
+		return 0;
+	/* already added ? */
+	if (ref->ids[*id] <= BTF_MAX_NR_TYPES)
+		return 0;
+	ref->ids[*id] = *id;
+
+	/* struct/union members not needed, except for anonymous structs
+	 * and unions, which we need since name won't help us determine
+	 * matches; so if a named struct/union, no need to recurse
+	 * into members.
+	 */
+	if (btf_is_eligible_named_fwd(t))
+		return 0;
+
+	/* ensure references in type are added also. */
+	ret = btf_type_visit_type_ids(t, btf_add_ref_type_ids, ctx);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+/* All split BTF ids will be shifted downwards since there are less base BTF
+ * in reference base BTF, and for those that refer to base BTF, we use the
+ * reference map to map from original base BTF to reference base BTF id.
+ */
+static int btf_update_ref_type_ids(__u32 *id, void *ctx)
+{
+	struct btf_ref *ref = ctx;
+
+	if (*id >= ref->nr_base_types)
+		*id -= ref->diff_id;
+	else
+		*id = ref->ids[*id];
+	return 0;
+}
+
+/* Create updated split BTF with "reference" base BTF; reference base BTF
+ * consists of BTF information required to clarify the types that split
+ * BTF refers to, omitting unneeded details.  Specifically it will contain
+ * base types and forward declarations of structs, unions and enumerated
+ * types, along with associated reference types like pointers, arrays etc.
+ *
+ * The only case where structs, unions or enumerated types are fully represented
+ * is when they are anonymous; in such cases, info about type content is needed
+ * to clarify type references.
+ *
+ * We return newly-created split BTF where the split BTf refers to a newly-created
+ * base reference BTF. Both must be freed separately by the caller.
+ *
+ * When creating the BTF representation for a module and provided with the
+ * base_ref option, pahole will create split BTF using this API, and store
+ * the base reference BTF in the .BTF.base.reference section.
+ */
+struct btf *btf__new_split_base_ref(struct btf *src_btf)
+{
+	unsigned int n = btf__type_cnt(src_btf);
+	const struct btf_type *t;
+	struct btf_ref ref = {};
+	struct btf *base_btf = NULL, *split_btf = NULL;
+	__u32 i, id = 0;
+	int ret = 0;
+
+	/* src BTF must be split BTF. */
+	if (!btf__base_btf(src_btf)) {
+		errno = -EINVAL;
+		return NULL;
+	}
+	base_btf = btf__new_empty();
+	if (!base_btf)
+		return NULL;
+	ref.ids = calloc(n, sizeof(*ref.ids));
+	if (!ref.ids) {
+		ret = -ENOMEM;
+		goto err_out;
+	}
+	for (i = 1; i < n; i++)
+		ref.ids[i] = BTF_UNPROCESSED_ID;
+	ref.pipe.src = src_btf;
+	ref.pipe.dst = base_btf;
+	ref.pipe.str_off_map = hashmap__new(btf_dedup_identity_hash_fn, btf_dedup_equal_fn, NULL);
+	if (IS_ERR(ref.pipe.str_off_map)) {
+		ret = -ENOMEM;
+		goto err_out;
+	}
+	ref.nr_base_types = btf__type_cnt(btf__base_btf(src_btf));
+
+	/* Pass over src split BTF; generate the list of base BTF
+	 * type ids it references; these will constitute our base
+	 * reference BTF set.
+	 */
+	for (i = src_btf->start_id; i < n; i++) {
+		t = btf__type_by_id(src_btf, i);
+
+		ret = btf_type_visit_type_ids((struct btf_type *)t,  btf_add_ref_type_ids, &ref);
+		if (ret < 0)
+			goto err_out;
+	}
+	/* Next add types for each of the required references. */
+	for (i = 1; i < src_btf->start_id; i++) {
+		if (ref.ids[i] > BTF_MAX_NR_TYPES)
+			continue;
+		t = btf_type_by_id(src_btf, i);
+		/* for named struct/unions/enums, use a fwd since we can match
+		 * via name without details.
+		 */
+		if (btf_is_eligible_named_fwd(t)) {
+			enum btf_fwd_kind fwd_kind;
+
+			switch (btf_kind(t)) {
+			case BTF_KIND_STRUCT:
+				fwd_kind = BTF_FWD_STRUCT;
+				break;
+			case BTF_KIND_UNION:
+				fwd_kind = BTF_FWD_UNION;
+				break;
+			case BTF_KIND_ENUM:
+				fwd_kind = BTF_FWD_ENUM;
+				break;
+			case BTF_KIND_ENUM64:
+				fwd_kind = BTF_FWD_ENUM64;
+				break;
+			default:
+				pr_warn("unexpected kind [%u] when creating base reference BTF.\n",
+					btf_kind(t));
+				goto err_out;
+			}
+			ret = btf__add_fwd(base_btf, btf__name_by_offset(src_btf, t->name_off),
+					   fwd_kind);
+		} else {
+			ret = btf_add_type(&ref.pipe, t);
+		}
+		if (ret < 0)
+			goto err_out;
+		ref.ids[i] = ++id;
+	}
+	/* now create new split BTF with reference base BTF as its base; we end up with
+	 * split BTF that has base BTF that represents enough about its base references
+	 * to allow it to be reconciled with the base BTF available.
+	 */
+	split_btf = btf__new_empty_split(base_btf);
+	if (!split_btf)
+		goto err_out;
+
+	ref.pipe.dst = split_btf;
+	/* all split BTF ids will be shifted downwards since there are less base BTF ids
+	 * in reference base BTF.
+	 */
+	ref.diff_id = ref.nr_base_types - btf__type_cnt(base_btf);
+
+	/* First add all split types */
+	for (i = src_btf->start_id; i < n; i++) {
+		t = btf_type_by_id(src_btf, i);
+		ret = btf_add_type(&ref.pipe, t);
+		if (ret < 0)
+			goto err_out;
+	}
+	/* Now update base/split BTF ids. */
+	for (i = 1; i < btf__type_cnt(split_btf); i++) {
+		t = btf_type_by_id(split_btf, i);
+
+		ret = btf_type_visit_type_ids((struct btf_type *)t,  btf_update_ref_type_ids, &ref);
+		if (ret < 0)
+			goto err_out;
+	}
+	free(ref.ids);
+	hashmap__free(ref.pipe.str_off_map);
+	return split_btf;
+err_out:
+	free(ref.ids);
+	hashmap__free(ref.pipe.str_off_map);
+	btf__free(split_btf);
+	btf__free(base_btf);
+	if (ret < 0)
+		errno = -ret;
+	return NULL;
+}
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 47d3e00b25c7..f2a853d6fa55 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -107,6 +107,21 @@ LIBBPF_API struct btf *btf__new_empty(void);
  */
 LIBBPF_API struct btf *btf__new_empty_split(struct btf *base_btf);
 
+/**
+ * @brief **btf__new_split_base_ref()** creates a new version of the
+ * split BTF *src_btf* passed in.  Its new base BTF will only contain
+ * the types needed to improve robustness of the split BTF to
+ * small changes in base BTF.  When that split BTF is loaded against
+ * a (possibly changed) base, this reference base BTF will help update
+ * references to that (possibly changed) base BTF.
+ *
+ * Both the new split and its reference base BTF it is built on top of
+ * must be freed by the caller.
+ *
+ * On error, NULL is returned and errno set.
+ */
+LIBBPF_API struct btf *btf__new_split_base_ref(struct btf *src_btf);
+
 LIBBPF_API struct btf *btf__parse(const char *path, struct btf_ext **btf_ext);
 LIBBPF_API struct btf *btf__parse_split(const char *path, struct btf *base_btf);
 LIBBPF_API struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 51732ecb1385..3d53b1781af1 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -414,5 +414,6 @@ LIBBPF_1.4.0 {
 		bpf_raw_tracepoint_open_opts;
 		bpf_token_create;
 		btf__new_split;
+		btf__new_split_base_ref;
 		btf_ext__raw_data;
 } LIBBPF_1.3.0;
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 03/13] selftests/bpf: test split base reference BTF generation
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
                   ` (2 preceding siblings ...)
  2024-03-22 10:24 ` [RFC bpf-next 02/13] libbpf: add btf__new_split_base_ref() creating split BTF with reference base BTF Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-22 10:24 ` [RFC bpf-next 04/13] libbpf: add btf__parse_opts() API for flexible BTF parsing Alan Maguire
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Test generation of split "base reference" BTF, ensuring that

- FWDs are used in place of full named struct/union declarations
- FWDs are used in place of full named enum declarations
- anonymous struct/unions are represented in full
- anonymous enums are represented in full
- types unreferenced from split BTF are not present in base reference
  BTF

Also test that with vmlinux BTF and split BTF based upon it,
we only represent needed base types referenced from split BTF.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 .../bpf/prog_tests/btf_split_base_ref.c       | 219 ++++++++++++++++++
 1 file changed, 219 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_split_base_ref.c

diff --git a/tools/testing/selftests/bpf/prog_tests/btf_split_base_ref.c b/tools/testing/selftests/bpf/prog_tests/btf_split_base_ref.c
new file mode 100644
index 000000000000..d611167fa4b2
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/btf_split_base_ref.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024, Oracle and/or its affiliates. */
+
+#include <test_progs.h>
+#include <bpf/btf.h>
+#include "btf_helpers.h"
+
+/* Fabricate base, split BTF with references to base types needed; then create
+ * split BTF with reference base BTF and ensure expectations are met:
+ *  - only referenced base types from split BTF are present
+ *  - struct/union/enum are represented as FWDs unless anonymous, when they
+ *    are represented in full
+ */
+static void test_split_base_ref(void)
+{
+	struct btf *btf1 = NULL, *btf2 = NULL, *btf3 = NULL, *btf4 = NULL;
+
+	btf1 = btf__new_empty();
+	if (!ASSERT_OK_PTR(btf1, "empty_main_btf"))
+		return;
+
+	btf__add_int(btf1, "int", 4, BTF_INT_SIGNED);	/* [1] int */
+	btf__add_ptr(btf1, 1);				/* [2] ptr to int */
+	btf__add_struct(btf1, "s1", 8);			/* [3] struct s1 { */
+	btf__add_field(btf1, "f1", 2, 0, 0);		/*      int *f1; */
+							/* } */
+	btf__add_struct(btf1, "", 12);			/* [4] struct { */
+	btf__add_field(btf1, "f1", 1, 0, 0);		/*	int f1; */
+	btf__add_field(btf1, "f2", 3, 32, 0);		/*	struct s1 f2; */
+							/* } */
+	btf__add_int(btf1, "unsigned int", 4, 0);	/* [5] unsigned int */
+	btf__add_union(btf1, "u1", 12);			/* [6] union u1 { */
+	btf__add_field(btf1, "f1", 1, 0, 0);		/*	int f1; */
+	btf__add_field(btf1, "f2", 2, 0, 0);		/*	int *f2; */
+							/* } */
+	btf__add_union(btf1, "", 4);			/* [7] union { */
+	btf__add_field(btf1, "f1", 1, 0, 0);		/*	int f1; */
+							/* } */
+	btf__add_enum(btf1, "e1", 4);			/* [8] enum e1 { */
+	btf__add_enum_value(btf1, "v1", 1);		/*	v1 = 1; */
+							/* } */
+	btf__add_enum(btf1, "", 4);			/* [9] enum { */
+	btf__add_enum_value(btf1, "av1", 2);		/*	av1 = 2; */
+							/* } */
+	btf__add_enum64(btf1, "e641", 8, true);		/* [10] enum64 { */
+	btf__add_enum64_value(btf1, "v1", 1024);	/*	v1 = 1024; */
+							/* } */
+	btf__add_enum64(btf1, "", 8, true);		/* [11] enum64 { */
+	btf__add_enum64_value(btf1, "v1", 1025);	/*	v1 = 1025; */
+							/* } */
+	btf__add_struct(btf1, "unneeded", 4);		/* struct unneeded { */
+	btf__add_field(btf1, "f1", 1, 0, 0);		/*	int f1; */
+							/* } */
+	VALIDATE_RAW_BTF(
+		btf1,
+		"[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+		"[2] PTR '(anon)' type_id=1",
+		"[3] STRUCT 's1' size=8 vlen=1\n"
+		"\t'f1' type_id=2 bits_offset=0",
+		"[4] STRUCT '(anon)' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=3 bits_offset=32",
+		"[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)",
+		"[6] UNION 'u1' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=2 bits_offset=0",
+		"[7] UNION '(anon)' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'v1' val=1",
+		"[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'av1' val=2",
+		"[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1024",
+		"[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1025",
+		"[12] STRUCT 'unneeded' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0");
+
+	btf2 = btf__new_empty_split(btf1);
+	if (!ASSERT_OK_PTR(btf2, "empty_split_btf"))
+		goto cleanup;
+
+	btf__add_ptr(btf2, 3);				/* [9] ptr to struct s1 */
+	/* add ptr to struct anon */
+	btf__add_ptr(btf2, 4);				/* [10] ptr to struct (anon) */
+	btf__add_const(btf2, 6);			/* [11] const union u1 */
+	btf__add_restrict(btf2, 7);			/* [12] restrict union (anon) */
+	btf__add_volatile(btf2, 8);			/* [13] volatile enum e1 */
+	btf__add_typedef(btf2, "et", 9);		/* [14] typedef enum (anon) */
+	btf__add_const(btf2, 10);			/* [15] const enum64 e641 */
+	btf__add_ptr(btf2, 11);				/* [16] restrict enum64 (anon) */
+
+	VALIDATE_RAW_BTF(
+		btf2,
+		"[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+		"[2] PTR '(anon)' type_id=1",
+		"[3] STRUCT 's1' size=8 vlen=1\n"
+		"\t'f1' type_id=2 bits_offset=0",
+		"[4] STRUCT '(anon)' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=3 bits_offset=32",
+		"[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)",
+		"[6] UNION 'u1' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=2 bits_offset=0",
+		"[7] UNION '(anon)' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'v1' val=1",
+		"[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'av1' val=2",
+		"[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1024",
+		"[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1025",
+		"[12] STRUCT 'unneeded' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[13] PTR '(anon)' type_id=3",
+		"[14] PTR '(anon)' type_id=4",
+		"[15] CONST '(anon)' type_id=6",
+		"[16] RESTRICT '(anon)' type_id=7",
+		"[17] VOLATILE '(anon)' type_id=8",
+		"[18] TYPEDEF 'et' type_id=9",
+		"[19] CONST '(anon)' type_id=10",
+		"[20] PTR '(anon)' type_id=11");
+
+	btf3 = btf__new_split_base_ref(btf2);
+	if (!ASSERT_OK_PTR(btf1, "new_split_base_ref"))
+		goto cleanup;
+
+	btf4 = (struct btf *)btf__base_btf(btf3);
+	if (!ASSERT_OK_PTR(btf4, "base_ref_btf"))
+		goto cleanup;
+
+	VALIDATE_RAW_BTF(
+		btf3,
+		"[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+		"[2] FWD 's1' fwd_kind=struct",
+		"[3] STRUCT '(anon)' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=2 bits_offset=32",
+		"[4] FWD 'u1' fwd_kind=union",
+		"[5] UNION '(anon)' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[6] ENUM 'e1' encoding=UNSIGNED size=4 vlen=0",
+		"[7] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'av1' val=2",
+		"[8] ENUM64 'e641' encoding=SIGNED size=8 vlen=0",
+		"[9] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1025",
+		"[10] PTR '(anon)' type_id=2",
+		"[11] PTR '(anon)' type_id=3",
+		"[12] CONST '(anon)' type_id=4",
+		"[13] RESTRICT '(anon)' type_id=5",
+		"[14] VOLATILE '(anon)' type_id=6",
+		"[15] TYPEDEF 'et' type_id=7",
+		"[16] CONST '(anon)' type_id=8",
+		"[17] PTR '(anon)' type_id=9");
+
+cleanup:
+	btf__free(btf4);
+	btf__free(btf3);
+	btf__free(btf2);
+	btf__free(btf1);
+}
+
+/* create split reference BTF from vmlinux + split BTF with a few type references;
+ * ensure the resultant split reference BTF is as expected, containing only types
+ * needed to disambiguate references from split BTF.
+ */
+static void test_split_base_ref_vmlinux(void)
+{
+	struct btf *split_btf = NULL, *vmlinux_btf = btf__load_vmlinux_btf();
+	struct btf *split_ref = NULL, *base_ref = NULL;
+	__s32 int_id, sk_buff_id;
+
+	if (!ASSERT_OK_PTR(vmlinux_btf, "load_vmlinux"))
+		return;
+	int_id = btf__find_by_name_kind(vmlinux_btf, "int", BTF_KIND_INT);
+	if (!ASSERT_GT(int_id, 0, "find_int"))
+		goto cleanup;
+	sk_buff_id = btf__find_by_name_kind(vmlinux_btf, "sk_buff", BTF_KIND_STRUCT);
+	if (!ASSERT_GT(sk_buff_id, 0, "find_sk_buff_id"))
+		goto cleanup;
+	split_btf = btf__new_empty_split(vmlinux_btf);
+	if (!ASSERT_OK_PTR(split_btf, "new_split"))
+		goto cleanup;
+	btf__add_typedef(split_btf, "myint", int_id);
+	btf__add_ptr(split_btf, sk_buff_id);
+
+	split_ref = btf__new_split_base_ref(split_btf);
+	if (!ASSERT_OK_PTR(split_ref, "new_split_base_ref"))
+		goto cleanup;
+
+	base_ref = (struct btf *)btf__base_btf(split_ref);
+	if (!ASSERT_OK_PTR(split_ref, "base_ref_btf"))
+		goto cleanup;
+	VALIDATE_RAW_BTF(
+		split_ref,
+		"[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+		"[2] FWD 'sk_buff' fwd_kind=struct",
+		"[3] TYPEDEF 'myint' type_id=1",
+		"[4] PTR '(anon)' type_id=2");
+
+cleanup:
+	btf__free(split_ref);
+	btf__free(base_ref);
+	btf__free(split_btf);
+	btf__free(vmlinux_btf);
+}
+
+void test_btf_split_base_ref(void)
+{
+	if (test__start_subtest("split_base_ref"))
+		test_split_base_ref();
+	if (test__start_subtest("split_base_ref_vmlinux"))
+		test_split_base_ref_vmlinux();
+}
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 04/13] libbpf: add btf__parse_opts() API for flexible BTF parsing
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
                   ` (3 preceding siblings ...)
  2024-03-22 10:24 ` [RFC bpf-next 03/13] selftests/bpf: test split base reference BTF generation Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-29 22:00   ` Andrii Nakryiko
  2024-03-22 10:24 ` [RFC bpf-next 05/13] bpftool: support displaying raw split BTF using base reference BTF as base Alan Maguire
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Options cover existing parsing scenarios (ELF, raw, retrieving
.BTF.ext) and also allow specification of the ELF section name
containing BTF.  This will allow consumers to retrieve BTF from
.BTF.base_ref sections (BTF_BASE_REF_ELF_SEC) also.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/btf.c      | 24 ++++++++++++++++++------
 tools/lib/bpf/btf.h      | 32 ++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index f02477af3d79..d43c3eba27e0 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1084,7 +1084,7 @@ struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf)
 	return libbpf_ptr(btf_new(data, size, base_btf));
 }
 
-static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
+static struct btf *btf_parse_elf(const char *path, const char *btf_sec, struct btf *base_btf,
 				 struct btf_ext **btf_ext)
 {
 	Elf_Data *btf_data = NULL, *btf_ext_data = NULL;
@@ -1146,7 +1146,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
 				idx, path);
 			goto done;
 		}
-		if (strcmp(name, BTF_ELF_SEC) == 0) {
+		if (strcmp(name, btf_sec) == 0) {
 			btf_data = elf_getdata(scn, 0);
 			if (!btf_data) {
 				pr_warn("failed to get section(%d, %s) data from %s\n",
@@ -1166,7 +1166,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
 	}
 
 	if (!btf_data) {
-		pr_warn("failed to find '%s' ELF section in %s\n", BTF_ELF_SEC, path);
+		pr_warn("failed to find '%s' ELF section in %s\n", btf_sec, path);
 		err = -ENODATA;
 		goto done;
 	}
@@ -1212,12 +1212,12 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
 
 struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext)
 {
-	return libbpf_ptr(btf_parse_elf(path, NULL, btf_ext));
+	return libbpf_ptr(btf_parse_elf(path, BTF_ELF_SEC, NULL, btf_ext));
 }
 
 struct btf *btf__parse_elf_split(const char *path, struct btf *base_btf)
 {
-	return libbpf_ptr(btf_parse_elf(path, base_btf, NULL));
+	return libbpf_ptr(btf_parse_elf(path, BTF_ELF_SEC, base_btf, NULL));
 }
 
 static struct btf *btf_parse_raw(const char *path, struct btf *base_btf)
@@ -1293,6 +1293,18 @@ struct btf *btf__parse_raw_split(const char *path, struct btf *base_btf)
 	return libbpf_ptr(btf_parse_raw(path, base_btf));
 }
 
+struct btf *btf__parse_opts(const char *path, struct btf_parse_opts *opts)
+{
+	if (!OPTS_VALID(opts, btf_parse_opts))
+		return libbpf_err_ptr(-EINVAL);
+	if (opts && !opts->btf_sec)
+		return btf__parse_raw_split(path, opts ? opts->base_btf : NULL);
+	return libbpf_ptr(btf_parse_elf(path,
+					opts ? opts->btf_sec : BTF_ELF_SEC,
+					opts ? opts->base_btf : NULL,
+					opts ? &opts->btf_ext : NULL));
+}
+
 static struct btf *btf_parse(const char *path, struct btf *base_btf, struct btf_ext **btf_ext)
 {
 	struct btf *btf;
@@ -1307,7 +1319,7 @@ static struct btf *btf_parse(const char *path, struct btf *base_btf, struct btf_
 		return btf;
 	if (err != -EPROTO)
 		return ERR_PTR(err);
-	return btf_parse_elf(path, base_btf, btf_ext);
+	return btf_parse_elf(path, BTF_ELF_SEC, base_btf, btf_ext);
 }
 
 struct btf *btf__parse(const char *path, struct btf_ext **btf_ext)
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index f2a853d6fa55..c63043520149 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -18,6 +18,7 @@ extern "C" {
 
 #define BTF_ELF_SEC ".BTF"
 #define BTF_EXT_ELF_SEC ".BTF.ext"
+#define BTF_BASE_REF_ELF_SEC ".BTF.base_ref"
 #define MAPS_ELF_SEC ".maps"
 
 struct btf;
@@ -129,6 +130,37 @@ LIBBPF_API struct btf *btf__parse_elf_split(const char *path, struct btf *base_b
 LIBBPF_API struct btf *btf__parse_raw(const char *path);
 LIBBPF_API struct btf *btf__parse_raw_split(const char *path, struct btf *base_btf);
 
+struct btf_parse_opts {
+	size_t sz;
+	/* use base BTF to parse split BTF */
+	struct btf *base_btf;
+	/* retrieve optional .BTF.ext info */
+	struct btf_ext *btf_ext;
+	/* BTF section name */
+	const char *btf_sec;
+        size_t:0;
+};
+
+#define btf_parse_opts__last_field btf_sec
+
+/* @brief **btf__parse_opts()** parses BTF information from either a
+ * raw BTF file (*btf_sec* is NULL) or from the specified BTF section,
+ * also retrieving  .BTF.ext info if *btf_ext* is non-NULL.  If
+ * *base_btf* is specified, use it to parse split BTF from the
+ * specified location.
+ *
+ * @return new BTF object instance which has to be eventually freed with
+ * **btf__free()**
+ *
+ * On error, error-code-encoded-as-pointer is returned, not a NULL. To extract
+ * error code from such a pointer `libbpf_get_error()` should be used. If
+ * `libbpf_set_strict_mode(LIBBPF_STRICT_CLEAN_PTRS)` is enabled, NULL is
+ * returned on error instead. In both cases thread-local `errno` variable is
+ * always set to error code as well.
+ */
+
+LIBBPF_API struct btf *btf__parse_opts(const char *path, struct btf_parse_opts *opts);
+
 LIBBPF_API struct btf *btf__load_vmlinux_btf(void);
 LIBBPF_API struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_btf);
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 3d53b1781af1..e9a7cb9c3c5b 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -415,5 +415,6 @@ LIBBPF_1.4.0 {
 		bpf_token_create;
 		btf__new_split;
 		btf__new_split_base_ref;
+		btf__parse_opts;
 		btf_ext__raw_data;
 } LIBBPF_1.3.0;
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 05/13] bpftool: support displaying raw split BTF using base reference BTF as base
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
                   ` (4 preceding siblings ...)
  2024-03-22 10:24 ` [RFC bpf-next 04/13] libbpf: add btf__parse_opts() API for flexible BTF parsing Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-22 10:24 ` [RFC bpf-next 06/13] kbuild,bpf: switch to using --btf_features for pahole v1.26 and later Alan Maguire
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

If no base BTF can be found, fall back to checking for the .BTF.base_ref
section and use it to display raw split BTF.  Note that C format output
is not supported for base reference BTF as structs and unions are generally
represented as FWDs.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/bpf/bpftool/btf.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index 91fcb75babe3..f1e0ea06f05f 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -552,6 +552,7 @@ static int do_dump(int argc, char **argv)
 	struct btf *btf = NULL, *base = NULL;
 	__u32 root_type_ids[2];
 	int root_type_cnt = 0;
+	bool c_compat = true;
 	bool dump_c = false;
 	__u32 btf_id = -1;
 	const char *src;
@@ -631,6 +632,21 @@ static int do_dump(int argc, char **argv)
 			base = get_vmlinux_btf_from_sysfs();
 
 		btf = btf__parse_split(*argv, base ?: base_btf);
+		/* Finally check for presence of "base reference" base BTF -
+		 * this will allow dumping in raw but not C format, since
+		 * base reference BTF will not contain fully-described
+		 * structs/unions.
+		 */
+		if (!btf && !base && !base_btf) {
+			LIBBPF_OPTS(btf_parse_opts, optp);
+
+			optp.btf_sec = BTF_BASE_REF_ELF_SEC;
+			base_btf = btf__parse_opts(*argv, &optp);
+			if (base_btf)
+				btf = btf__parse_split(*argv, base_btf);
+			if (btf)
+				c_compat = false;
+		}
 		if (!btf) {
 			err = -errno;
 			p_err("failed to load BTF from %s: %s",
@@ -686,6 +702,11 @@ static int do_dump(int argc, char **argv)
 	}
 
 	if (dump_c) {
+		if (!c_compat) {
+			p_err("Cannot represent split BTF with base reference BTF in C format");
+			err = -ENOTSUP;
+			goto done;
+		}
 		if (json_output) {
 			p_err("JSON output for C-syntax dump is not supported");
 			err = -ENOTSUP;
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 06/13] kbuild,bpf: switch to using --btf_features for pahole v1.26 and later
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
                   ` (5 preceding siblings ...)
  2024-03-22 10:24 ` [RFC bpf-next 05/13] bpftool: support displaying raw split BTF using base reference BTF as base Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-29 22:01   ` Andrii Nakryiko
  2024-03-22 10:24 ` [RFC bpf-next 07/13] resolve_btfids: use .BTF.base_ref BTF as base BTF if -r option is used Alan Maguire
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

The btf_features list can be used for pahole v1.26 and later -
it is useful because if a feature is not yet implemented it will
not exit with a failure message.  This will allow us to add feature
requests to the pahole options without having to check pahole versions
in future; if the version of pahole supports the feature it will be
added.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 scripts/Makefile.btf | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
index 82377e470aed..9694ca3c5252 100644
--- a/scripts/Makefile.btf
+++ b/scripts/Makefile.btf
@@ -12,8 +12,11 @@ pahole-flags-$(call test-ge, $(pahole-ver), 121)	+= --btf_gen_floats
 
 pahole-flags-$(call test-ge, $(pahole-ver), 122)	+= -j
 
-pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)		+= --lang_exclude=rust
-
 pahole-flags-$(call test-ge, $(pahole-ver), 125)	+= --skip_encoding_btf_inconsistent_proto --btf_gen_optimized
 
+# Switch to using --btf_features for v1.26 and later.
+pahole-flags-$(call test-ge, $(pahole-ver), 126)	= -j --btf_features=encode_force,var,float,emum64,decl_tag,type_tag,optimized_func,consistent_func
+
+pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)		+= --lang_exclude=rust
+
 export PAHOLE_FLAGS := $(pahole-flags-y)
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 07/13] resolve_btfids: use .BTF.base_ref BTF as base BTF if -r option is used
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
                   ` (6 preceding siblings ...)
  2024-03-22 10:24 ` [RFC bpf-next 06/13] kbuild,bpf: switch to using --btf_features for pahole v1.26 and later Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-22 10:24 ` [RFC bpf-next 08/13] kbuild, bpf: add module-specific pahole/resolve_btfids flags for base reference BTF Alan Maguire
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

When resolving BTF ids, use the base reference BTF in the module
when passed the -r option.  Both references to base BTF from split
BTF and BTF ids will be reconciled with base vmlinux on module
load.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/bpf/resolve_btfids/main.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/resolve_btfids/main.c b/tools/bpf/resolve_btfids/main.c
index d9520cb826b3..cdab366bd4d6 100644
--- a/tools/bpf/resolve_btfids/main.c
+++ b/tools/bpf/resolve_btfids/main.c
@@ -115,6 +115,7 @@ struct object {
 	const char *path;
 	const char *btf;
 	const char *base_btf_path;
+	int base_ref;
 
 	struct {
 		int		 fd;
@@ -532,11 +533,21 @@ static int symbols_resolve(struct object *obj)
 	__u32 nr_types;
 
 	if (obj->base_btf_path) {
-		base_btf = btf__parse(obj->base_btf_path, NULL);
+		LIBBPF_OPTS(btf_parse_opts, optp);
+		const char *path;
+
+		if (obj->base_ref) {
+			optp.btf_sec = BTF_BASE_REF_ELF_SEC;
+			path = obj->path;
+		} else {
+			optp.btf_sec = BTF_ELF_SEC;
+			path = obj->base_btf_path;
+		}
+		base_btf = btf__parse_opts(path, &optp);
 		err = libbpf_get_error(base_btf);
 		if (err) {
 			pr_err("FAILED: load base BTF from %s: %s\n",
-			       obj->base_btf_path, strerror(-err));
+			       path, strerror(-err));
 			return -1;
 		}
 	}
@@ -781,6 +792,8 @@ int main(int argc, const char **argv)
 			   "BTF data"),
 		OPT_STRING('b', "btf_base", &obj.base_btf_path, "file",
 			   "path of file providing base BTF"),
+		OPT_INCR('r', "base_ref", &obj.base_ref,
+			 "use base reference BTF as base"),
 		OPT_END()
 	};
 	int err = -1;
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 08/13] kbuild, bpf: add module-specific pahole/resolve_btfids flags for base reference BTF
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
                   ` (7 preceding siblings ...)
  2024-03-22 10:24 ` [RFC bpf-next 07/13] resolve_btfids: use .BTF.base_ref BTF as base BTF if -r option is used Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-23  2:50   ` Alexei Starovoitov
  2024-03-22 10:24 ` [RFC bpf-next 09/13] libbpf: split BTF reconciliation Alan Maguire
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Support creation of module BTF along with base reference BTF;
the latter is stored in a .BTF.base_ref ELF section and supplements
split BTF references to base BTF with information about base types,
allowing for later reconciliation of split BTF with a (possibly
changed) base.  resolve_btfids uses the "-r" option to specify
that the BTF.ids section should be populated with split BTF
relative to the added .BTF.base_ref section rather than relative
to the vmlinux base.

Modules using base reference BTF can be built via

BTF_BASE_REF=1 make -C. -M=path2/module

The default is still to use split BTF relative to vmlinux.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 scripts/Makefile.btf      | 7 +++++++
 scripts/Makefile.modfinal | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
index 9694ca3c5252..c8212b2ab7ca 100644
--- a/scripts/Makefile.btf
+++ b/scripts/Makefile.btf
@@ -19,4 +19,11 @@ pahole-flags-$(call test-ge, $(pahole-ver), 126)	= -j --btf_features=encode_forc
 
 pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)		+= --lang_exclude=rust
 
+ifeq ($(BTF_BASE_REF),1)
+module-pahole-flags-$(call test-ge, $(pahole-ver), 126)	+= --btf_features=base_ref
+module-resolve-btfids-flags-$(call test-ge, $(pahole-ver), 126) = -r
+endif
+
 export PAHOLE_FLAGS := $(pahole-flags-y)
+export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y)
+export MODULE_RESOLVE_BTFIDS_FLAGS := $(module-resolve-btfids-flags-y)
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 8568d256d6fb..22f5bb0a60a6 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -39,8 +39,8 @@ quiet_cmd_btf_ko = BTF [M] $@
 	if [ ! -f vmlinux ]; then					\
 		printf "Skipping BTF generation for %s due to unavailability of vmlinux\n" $@ 1>&2; \
 	else								\
-		LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) --btf_base vmlinux $@; \
-		$(RESOLVE_BTFIDS) -b vmlinux $@; 			\
+		LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) $(MODULE_PAHOLE_FLAGS) --btf_base vmlinux $@; \
+		$(RESOLVE_BTFIDS) $(MODULE_RESOLVE_BTFIDS_FLAGS) -b vmlinux $@; 			\
 	fi;
 
 # Same as newer-prereqs, but allows to exclude specified extra dependencies
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 09/13] libbpf: split BTF reconciliation
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
                   ` (8 preceding siblings ...)
  2024-03-22 10:24 ` [RFC bpf-next 08/13] kbuild, bpf: add module-specific pahole/resolve_btfids flags for base reference BTF Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-29 22:01   ` Andrii Nakryiko
  2024-03-22 10:24 ` [RFC bpf-next 10/13] module, bpf: store BTF base reference pointer in struct module Alan Maguire
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Map base reference BTF type ids referenced in split BTF and their
references to the base BTF passed in, and if the mapping succeeds,
reparent the split BTF to the base BTF.

Reconciliation rules are

- base types must match exactly
- enum[64] types should match all value name/value pairs, but the
  to-be-reconciled enum[64] can also define additional name/value pairs
- named fwds match to the correspondingly-named struct/union/enum/enum64
- anon struct/unions must have field names/offsets specified in base
  reference BTF matched by those in base BTF we are matching with

Reconciliation can not recurse, since it will be used in-kernel also and
we do not want to blow up the kernel stack when carrying out type
compatibility checks.  Hence we use a stack for reference type
reconciliation rather then recursive function calls.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/Build             |   2 +-
 tools/lib/bpf/btf.c             |  58 ++++
 tools/lib/bpf/btf.h             |   8 +
 tools/lib/bpf/btf_reconcile.c   | 590 ++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map        |   1 +
 tools/lib/bpf/libbpf_internal.h |   2 +
 6 files changed, 660 insertions(+), 1 deletion(-)
 create mode 100644 tools/lib/bpf/btf_reconcile.c

diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index b6619199a706..c05cc524ca30 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1,4 +1,4 @@
 libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
 	    netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \
 	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
-	    usdt.o zip.o elf.o features.o
+	    usdt.o zip.o elf.o features.o btf_reconcile.o
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index d43c3eba27e0..09a11954cad8 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -5432,3 +5432,61 @@ struct btf *btf__new_split_base_ref(struct btf *src_btf)
 		errno = -ret;
 	return NULL;
 }
+
+struct btf_rewrite_strs {
+	struct btf *btf;
+	const struct btf *old_base_btf;
+	int str_start;
+	int str_diff;
+};
+
+static int btf_rewrite_strs(__u32 *str_off, void *ctx)
+{
+	struct btf_rewrite_strs *r = ctx;
+	const char *s;
+	int off;
+
+	if (!*str_off)
+		return 0;
+	if (*str_off >= r->str_start) {
+		*str_off += r->str_diff;
+	} else {
+		s = btf__str_by_offset(r->old_base_btf, *str_off);
+		if (!s)
+			return -ENOENT;
+		off = btf__add_str(r->btf, s);
+		if (off < 0)
+			return off;
+		*str_off = off;
+	}
+	return 0;
+}
+
+int btf_set_base_btf(struct btf *btf, struct btf *base_btf)
+{
+	struct btf_rewrite_strs r = {};
+	struct btf_type *t;
+	int i, err;
+
+	r.old_base_btf = btf__base_btf(btf);
+	if (!r.old_base_btf)
+		return -EINVAL;
+	r.btf = btf;
+	r.str_start = r.old_base_btf->hdr->str_len;
+	r.str_diff = base_btf->hdr->str_len - r.old_base_btf->hdr->str_len;
+	btf->base_btf = base_btf;
+	btf->start_id = btf__type_cnt(base_btf);
+	btf->start_str_off = base_btf->hdr->str_len;
+	for (i = 0; i < btf->nr_types; i++) {
+		t = (struct btf_type *)btf__type_by_id(btf, i + btf->start_id);
+		err = btf_type_visit_str_offs((struct btf_type *)t, btf_rewrite_strs, &r);
+		if (err)
+			break;
+	}
+	return err;
+}
+
+int btf__reconcile(struct btf *btf, const struct btf *base_btf)
+{
+	return btf_reconcile(btf, base_btf, NULL);
+}
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index c63043520149..4cda17cbcde9 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -279,6 +279,14 @@ struct btf_dedup_opts {
 
 LIBBPF_API int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts);
 
+/**
+ * @brief **btf__reconcile()** will check the split BTF *btf* for references
+ * to base BTF kinds, and verify those references are compatible with
+ * *base_btf*; if they are, *btf* is adjusted such that is re-parented to
+ * *base_btf* and type ids and strings are adjusted to accommodate this.
+ */
+LIBBPF_API int btf__reconcile(struct btf *btf, const struct btf *base_btf);
+
 struct btf_dump;
 
 struct btf_dump_opts {
diff --git a/tools/lib/bpf/btf_reconcile.c b/tools/lib/bpf/btf_reconcile.c
new file mode 100644
index 000000000000..84787328a1de
--- /dev/null
+++ b/tools/lib/bpf/btf_reconcile.c
@@ -0,0 +1,590 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024, Oracle and/or its affiliates. */
+
+#include "btf.h"
+#include "bpf.h"
+#include "libbpf.h"
+#include "libbpf_internal.h"
+
+struct btf;
+
+#define BTF_MAX_NR_TYPES 0x7fffffffU
+#define BTF_UNPROCESSED_ID ((__u32)-1)
+
+struct btf_reconcile {
+	struct btf *btf;
+	const struct btf *base_btf;
+	const struct btf *base_ref_btf;
+	unsigned int nr_base_types;
+	__u32 *map;
+	__u32 *stack;
+	unsigned int stack_size;
+	unsigned int stack_limit;
+};
+
+/* Find next type after *id in base BTF that matches kind of type t passed in
+ * and name (if it is specified).  Match fwd kinds to appropriate kind also.
+ */
+static int btf_reconcile_find_next(struct btf_reconcile *r, const struct btf_type *t,
+				   __u32 *id, const struct btf_type **tp)
+{
+	const struct btf_type *nt;
+	int kind, tkind = btf_kind(t);
+	int tkflag = btf_kflag(t);
+	__u32 i;
+
+	for (i = *id + 1; i < r->nr_base_types; i++) {
+		nt = btf__type_by_id(r->base_btf, i);
+		kind = btf_kind(nt);
+		if (tkind != BTF_KIND_FWD) {
+			if (kind != tkind)
+				continue;
+		} else  {
+			switch (kind) {
+			case BTF_KIND_FWD:
+				continue;
+			case BTF_KIND_STRUCT:
+				if (tkflag)
+					continue;
+				break;
+			case BTF_KIND_UNION:
+				if (!tkflag)
+					continue;
+				break;
+			default:
+				break;
+			}
+		}
+		/* either names must match or both be anon. */
+		if (t->name_off && nt->name_off) {
+			if (strcmp(btf__name_by_offset(r->btf, t->name_off),
+				   btf__name_by_offset(r->base_btf, nt->name_off)))
+				continue;
+		} else if (t->name_off != nt->name_off) {
+			continue;
+		}
+		*tp = nt;
+		*id = i;
+		return 0;
+	}
+	return -ENOENT;
+}
+
+static int btf_reconcile_int(struct btf_reconcile *r, const char *name,
+			     const struct btf_type *t, const struct btf_type *bt)
+{
+	__u32 *info = (__u32 *)(t + 1);
+	__u32 *binfo = (__u32 *)(bt + 1);
+
+	if (t->size != bt->size) {
+		pr_warn("INT types '%s' disagree on size; reference base BTF says %d; base BTF says %d\n",
+			name, t->size, bt->size);
+		return -EINVAL;
+	}
+	if (BTF_INT_ENCODING(*info) != BTF_INT_ENCODING(*binfo)) {
+		pr_warn("INT types '%s' disagree on encoding; reference base BTF says '(%s/%s/%s); base BTF says '(%s/%s/%s)'\n",
+			name,
+			BTF_INT_ENCODING(*info) & BTF_INT_SIGNED ? "signed" : "unsigned",
+			BTF_INT_ENCODING(*info) & BTF_INT_CHAR ? "char" : "nonchar",
+			BTF_INT_ENCODING(*info) & BTF_INT_BOOL ? "bool" : "nonbool",
+			BTF_INT_ENCODING(*binfo) & BTF_INT_SIGNED ? "signed" : "unsigned",
+			BTF_INT_ENCODING(*binfo) & BTF_INT_CHAR ? "char" : "nonchar",
+			BTF_INT_ENCODING(*binfo) & BTF_INT_BOOL ? "bool" : "nonbool");
+		return -EINVAL;
+	}
+	if (BTF_INT_BITS(*info) != BTF_INT_BITS(*binfo)) {
+		pr_warn("INT types '%s' disagree on bit size; reference base BTF says %d; base BTF says %d\n",
+			name, BTF_INT_BITS(*info), BTF_INT_BITS(*binfo));
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int btf_reconcile_float(struct btf_reconcile *r, const char *name,
+			       const struct btf_type *t, const struct btf_type *bt)
+{
+
+	if (t->size != bt->size) {
+		pr_warn("float types '%s' disagree on size; reference base BTF says %d; base BTF says %d\n",
+			name, t->size, bt->size);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/* Ensure each enum value in type t has equivalent in base BTF and that values (if any) match. */
+static int btf_reconcile_enum(struct btf_reconcile *r, const char *name,
+			      const struct btf_type *t, const struct btf_type *bt)
+{
+	struct btf_enum *v = (struct btf_enum *)(t + 1);
+	struct btf_enum *bv = (struct btf_enum *)(bt + 1);
+	bool found, match;
+	int i, j;
+
+	for (i = 0; i < btf_vlen(t); i++, v++) {
+		found = match = false;
+
+		if (!v->name_off)
+			continue;
+		for (j = 0; j < btf_vlen(bt); j++, bv++) {
+			if (!bv->name_off)
+				continue;
+			if (strcmp(btf__name_by_offset(r->base_ref_btf, v->name_off),
+				   btf__name_by_offset(r->base_btf, bv->name_off)) != 0)
+				continue;
+			found = true;
+			match = (v->val == bv->val);
+			break;
+		}
+		if (!found) {
+			if (t->name_off)
+				pr_warn("ENUM types '%s' disagree; base reference BTF has enum value '%s' (%d), base BTF does not have that value.\n",
+					name,
+					btf__name_by_offset(r->base_ref_btf, v->name_off),
+					v->val);
+			return -EINVAL;
+		}
+		if (!match) {
+			if (t->name_off)
+				pr_warn("ENUM types '%s' disagree on enum value '%s'; reference base BTF specifies value %d; base BTF specifies value %d\n",
+					name,
+					btf__name_by_offset(r->base_ref_btf, v->name_off),
+					v->val, bv->val);
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+/* ensure each enum64 value in type t has equivalent in base BTF and that values match */
+static int btf_reconcile_enum64(struct btf_reconcile *r, const char *name,
+			      const struct btf_type *t, const struct btf_type *bt)
+{
+	struct btf_enum64 *v = (struct btf_enum64 *)(t + 1);
+	struct btf_enum64 *bv = (struct btf_enum64 *)(bt + 1);
+	bool found, match;
+	int i, j;
+
+	for (i = 0; i < btf_vlen(t); i++, v++) {
+		found = match = false;
+
+		if (!v->name_off)
+			continue;
+		for (j = 0; j < btf_vlen(bt); j++, bv++) {
+			if (!bv->name_off)
+				continue;
+			if (strcmp(btf__name_by_offset(r->base_ref_btf, v->name_off),
+				   btf__name_by_offset(r->base_btf, bv->name_off)) != 0)
+				continue;
+			found = true;
+			match = (btf_enum64_value(v) == btf_enum64_value(bv));
+			break;
+		}
+		if (!found) {
+			if (t->name_off)
+				pr_warn("ENUM64 types '%s' disagree; reference base BTF has enum64 value '%s' (%lld), base BTF does not have that value.\n",
+					name, btf__name_by_offset(r->base_ref_btf, v->name_off),
+					btf_enum64_value(v));
+			return -EINVAL;
+		}
+		if (!match) {
+			if (t->name_off)
+				pr_warn("ENUM64 types '%s' disagree on enum value '%s'; reference base BTF specifies value %lld; base BTF specifies value %lld\n",
+					name, btf__name_by_offset(r->base_ref_btf, v->name_off),
+					btf_enum64_value(v), btf_enum64_value(bv));
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+/* reconcile base types (int, float, enum, enum64 and fwd) */
+static int btf_reconcile_base_type(struct btf_reconcile *r, __u32 id)
+{
+	const struct btf_type *t = btf_type_by_id(r->base_ref_btf, id);
+	const char *name = btf__name_by_offset(r->base_ref_btf, t->name_off);
+	const struct btf_type *bt = NULL;
+	__u32 base_id = 0;
+	int err = 0;
+
+	switch (btf_kind(t)) {
+	case BTF_KIND_INT:
+	case BTF_KIND_ENUM:
+	case BTF_KIND_FLOAT:
+	case BTF_KIND_ENUM64:
+	case BTF_KIND_FWD:
+		break;
+	default:
+		return 0;
+	}
+
+	if (r->map[id] <= BTF_MAX_NR_TYPES)
+		return 0;
+
+	while ((err = btf_reconcile_find_next(r, t, &base_id, &bt)) != -ENOENT) {
+		bt = btf_type_by_id(r->base_btf, base_id);
+		switch (btf_kind(t)) {
+		case BTF_KIND_INT:
+			err = btf_reconcile_int(r, name, t, bt);
+			break;
+		case BTF_KIND_ENUM:
+			err = btf_reconcile_enum(r, name, t, bt);
+			break;
+		case BTF_KIND_FLOAT:
+			err = btf_reconcile_float(r, name, t, bt);
+			break;
+		case BTF_KIND_ENUM64:
+			err = btf_reconcile_enum64(r, name, t, bt);
+			break;
+		case BTF_KIND_FWD:
+			err = 0;
+			break;
+		default:
+			return 0;
+		}
+		if (!err) {
+			r->map[id] = base_id;
+			return 0;
+		}
+	}
+	return err;
+}
+
+/* all reference BTF members must be in base BTF equivalent. */
+static int btf_reconcile_check_member(struct btf_reconcile *r, const char *name,
+				      struct btf_member *m, const struct btf_type *bt,
+				      bool verbose)
+{
+	struct btf_member *bm = (struct btf_member *)(bt + 1);
+	const char *kindstr = btf_kind(bt) == BTF_KIND_STRUCT ? "STRUCT" : "UNION";
+	const char *mname, *bmname;
+	int i, bvlen = btf_vlen(bt);
+
+	mname = btf__name_by_offset(r->base_ref_btf, m->name_off);
+	for (i = 0; i < bvlen; i++, bm++) {
+		bmname = btf__name_by_offset(r->base_btf, bm->name_off);
+
+		if (!m->name_off || !bm->name_off) {
+			if (m->name_off != bm->name_off)
+				continue;
+			if (bm->offset != m->offset)
+				continue;
+		} else {
+			if (strcmp(mname, bmname) != 0)
+				continue;
+			if (bm->offset != m->offset) {
+				if (verbose) {
+					pr_warn("%s '%s' member '%s' disagrees about offset; %d in base reference BTF versus %d in base BTF\n",
+						kindstr, name, mname, bm->offset, m->offset);
+					return -EINVAL;
+				}
+			}
+		}
+		return 0;
+	}
+	if (verbose)
+		pr_warn("%s '%s' missing member '%s' found in base reference BTF\n",
+			kindstr, name, mname);
+	return -EINVAL;
+}
+
+static int btf_reconcile_struct_type(struct btf_reconcile *r, __u32 id)
+{
+	const struct btf_type *t = btf_type_by_id(r->base_ref_btf, id);
+	const char *name = btf__name_by_offset(r->base_ref_btf, t->name_off);
+	const struct btf_type *bt = NULL;
+	struct btf_member *m;
+	const char *kindstr;
+	int i, vlen, err = 0;
+	__u32 base_id = 0;
+
+	switch (btf_kind(t)) {
+	case BTF_KIND_STRUCT:
+		kindstr = "STRUCT";
+		break;
+	case BTF_KIND_UNION:
+		kindstr = "UNION";
+		break;
+	default:
+		return 0;
+	}
+
+	if (r->map[id] <= BTF_MAX_NR_TYPES)
+		return 0;
+
+	vlen = btf_vlen(t);
+
+	while ((err = btf_reconcile_find_next(r, t, &base_id, &bt)) != -ENOENT) {
+		/* must be at least as big */
+		if (bt->size < t->size) {
+			if (t->name_off) {
+				pr_warn("%s '%s' disagrees about size with reference base BTF (%d); base BTF is smaller (%d)\n",
+					kindstr, name, t->size, bt->size);
+				return -EINVAL;
+			}
+			continue;
+		}
+		/* must have at least as many elements */
+		if (btf_vlen(bt) < vlen) {
+			if (t->name_off) {
+				pr_warn("%s '%s' disagrees about number of members with reference base BTF (%d); base BTF has less (%d)\n",
+					kindstr, name, vlen, btf_vlen(bt));
+				return -EINVAL;
+			}
+			continue;
+		}
+		m = (struct btf_member *)(t + 1);
+		for (i = 0; i < vlen; i++, m++) {
+			if (btf_reconcile_check_member(r, name, m, bt, t->name_off != 0)) {
+				if (t->name_off)
+					return -EINVAL;
+				err = -EINVAL;
+				break;
+			}
+		}
+		if (!err) {
+			r->map[id] = base_id;
+			return 0;
+		}
+	}
+	return err;
+}
+
+/* Use a stack rather than recursion to manage dependent reference types.
+ * When a reference type with dependents is encountered, the approach we
+ * take depends on whether the dependents have been resolved to base
+ * BTF references via the map[].  If they all have, we can simply search
+ * for the base BTF type that has those references.  If the references
+ * are not resolved, we need to push the type and its dependents onto
+ * the stack for later resolution.  We first pop the dependents, and
+ * once these have been resolved we pop the reference type with dependents
+ * now resolved.
+ */
+static int btf_reconcile_push(struct btf_reconcile *r, __u32 id)
+{
+	if (r->stack_size >= r->stack_limit)
+		return -ENOSPC;
+	r->stack[r->stack_size++] = id;
+	return 0;
+}
+
+static __u32 btf_reconcile_pop(struct btf_reconcile *r)
+{
+	if (r->stack_size > 0)
+		return r->stack[--r->stack_size];
+	return BTF_UNPROCESSED_ID;
+}
+
+static int btf_reconcile_ref_type(struct btf_reconcile *r, __u32 id)
+{
+	const struct btf_type *t;
+	const struct btf_type *bt;
+	__u32 base_id;
+	int err = 0;
+
+	do {
+		if (r->map[id] <= BTF_MAX_NR_TYPES)
+			continue;
+		t = btf_type_by_id(r->base_ref_btf, id);
+		switch (btf_kind(t)) {
+		case BTF_KIND_CONST:
+		case BTF_KIND_VOLATILE:
+		case BTF_KIND_RESTRICT:
+		case BTF_KIND_PTR:
+		case BTF_KIND_TYPEDEF:
+		case BTF_KIND_FUNC:
+		case BTF_KIND_TYPE_TAG:
+		case BTF_KIND_DECL_TAG:
+			if (r->map[t->type] <= BTF_MAX_NR_TYPES) {
+				bt = NULL;
+				base_id = 0;
+				while ((err = btf_reconcile_find_next(r, t, &base_id, &bt))
+				       != -ENOENT) {
+					if (btf_kind(t) == BTF_KIND_DECL_TAG) {
+						if (btf_decl_tag(t) != btf_decl_tag(bt))
+							continue;
+					}
+					if (bt->type != r->map[t->type])
+						continue;
+					r->map[id] = base_id;
+					break;
+				}
+				if (err) {
+					pr_warn("could not find base BTF type for base reference type[%u]\n",
+						id);
+					return err;
+				}
+			} else {
+				if (btf_reconcile_push(r, id) < 0 ||
+				    btf_reconcile_push(r, t->type) < 0)
+					return -ENOSPC;
+			}
+			break;
+		case BTF_KIND_ARRAY: {
+			struct btf_array *ba, *a = btf_array(t);
+
+			if (r->map[a->type] <= BTF_MAX_NR_TYPES &&
+			    r->map[a->index_type] <= BTF_MAX_NR_TYPES) {
+				bt = NULL;
+				base_id = 0;
+				while ((err = btf_reconcile_find_next(r, t, &base_id, &bt))
+				       != -ENOENT) {
+					ba = btf_array(bt);
+					if (r->map[a->type] != ba->type ||
+					    r->map[a->index_type] != ba->index_type)
+						continue;
+					r->map[id] = base_id;
+					break;
+				}
+				if (err) {
+					pr_warn("could not matching find base BTF ARRAY for base reference ARRAY[%u]\n",
+						id);
+					return err;
+				}
+			} else {
+				if (btf_reconcile_push(r, id) < 0 ||
+				    btf_reconcile_push(r, a->type) < 0 ||
+				    btf_reconcile_push(r, a->index_type) < 0)
+					return -ENOSPC;
+			}
+			break;
+		}
+		case BTF_KIND_FUNC_PROTO: {
+			struct btf_param *p = btf_params(t);
+			int i, vlen = btf_vlen(t);
+
+			for (i = 0; i < vlen; i++, p++) {
+				if (r->map[p->type] > BTF_MAX_NR_TYPES)
+					break;
+			}
+			if (i == vlen && r->map[t->type] <= BTF_MAX_NR_TYPES) {
+				while ((err = btf_reconcile_find_next(r, t, &base_id, &bt))
+				       != -ENOENT) {
+					struct btf_param *bp = btf_params(bt);
+					int bvlen = btf_vlen(bt);
+					int j;
+
+					if (bvlen != vlen)
+						continue;
+					if (r->map[t->type] != bt->type)
+						continue;
+					for (j = 0, p = btf_params(t); j < bvlen; j++, bp++, p++) {
+						if (r->map[p->type] != bp->type)
+							break;
+					}
+					if (j < bvlen)
+						continue;
+					r->map[id] = base_id;
+					break;
+				}
+				if (err) {
+					pr_warn("could not find matching base BTF FUNC_PROTO for base reference FUNC_PROTO[%u]\n",
+						id);
+					return err;
+				}
+			} else {
+				if (btf_reconcile_push(r, id) < 0 ||
+				    btf_reconcile_push(r, t->type) < 0)
+					return -ENOSPC;
+				for (i = 0, p = btf_params(t); i < btf_vlen(t); i++, p++) {
+					if (btf_reconcile_push(r, p->type) < 0)
+						return -ENOSPC;
+				}
+			}
+			break;
+		}
+		default:
+			return -EINVAL;
+		}
+	} while ((id = btf_reconcile_pop(r)) <= BTF_MAX_NR_TYPES);
+
+	return 0;
+}
+
+static int btf_reconcile_rewrite_type_id(__u32 *id, void *ctx)
+{
+	struct btf_reconcile *r = ctx;
+
+	*id = r->map[*id];
+	return 0;
+}
+
+/* If successful, output of reconciliation is updated BTF with base BTF pointing
+ * at base_btf, and type ids, strings adjusted accordingly
+ */
+int btf_reconcile(struct btf *btf, const struct btf *base_btf, __u32 **map_ids)
+{
+	const struct btf *base_ref_btf = btf__base_btf(btf);
+	unsigned int nr_split_types, nr_base_ref_types;
+	unsigned int nr_types = btf__type_cnt(btf);
+	struct btf_reconcile r = {};
+	const struct btf_type *t;
+	int diff_id, err = 0;
+	__u32 id, i;
+
+	if (!base_btf || base_ref_btf == base_btf)
+		return 0;
+
+	nr_base_ref_types = btf__type_cnt(base_ref_btf);
+	r.nr_base_types = btf__type_cnt(base_btf);
+	nr_split_types = nr_types - nr_base_ref_types;
+	r.map = calloc(nr_types, sizeof(*r.map));
+	r.stack_limit = nr_base_ref_types;
+	r.stack = calloc(r.stack_limit, sizeof(*r.stack));
+	if (!r.map || !r.stack) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+	diff_id = r.nr_base_types - nr_base_ref_types;
+	for (id = 1; id < nr_base_ref_types; id++)
+		r.map[id] = BTF_UNPROCESSED_ID;
+	for (id = nr_base_ref_types; id < nr_types; id++)
+		r.map[id] = id + diff_id;
+
+	r.btf = btf;
+	r.base_ref_btf = base_ref_btf;
+	r.base_btf = base_btf;
+
+	/* Build a map from base references to actual base BTF ids; it is used
+	 * to track the state of comparisons.  First map base types and fwds,
+	 * next structs/unions, and finally reference types (const, restrict,
+	 * ptr, array, func, func_proto etc).
+	 */
+	for (id = 1; id < nr_base_ref_types; id++) {
+		err = btf_reconcile_base_type(&r, id);
+		if (err)
+			goto err_out;
+	}
+	for (id = 1; id < nr_base_ref_types; id++) {
+		err = btf_reconcile_struct_type(&r, id);
+		if (err)
+			goto err_out;
+	}
+	for (id = 1; id < nr_base_ref_types; id++) {
+		err = btf_reconcile_ref_type(&r, id);
+		if (err)
+			goto err_out;
+	}
+	/* Next, rewrite type ids in split BTF, replacing split ids with updated
+	 * ids based on number of types in base BTF, and base ids with
+	 * reconciled ids from base_btf.
+	 */
+	for (i = 0, id = nr_base_ref_types; i < nr_split_types; i++, id++) {
+		t = btf__type_by_id(btf, id);
+		err = btf_type_visit_type_ids((struct btf_type *)t,
+					      btf_reconcile_rewrite_type_id, &r);
+		if (err)
+			goto err_out;
+	}
+	/* Finally reset base BTF to base_btf; as part of this operation, string
+	 * offsets are also updated, and we are done.
+	 */
+	err = btf_set_base_btf(r.btf, (struct btf *)r.base_btf);
+err_out:
+	if (!err && map_ids)
+		*map_ids = r.map;
+	else
+		free(r.map);
+	free(r.stack);
+	return err;
+}
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index e9a7cb9c3c5b..ef61985204ab 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -416,5 +416,6 @@ LIBBPF_1.4.0 {
 		btf__new_split;
 		btf__new_split_base_ref;
 		btf__parse_opts;
+		btf__reconcile;
 		btf_ext__raw_data;
 } LIBBPF_1.3.0;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 864b36177424..097d4d1e4bf7 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -234,6 +234,8 @@ struct btf_type;
 struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id);
 const char *btf_kind_str(const struct btf_type *t);
 const struct btf_type *skip_mods_and_typedefs(const struct btf *btf, __u32 id, __u32 *res_id);
+int btf_set_base_btf(struct btf *btf, struct btf *base_btf);
+int btf_reconcile(struct btf *btf, const struct btf *base_btf, __u32 **map_ids);
 
 static inline enum btf_func_linkage btf_func_linkage(const struct btf_type *t)
 {
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 10/13] module, bpf: store BTF base reference pointer in struct module
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
                   ` (9 preceding siblings ...)
  2024-03-22 10:24 ` [RFC bpf-next 09/13] libbpf: split BTF reconciliation Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-22 10:24 ` [RFC bpf-next 11/13] libbpf,bpf: share BTF reconcile-related code with kernel Alan Maguire
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

...as this will allow split BTF modules with a base reference BTF
representation (rather than the full vmlinux BTF at time of
BTF encoding) to resolve their references to kernel types in a
way that is more resilient to small changes in kernel types.

This will allow modules that are not built every time the kernel
is to provide more resilient BTF, rather than have it invalidated
every time BTF ids for core kernel types change.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 include/linux/module.h | 2 ++
 kernel/module/main.c   | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..4167175b580b 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -510,6 +510,8 @@ struct module {
 #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
 	unsigned int btf_data_size;
 	void *btf_data;
+	unsigned int btf_base_ref_data_size;
+	void *btf_base_ref_data;
 #endif
 #ifdef CONFIG_JUMP_LABEL
 	struct jump_entry *jump_entries;
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 36681911c05a..d116ca4a5794 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2148,6 +2148,8 @@ static int find_module_sections(struct module *mod, struct load_info *info)
 #endif
 #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
 	mod->btf_data = any_section_objs(info, ".BTF", 1, &mod->btf_data_size);
+	mod->btf_base_ref_data = any_section_objs(info, ".BTF.base_ref", 1,
+						  &mod->btf_base_ref_data_size);
 #endif
 #ifdef CONFIG_JUMP_LABEL
 	mod->jump_entries = section_objs(info, "__jump_table",
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 11/13] libbpf,bpf: share BTF reconcile-related code with kernel
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
                   ` (10 preceding siblings ...)
  2024-03-22 10:24 ` [RFC bpf-next 10/13] module, bpf: store BTF base reference pointer in struct module Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-29 22:04   ` Andrii Nakryiko
  2024-03-22 10:24 ` [RFC bpf-next 12/13] selftests/bpf: extend base reference tests cover BTF reconciliation Alan Maguire
  2024-03-22 10:24 ` [RFC bpf-next 13/13] bpftool: support displaying reconciled-with-base split BTF Alan Maguire
  13 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Share reconciliation implementation with the kernel.  As part of this,
we also need the type/string visitation functions so add them to a
btf_common.c file that also gets shared with the kernel. Reconciliation
code between kernel and userspace is identical save for the
implementation of the reparenting of split BTF to the reconciled base
BTF; this depends on struct btf internals so is different in kernel and
userspace.

One other wrinkle on the kernel side is we have to map .BTF.ids in
modules as they were generated with the type ids used at BTF encoding
time. btf_reconcile() optionally returns an array mapping from old BTF
ids to reconciled ids, so we use that to fix up these references where
needed for kfuncs.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 include/linux/btf.h           |  29 +++++
 kernel/bpf/Makefile           |   8 ++
 kernel/bpf/btf.c              | 197 +++++++++++++++++++++++++++++-----
 tools/lib/bpf/Build           |   2 +-
 tools/lib/bpf/btf.c           | 130 ----------------------
 tools/lib/bpf/btf_common.c    | 146 +++++++++++++++++++++++++
 tools/lib/bpf/btf_reconcile.c |  24 +++++
 7 files changed, 376 insertions(+), 160 deletions(-)
 create mode 100644 tools/lib/bpf/btf_common.c

diff --git a/include/linux/btf.h b/include/linux/btf.h
index f9e56fd12a9f..09b12b5bd936 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -214,6 +214,7 @@ bool btf_is_kernel(const struct btf *btf);
 bool btf_is_module(const struct btf *btf);
 struct module *btf_try_get_module(const struct btf *btf);
 u32 btf_nr_types(const struct btf *btf);
+struct btf *btf_base_btf(const struct btf *btf);
 bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   const struct btf_member *m,
 			   u32 expected_offset, u32 expected_size);
@@ -515,8 +516,15 @@ static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *
 }
 #endif
 
+typedef int (*type_id_visit_fn)(__u32 *type_id, void *ctx);
+typedef int (*str_off_visit_fn)(__u32 *str_off, void *ctx);
+
 #ifdef CONFIG_BPF_SYSCALL
 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
+int btf_set_base_btf(struct btf *btf, struct btf *base_btf);
+int btf_reconcile(struct btf *btf, const struct btf *base_btf, __u32 **map_ids);
+int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx);
+int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx);
 const char *btf_name_by_offset(const struct btf *btf, u32 offset);
 struct btf *btf_parse_vmlinux(void);
 struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
@@ -543,6 +551,27 @@ static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
 {
 	return NULL;
 }
+static inline int btf_set_base_btf(struct btf *btf, struct btf *base_btf)
+{
+	return 0;
+}
+static inline int btf_reconcile(void *log, struct btf *btf, const struct btf *base_btf,
+				__u32 **map_ids)
+{
+	return 0;
+}
+static inline int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit,
+					  void *ctx)
+{
+	return 0;
+}
+
+static inline int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit,
+					  void *ctx)
+{
+	return 0;
+}
+
 static inline const char *btf_name_by_offset(const struct btf *btf,
 					     u32 offset)
 {
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 368c5d86b5b7..4183b8d68573 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -49,3 +49,11 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/
 obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
 $(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
 	$(call if_changed_rule,cc_o_c)
+
+obj-$(CONFIG_BPF_SYSCALL) += btf_common.o
+$(obj)/btf_common.o: $(srctree)/tools/lib/bpf/btf_common.c FORCE
+	$(call if_changed_rule,cc_o_c)
+
+obj-$(CONFIG_BPF_SYSCALL) += btf_reconcile.o
+$(obj)/btf_reconcile.o: $(srctree)/tools/lib/bpf/btf_reconcile.c FORCE
+	$(call if_changed_rule,cc_o_c)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 90c4a32d89ff..66fb32b6d948 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -273,6 +273,7 @@ struct btf {
 	u32 start_str_off; /* first string offset (0 for base BTF) */
 	char name[MODULE_NAME_LEN];
 	bool kernel_btf;
+	__u32 *base_ref_map; /* map from base reference -> vmlinux BTF ids */
 };
 
 enum verifier_phase {
@@ -1734,7 +1735,10 @@ static void btf_free(struct btf *btf)
 	kvfree(btf->types);
 	kvfree(btf->resolved_sizes);
 	kvfree(btf->resolved_ids);
-	kvfree(btf->data);
+	/* only split BTF allocates BTF data */
+	if (btf->base_btf)
+		kvfree(btf->data);
+	kvfree(btf->base_ref_map);
 	kfree(btf);
 }
 
@@ -1763,6 +1767,90 @@ void btf_put(struct btf *btf)
 	}
 }
 
+struct btf *btf_base_btf(const struct btf *btf)
+{
+	return btf->base_btf;
+}
+
+struct btf_rewrite_strs {
+	struct btf *btf;
+	const struct btf *old_base_btf;
+	int str_start;
+	int str_diff;
+	__u32 *str_map;
+};
+
+static __u32 btf_find_str(struct btf *btf, const char *s)
+{
+	__u32 offset = 0;
+
+	while (offset < btf->hdr.str_len) {
+		while (!btf->strings[offset])
+			offset++;
+		if (strcmp(s, &btf->strings[offset]) == 0)
+			return offset;
+		while (btf->strings[offset])
+			offset++;
+	}
+	return -ENOENT;
+}
+
+static int btf_rewrite_strs(__u32 *str_off, void *ctx)
+{
+	struct btf_rewrite_strs *r = ctx;
+	const char *s;
+	int off;
+
+	if (!*str_off)
+		return 0;
+	if (*str_off >= r->str_start) {
+		*str_off += r->str_diff;
+	} else {
+		s = btf_str_by_offset(r->old_base_btf, *str_off);
+		if (!s)
+			return -ENOENT;
+		if (r->str_map[*str_off]) {
+			off = r->str_map[*str_off];
+		} else {
+			off = btf_find_str(r->btf->base_btf, s);
+			if (off < 0)
+				return off;
+			r->str_map[*str_off] = off;
+		}
+		*str_off = off;
+	}
+	return 0;
+}
+
+int btf_set_base_btf(struct btf *btf, struct btf *base_btf)
+{
+	struct btf_rewrite_strs r = {};
+	struct btf_type *t;
+	int i, err;
+
+	r.old_base_btf = btf_base_btf(btf);
+	if (!r.old_base_btf)
+		return -EINVAL;
+	r.btf = btf;
+	r.str_start = r.old_base_btf->hdr.str_len;
+	r.str_diff = base_btf->hdr.str_len - r.old_base_btf->hdr.str_len;
+	r.str_map = kvcalloc(r.old_base_btf->hdr.str_len, sizeof(*r.str_map),
+			     GFP_KERNEL | __GFP_NOWARN);
+	if (!r.str_map)
+		return -ENOMEM;
+	btf->base_btf = base_btf;
+	btf->start_id = btf_nr_types(base_btf);
+	btf->start_str_off = base_btf->hdr.str_len;
+	for (i = 0; i < btf->nr_types; i++) {
+		t = (struct btf_type *)btf_type_by_id(btf, i + btf->start_id);
+		err = btf_type_visit_str_offs((struct btf_type *)t, btf_rewrite_strs, &r);
+		if (err)
+			break;
+	}
+	kvfree(r.str_map);
+	return err;
+}
+
 static int env_resolve_init(struct btf_verifier_env *env)
 {
 	struct btf *btf = env->btf;
@@ -5964,20 +6052,12 @@ int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_ty
 BTF_ID_LIST(bpf_ctx_convert_btf_id)
 BTF_ID(struct, bpf_ctx_convert)
 
-struct btf *btf_parse_vmlinux(void)
+static struct btf *btf_parse_base(struct btf_verifier_env *env, const char *name,
+				  void *data, unsigned int data_size)
 {
-	struct btf_verifier_env *env = NULL;
-	struct bpf_verifier_log *log;
 	struct btf *btf = NULL;
 	int err;
 
-	env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
-	if (!env)
-		return ERR_PTR(-ENOMEM);
-
-	log = &env->log;
-	log->level = BPF_LOG_KERNEL;
-
 	btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
 	if (!btf) {
 		err = -ENOMEM;
@@ -5985,10 +6065,10 @@ struct btf *btf_parse_vmlinux(void)
 	}
 	env->btf = btf;
 
-	btf->data = __start_BTF;
-	btf->data_size = __stop_BTF - __start_BTF;
+	btf->data = data;
+	btf->data_size = data_size;
 	btf->kernel_btf = true;
-	snprintf(btf->name, sizeof(btf->name), "vmlinux");
+	snprintf(btf->name, sizeof(btf->name), name);
 
 	err = btf_parse_hdr(env);
 	if (err)
@@ -6008,7 +6088,7 @@ struct btf *btf_parse_vmlinux(void)
 	if (err)
 		goto errout;
 
-	/* btf_parse_vmlinux() runs under bpf_verifier_lock */
+	/* btf_parse_base() runs under bpf_verifier_lock */
 	bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]);
 
 	refcount_set(&btf->refcnt, 1);
@@ -6017,11 +6097,8 @@ struct btf *btf_parse_vmlinux(void)
 	if (err)
 		goto errout;
 
-	btf_verifier_env_free(env);
 	return btf;
-
 errout:
-	btf_verifier_env_free(env);
 	if (btf) {
 		kvfree(btf->types);
 		kfree(btf);
@@ -6029,19 +6106,49 @@ struct btf *btf_parse_vmlinux(void)
 	return ERR_PTR(err);
 }
 
+struct btf *btf_parse_vmlinux(void)
+{
+	struct btf_verifier_env *env = NULL;
+	struct bpf_verifier_log *log;
+	struct btf *btf;
+
+	env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
+	if (!env)
+		return ERR_PTR(-ENOMEM);
+
+	log = &env->log;
+	log->level = BPF_LOG_KERNEL;
+	btf = btf_parse_base(env, "vmlinux", __start_BTF, __stop_BTF - __start_BTF);
+	btf_verifier_env_free(env);
+	return btf;
+}
+
 #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
 
-static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
+/* If .BTF_ids section was created with base reference BTF, both base and
+ * split BTF ids will need to be mapped to actual base/split ids for
+ * BTF now that it has been reconciled.
+ */
+static __u32 btf_id_map(const struct btf *btf, __u32 id)
+{
+	if (!btf->base_btf || !btf->base_ref_map)
+		return id;
+	return btf->base_ref_map[id];
+}
+
+static struct btf *btf_parse_module(const char *module_name, const void *data,
+				    unsigned int data_size, void *base_ref_data,
+				    unsigned int base_ref_data_size)
 {
+	struct btf *btf = NULL, *vmlinux_btf, *base_btf = NULL;
 	struct btf_verifier_env *env = NULL;
 	struct bpf_verifier_log *log;
-	struct btf *btf = NULL, *base_btf;
 	int err;
 
-	base_btf = bpf_get_btf_vmlinux();
-	if (IS_ERR(base_btf))
-		return base_btf;
-	if (!base_btf)
+	vmlinux_btf = bpf_get_btf_vmlinux();
+	if (IS_ERR(vmlinux_btf))
+		return vmlinux_btf;
+	if (!vmlinux_btf)
 		return ERR_PTR(-EINVAL);
 
 	env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
@@ -6051,6 +6158,19 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
 	log = &env->log;
 	log->level = BPF_LOG_KERNEL;
 
+	if (base_ref_data) {
+		char base_ref_name[MODULE_NAME_LEN];
+
+		snprintf(base_ref_name, sizeof(base_ref_name), "%s.BTF.base_ref", module_name);
+		base_btf = btf_parse_base(env, base_ref_name, base_ref_data, base_ref_data_size);
+		if (IS_ERR(base_btf))
+			return base_btf;
+		if (!base_btf)
+			return ERR_PTR(-EINVAL);
+	} else {
+		base_btf = vmlinux_btf;
+	}
+
 	btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
 	if (!btf) {
 		err = -ENOMEM;
@@ -6090,12 +6210,21 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
 	if (err)
 		goto errout;
 
+	if (base_btf != vmlinux_btf) {
+		err = btf_reconcile(btf, vmlinux_btf, &btf->base_ref_map);
+		if (err)
+			goto errout;
+		btf_free(base_btf);
+		base_btf = vmlinux_btf;
+	}
 	btf_verifier_env_free(env);
 	refcount_set(&btf->refcnt, 1);
 	return btf;
 
 errout:
 	btf_verifier_env_free(env);
+	if (base_btf != vmlinux_btf)
+		btf_free(base_btf);
 	if (btf) {
 		kvfree(btf->data);
 		kvfree(btf->types);
@@ -7648,7 +7777,8 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
 			err = -ENOMEM;
 			goto out;
 		}
-		btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
+		btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size,
+				       mod->btf_base_ref_data, mod->btf_base_ref_data_size);
 		if (IS_ERR(btf)) {
 			kfree(btf_mod);
 			if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH)) {
@@ -7972,7 +8102,7 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
 	bool add_filter = !!kset->filter;
 	struct btf_kfunc_set_tab *tab;
 	struct btf_id_set8 *set;
-	u32 set_cnt;
+	u32 set_cnt, i;
 	int ret;
 
 	if (hook >= BTF_KFUNC_HOOK_MAX) {
@@ -8062,6 +8192,9 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
 
 	/* Concatenate the two sets */
 	memcpy(set->pairs + set->cnt, add_set->pairs, add_set->cnt * sizeof(set->pairs[0]));
+	for (i = set->cnt; i < set->cnt + add_set->cnt; i++)
+		set->pairs[i].id = btf_id_map(btf, set->pairs[i].id);
+
 	set->cnt += add_set->cnt;
 
 	sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL);
@@ -8184,7 +8317,7 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
 		return PTR_ERR(btf);
 
 	for (i = 0; i < kset->set->cnt; i++) {
-		ret = btf_check_kfunc_protos(btf, kset->set->pairs[i].id,
+		ret = btf_check_kfunc_protos(btf, btf_id_map(btf, kset->set->pairs[i].id),
 					     kset->set->pairs[i].flags);
 		if (ret)
 			goto err_out;
@@ -8248,7 +8381,7 @@ static int btf_check_dtor_kfuncs(struct btf *btf, const struct btf_id_dtor_kfunc
 	u32 nr_args, i;
 
 	for (i = 0; i < cnt; i++) {
-		dtor_btf_id = dtors[i].kfunc_btf_id;
+		dtor_btf_id = btf_id_map(btf, dtors[i].kfunc_btf_id);
 
 		dtor_func = btf_type_by_id(btf, dtor_btf_id);
 		if (!dtor_func || !btf_type_is_func(dtor_func))
@@ -8283,7 +8416,7 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c
 {
 	struct btf_id_dtor_kfunc_tab *tab;
 	struct btf *btf;
-	u32 tab_cnt;
+	u32 tab_cnt, i;
 	int ret;
 
 	btf = btf_get_module_btf(owner);
@@ -8334,6 +8467,12 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c
 	btf->dtor_kfunc_tab = tab;
 
 	memcpy(tab->dtors + tab->cnt, dtors, add_cnt * sizeof(tab->dtors[0]));
+
+	/* remap BTF ids based on BTF reconciliation (if any) */
+	for (i = tab_cnt; i < tab_cnt + add_cnt; i++) {
+		tab->dtors[i].btf_id = btf_id_map(btf, tab->dtors[i].btf_id);
+		tab->dtors[i].kfunc_btf_id = btf_id_map(btf, tab->dtors[i].kfunc_btf_id);
+	}
 	tab->cnt += add_cnt;
 
 	sort(tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func, NULL);
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index c05cc524ca30..0179b3cfe95f 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1,4 +1,4 @@
 libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
 	    netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \
 	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
-	    usdt.o zip.o elf.o features.o btf_reconcile.o
+	    usdt.o zip.o elf.o features.o btf_common.o btf_reconcile.o
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 09a11954cad8..e034f0c26c96 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -5026,136 +5026,6 @@ struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_bt
 	return btf__parse_split(path, vmlinux_btf);
 }
 
-int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx)
-{
-	int i, n, err;
-
-	switch (btf_kind(t)) {
-	case BTF_KIND_INT:
-	case BTF_KIND_FLOAT:
-	case BTF_KIND_ENUM:
-	case BTF_KIND_ENUM64:
-		return 0;
-
-	case BTF_KIND_FWD:
-	case BTF_KIND_CONST:
-	case BTF_KIND_VOLATILE:
-	case BTF_KIND_RESTRICT:
-	case BTF_KIND_PTR:
-	case BTF_KIND_TYPEDEF:
-	case BTF_KIND_FUNC:
-	case BTF_KIND_VAR:
-	case BTF_KIND_DECL_TAG:
-	case BTF_KIND_TYPE_TAG:
-		return visit(&t->type, ctx);
-
-	case BTF_KIND_ARRAY: {
-		struct btf_array *a = btf_array(t);
-
-		err = visit(&a->type, ctx);
-		err = err ?: visit(&a->index_type, ctx);
-		return err;
-	}
-
-	case BTF_KIND_STRUCT:
-	case BTF_KIND_UNION: {
-		struct btf_member *m = btf_members(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->type, ctx);
-			if (err)
-				return err;
-		}
-		return 0;
-	}
-
-	case BTF_KIND_FUNC_PROTO: {
-		struct btf_param *m = btf_params(t);
-
-		err = visit(&t->type, ctx);
-		if (err)
-			return err;
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->type, ctx);
-			if (err)
-				return err;
-		}
-		return 0;
-	}
-
-	case BTF_KIND_DATASEC: {
-		struct btf_var_secinfo *m = btf_var_secinfos(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->type, ctx);
-			if (err)
-				return err;
-		}
-		return 0;
-	}
-
-	default:
-		return -EINVAL;
-	}
-}
-
-int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx)
-{
-	int i, n, err;
-
-	err = visit(&t->name_off, ctx);
-	if (err)
-		return err;
-
-	switch (btf_kind(t)) {
-	case BTF_KIND_STRUCT:
-	case BTF_KIND_UNION: {
-		struct btf_member *m = btf_members(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->name_off, ctx);
-			if (err)
-				return err;
-		}
-		break;
-	}
-	case BTF_KIND_ENUM: {
-		struct btf_enum *m = btf_enum(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->name_off, ctx);
-			if (err)
-				return err;
-		}
-		break;
-	}
-	case BTF_KIND_ENUM64: {
-		struct btf_enum64 *m = btf_enum64(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->name_off, ctx);
-			if (err)
-				return err;
-		}
-		break;
-	}
-	case BTF_KIND_FUNC_PROTO: {
-		struct btf_param *m = btf_params(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->name_off, ctx);
-			if (err)
-				return err;
-		}
-		break;
-	}
-	default:
-		break;
-	}
-
-	return 0;
-}
-
 int btf_ext_visit_type_ids(struct btf_ext *btf_ext, type_id_visit_fn visit, void *ctx)
 {
 	const struct btf_ext_info *seg;
diff --git a/tools/lib/bpf/btf_common.c b/tools/lib/bpf/btf_common.c
new file mode 100644
index 000000000000..57932f07bb62
--- /dev/null
+++ b/tools/lib/bpf/btf_common.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+/* Copyright (c) 2021 Facebook */
+/* Copyright (c) 2024, Oracle and/or its affiliates. */
+
+#ifdef __KERNEL__
+#include <linux/bpf.h>
+#include <linux/btf.h>
+
+static inline struct btf_var_secinfo *btf_var_secinfos(const struct btf_type *t)
+{
+	return (struct btf_var_secinfo *)(t + 1);
+}
+
+#else
+#include "btf.h"
+#include "libbpf_internal.h"
+#endif
+
+int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx)
+{
+	int i, n, err;
+
+	switch (btf_kind(t)) {
+	case BTF_KIND_INT:
+	case BTF_KIND_FLOAT:
+	case BTF_KIND_ENUM:
+	case BTF_KIND_ENUM64:
+		return 0;
+
+	case BTF_KIND_FWD:
+	case BTF_KIND_CONST:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_RESTRICT:
+	case BTF_KIND_PTR:
+	case BTF_KIND_TYPEDEF:
+	case BTF_KIND_FUNC:
+	case BTF_KIND_VAR:
+	case BTF_KIND_DECL_TAG:
+	case BTF_KIND_TYPE_TAG:
+		return visit(&t->type, ctx);
+
+	case BTF_KIND_ARRAY: {
+		struct btf_array *a = btf_array(t);
+
+		err = visit(&a->type, ctx);
+		err = err ?: visit(&a->index_type, ctx);
+		return err;
+	}
+
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION: {
+		struct btf_member *m = btf_members(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->type, ctx);
+			if (err)
+				return err;
+		}
+		return 0;
+	}
+	case BTF_KIND_FUNC_PROTO: {
+		struct btf_param *m = btf_params(t);
+
+		err = visit(&t->type, ctx);
+		if (err)
+			return err;
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->type, ctx);
+			if (err)
+				return err;
+		}
+		return 0;
+	}
+
+	case BTF_KIND_DATASEC: {
+		struct btf_var_secinfo *m = btf_var_secinfos(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->type, ctx);
+			if (err)
+				return err;
+		}
+		return 0;
+	}
+
+	default:
+		return -EINVAL;
+	}
+}
+
+int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx)
+{
+	int i, n, err;
+
+	err = visit(&t->name_off, ctx);
+	if (err)
+		return err;
+
+	switch (btf_kind(t)) {
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION: {
+		struct btf_member *m = btf_members(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->name_off, ctx);
+			if (err)
+				return err;
+		}
+		break;
+	}
+	case BTF_KIND_ENUM: {
+		struct btf_enum *m = btf_enum(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->name_off, ctx);
+			if (err)
+				return err;
+		}
+		break;
+	}
+	case BTF_KIND_ENUM64: {
+		struct btf_enum64 *m = btf_enum64(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->name_off, ctx);
+			if (err)
+				return err;
+		}
+		break;
+	}
+	case BTF_KIND_FUNC_PROTO: {
+		struct btf_param *m = btf_params(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->name_off, ctx);
+			if (err)
+				return err;
+		}
+		break;
+	}
+	default:
+		break;
+	}
+
+	return 0;
+}
diff --git a/tools/lib/bpf/btf_reconcile.c b/tools/lib/bpf/btf_reconcile.c
index 84787328a1de..9057c3ed4c66 100644
--- a/tools/lib/bpf/btf_reconcile.c
+++ b/tools/lib/bpf/btf_reconcile.c
@@ -1,11 +1,35 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2024, Oracle and/or its affiliates. */
 
+#ifdef __KERNEL__
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/string.h>
+#include <linux/bpf_verifier.h>
+
+#define btf__type_by_id		btf_type_by_id
+#define btf__type_cnt		btf_nr_types
+#define btf__base_btf		btf_base_btf
+#define btf__name_by_offset	btf_name_by_offset
+#define btf_kflag		btf_type_kflag
+
+#define calloc(nmemb, size)	kvcalloc(nmemb, size, GFP_KERNEL | __GFP_NOWARN)
+#define free(ptr)		kvfree(ptr)
+
+static inline struct btf_decl_tag *btf_decl_tag(const struct btf_type *t)
+{
+	return (struct btf_decl_tag *)(t + 1);
+}
+
+#else
+
 #include "btf.h"
 #include "bpf.h"
 #include "libbpf.h"
 #include "libbpf_internal.h"
 
+#endif /* __KERNEL__ */
+
 struct btf;
 
 #define BTF_MAX_NR_TYPES 0x7fffffffU
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 12/13] selftests/bpf: extend base reference tests cover BTF reconciliation
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
                   ` (11 preceding siblings ...)
  2024-03-22 10:24 ` [RFC bpf-next 11/13] libbpf,bpf: share BTF reconcile-related code with kernel Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  2024-03-22 10:24 ` [RFC bpf-next 13/13] bpftool: support displaying reconciled-with-base split BTF Alan Maguire
  13 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Ensure reconciled BTF looks as expected; in this case identical to
original split BTF.  More tests needed here for positive/negative
cases..

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 .../bpf/prog_tests/btf_split_base_ref.c       | 35 +++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/btf_split_base_ref.c b/tools/testing/selftests/bpf/prog_tests/btf_split_base_ref.c
index d611167fa4b2..3f4382651fc8 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_split_base_ref.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_split_base_ref.c
@@ -158,6 +158,41 @@ static void test_split_base_ref(void)
 		"[16] CONST '(anon)' type_id=8",
 		"[17] PTR '(anon)' type_id=9");
 
+	if (!ASSERT_EQ(btf__reconcile(btf3, btf1), 0, "reconcile_split"))
+		goto cleanup;
+	VALIDATE_RAW_BTF(
+		btf3,
+		"[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+		"[2] PTR '(anon)' type_id=1",
+		"[3] STRUCT 's1' size=8 vlen=1\n"
+		"\t'f1' type_id=2 bits_offset=0",
+		"[4] STRUCT '(anon)' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=3 bits_offset=32",
+		"[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)",
+		"[6] UNION 'u1' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=2 bits_offset=0",
+		"[7] UNION '(anon)' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'v1' val=1",
+		"[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'av1' val=2",
+		"[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1024",
+		"[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1025",
+		"[12] STRUCT 'unneeded' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[13] PTR '(anon)' type_id=3",
+		"[14] PTR '(anon)' type_id=4",
+		"[15] CONST '(anon)' type_id=6",
+		"[16] RESTRICT '(anon)' type_id=7",
+		"[17] VOLATILE '(anon)' type_id=8",
+		"[18] TYPEDEF 'et' type_id=9",
+		"[19] CONST '(anon)' type_id=10",
+		"[20] PTR '(anon)' type_id=11");
 cleanup:
 	btf__free(btf4);
 	btf__free(btf3);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next 13/13] bpftool: support displaying reconciled-with-base split BTF
  2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
                   ` (12 preceding siblings ...)
  2024-03-22 10:24 ` [RFC bpf-next 12/13] selftests/bpf: extend base reference tests cover BTF reconciliation Alan Maguire
@ 2024-03-22 10:24 ` Alan Maguire
  13 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-03-22 10:24 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

If the -R <base_btf> option is used, we can display BTF
that has been generated with base reference BTF in its
reconciled form.  For example if we build a module with
base reference BTF:

BTF_BASE_REF=1 make -C. M=path/2/module

...we can display its content relative to that base
reference BTF via

bpftool btf dump file path/2/module

Alternatively, we can display content reconciled with
(a possibly changed) base BTF via

bpftool btf dump -R vmlinux path/2/module

The latter mirrors how the kernel will handle such split
BTF; it reconciles its representation with the running
kernel, and if successful, renumbers BTF ids to reference
the current vmlinux BTF.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/bpf/bpftool/Documentation/bpftool-btf.rst | 17 +++++++++++++++++
 tools/bpf/bpftool/btf.c                         | 14 ++++++++++++--
 tools/bpf/bpftool/main.c                        | 14 +++++++++++++-
 tools/bpf/bpftool/main.h                        |  2 ++
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-btf.rst b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
index 342716f74ec4..97b6426367e6 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-btf.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
@@ -90,6 +90,23 @@ OPTIONS
 		  to autodetect the path for the base object, and passing
 		  this option is optional. When the main BTF object is passed
 		  through other handles, this option becomes necessary.
+        -R, --reconcile-base-btf *FILE*
+                  When split BTF is generated with base reference BTF,
+                  the latter is stored in a .BTF.base_ref section and allows
+                  us to later reconcile split BTF and a potentially-changed
+                  base BTF by noting that type ids in the split BTF that
+                  refer to base BTF refer to specific integers, structs etc.
+                  By passing base BTF via the -R option, we can carry out
+                  that reconciliation operation between split/base reference
+                  BTF and base BTF.  If successful, the split BTF will then
+                  refer to the base BTF types from *reconcile-base-btf*.
+
+                  If this argument is not used, split BTF is shown relative
+                  to .BTF.base_ref data as a base; however in this mode
+                  it is not possible to generate "C" output, since the
+                  information retained in .BTF.base_ref is just enough to
+                  clarify what a type id refers to, but does not contain
+                  full type details for all types.
 
 EXAMPLES
 ========
diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index f1e0ea06f05f..355b56eff9f2 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -644,8 +644,17 @@ static int do_dump(int argc, char **argv)
 			base_btf = btf__parse_opts(*argv, &optp);
 			if (base_btf)
 				btf = btf__parse_split(*argv, base_btf);
-			if (btf)
+			if (btf && reconcile_base_btf) {
+				err = btf__reconcile(btf, reconcile_base_btf);
+				if (err) {
+					p_err("coud not reconcile BTF from '%s' with base BTF '%s': %s\n",
+					      *argv, reconcile_base_btf_path,
+					      strerror(-err));
+					goto done;
+				}
+			} else {
 				c_compat = false;
+			}
 		}
 		if (!btf) {
 			err = -errno;
@@ -1088,7 +1097,8 @@ static int do_help(int argc, char **argv)
 		"       " HELP_SPEC_MAP "\n"
 		"       " HELP_SPEC_PROGRAM "\n"
 		"       " HELP_SPEC_OPTIONS " |\n"
-		"                    {-B|--base-btf} }\n"
+		"                    {-B|--base-btf}\n"
+		"                    {-R|--reconcile-base-btf}}\n"
 		"",
 		bin_name, "btf");
 
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 08d0ac543c67..33ac0ca77162 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -32,6 +32,8 @@ bool verifier_logs;
 bool relaxed_maps;
 bool use_loader;
 struct btf *base_btf;
+struct btf *reconcile_base_btf;
+const char *reconcile_base_btf_path;
 struct hashmap *refs_table;
 
 static void __noreturn clean_and_exit(int i)
@@ -448,6 +450,7 @@ int main(int argc, char **argv)
 		{ "debug",	no_argument,	NULL,	'd' },
 		{ "use-loader",	no_argument,	NULL,	'L' },
 		{ "base-btf",	required_argument, NULL, 'B' },
+		{ "reconcile-base-btf", required_argument, NULL, 'R' },
 		{ 0 }
 	};
 	bool version_requested = false;
@@ -473,7 +476,7 @@ int main(int argc, char **argv)
 	bin_name = "bpftool";
 
 	opterr = 0;
-	while ((opt = getopt_long(argc, argv, "VhpjfLmndB:l",
+	while ((opt = getopt_long(argc, argv, "VhpjfLmndB:lR:",
 				  options, NULL)) >= 0) {
 		switch (opt) {
 		case 'V':
@@ -516,6 +519,15 @@ int main(int argc, char **argv)
 				return -1;
 			}
 			break;
+		case 'R':
+			reconcile_base_btf_path = optarg;
+			reconcile_base_btf = btf__parse(optarg, NULL);
+			if (!reconcile_base_btf) {
+				p_err("failed to parse base BTF for reconciliation at '%s': %d\n",
+				      optarg, -errno);
+				return -1;
+			}
+			break;
 		case 'L':
 			use_loader = true;
 			break;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index b8bb08d10dec..c84bfc1d9135 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -83,6 +83,8 @@ extern bool verifier_logs;
 extern bool relaxed_maps;
 extern bool use_loader;
 extern struct btf *base_btf;
+extern struct btf *reconcile_base_btf;
+extern const char *reconcile_base_btf_path;
 extern struct hashmap *refs_table;
 
 void __printf(1, 2) p_err(const char *fmt, ...);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 08/13] kbuild, bpf: add module-specific pahole/resolve_btfids flags for base reference BTF
  2024-03-22 10:24 ` [RFC bpf-next 08/13] kbuild, bpf: add module-specific pahole/resolve_btfids flags for base reference BTF Alan Maguire
@ 2024-03-23  2:50   ` Alexei Starovoitov
  2024-03-25  9:51     ` Alan Maguire
  0 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2024-03-23  2:50 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, Jiri Olsa, Arnaldo Carvalho de Melo,
	Quentin Monnet, Eddy Z, Mykola Lysenko, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Hou Tao,
	bpf, Masahiro Yamada, Luis R. Rodriguez, Nathan Chancellor

On Fri, Mar 22, 2024 at 3:26 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Support creation of module BTF along with base reference BTF;
> the latter is stored in a .BTF.base_ref ELF section and supplements
> split BTF references to base BTF with information about base types,
> allowing for later reconciliation of split BTF with a (possibly
> changed) base.  resolve_btfids uses the "-r" option to specify
> that the BTF.ids section should be populated with split BTF
> relative to the added .BTF.base_ref section rather than relative
> to the vmlinux base.
>
> Modules using base reference BTF can be built via
>
> BTF_BASE_REF=1 make -C. -M=path2/module
>
> The default is still to use split BTF relative to vmlinux.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  scripts/Makefile.btf      | 7 +++++++
>  scripts/Makefile.modfinal | 4 ++--
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
> index 9694ca3c5252..c8212b2ab7ca 100644
> --- a/scripts/Makefile.btf
> +++ b/scripts/Makefile.btf
> @@ -19,4 +19,11 @@ pahole-flags-$(call test-ge, $(pahole-ver), 126)     = -j --btf_features=encode_forc
>
>  pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)         += --lang_exclude=rust
>
> +ifeq ($(BTF_BASE_REF),1)
> +module-pahole-flags-$(call test-ge, $(pahole-ver), 126)        += --btf_features=base_ref
> +module-resolve-btfids-flags-$(call test-ge, $(pahole-ver), 126) = -r
> +endif

The patch set looks great to me.
I wonder whether we should drop this extra BTF_BASE_REF flag
and do it unconditionally.
Currently btf_parse_module() doesn't have any real checks
whether module's btf matches base_btf.
All the btf_check_*() might succeed by luck even if vmlinux btf
is different.
Since .BTF.base_ref is small we can always emit it and
during module load the btf_reconcile() step will be that safety check.
Less build options, less moving pieces and doing it all the time
will make the whole process robust.
Conditional BTF_BASE_REF=1 will likely mean that there will be
bugs that only few folks will hit.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 08/13] kbuild, bpf: add module-specific pahole/resolve_btfids flags for base reference BTF
  2024-03-23  2:50   ` Alexei Starovoitov
@ 2024-03-25  9:51     ` Alan Maguire
  2024-03-25 16:41       ` Alexei Starovoitov
  0 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-03-25  9:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Jiri Olsa, Arnaldo Carvalho de Melo,
	Quentin Monnet, Eddy Z, Mykola Lysenko, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Hou Tao,
	bpf, Masahiro Yamada, Luis R. Rodriguez, Nathan Chancellor

On 23/03/2024 02:50, Alexei Starovoitov wrote:
> On Fri, Mar 22, 2024 at 3:26 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> Support creation of module BTF along with base reference BTF;
>> the latter is stored in a .BTF.base_ref ELF section and supplements
>> split BTF references to base BTF with information about base types,
>> allowing for later reconciliation of split BTF with a (possibly
>> changed) base.  resolve_btfids uses the "-r" option to specify
>> that the BTF.ids section should be populated with split BTF
>> relative to the added .BTF.base_ref section rather than relative
>> to the vmlinux base.
>>
>> Modules using base reference BTF can be built via
>>
>> BTF_BASE_REF=1 make -C. -M=path2/module
>>
>> The default is still to use split BTF relative to vmlinux.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>>  scripts/Makefile.btf      | 7 +++++++
>>  scripts/Makefile.modfinal | 4 ++--
>>  2 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
>> index 9694ca3c5252..c8212b2ab7ca 100644
>> --- a/scripts/Makefile.btf
>> +++ b/scripts/Makefile.btf
>> @@ -19,4 +19,11 @@ pahole-flags-$(call test-ge, $(pahole-ver), 126)     = -j --btf_features=encode_forc
>>
>>  pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)         += --lang_exclude=rust
>>
>> +ifeq ($(BTF_BASE_REF),1)
>> +module-pahole-flags-$(call test-ge, $(pahole-ver), 126)        += --btf_features=base_ref
>> +module-resolve-btfids-flags-$(call test-ge, $(pahole-ver), 126) = -r
>> +endif
> 
> The patch set looks great to me.
> I wonder whether we should drop this extra BTF_BASE_REF flag
> and do it unconditionally.
> Currently btf_parse_module() doesn't have any real checks
> whether module's btf matches base_btf.
> All the btf_check_*() might succeed by luck even if vmlinux btf
> is different.
> Since .BTF.base_ref is small we can always emit it and
> during module load the btf_reconcile() step will be that safety check.
> Less build options, less moving pieces and doing it all the time
> will make the whole process robust.
> Conditional BTF_BASE_REF=1 will likely mean that there will be
> bugs that only few folks will hit.

We could certainly do it unconditionally for out-of-tree module builds
(the KBUILD_EXTMOD case); so anytime a user does "make -C.
M=path/2/module", they would get base reference BTF. With this series,
that would just mean applying this change:

diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
index c8212b2ab7ca..0322f8450a89 100644
--- a/scripts/Makefile.btf
+++ b/scripts/Makefile.btf
@@ -19,7 +19,7 @@ pahole-flags-$(call test-ge, $(pahole-ver), 126)
= -j --btf_features=encode_forc

 pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)         +=
--lang_exclude=rust

-ifeq ($(BTF_BASE_REF),1)
+ifneq ($(KBUILD_EXTMOD),)
 module-pahole-flags-$(call test-ge, $(pahole-ver), 126)        +=
--btf_features=base_ref
 module-resolve-btfids-flags-$(call test-ge, $(pahole-ver), 126) = -r
 endif

So with this approach, in-tree modules don't add base reference BTF
since they don't need it, while out-of-tree module builds always do.
We could arrange for bpf_testmod to be built both ways perhaps to ensure
both approaches get tested in selftests along with kfunc addition etc.

For adding base reference BTF in all cases, I ran the numbers on adding
base reference BTF across all kernel modules and for x86_64 with 2667
modules. Base reference BTF adds around 4MB in total. Not huge, but for
in-tree builds, the only cases we've seen that triggered BTF mismatches
have been broken cases where vmlinux got built twice during kernel
build. So my suggestion would be to limit base reference BTF addition to
the KBUILD_EXTMOD case, and make it unconditional for that case only.

Thanks!

Alan

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 08/13] kbuild, bpf: add module-specific pahole/resolve_btfids flags for base reference BTF
  2024-03-25  9:51     ` Alan Maguire
@ 2024-03-25 16:41       ` Alexei Starovoitov
  0 siblings, 0 replies; 29+ messages in thread
From: Alexei Starovoitov @ 2024-03-25 16:41 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, Jiri Olsa, Arnaldo Carvalho de Melo,
	Quentin Monnet, Eddy Z, Mykola Lysenko, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Hou Tao,
	bpf, Masahiro Yamada, Luis R. Rodriguez, Nathan Chancellor

On Mon, Mar 25, 2024 at 2:51 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 23/03/2024 02:50, Alexei Starovoitov wrote:
> > On Fri, Mar 22, 2024 at 3:26 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> Support creation of module BTF along with base reference BTF;
> >> the latter is stored in a .BTF.base_ref ELF section and supplements
> >> split BTF references to base BTF with information about base types,
> >> allowing for later reconciliation of split BTF with a (possibly
> >> changed) base.  resolve_btfids uses the "-r" option to specify
> >> that the BTF.ids section should be populated with split BTF
> >> relative to the added .BTF.base_ref section rather than relative
> >> to the vmlinux base.
> >>
> >> Modules using base reference BTF can be built via
> >>
> >> BTF_BASE_REF=1 make -C. -M=path2/module
> >>
> >> The default is still to use split BTF relative to vmlinux.
> >>
> >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >> ---
> >>  scripts/Makefile.btf      | 7 +++++++
> >>  scripts/Makefile.modfinal | 4 ++--
> >>  2 files changed, 9 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
> >> index 9694ca3c5252..c8212b2ab7ca 100644
> >> --- a/scripts/Makefile.btf
> >> +++ b/scripts/Makefile.btf
> >> @@ -19,4 +19,11 @@ pahole-flags-$(call test-ge, $(pahole-ver), 126)     = -j --btf_features=encode_forc
> >>
> >>  pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)         += --lang_exclude=rust
> >>
> >> +ifeq ($(BTF_BASE_REF),1)
> >> +module-pahole-flags-$(call test-ge, $(pahole-ver), 126)        += --btf_features=base_ref
> >> +module-resolve-btfids-flags-$(call test-ge, $(pahole-ver), 126) = -r
> >> +endif
> >
> > The patch set looks great to me.
> > I wonder whether we should drop this extra BTF_BASE_REF flag
> > and do it unconditionally.
> > Currently btf_parse_module() doesn't have any real checks
> > whether module's btf matches base_btf.
> > All the btf_check_*() might succeed by luck even if vmlinux btf
> > is different.
> > Since .BTF.base_ref is small we can always emit it and
> > during module load the btf_reconcile() step will be that safety check.
> > Less build options, less moving pieces and doing it all the time
> > will make the whole process robust.
> > Conditional BTF_BASE_REF=1 will likely mean that there will be
> > bugs that only few folks will hit.
>
> We could certainly do it unconditionally for out-of-tree module builds
> (the KBUILD_EXTMOD case); so anytime a user does "make -C.
> M=path/2/module", they would get base reference BTF. With this series,
> that would just mean applying this change:
>
> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
> index c8212b2ab7ca..0322f8450a89 100644
> --- a/scripts/Makefile.btf
> +++ b/scripts/Makefile.btf
> @@ -19,7 +19,7 @@ pahole-flags-$(call test-ge, $(pahole-ver), 126)
> = -j --btf_features=encode_forc
>
>  pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)         +=
> --lang_exclude=rust
>
> -ifeq ($(BTF_BASE_REF),1)
> +ifneq ($(KBUILD_EXTMOD),)
>  module-pahole-flags-$(call test-ge, $(pahole-ver), 126)        +=
> --btf_features=base_ref
>  module-resolve-btfids-flags-$(call test-ge, $(pahole-ver), 126) = -r
>  endif
>
> So with this approach, in-tree modules don't add base reference BTF
> since they don't need it, while out-of-tree module builds always do.
> We could arrange for bpf_testmod to be built both ways perhaps to ensure
> both approaches get tested in selftests along with kfunc addition etc.
>
> For adding base reference BTF in all cases, I ran the numbers on adding
> base reference BTF across all kernel modules and for x86_64 with 2667
> modules. Base reference BTF adds around 4MB in total. Not huge, but for
> in-tree builds, the only cases we've seen that triggered BTF mismatches
> have been broken cases where vmlinux got built twice during kernel
> build. So my suggestion would be to limit base reference BTF addition to
> the KBUILD_EXTMOD case, and make it unconditional for that case only.

So 4M across 2667 modules is ~1.5Kbyte per module ?
And how many modules out of 2667 will be loaded?
Even for 100 loaded modules it's just 150Kbyte of extra memory.

But I'm fine with sticking to KBUILD_EXTMOD only.
Let's make bpf_testmod with it though unconditionally.
No need to build it twice.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 01/13] libbpf: add support to btf__add_fwd() for ENUM64
  2024-03-22 10:24 ` [RFC bpf-next 01/13] libbpf: add support to btf__add_fwd() for ENUM64 Alan Maguire
@ 2024-03-29 21:59   ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2024-03-29 21:59 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, Mar 22, 2024 at 3:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Forward declaration of BTF_KIND_ENUM64 is added by supporting BTF_FWD_ENUM64
> as an enumerated value for btf_fwd_kind; an ENUM64 forward is an 8-byte
> signed enum64 with no values.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/lib/bpf/btf.c | 7 ++++++-
>  tools/lib/bpf/btf.h | 1 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 2d0840ef599a..dcff6c29c851 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -2418,7 +2418,7 @@ int btf__add_enum64_value(struct btf *btf, const char *name, __u64 value)
>   * Append new BTF_KIND_FWD type with:
>   *   - *name*, non-empty/non-NULL name;
>   *   - *fwd_kind*, kind of forward declaration, one of BTF_FWD_STRUCT,
> - *     BTF_FWD_UNION, or BTF_FWD_ENUM;
> + *     BTF_FWD_UNION, BTF_FWD_ENUM or BTF_FWD_ENUM64;
>   * Returns:
>   *   - >0, type ID of newly added BTF type;
>   *   - <0, on error.
> @@ -2446,6 +2446,11 @@ int btf__add_fwd(struct btf *btf, const char *name, enum btf_fwd_kind fwd_kind)
>                  * values; we also assume a standard 4-byte size for it
>                  */
>                 return btf__add_enum(btf, name, sizeof(int));
> +       case BTF_FWD_ENUM64:
> +               /* enum64 forward is similarly just an enum64 with no enum
> +                * values; assume 8 byte size, signed.
> +                */
> +               return btf__add_enum64(btf, name, 8, true);

nit: let's do sizeof(__u64) to keep it consistent with the FWD_ENUM case


>         default:
>                 return libbpf_err(-EINVAL);
>         }
> diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> index 8e6880d91c84..47d3e00b25c7 100644
> --- a/tools/lib/bpf/btf.h
> +++ b/tools/lib/bpf/btf.h
> @@ -194,6 +194,7 @@ enum btf_fwd_kind {
>         BTF_FWD_STRUCT = 0,
>         BTF_FWD_UNION = 1,
>         BTF_FWD_ENUM = 2,
> +       BTF_FWD_ENUM64 = 3,
>  };
>
>  LIBBPF_API int btf__add_fwd(struct btf *btf, const char *name, enum btf_fwd_kind fwd_kind);
> --
> 2.39.3
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 02/13] libbpf: add btf__new_split_base_ref() creating split BTF with reference base BTF
  2024-03-22 10:24 ` [RFC bpf-next 02/13] libbpf: add btf__new_split_base_ref() creating split BTF with reference base BTF Alan Maguire
@ 2024-03-29 22:00   ` Andrii Nakryiko
  2024-04-04 15:21     ` Alan Maguire
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2024-03-29 22:00 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, Mar 22, 2024 at 3:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> To support more robust split BTF, adding supplemental context for the
> base BTF type ids that split BTF refers to is required.  Without such
> references, a simple shuffling of base BTF type ids (without any other
> significant change) invalidates the split BTF.  Here the attempt is made
> to store additional context to make split BTF more robust.
>
> This context comes in the form of "reference" base BTF - this base BTF
> constitutes the minimal BTF representation needed to disambiguate split BTF
> references to base BTF.  So for example if a struct is referred to from
> split -> base BTF, a forward representation of that struct is retained

so you decided to not bother with recording expected size? Remember we
were talking about situations where vmlinux struct can be embedded
inside the kernel module's struct. In this case *exact* struct size is
very important and can't be changed without breaking compatibility.
Shouldn't we keep track of this?

> in reference base BTF, and split BTF refers to it.  Base types are kept
> intact in reference base BTF, and reference types like pointers are stored
> as-is.  Avoiding struct/union/enum/enum64 expansion is important to keep
> reference base BTF representation to a minimum size; however anonymous
> struct, union and enum[64] types are represented in full since type details
> are needed to disambiguate the reference - the name is not enough in those
> cases since there is no name.  In practice these are rare; in sample
> cases where reference base BTF was generated for in-tree kernel modules,
> only a few were needed in base reference BTF.  These represent the
> anonymous struct/unions that are used by the module but were de-duplicated
> to use base vmlinux BTF ids instead.
>
> When successful, a new representation of the split BTF passed in is
> returned.  It refers to the reference BTF as its base, and both returned
> split and base BTF need to be freed by the caller.
>
> So to take a simple example, with split BTF with a type referring
> to "struct sk_buff", we will generate base reference BTF with a
> FWD struct sk_buff, and the split BTF will refer to it instead.
>
> Tools like pahole can utilize such split BTF to popuate the .BTF section

typo: populate

> (split BTF) and an additional .BTF.base_ref section (base reference BTF).
> Then when the split BTF is loaded, the base reference BTF can be used
> to reconcile split BTF and the current - and possibly changed - base BTF.
>
> So for example if "struct sk_buff" was id 502 when the split BTF was
> originally generated,  we can use the reference base BTF to see that
> id 502 refers to a "struct sk_buff" and replace instances of id 502
> with the current (reconciled) base BTF sk_buff type id.
>
> Base reference BTF is small; when building a kernel with all modules
> using base reference BTF as a test, the average size for module
> base reference BTF is 1555 bytes (standard deviation 1563).  The
> maximum base reference BTF size across ~2700 modules was 37895 bytes.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/lib/bpf/btf.c      | 215 +++++++++++++++++++++++++++++++++++++--
>  tools/lib/bpf/btf.h      |  15 +++
>  tools/lib/bpf/libbpf.map |   1 +
>  3 files changed, 225 insertions(+), 6 deletions(-)
>

[...]

> +static int btf_add_ref_type_ids(__u32 *id, void *ctx)
> +{
> +       struct btf_ref *ref = ctx;
> +       struct btf_type *t = btf_type_by_id(ref->pipe.src, *id);
> +       int ret;
> +
> +       /* split BTF id, not needed */
> +       if (*id > ref->nr_base_types)
> +               return 0;
> +       /* already added ? */
> +       if (ref->ids[*id] <= BTF_MAX_NR_TYPES)
> +               return 0;
> +       ref->ids[*id] = *id;
> +
> +       /* struct/union members not needed, except for anonymous structs
> +        * and unions, which we need since name won't help us determine
> +        * matches; so if a named struct/union, no need to recurse
> +        * into members.
> +        */
> +       if (btf_is_eligible_named_fwd(t))
> +               return 0;

Let's be a bit more robust here. There are three cases for BTF types
(kinds) in "base reference BTF": forward-declared (non-anonymous
structs/unions/enums), copied-as-is (anonymous
struct/union/enum/func_proto, plus INT/FLOAT; maybe something else),
but also there is a third kind that shouldn't be referenced from split
BTF (datasec, funcs, vars, etc). I'd like us to be strict in checking
that we don't accidentally and silently support the latter. It should
mean a misuse of this approach and we should error out in this case.

> +
> +       /* ensure references in type are added also. */
> +       ret = btf_type_visit_type_ids(t, btf_add_ref_type_ids, ctx);
> +       if (ret < 0)
> +               return ret;
> +       return 0;
> +}
> +
> +/* All split BTF ids will be shifted downwards since there are less base BTF
> + * in reference base BTF, and for those that refer to base BTF, we use the
> + * reference map to map from original base BTF to reference base BTF id.
> + */
> +static int btf_update_ref_type_ids(__u32 *id, void *ctx)
> +{
> +       struct btf_ref *ref = ctx;
> +
> +       if (*id >= ref->nr_base_types)
> +               *id -= ref->diff_id;
> +       else
> +               *id = ref->ids[*id];
> +       return 0;
> +}
> +
> +/* Create updated split BTF with "reference" base BTF; reference base BTF
> + * consists of BTF information required to clarify the types that split
> + * BTF refers to, omitting unneeded details.  Specifically it will contain
> + * base types and forward declarations of structs, unions and enumerated
> + * types, along with associated reference types like pointers, arrays etc.
> + *
> + * The only case where structs, unions or enumerated types are fully represented
> + * is when they are anonymous; in such cases, info about type content is needed
> + * to clarify type references.
> + *
> + * We return newly-created split BTF where the split BTf refers to a newly-created
> + * base reference BTF. Both must be freed separately by the caller.
> + *
> + * When creating the BTF representation for a module and provided with the
> + * base_ref option, pahole will create split BTF using this API, and store
> + * the base reference BTF in the .BTF.base.reference section.
> + */
> +struct btf *btf__new_split_base_ref(struct btf *src_btf)

I find this name very confusing, for some reason. btf__new*/btf__parse
is generally used to either create a new empty instance, or just
parse/index pre-existing BTF information. In this case I feel like
it's actually a different operation, where we create a derived BTF, so
I'd name it distinctively. How do you feel about something with
"distill" terminology as an action performed here. E.g.,
btf__distill_base() or something like that?

The API contract is also suboptimal, given back split BTF where base
BTF is owned (and has to be freed). Let's keep it a bit more explicit,
something like:

int btf__distill_base(struct btf *new_base_btf, struct btf *new_split_btf);

WDYT?

> +{
> +       unsigned int n = btf__type_cnt(src_btf);
> +       const struct btf_type *t;
> +       struct btf_ref ref = {};
> +       struct btf *base_btf = NULL, *split_btf = NULL;
> +       __u32 i, id = 0;
> +       int ret = 0;
> +
> +       /* src BTF must be split BTF. */
> +       if (!btf__base_btf(src_btf)) {
> +               errno = -EINVAL;
> +               return NULL;
> +       }
> +       base_btf = btf__new_empty();

I was confused, it's baseref_btf or something, we already have
"base_btf", which is coming from btf__base_btf(src_btf), so let's
distinguish all those "base" BTFs?

> +       if (!base_btf)
> +               return NULL;
> +       ref.ids = calloc(n, sizeof(*ref.ids));
> +       if (!ref.ids) {
> +               ret = -ENOMEM;
> +               goto err_out;
> +       }
> +       for (i = 1; i < n; i++)
> +               ref.ids[i] = BTF_UNPROCESSED_ID;

by now we generally expect BTF IDs to be a positive integer, so I'd
use int for ids and just use explicit -1, and then a bit more natural
check for >= 0 in btf_add_ref_type_ids

> +       ref.pipe.src = src_btf;
> +       ref.pipe.dst = base_btf;
> +       ref.pipe.str_off_map = hashmap__new(btf_dedup_identity_hash_fn, btf_dedup_equal_fn, NULL);
> +       if (IS_ERR(ref.pipe.str_off_map)) {
> +               ret = -ENOMEM;
> +               goto err_out;
> +       }

[...]

> +                       default:
> +                               pr_warn("unexpected kind [%u] when creating base reference BTF.\n",
> +                                       btf_kind(t));
> +                               goto err_out;
> +                       }
> +                       ret = btf__add_fwd(base_btf, btf__name_by_offset(src_btf, t->name_off),
> +                                          fwd_kind);
> +               } else {
> +                       ret = btf_add_type(&ref.pipe, t);

see comment above, we should detect erroneous uses of referencing
"non-relocatable" kinds like VAR/DATASEC

> +               }
> +               if (ret < 0)
> +                       goto err_out;
> +               ref.ids[i] = ++id;
> +       }
> +       /* now create new split BTF with reference base BTF as its base; we end up with
> +        * split BTF that has base BTF that represents enough about its base references
> +        * to allow it to be reconciled with the base BTF available.
> +        */
> +       split_btf = btf__new_empty_split(base_btf);
> +       if (!split_btf)
> +               goto err_out;
> +
> +       ref.pipe.dst = split_btf;
> +       /* all split BTF ids will be shifted downwards since there are less base BTF ids
> +        * in reference base BTF.
> +        */
> +       ref.diff_id = ref.nr_base_types - btf__type_cnt(base_btf);
> +
> +       /* First add all split types */
> +       for (i = src_btf->start_id; i < n; i++) {
> +               t = btf_type_by_id(src_btf, i);
> +               ret = btf_add_type(&ref.pipe, t);
> +               if (ret < 0)
> +                       goto err_out;
> +       }
> +       /* Now update base/split BTF ids. */
> +       for (i = 1; i < btf__type_cnt(split_btf); i++) {

nit: cache this btf__type_cnt() result, no need to keep calling it all the time

> +               t = btf_type_by_id(split_btf, i);
> +
> +               ret = btf_type_visit_type_ids((struct btf_type *)t,  btf_update_ref_type_ids, &ref);

nit: it's a bit annoying to cast t here, given btf_type_by_id()
returns non-const type as you need it. Just mark t as non-const and
use btf_type_by_id() everywhere?



> +               if (ret < 0)
> +                       goto err_out;
> +       }
> +       free(ref.ids);
> +       hashmap__free(ref.pipe.str_off_map);
> +       return split_btf;

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 04/13] libbpf: add btf__parse_opts() API for flexible BTF parsing
  2024-03-22 10:24 ` [RFC bpf-next 04/13] libbpf: add btf__parse_opts() API for flexible BTF parsing Alan Maguire
@ 2024-03-29 22:00   ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2024-03-29 22:00 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, Mar 22, 2024 at 3:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Options cover existing parsing scenarios (ELF, raw, retrieving
> .BTF.ext) and also allow specification of the ELF section name
> containing BTF.  This will allow consumers to retrieve BTF from
> .BTF.base_ref sections (BTF_BASE_REF_ELF_SEC) also.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/lib/bpf/btf.c      | 24 ++++++++++++++++++------
>  tools/lib/bpf/btf.h      | 32 ++++++++++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.map |  1 +
>  3 files changed, 51 insertions(+), 6 deletions(-)
>

[...]

> +struct btf *btf__parse_opts(const char *path, struct btf_parse_opts *opts)
> +{
> +       if (!OPTS_VALID(opts, btf_parse_opts))
> +               return libbpf_err_ptr(-EINVAL);
> +       if (opts && !opts->btf_sec)

please use OPTS_GET() macro to access members of opts structs

> +               return btf__parse_raw_split(path, opts ? opts->base_btf : NULL);

this should still work for ELF parsing, so use btf_parse() which can
deal both with raw BTF and parse ELF (defaulting to .BTF section)

> +       return libbpf_ptr(btf_parse_elf(path,
> +                                       opts ? opts->btf_sec : BTF_ELF_SEC,
> +                                       opts ? opts->base_btf : NULL,
> +                                       opts ? &opts->btf_ext : NULL));

ditto

> +}
> +
>  static struct btf *btf_parse(const char *path, struct btf *base_btf, struct btf_ext **btf_ext)
>  {
>         struct btf *btf;
> @@ -1307,7 +1319,7 @@ static struct btf *btf_parse(const char *path, struct btf *base_btf, struct btf_
>                 return btf;
>         if (err != -EPROTO)
>                 return ERR_PTR(err);
> -       return btf_parse_elf(path, base_btf, btf_ext);
> +       return btf_parse_elf(path, BTF_ELF_SEC, base_btf, btf_ext);
>  }
>
>  struct btf *btf__parse(const char *path, struct btf_ext **btf_ext)
> diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> index f2a853d6fa55..c63043520149 100644
> --- a/tools/lib/bpf/btf.h
> +++ b/tools/lib/bpf/btf.h
> @@ -18,6 +18,7 @@ extern "C" {
>
>  #define BTF_ELF_SEC ".BTF"
>  #define BTF_EXT_ELF_SEC ".BTF.ext"
> +#define BTF_BASE_REF_ELF_SEC ".BTF.base_ref"
>  #define MAPS_ELF_SEC ".maps"
>
>  struct btf;
> @@ -129,6 +130,37 @@ LIBBPF_API struct btf *btf__parse_elf_split(const char *path, struct btf *base_b
>  LIBBPF_API struct btf *btf__parse_raw(const char *path);
>  LIBBPF_API struct btf *btf__parse_raw_split(const char *path, struct btf *base_btf);
>
> +struct btf_parse_opts {
> +       size_t sz;
> +       /* use base BTF to parse split BTF */
> +       struct btf *base_btf;
> +       /* retrieve optional .BTF.ext info */
> +       struct btf_ext *btf_ext;

shouldn't this be a pointer to a pointer, it's an output parameter,
right? This will give control to users to define if they are
interested in .BTF.ext content or not (NULL means "don't bother
parsing")

> +       /* BTF section name */
> +       const char *btf_sec;
> +        size_t:0;
> +};
> +
> +#define btf_parse_opts__last_field btf_sec
> +
> +/* @brief **btf__parse_opts()** parses BTF information from either a
> + * raw BTF file (*btf_sec* is NULL) or from the specified BTF section,
> + * also retrieving  .BTF.ext info if *btf_ext* is non-NULL.  If
> + * *base_btf* is specified, use it to parse split BTF from the
> + * specified location.
> + *
> + * @return new BTF object instance which has to be eventually freed with
> + * **btf__free()**
> + *
> + * On error, error-code-encoded-as-pointer is returned, not a NULL. To extract
> + * error code from such a pointer `libbpf_get_error()` should be used. If
> + * `libbpf_set_strict_mode(LIBBPF_STRICT_CLEAN_PTRS)` is enabled, NULL is
> + * returned on error instead. In both cases thread-local `errno` variable is
> + * always set to error code as well.
> + */
> +
> +LIBBPF_API struct btf *btf__parse_opts(const char *path, struct btf_parse_opts *opts);
> +
>  LIBBPF_API struct btf *btf__load_vmlinux_btf(void);
>  LIBBPF_API struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_btf);
>
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 3d53b1781af1..e9a7cb9c3c5b 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -415,5 +415,6 @@ LIBBPF_1.4.0 {
>                 bpf_token_create;
>                 btf__new_split;
>                 btf__new_split_base_ref;
> +               btf__parse_opts;

for the next revision, please put it into 1.5.0 section, we'll have
1.4 release soon, so let's avoid accidentally adding this back to 1.4

>                 btf_ext__raw_data;
>  } LIBBPF_1.3.0;

> --
> 2.39.3
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 06/13] kbuild,bpf: switch to using --btf_features for pahole v1.26 and later
  2024-03-22 10:24 ` [RFC bpf-next 06/13] kbuild,bpf: switch to using --btf_features for pahole v1.26 and later Alan Maguire
@ 2024-03-29 22:01   ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2024-03-29 22:01 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, Mar 22, 2024 at 3:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> The btf_features list can be used for pahole v1.26 and later -
> it is useful because if a feature is not yet implemented it will
> not exit with a failure message.  This will allow us to add feature
> requests to the pahole options without having to check pahole versions
> in future; if the version of pahole supports the feature it will be
> added.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  scripts/Makefile.btf | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
> index 82377e470aed..9694ca3c5252 100644
> --- a/scripts/Makefile.btf
> +++ b/scripts/Makefile.btf
> @@ -12,8 +12,11 @@ pahole-flags-$(call test-ge, $(pahole-ver), 121)     += --btf_gen_floats
>
>  pahole-flags-$(call test-ge, $(pahole-ver), 122)       += -j
>
> -pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)         += --lang_exclude=rust
> -
>  pahole-flags-$(call test-ge, $(pahole-ver), 125)       += --skip_encoding_btf_inconsistent_proto --btf_gen_optimized
>
> +# Switch to using --btf_features for v1.26 and later.
> +pahole-flags-$(call test-ge, $(pahole-ver), 126)       = -j --btf_features=encode_force,var,float,emum64,decl_tag,type_tag,optimized_func,consistent_func

typo: enum64


> +
> +pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)         += --lang_exclude=rust
> +
>  export PAHOLE_FLAGS := $(pahole-flags-y)
> --
> 2.39.3
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 09/13] libbpf: split BTF reconciliation
  2024-03-22 10:24 ` [RFC bpf-next 09/13] libbpf: split BTF reconciliation Alan Maguire
@ 2024-03-29 22:01   ` Andrii Nakryiko
  2024-04-05 10:06     ` Alan Maguire
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2024-03-29 22:01 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, Mar 22, 2024 at 3:26 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Map base reference BTF type ids referenced in split BTF and their
> references to the base BTF passed in, and if the mapping succeeds,
> reparent the split BTF to the base BTF.
>
> Reconciliation rules are
>
> - base types must match exactly
> - enum[64] types should match all value name/value pairs, but the
>   to-be-reconciled enum[64] can also define additional name/value pairs
> - named fwds match to the correspondingly-named struct/union/enum/enum64

yeah, but what about their size if they are embedded? Using FWD was a
nice trick, but it's not flexible enough for recording (optionally)
size... Probably emitting an empty (but named) struct/union/enum would
be a bit better (and actually make split BTF using base ref more valid
even without pre-processing).

> - anon struct/unions must have field names/offsets specified in base
>   reference BTF matched by those in base BTF we are matching with
>
> Reconciliation can not recurse, since it will be used in-kernel also and
> we do not want to blow up the kernel stack when carrying out type
> compatibility checks.  Hence we use a stack for reference type
> reconciliation rather then recursive function calls.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/lib/bpf/Build             |   2 +-
>  tools/lib/bpf/btf.c             |  58 ++++
>  tools/lib/bpf/btf.h             |   8 +
>  tools/lib/bpf/btf_reconcile.c   | 590 ++++++++++++++++++++++++++++++++

how wrong would it be to call this process "relocate" rather than "reconcile"?

>  tools/lib/bpf/libbpf.map        |   1 +
>  tools/lib/bpf/libbpf_internal.h |   2 +
>  6 files changed, 660 insertions(+), 1 deletion(-)
>  create mode 100644 tools/lib/bpf/btf_reconcile.c
>

[...]

> +/* Find next type after *id in base BTF that matches kind of type t passed in
> + * and name (if it is specified).  Match fwd kinds to appropriate kind also.
> + */
> +static int btf_reconcile_find_next(struct btf_reconcile *r, const struct btf_type *t,
> +                                  __u32 *id, const struct btf_type **tp)

I haven't grokked the whole patch logic just yet, doing a first pass,
so I might be asking stupid questions, sorry.

But it looks like we have these linear searches here to find matching
types, is that right? Wouldn't it be better to build an index first to
speed up search?

> +{
> +       const struct btf_type *nt;
> +       int kind, tkind = btf_kind(t);
> +       int tkflag = btf_kflag(t);
> +       __u32 i;
> +

[...]

> +static int btf_reconcile_int(struct btf_reconcile *r, const char *name,
> +                            const struct btf_type *t, const struct btf_type *bt)
> +{
> +       __u32 *info = (__u32 *)(t + 1);
> +       __u32 *binfo = (__u32 *)(bt + 1);
> +
> +       if (t->size != bt->size) {
> +               pr_warn("INT types '%s' disagree on size; reference base BTF says %d; base BTF says %d\n",
> +                       name, t->size, bt->size);
> +               return -EINVAL;
> +       }
> +       if (BTF_INT_ENCODING(*info) != BTF_INT_ENCODING(*binfo)) {
> +               pr_warn("INT types '%s' disagree on encoding; reference base BTF says '(%s/%s/%s); base BTF says '(%s/%s/%s)'\n",
> +                       name,
> +                       BTF_INT_ENCODING(*info) & BTF_INT_SIGNED ? "signed" : "unsigned",
> +                       BTF_INT_ENCODING(*info) & BTF_INT_CHAR ? "char" : "nonchar",
> +                       BTF_INT_ENCODING(*info) & BTF_INT_BOOL ? "bool" : "nonbool",
> +                       BTF_INT_ENCODING(*binfo) & BTF_INT_SIGNED ? "signed" : "unsigned",
> +                       BTF_INT_ENCODING(*binfo) & BTF_INT_CHAR ? "char" : "nonchar",
> +                       BTF_INT_ENCODING(*binfo) & BTF_INT_BOOL ? "bool" : "nonbool");

there is btf_int_encoding() helper, please use it

> +               return -EINVAL;
> +       }
> +       if (BTF_INT_BITS(*info) != BTF_INT_BITS(*binfo)) {

btf_int_bits()

> +               pr_warn("INT types '%s' disagree on bit size; reference base BTF says %d; base BTF says %d\n",
> +                       name, BTF_INT_BITS(*info), BTF_INT_BITS(*binfo));
> +               return -EINVAL;
> +       }
> +       return 0;
> +}
> +
> +static int btf_reconcile_float(struct btf_reconcile *r, const char *name,
> +                              const struct btf_type *t, const struct btf_type *bt)
> +{
> +
> +       if (t->size != bt->size) {
> +               pr_warn("float types '%s' disagree on size; reference base BTF says %d; base BTF says %d\n",
> +                       name, t->size, bt->size);
> +               return -EINVAL;
> +       }
> +       return 0;
> +}
> +
> +/* Ensure each enum value in type t has equivalent in base BTF and that values (if any) match. */
> +static int btf_reconcile_enum(struct btf_reconcile *r, const char *name,
> +                             const struct btf_type *t, const struct btf_type *bt)
> +{

should we care about compatibility between ENUM and ENUM64, they can
both represent the same values of the same size?

> +       struct btf_enum *v = (struct btf_enum *)(t + 1);
> +       struct btf_enum *bv = (struct btf_enum *)(bt + 1);
> +       bool found, match;
> +       int i, j;
> +
> +       for (i = 0; i < btf_vlen(t); i++, v++) {
> +               found = match = false;
> +
> +               if (!v->name_off)
> +                       continue;
> +               for (j = 0; j < btf_vlen(bt); j++, bv++) {
> +                       if (!bv->name_off)
> +                               continue;
> +                       if (strcmp(btf__name_by_offset(r->base_ref_btf, v->name_off),
> +                                  btf__name_by_offset(r->base_btf, bv->name_off)) != 0)

nit: please use local vars to shorten these long if conditions, it
will be easier to read and follow

> +                               continue;
> +                       found = true;
> +                       match = (v->val == bv->val);
> +                       break;
> +               }

[...]

> +       while ((err = btf_reconcile_find_next(r, t, &base_id, &bt)) != -ENOENT) {
> +               bt = btf_type_by_id(r->base_btf, base_id);
> +               switch (btf_kind(t)) {
> +               case BTF_KIND_INT:
> +                       err = btf_reconcile_int(r, name, t, bt);
> +                       break;
> +               case BTF_KIND_ENUM:
> +                       err = btf_reconcile_enum(r, name, t, bt);
> +                       break;
> +               case BTF_KIND_FLOAT:
> +                       err = btf_reconcile_float(r, name, t, bt);
> +                       break;
> +               case BTF_KIND_ENUM64:
> +                       err = btf_reconcile_enum64(r, name, t, bt);
> +                       break;
> +               case BTF_KIND_FWD:

well, FWD can be for enum vs struct, gotta check that

> +                       err = 0;
> +                       break;
> +               default:
> +                       return 0;
> +               }
> +               if (!err) {
> +                       r->map[id] = base_id;
> +                       return 0;
> +               }
> +       }
> +       return err;
> +}

[...]

> +                               if (err) {
> +                                       pr_warn("could not find base BTF type for base reference type[%u]\n",
> +                                               id);
> +                                       return err;
> +                               }
> +                       } else {
> +                               if (btf_reconcile_push(r, id) < 0 ||
> +                                   btf_reconcile_push(r, t->type) < 0)
> +                                       return -ENOSPC;

I'm missing something, please help me understand. I don't get why we
need a recursive algorithm at all.

In my mind, we have this small "base ref" set of types referenced from
module's BTF (split BTF part), right? So all we should need is to map
every type from base ref set to vmlinux BTF.

What I don't yet fully get is why CONST/VOLATILE or PTR need to
postpone reconciliation via a queue. By the time we get to types in
split BTF all base ref types should be mapped, so all you need is to
remap t->type to resolved vmlinux BTF, no?

I suspect the answer might have something to do with those anonymous
structs/unions which you copy verbatim into base ref BTF?

But on the latter topic, I wonder if we at all need this? Why not keep
all those anon struct/union/enum in module's part of BTF? If they are
unnamed, I doubt they will ever be referenced from kfuncs or anything
like that, so their BTF ID isn't that important.

If base BTF is all named types, it would simplify the reconciliation
process significantly, I think.

But again, I only skimmed the overall algorithm, sorry for my
laziness, but I figured it would be good to discuss the above first
anyways.

> +                       }
> +                       break;

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 11/13] libbpf,bpf: share BTF reconcile-related code with kernel
  2024-03-22 10:24 ` [RFC bpf-next 11/13] libbpf,bpf: share BTF reconcile-related code with kernel Alan Maguire
@ 2024-03-29 22:04   ` Andrii Nakryiko
  2024-04-01 15:58     ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2024-03-29 22:04 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, Mar 22, 2024 at 3:26 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Share reconciliation implementation with the kernel.  As part of this,
> we also need the type/string visitation functions so add them to a
> btf_common.c file that also gets shared with the kernel. Reconciliation
> code between kernel and userspace is identical save for the
> implementation of the reparenting of split BTF to the reconciled base
> BTF; this depends on struct btf internals so is different in kernel and
> userspace.
>
> One other wrinkle on the kernel side is we have to map .BTF.ids in
> modules as they were generated with the type ids used at BTF encoding
> time. btf_reconcile() optionally returns an array mapping from old BTF
> ids to reconciled ids, so we use that to fix up these references where
> needed for kfuncs.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  include/linux/btf.h           |  29 +++++
>  kernel/bpf/Makefile           |   8 ++
>  kernel/bpf/btf.c              | 197 +++++++++++++++++++++++++++++-----
>  tools/lib/bpf/Build           |   2 +-
>  tools/lib/bpf/btf.c           | 130 ----------------------
>  tools/lib/bpf/btf_common.c    | 146 +++++++++++++++++++++++++
>  tools/lib/bpf/btf_reconcile.c |  24 +++++
>  7 files changed, 376 insertions(+), 160 deletions(-)
>  create mode 100644 tools/lib/bpf/btf_common.c
>

[...]

> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 09a11954cad8..e034f0c26c96 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -5026,136 +5026,6 @@ struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_bt
>         return btf__parse_split(path, vmlinux_btf);
>  }
>
> -int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx)

This and btf_type_visit_str_offs are heavily recursive functions not
appropriate for kernel code, so you can't just share the code with
kernel.

And before you rush adding artificial depth limits for kernel-side,
let's discuss implementing btf_type_visit_type_ids and
btf_type_visit_str_offs through iterator approach, just like we did
for elf symbols iteration.

I think it would be a general improvement to the point where we can
probably think about exposing these BTF type id/string ref iterators
as public API, as that's a very useful functionality. I just never
felt like callback based API is the right abstraction (and still think
that). But iterator sounds like a good idea and is worth doing as a
preparatory series to simplify code in libbpf and preparing everything
for kernel as well.

> -{
> -       int i, n, err;
> -

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 11/13] libbpf,bpf: share BTF reconcile-related code with kernel
  2024-03-29 22:04   ` Andrii Nakryiko
@ 2024-04-01 15:58     ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2024-04-01 15:58 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, Mar 29, 2024 at 3:04 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Mar 22, 2024 at 3:26 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >
> > Share reconciliation implementation with the kernel.  As part of this,
> > we also need the type/string visitation functions so add them to a
> > btf_common.c file that also gets shared with the kernel. Reconciliation
> > code between kernel and userspace is identical save for the
> > implementation of the reparenting of split BTF to the reconciled base
> > BTF; this depends on struct btf internals so is different in kernel and
> > userspace.
> >
> > One other wrinkle on the kernel side is we have to map .BTF.ids in
> > modules as they were generated with the type ids used at BTF encoding
> > time. btf_reconcile() optionally returns an array mapping from old BTF
> > ids to reconciled ids, so we use that to fix up these references where
> > needed for kfuncs.
> >
> > Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> > ---
> >  include/linux/btf.h           |  29 +++++
> >  kernel/bpf/Makefile           |   8 ++
> >  kernel/bpf/btf.c              | 197 +++++++++++++++++++++++++++++-----
> >  tools/lib/bpf/Build           |   2 +-
> >  tools/lib/bpf/btf.c           | 130 ----------------------
> >  tools/lib/bpf/btf_common.c    | 146 +++++++++++++++++++++++++
> >  tools/lib/bpf/btf_reconcile.c |  24 +++++
> >  7 files changed, 376 insertions(+), 160 deletions(-)
> >  create mode 100644 tools/lib/bpf/btf_common.c
> >
>
> [...]
>
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index 09a11954cad8..e034f0c26c96 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -5026,136 +5026,6 @@ struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_bt
> >         return btf__parse_split(path, vmlinux_btf);
> >  }
> >
> > -int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx)
>
> This and btf_type_visit_str_offs are heavily recursive functions not
> appropriate for kernel code, so you can't just share the code with
> kernel.

Sorry, a complete brain fart on my part. Of course they are not
recursive, there is no reason for them to be. So this iterator idea I
proposed, while I think is worth doing, strictly speaking isn't
necessary for your work.

>
> And before you rush adding artificial depth limits for kernel-side,
> let's discuss implementing btf_type_visit_type_ids and
> btf_type_visit_str_offs through iterator approach, just like we did
> for elf symbols iteration.
>
> I think it would be a general improvement to the point where we can
> probably think about exposing these BTF type id/string ref iterators
> as public API, as that's a very useful functionality. I just never
> felt like callback based API is the right abstraction (and still think
> that). But iterator sounds like a good idea and is worth doing as a
> preparatory series to simplify code in libbpf and preparing everything
> for kernel as well.
>
> > -{
> > -       int i, n, err;
> > -
>
> [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 02/13] libbpf: add btf__new_split_base_ref() creating split BTF with reference base BTF
  2024-03-29 22:00   ` Andrii Nakryiko
@ 2024-04-04 15:21     ` Alan Maguire
  2024-04-04 22:49       ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-04-04 15:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On 29/03/2024 22:00, Andrii Nakryiko wrote:
> On Fri, Mar 22, 2024 at 3:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> To support more robust split BTF, adding supplemental context for the
>> base BTF type ids that split BTF refers to is required.  Without such
>> references, a simple shuffling of base BTF type ids (without any other
>> significant change) invalidates the split BTF.  Here the attempt is made
>> to store additional context to make split BTF more robust.
>>
>> This context comes in the form of "reference" base BTF - this base BTF
>> constitutes the minimal BTF representation needed to disambiguate split BTF
>> references to base BTF.  So for example if a struct is referred to from
>> split -> base BTF, a forward representation of that struct is retained
> 
> so you decided to not bother with recording expected size? Remember we
> were talking about situations where vmlinux struct can be embedded
> inside the kernel module's struct. In this case *exact* struct size is
> very important and can't be changed without breaking compatibility.
> Shouldn't we keep track of this?
>

There are two cases here I think. The embedded case - which is rare, but
needs size tracking - and the non-embedded case, where a struct/union
can potentially change size without invalidating the module's
assumptions. In the former, your suggestion about having an empty
(sized) struct instead of a forward would work well I think. The only
question is around the non-embedded case. We could either use the same
approach (but then ignore the size constraints during reconciliation) or
continue to use FWDs. The advantage of continuing to use FWDs is we
wouldn't have to compute struct embeddedness during reconciliation, and
the BTF itself would be a bit more informative about the constraints
(FWD means no constraints, empty sized struct means we care about size
due to embeddedness). What do you think?

>> in reference base BTF, and split BTF refers to it.  Base types are kept
>> intact in reference base BTF, and reference types like pointers are stored
>> as-is.  Avoiding struct/union/enum/enum64 expansion is important to keep
>> reference base BTF representation to a minimum size; however anonymous
>> struct, union and enum[64] types are represented in full since type details
>> are needed to disambiguate the reference - the name is not enough in those
>> cases since there is no name.  In practice these are rare; in sample
>> cases where reference base BTF was generated for in-tree kernel modules,
>> only a few were needed in base reference BTF.  These represent the
>> anonymous struct/unions that are used by the module but were de-duplicated
>> to use base vmlinux BTF ids instead.
>>
>> When successful, a new representation of the split BTF passed in is
>> returned.  It refers to the reference BTF as its base, and both returned
>> split and base BTF need to be freed by the caller.
>>
>> So to take a simple example, with split BTF with a type referring
>> to "struct sk_buff", we will generate base reference BTF with a
>> FWD struct sk_buff, and the split BTF will refer to it instead.
>>
>> Tools like pahole can utilize such split BTF to popuate the .BTF section
> 
> typo: populate
>

Will fix, thanks.

>> (split BTF) and an additional .BTF.base_ref section (base reference BTF).
>> Then when the split BTF is loaded, the base reference BTF can be used
>> to reconcile split BTF and the current - and possibly changed - base BTF.
>>
>> So for example if "struct sk_buff" was id 502 when the split BTF was
>> originally generated,  we can use the reference base BTF to see that
>> id 502 refers to a "struct sk_buff" and replace instances of id 502
>> with the current (reconciled) base BTF sk_buff type id.
>>
>> Base reference BTF is small; when building a kernel with all modules
>> using base reference BTF as a test, the average size for module
>> base reference BTF is 1555 bytes (standard deviation 1563).  The
>> maximum base reference BTF size across ~2700 modules was 37895 bytes.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>>  tools/lib/bpf/btf.c      | 215 +++++++++++++++++++++++++++++++++++++--
>>  tools/lib/bpf/btf.h      |  15 +++
>>  tools/lib/bpf/libbpf.map |   1 +
>>  3 files changed, 225 insertions(+), 6 deletions(-)
>>
> 
> [...]
> 
>> +static int btf_add_ref_type_ids(__u32 *id, void *ctx)
>> +{
>> +       struct btf_ref *ref = ctx;
>> +       struct btf_type *t = btf_type_by_id(ref->pipe.src, *id);
>> +       int ret;
>> +
>> +       /* split BTF id, not needed */
>> +       if (*id > ref->nr_base_types)
>> +               return 0;
>> +       /* already added ? */
>> +       if (ref->ids[*id] <= BTF_MAX_NR_TYPES)
>> +               return 0;
>> +       ref->ids[*id] = *id;
>> +
>> +       /* struct/union members not needed, except for anonymous structs
>> +        * and unions, which we need since name won't help us determine
>> +        * matches; so if a named struct/union, no need to recurse
>> +        * into members.
>> +        */
>> +       if (btf_is_eligible_named_fwd(t))
>> +               return 0;
> 
> Let's be a bit more robust here. There are three cases for BTF types
> (kinds) in "base reference BTF": forward-declared (non-anonymous
> structs/unions/enums), copied-as-is (anonymous
> struct/union/enum/func_proto, plus INT/FLOAT; maybe something else),
> but also there is a third kind that shouldn't be referenced from split
> BTF (datasec, funcs, vars, etc). I'd like us to be strict in checking
> that we don't accidentally and silently support the latter. It should
> mean a misuse of this approach and we should error out in this case.
> 

Sounds good. I'll add that for v2.

>> +
>> +       /* ensure references in type are added also. */
>> +       ret = btf_type_visit_type_ids(t, btf_add_ref_type_ids, ctx);
>> +       if (ret < 0)
>> +               return ret;
>> +       return 0;
>> +}
>> +
>> +/* All split BTF ids will be shifted downwards since there are less base BTF
>> + * in reference base BTF, and for those that refer to base BTF, we use the
>> + * reference map to map from original base BTF to reference base BTF id.
>> + */
>> +static int btf_update_ref_type_ids(__u32 *id, void *ctx)
>> +{
>> +       struct btf_ref *ref = ctx;
>> +
>> +       if (*id >= ref->nr_base_types)
>> +               *id -= ref->diff_id;
>> +       else
>> +               *id = ref->ids[*id];
>> +       return 0;
>> +}
>> +
>> +/* Create updated split BTF with "reference" base BTF; reference base BTF
>> + * consists of BTF information required to clarify the types that split
>> + * BTF refers to, omitting unneeded details.  Specifically it will contain
>> + * base types and forward declarations of structs, unions and enumerated
>> + * types, along with associated reference types like pointers, arrays etc.
>> + *
>> + * The only case where structs, unions or enumerated types are fully represented
>> + * is when they are anonymous; in such cases, info about type content is needed
>> + * to clarify type references.
>> + *
>> + * We return newly-created split BTF where the split BTf refers to a newly-created
>> + * base reference BTF. Both must be freed separately by the caller.
>> + *
>> + * When creating the BTF representation for a module and provided with the
>> + * base_ref option, pahole will create split BTF using this API, and store
>> + * the base reference BTF in the .BTF.base.reference section.
>> + */
>> +struct btf *btf__new_split_base_ref(struct btf *src_btf)
> 
> I find this name very confusing, for some reason. btf__new*/btf__parse
> is generally used to either create a new empty instance, or just
> parse/index pre-existing BTF information. In this case I feel like
> it's actually a different operation, where we create a derived BTF, so
> I'd name it distinctively. How do you feel about something with
> "distill" terminology as an action performed here. E.g.,
> btf__distill_base() or something like that?
> 
> The API contract is also suboptimal, given back split BTF where base
> BTF is owned (and has to be freed). Let's keep it a bit more explicit,
> something like:
> 
> int btf__distill_base(struct btf *new_base_btf, struct btf *new_split_btf);
> 
> WDYT?
> 

Yeah, any suggestions around naming are very welcome! My first choice
was "minimal base" but that seems to prioritize the minimization, which
is really just a side-effect of the desire to add context to the
references in the split BTF. "base reference" feels clumsy to me too.
Ideally we want to find a name that signals that the base BTF is a
companion to the associated split BTF, adding context for it. Describing
the operation as distilling certainly connotes that we're removing
non-essential info; so we'd end up with

int btf__distill_base(const struct btf *src_btf, struct btf
**new_base_btf, struct btf **new_split_btf);

...and we could call the section .BTF.distilled.base perhaps.



>> +{
>> +       unsigned int n = btf__type_cnt(src_btf);
>> +       const struct btf_type *t;
>> +       struct btf_ref ref = {};
>> +       struct btf *base_btf = NULL, *split_btf = NULL;
>> +       __u32 i, id = 0;
>> +       int ret = 0;
>> +
>> +       /* src BTF must be split BTF. */
>> +       if (!btf__base_btf(src_btf)) {
>> +               errno = -EINVAL;
>> +               return NULL;
>> +       }
>> +       base_btf = btf__new_empty();
> 
> I was confused, it's baseref_btf or something, we already have
> "base_btf", which is coming from btf__base_btf(src_btf), so let's
> distinguish all those "base" BTFs?
>

Will do.

>> +       if (!base_btf)
>> +               return NULL;
>> +       ref.ids = calloc(n, sizeof(*ref.ids));
>> +       if (!ref.ids) {
>> +               ret = -ENOMEM;
>> +               goto err_out;
>> +       }
>> +       for (i = 1; i < n; i++)
>> +               ref.ids[i] = BTF_UNPROCESSED_ID;
> 
> by now we generally expect BTF IDs to be a positive integer, so I'd
> use int for ids and just use explicit -1, and then a bit more natural
> check for >= 0 in btf_add_ref_type_ids
> 

Sure.

>> +       ref.pipe.src = src_btf;
>> +       ref.pipe.dst = base_btf;
>> +       ref.pipe.str_off_map = hashmap__new(btf_dedup_identity_hash_fn, btf_dedup_equal_fn, NULL);
>> +       if (IS_ERR(ref.pipe.str_off_map)) {
>> +               ret = -ENOMEM;
>> +               goto err_out;
>> +       }
> 
> [...]
> 
>> +                       default:
>> +                               pr_warn("unexpected kind [%u] when creating base reference BTF.\n",
>> +                                       btf_kind(t));
>> +                               goto err_out;
>> +                       }
>> +                       ret = btf__add_fwd(base_btf, btf__name_by_offset(src_btf, t->name_off),
>> +                                          fwd_kind);
>> +               } else {
>> +                       ret = btf_add_type(&ref.pipe, t);
> 
> see comment above, we should detect erroneous uses of referencing
> "non-relocatable" kinds like VAR/DATASEC
> 
>> +               }
>> +               if (ret < 0)
>> +                       goto err_out;
>> +               ref.ids[i] = ++id;
>> +       }
>> +       /* now create new split BTF with reference base BTF as its base; we end up with
>> +        * split BTF that has base BTF that represents enough about its base references
>> +        * to allow it to be reconciled with the base BTF available.
>> +        */
>> +       split_btf = btf__new_empty_split(base_btf);
>> +       if (!split_btf)
>> +               goto err_out;
>> +
>> +       ref.pipe.dst = split_btf;
>> +       /* all split BTF ids will be shifted downwards since there are less base BTF ids
>> +        * in reference base BTF.
>> +        */
>> +       ref.diff_id = ref.nr_base_types - btf__type_cnt(base_btf);
>> +
>> +       /* First add all split types */
>> +       for (i = src_btf->start_id; i < n; i++) {
>> +               t = btf_type_by_id(src_btf, i);
>> +               ret = btf_add_type(&ref.pipe, t);
>> +               if (ret < 0)
>> +                       goto err_out;
>> +       }
>> +       /* Now update base/split BTF ids. */
>> +       for (i = 1; i < btf__type_cnt(split_btf); i++) {
> 
> nit: cache this btf__type_cnt() result, no need to keep calling it all the time
>

Will do.

>> +               t = btf_type_by_id(split_btf, i);
>> +
>> +               ret = btf_type_visit_type_ids((struct btf_type *)t,  btf_update_ref_type_ids, &ref);
> 
> nit: it's a bit annoying to cast t here, given btf_type_by_id()
> returns non-const type as you need it. Just mark t as non-const and
> use btf_type_by_id() everywhere?
>

Sure.

> 
> 
>> +               if (ret < 0)
>> +                       goto err_out;
>> +       }
>> +       free(ref.ids);
>> +       hashmap__free(ref.pipe.str_off_map);
>> +       return split_btf;
> 
> [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 02/13] libbpf: add btf__new_split_base_ref() creating split BTF with reference base BTF
  2024-04-04 15:21     ` Alan Maguire
@ 2024-04-04 22:49       ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2024-04-04 22:49 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Thu, Apr 4, 2024 at 8:22 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 29/03/2024 22:00, Andrii Nakryiko wrote:
> > On Fri, Mar 22, 2024 at 3:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> To support more robust split BTF, adding supplemental context for the
> >> base BTF type ids that split BTF refers to is required.  Without such
> >> references, a simple shuffling of base BTF type ids (without any other
> >> significant change) invalidates the split BTF.  Here the attempt is made
> >> to store additional context to make split BTF more robust.
> >>
> >> This context comes in the form of "reference" base BTF - this base BTF
> >> constitutes the minimal BTF representation needed to disambiguate split BTF
> >> references to base BTF.  So for example if a struct is referred to from
> >> split -> base BTF, a forward representation of that struct is retained
> >
> > so you decided to not bother with recording expected size? Remember we
> > were talking about situations where vmlinux struct can be embedded
> > inside the kernel module's struct. In this case *exact* struct size is
> > very important and can't be changed without breaking compatibility.
> > Shouldn't we keep track of this?
> >
>
> There are two cases here I think. The embedded case - which is rare, but
> needs size tracking - and the non-embedded case, where a struct/union
> can potentially change size without invalidating the module's
> assumptions. In the former, your suggestion about having an empty
> (sized) struct instead of a forward would work well I think. The only
> question is around the non-embedded case. We could either use the same
> approach (but then ignore the size constraints during reconciliation) or
> continue to use FWDs. The advantage of continuing to use FWDs is we
> wouldn't have to compute struct embeddedness during reconciliation, and
> the BTF itself would be a bit more informative about the constraints
> (FWD means no constraints, empty sized struct means we care about size
> due to embeddedness). What do you think?

Sure. I was going to propose using size zero, but FWD works just as
fine and avoids the ambiguity of embedding zero-sized structs. The
important part is to check and enforce size where it's important.

>
> >> in reference base BTF, and split BTF refers to it.  Base types are kept
> >> intact in reference base BTF, and reference types like pointers are stored
> >> as-is.  Avoiding struct/union/enum/enum64 expansion is important to keep
> >> reference base BTF representation to a minimum size; however anonymous
> >> struct, union and enum[64] types are represented in full since type details
> >> are needed to disambiguate the reference - the name is not enough in those
> >> cases since there is no name.  In practice these are rare; in sample
> >> cases where reference base BTF was generated for in-tree kernel modules,
> >> only a few were needed in base reference BTF.  These represent the
> >> anonymous struct/unions that are used by the module but were de-duplicated
> >> to use base vmlinux BTF ids instead.
> >>

[...]

> >> +/* Create updated split BTF with "reference" base BTF; reference base BTF
> >> + * consists of BTF information required to clarify the types that split
> >> + * BTF refers to, omitting unneeded details.  Specifically it will contain
> >> + * base types and forward declarations of structs, unions and enumerated
> >> + * types, along with associated reference types like pointers, arrays etc.
> >> + *
> >> + * The only case where structs, unions or enumerated types are fully represented
> >> + * is when they are anonymous; in such cases, info about type content is needed
> >> + * to clarify type references.
> >> + *
> >> + * We return newly-created split BTF where the split BTf refers to a newly-created
> >> + * base reference BTF. Both must be freed separately by the caller.
> >> + *
> >> + * When creating the BTF representation for a module and provided with the
> >> + * base_ref option, pahole will create split BTF using this API, and store
> >> + * the base reference BTF in the .BTF.base.reference section.
> >> + */
> >> +struct btf *btf__new_split_base_ref(struct btf *src_btf)
> >
> > I find this name very confusing, for some reason. btf__new*/btf__parse
> > is generally used to either create a new empty instance, or just
> > parse/index pre-existing BTF information. In this case I feel like
> > it's actually a different operation, where we create a derived BTF, so
> > I'd name it distinctively. How do you feel about something with
> > "distill" terminology as an action performed here. E.g.,
> > btf__distill_base() or something like that?
> >
> > The API contract is also suboptimal, given back split BTF where base
> > BTF is owned (and has to be freed). Let's keep it a bit more explicit,
> > something like:
> >
> > int btf__distill_base(struct btf *new_base_btf, struct btf *new_split_btf);
> >
> > WDYT?
> >
>
> Yeah, any suggestions around naming are very welcome! My first choice
> was "minimal base" but that seems to prioritize the minimization, which
> is really just a side-effect of the desire to add context to the
> references in the split BTF. "base reference" feels clumsy to me too.
> Ideally we want to find a name that signals that the base BTF is a
> companion to the associated split BTF, adding context for it. Describing
> the operation as distilling certainly connotes that we're removing
> non-essential info; so we'd end up with
>
> int btf__distill_base(const struct btf *src_btf, struct btf
> **new_base_btf, struct btf **new_split_btf);
>
> ...and we could call the section .BTF.distilled.base perhaps.
>

We can get cute with analogies, but I'm sure people will hate it. :)

What do you get after you remove all the non-essential cruft
(distilling the essence)? A "nugget" of goodness... ;)

I'd keep it short as .BTF.base (and know that it's this "distilled"
base BTF, not a real one), but I don't feel very strongly about this.

>
>
> >> +{
> >> +       unsigned int n = btf__type_cnt(src_btf);
> >> +       const struct btf_type *t;
> >> +       struct btf_ref ref = {};
> >> +       struct btf *base_btf = NULL, *split_btf = NULL;
> >> +       __u32 i, id = 0;
> >> +       int ret = 0;
> >> +
> >> +       /* src BTF must be split BTF. */
> >> +       if (!btf__base_btf(src_btf)) {
> >> +               errno = -EINVAL;
> >> +               return NULL;
> >> +       }
> >> +       base_btf = btf__new_empty();
> >

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 09/13] libbpf: split BTF reconciliation
  2024-03-29 22:01   ` Andrii Nakryiko
@ 2024-04-05 10:06     ` Alan Maguire
  2024-04-05 19:58       ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-04-05 10:06 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On 29/03/2024 22:01, Andrii Nakryiko wrote:
> On Fri, Mar 22, 2024 at 3:26 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> Map base reference BTF type ids referenced in split BTF and their
>> references to the base BTF passed in, and if the mapping succeeds,
>> reparent the split BTF to the base BTF.
>>
>> Reconciliation rules are
>>
>> - base types must match exactly
>> - enum[64] types should match all value name/value pairs, but the
>>   to-be-reconciled enum[64] can also define additional name/value pairs
>> - named fwds match to the correspondingly-named struct/union/enum/enum64
> 
> yeah, but what about their size if they are embedded? Using FWD was a
> nice trick, but it's not flexible enough for recording (optionally)
> size... Probably emitting an empty (but named) struct/union/enum would
> be a bit better (and actually make split BTF using base ref more valid
> even without pre-processing).
> 
>> - anon struct/unions must have field names/offsets specified in base
>>   reference BTF matched by those in base BTF we are matching with
>>
>> Reconciliation can not recurse, since it will be used in-kernel also and
>> we do not want to blow up the kernel stack when carrying out type
>> compatibility checks.  Hence we use a stack for reference type
>> reconciliation rather then recursive function calls.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>>  tools/lib/bpf/Build             |   2 +-
>>  tools/lib/bpf/btf.c             |  58 ++++
>>  tools/lib/bpf/btf.h             |   8 +
>>  tools/lib/bpf/btf_reconcile.c   | 590 ++++++++++++++++++++++++++++++++
> 
> how wrong would it be to call this process "relocate" rather than "reconcile"?
>

seems fine to me.

>>  tools/lib/bpf/libbpf.map        |   1 +
>>  tools/lib/bpf/libbpf_internal.h |   2 +
>>  6 files changed, 660 insertions(+), 1 deletion(-)
>>  create mode 100644 tools/lib/bpf/btf_reconcile.c
>>
> 
> [...]
> 
>> +/* Find next type after *id in base BTF that matches kind of type t passed in
>> + * and name (if it is specified).  Match fwd kinds to appropriate kind also.
>> + */
>> +static int btf_reconcile_find_next(struct btf_reconcile *r, const struct btf_type *t,
>> +                                  __u32 *id, const struct btf_type **tp)
> 
> I haven't grokked the whole patch logic just yet, doing a first pass,
> so I might be asking stupid questions, sorry.
> 
> But it looks like we have these linear searches here to find matching
> types, is that right? Wouldn't it be better to build an index first to
> speed up search?
> 

It would, but the aim here was to keep things simple with an eye to
sharing the code with the kernel. A lot of the libbpf hash stuff would
be handy but then we'd have to have something on the kernel side. Given
that the size of the base BTF is so small, the linear searches aren't
much of a cost.

>> +{
>> +       const struct btf_type *nt;
>> +       int kind, tkind = btf_kind(t);
>> +       int tkflag = btf_kflag(t);
>> +       __u32 i;
>> +
> 
> [...]
> 
>> +static int btf_reconcile_int(struct btf_reconcile *r, const char *name,
>> +                            const struct btf_type *t, const struct btf_type *bt)
>> +{
>> +       __u32 *info = (__u32 *)(t + 1);
>> +       __u32 *binfo = (__u32 *)(bt + 1);
>> +
>> +       if (t->size != bt->size) {
>> +               pr_warn("INT types '%s' disagree on size; reference base BTF says %d; base BTF says %d\n",
>> +                       name, t->size, bt->size);
>> +               return -EINVAL;
>> +       }
>> +       if (BTF_INT_ENCODING(*info) != BTF_INT_ENCODING(*binfo)) {
>> +               pr_warn("INT types '%s' disagree on encoding; reference base BTF says '(%s/%s/%s); base BTF says '(%s/%s/%s)'\n",
>> +                       name,
>> +                       BTF_INT_ENCODING(*info) & BTF_INT_SIGNED ? "signed" : "unsigned",
>> +                       BTF_INT_ENCODING(*info) & BTF_INT_CHAR ? "char" : "nonchar",
>> +                       BTF_INT_ENCODING(*info) & BTF_INT_BOOL ? "bool" : "nonbool",
>> +                       BTF_INT_ENCODING(*binfo) & BTF_INT_SIGNED ? "signed" : "unsigned",
>> +                       BTF_INT_ENCODING(*binfo) & BTF_INT_CHAR ? "char" : "nonchar",
>> +                       BTF_INT_ENCODING(*binfo) & BTF_INT_BOOL ? "bool" : "nonbool");
> 
> there is btf_int_encoding() helper, please use it
>
will do, and for others below too.

>> +               return -EINVAL;
>> +       }
>> +       if (BTF_INT_BITS(*info) != BTF_INT_BITS(*binfo)) {
> 
> btf_int_bits()
> 
>> +               pr_warn("INT types '%s' disagree on bit size; reference base BTF says %d; base BTF says %d\n",
>> +                       name, BTF_INT_BITS(*info), BTF_INT_BITS(*binfo));
>> +               return -EINVAL;
>> +       }
>> +       return 0;
>> +}
>> +
>> +static int btf_reconcile_float(struct btf_reconcile *r, const char *name,
>> +                              const struct btf_type *t, const struct btf_type *bt)
>> +{
>> +
>> +       if (t->size != bt->size) {
>> +               pr_warn("float types '%s' disagree on size; reference base BTF says %d; base BTF says %d\n",
>> +                       name, t->size, bt->size);
>> +               return -EINVAL;
>> +       }
>> +       return 0;
>> +}
>> +
>> +/* Ensure each enum value in type t has equivalent in base BTF and that values (if any) match. */
>> +static int btf_reconcile_enum(struct btf_reconcile *r, const char *name,
>> +                             const struct btf_type *t, const struct btf_type *bt)
>> +{
> 
> should we care about compatibility between ENUM and ENUM64, they can
> both represent the same values of the same size?
> 

so do you mean if one representation uses an enum, another an enum64?
Yep that might well be the case, I'll add that too.

>> +       struct btf_enum *v = (struct btf_enum *)(t + 1);
>> +       struct btf_enum *bv = (struct btf_enum *)(bt + 1);
>> +       bool found, match;
>> +       int i, j;
>> +
>> +       for (i = 0; i < btf_vlen(t); i++, v++) {
>> +               found = match = false;
>> +
>> +               if (!v->name_off)
>> +                       continue;
>> +               for (j = 0; j < btf_vlen(bt); j++, bv++) {
>> +                       if (!bv->name_off)
>> +                               continue;
>> +                       if (strcmp(btf__name_by_offset(r->base_ref_btf, v->name_off),
>> +                                  btf__name_by_offset(r->base_btf, bv->name_off)) != 0)
> 
> nit: please use local vars to shorten these long if conditions, it
> will be easier to read and follow
>

will do.


>> +                               continue;
>> +                       found = true;
>> +                       match = (v->val == bv->val);
>> +                       break;
>> +               }
> 
> [...]
> 
>> +       while ((err = btf_reconcile_find_next(r, t, &base_id, &bt)) != -ENOENT) {
>> +               bt = btf_type_by_id(r->base_btf, base_id);
>> +               switch (btf_kind(t)) {
>> +               case BTF_KIND_INT:
>> +                       err = btf_reconcile_int(r, name, t, bt);
>> +                       break;
>> +               case BTF_KIND_ENUM:
>> +                       err = btf_reconcile_enum(r, name, t, bt);
>> +                       break;
>> +               case BTF_KIND_FLOAT:
>> +                       err = btf_reconcile_float(r, name, t, bt);
>> +                       break;
>> +               case BTF_KIND_ENUM64:
>> +                       err = btf_reconcile_enum64(r, name, t, bt);
>> +                       break;
>> +               case BTF_KIND_FWD:
> 
> well, FWD can be for enum vs struct, gotta check that
> 
>> +                       err = 0;
>> +                       break;
>> +               default:
>> +                       return 0;
>> +               }
>> +               if (!err) {
>> +                       r->map[id] = base_id;
>> +                       return 0;
>> +               }
>> +       }
>> +       return err;
>> +}
> 
> [...]
> 
>> +                               if (err) {
>> +                                       pr_warn("could not find base BTF type for base reference type[%u]\n",
>> +                                               id);
>> +                                       return err;
>> +                               }
>> +                       } else {
>> +                               if (btf_reconcile_push(r, id) < 0 ||
>> +                                   btf_reconcile_push(r, t->type) < 0)
>> +                                       return -ENOSPC;
> 
> I'm missing something, please help me understand. I don't get why we
> need a recursive algorithm at all.
> 
> In my mind, we have this small "base ref" set of types referenced from
> module's BTF (split BTF part), right? So all we should need is to map
> every type from base ref set to vmlinux BTF.
> 
> What I don't yet fully get is why CONST/VOLATILE or PTR need to
> postpone reconciliation via a queue. By the time we get to types in
> split BTF all base ref types should be mapped, so all you need is to
> remap t->type to resolved vmlinux BTF, no?
> 

It's possible to have multiple layers of reference in the distilled base
BTF though; for example here's a case from a module's distilled base BTF:

[41] PTR '(anon)' type_id=42
[42] CONST '(anon)' type_id=0

To resolve type id 41 we need to resolve type id 42, and since type id 0
already has a mapping, at that point we can look for CONSTs that refer
to type id 0 and then once we've established that mapping we can find
PTRs that have a t->type that refers to the const. So as described below
we keep pushing type ids onto the stack until we find one with a t->type
mapping to base BTF; once we hit that we can start looking in base BTF
for types that have the mapped t->type value.

> I suspect the answer might have something to do with those anonymous
> structs/unions which you copy verbatim into base ref BTF?
>

They do add a few more types, but we can get base BTF references that
don't come from that source too. The above case PTR CONST void for
example wasn't referred to via any other distilled base types.

> But on the latter topic, I wonder if we at all need this? Why not keep
> all those anon struct/union/enum in module's part of BTF? If they are
> unnamed, I doubt they will ever be referenced from kfuncs or anything
> like that, so their BTF ID isn't that important.
> 

Changing the module BTF would make things a bit more complicated.
Currently we just update type id references and string offsets. The anon
structs we end up with tend to be very small; from the same distilled
base BTF used above here are the instances of struct/union:

[35] STRUCT '(anon)' size=4 vlen=1
        'counter' type_id=6 bits_offset=0
[62] STRUCT '(anon)' size=8 vlen=1
        'raw_lock' type_id=48 bits_offset=0
[113] UNION '(anon)' size=8 vlen=2
        'kernel' type_id=13 bits_offset=0
        'user' type_id=13 bits_offset=0
[114] STRUCT '(anon)' size=16 vlen=2
        '(anon)' type_id=113 bits_offset=0
        'is_kernel' type_id=11 bits_offset=64 bitfield_size=1
[119] STRUCT '(anon)' size=8 vlen=1
        'net' type_id=85 bits_offset=0

These only added one type that wasn't referenced elsewhere - typedef
arch_rwlock_t.

> If base BTF is all named types, it would simplify the reconciliation
> process significantly, I think.
> 
> But again, I only skimmed the overall algorithm, sorry for my
> laziness, but I figured it would be good to discuss the above first
> anyways.
>

I'll try and walk through an example of how the algorithm proceeds; that
might help make the approach concrete and we can see if it can be
simplified.

Consider split BTF that uses the base BTF type "int *", i.e. a PTR to an
"int". It could do so in an anonymous struct/union, but also as an array
member type or a FUNC_PROTO or VAR. For such a reference type, we first
encounter the outer PTR. If its t->type has a mapping to base BTF "int"
(it will), we look through the base BTF for reference types that match
the kind (here a PTR) _and_ have the required t->type (int). Once we
find the reference type in base BTF, we can add the mapping for PTR to
int from distilled->base BTF id. So the reference type with one layer of
reference is pretty straightforward.

In the case where we don't have a mapping for a t->type - let's say PTR
to PTR to int (int **) - we push the type id for PTR to PTR to int onto
the stack along with the type id for t->type (PTR to int) after. So we
will first pop PTR to int and go through the above-described type
resolution. Next we will pop the PTR to PTR to int and because we've
resolved PTR to int now, it's t->type will have a mapping and we can go
through the search process to find a PTR that refers to a PTR to int
in base BTF, and we then add that mapping too.

For an array we push the member and index types, a func proto the return
type and parameter types, etc.

So once multiple layers of reference are part of the picture I _think_
we need something like this approach.

Alan

>> +                       }
>> +                       break;
> 
> [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next 09/13] libbpf: split BTF reconciliation
  2024-04-05 10:06     ` Alan Maguire
@ 2024-04-05 19:58       ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2024-04-05 19:58 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, Apr 5, 2024 at 3:06 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 29/03/2024 22:01, Andrii Nakryiko wrote:
> > On Fri, Mar 22, 2024 at 3:26 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> Map base reference BTF type ids referenced in split BTF and their
> >> references to the base BTF passed in, and if the mapping succeeds,
> >> reparent the split BTF to the base BTF.
> >>
> >> Reconciliation rules are
> >>
> >> - base types must match exactly
> >> - enum[64] types should match all value name/value pairs, but the
> >>   to-be-reconciled enum[64] can also define additional name/value pairs
> >> - named fwds match to the correspondingly-named struct/union/enum/enum64
> >
> > yeah, but what about their size if they are embedded? Using FWD was a
> > nice trick, but it's not flexible enough for recording (optionally)
> > size... Probably emitting an empty (but named) struct/union/enum would
> > be a bit better (and actually make split BTF using base ref more valid
> > even without pre-processing).
> >
> >> - anon struct/unions must have field names/offsets specified in base
> >>   reference BTF matched by those in base BTF we are matching with
> >>
> >> Reconciliation can not recurse, since it will be used in-kernel also and
> >> we do not want to blow up the kernel stack when carrying out type
> >> compatibility checks.  Hence we use a stack for reference type
> >> reconciliation rather then recursive function calls.
> >>
> >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >> ---
> >>  tools/lib/bpf/Build             |   2 +-
> >>  tools/lib/bpf/btf.c             |  58 ++++
> >>  tools/lib/bpf/btf.h             |   8 +
> >>  tools/lib/bpf/btf_reconcile.c   | 590 ++++++++++++++++++++++++++++++++
> >
> > how wrong would it be to call this process "relocate" rather than "reconcile"?
> >
>
> seems fine to me.
>
> >>  tools/lib/bpf/libbpf.map        |   1 +
> >>  tools/lib/bpf/libbpf_internal.h |   2 +
> >>  6 files changed, 660 insertions(+), 1 deletion(-)
> >>  create mode 100644 tools/lib/bpf/btf_reconcile.c
> >>
> >
> > [...]
> >
> >> +/* Find next type after *id in base BTF that matches kind of type t passed in
> >> + * and name (if it is specified).  Match fwd kinds to appropriate kind also.
> >> + */
> >> +static int btf_reconcile_find_next(struct btf_reconcile *r, const struct btf_type *t,
> >> +                                  __u32 *id, const struct btf_type **tp)
> >
> > I haven't grokked the whole patch logic just yet, doing a first pass,
> > so I might be asking stupid questions, sorry.
> >
> > But it looks like we have these linear searches here to find matching
> > types, is that right? Wouldn't it be better to build an index first to
> > speed up search?
> >
>
> It would, but the aim here was to keep things simple with an eye to
> sharing the code with the kernel. A lot of the libbpf hash stuff would
> be handy but then we'd have to have something on the kernel side. Given
> that the size of the base BTF is so small, the linear searches aren't
> much of a cost.

I didn't mean hashmap, I was thinking sort+binary search, but ok, we
can do that later as an optimization.

[...]

> >> +
> >> +/* Ensure each enum value in type t has equivalent in base BTF and that values (if any) match. */
> >> +static int btf_reconcile_enum(struct btf_reconcile *r, const char *name,
> >> +                             const struct btf_type *t, const struct btf_type *bt)
> >> +{
> >
> > should we care about compatibility between ENUM and ENUM64, they can
> > both represent the same values of the same size?
> >
>
> so do you mean if one representation uses an enum, another an enum64?
> Yep that might well be the case, I'll add that too.

yep, they are two different kinds, but they overlap in what they can
represent, so we try to support them interchangeably, if possible

[...]

> >> +                               if (err) {
> >> +                                       pr_warn("could not find base BTF type for base reference type[%u]\n",
> >> +                                               id);
> >> +                                       return err;
> >> +                               }
> >> +                       } else {
> >> +                               if (btf_reconcile_push(r, id) < 0 ||
> >> +                                   btf_reconcile_push(r, t->type) < 0)
> >> +                                       return -ENOSPC;
> >
> > I'm missing something, please help me understand. I don't get why we
> > need a recursive algorithm at all.
> >
> > In my mind, we have this small "base ref" set of types referenced from
> > module's BTF (split BTF part), right? So all we should need is to map
> > every type from base ref set to vmlinux BTF.
> >
> > What I don't yet fully get is why CONST/VOLATILE or PTR need to
> > postpone reconciliation via a queue. By the time we get to types in
> > split BTF all base ref types should be mapped, so all you need is to
> > remap t->type to resolved vmlinux BTF, no?
> >
>
> It's possible to have multiple layers of reference in the distilled base
> BTF though; for example here's a case from a module's distilled base BTF:
>
> [41] PTR '(anon)' type_id=42
> [42] CONST '(anon)' type_id=0
>
> To resolve type id 41 we need to resolve type id 42, and since type id 0
> already has a mapping, at that point we can look for CONSTs that refer
> to type id 0 and then once we've established that mapping we can find
> PTRs that have a t->type that refers to the const. So as described below
> we keep pushing type ids onto the stack until we find one with a t->type
> mapping to base BTF; once we hit that we can start looking in base BTF
> for types that have the mapped t->type value.
>

see below

> > I suspect the answer might have something to do with those anonymous
> > structs/unions which you copy verbatim into base ref BTF?
> >
>
> They do add a few more types, but we can get base BTF references that
> don't come from that source too. The above case PTR CONST void for
> example wasn't referred to via any other distilled base types.

I think it's too problematic to allow unnamed types in base reference
BTF. If we keep the rule that only named
structs/unions/typedefs/int/float kinds can be in base reference, then
everything becomes much simpler and faster, without breaking any of
kfunc/PTR_TO_BTF_ID usage.

The biggest problem is probably TYPEDEF pointing to anonymous
struct/union/func proto, so we might want to still record the shape of
expected underlying type, maybe, but we still go off of just a name in
the first place. Maybe initially we should just say that any ambiguity
would be rejected and keep only TYPEDEF with name?

Let me know if this is unrealistic, though.

>
> > But on the latter topic, I wonder if we at all need this? Why not keep
> > all those anon struct/union/enum in module's part of BTF? If they are
> > unnamed, I doubt they will ever be referenced from kfuncs or anything
> > like that, so their BTF ID isn't that important.
> >
>
> Changing the module BTF would make things a bit more complicated.

you mean appending those anonymous types from base BTF to module BTF?
It shouldn't be too hard, we just do it recursively and keep track of
ID remapping (so we don't add the same type twice). Or is there
something more?


> Currently we just update type id references and string offsets. The anon
> structs we end up with tend to be very small; from the same distilled
> base BTF used above here are the instances of struct/union:
>
> [35] STRUCT '(anon)' size=4 vlen=1
>         'counter' type_id=6 bits_offset=0
> [62] STRUCT '(anon)' size=8 vlen=1
>         'raw_lock' type_id=48 bits_offset=0
> [113] UNION '(anon)' size=8 vlen=2
>         'kernel' type_id=13 bits_offset=0
>         'user' type_id=13 bits_offset=0
> [114] STRUCT '(anon)' size=16 vlen=2
>         '(anon)' type_id=113 bits_offset=0
>         'is_kernel' type_id=11 bits_offset=64 bitfield_size=1
> [119] STRUCT '(anon)' size=8 vlen=1
>         'net' type_id=85 bits_offset=0
>
> These only added one type that wasn't referenced elsewhere - typedef
> arch_rwlock_t.
>
> > If base BTF is all named types, it would simplify the reconciliation
> > process significantly, I think.
> >
> > But again, I only skimmed the overall algorithm, sorry for my
> > laziness, but I figured it would be good to discuss the above first
> > anyways.
> >
>
> I'll try and walk through an example of how the algorithm proceeds; that
> might help make the approach concrete and we can see if it can be
> simplified.
>
> Consider split BTF that uses the base BTF type "int *", i.e. a PTR to an
> "int". It could do so in an anonymous struct/union, but also as an array
> member type or a FUNC_PROTO or VAR. For such a reference type, we first
> encounter the outer PTR. If its t->type has a mapping to base BTF "int"
> (it will), we look through the base BTF for reference types that match
> the kind (here a PTR) _and_ have the required t->type (int). Once we
> find the reference type in base BTF, we can add the mapping for PTR to
> int from distilled->base BTF id. So the reference type with one layer of
> reference is pretty straightforward.
>
> In the case where we don't have a mapping for a t->type - let's say PTR
> to PTR to int (int **) - we push the type id for PTR to PTR to int onto
> the stack along with the type id for t->type (PTR to int) after. So we
> will first pop PTR to int and go through the above-described type
> resolution. Next we will pop the PTR to PTR to int and because we've
> resolved PTR to int now, it's t->type will have a mapping and we can go
> through the search process to find a PTR that refers to a PTR to int
> in base BTF, and we then add that mapping too.

sure, pretty standard dfs with memoization, makes sense

what makes me a bit uncomfortable is that if you have PTR -> STRUCT,
once you found match for STRUCT, you'll go do a linear search for any
PTR  to that STRUCT. Which starts to sound like BTF deduplication
(though admittedly matching STRUCT by name removes half of BTF dedup
complexity) and the way you are doing it right now with linear search
over vmlinux BTF is going to be slow (which is why I was proposing an
index).

Ok, anyways, it's probably going to work. If we can simplify further
with keeping just named types, it would be great. If not, so be it, I
suppose.

>
> For an array we push the member and index types, a func proto the return
> type and parameter types, etc.
>
> So once multiple layers of reference are part of the picture I _think_
> we need something like this approach.
>
> Alan
>
> >> +                       }
> >> +                       break;
> >
> > [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-04-05 19:58 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-22 10:24 [RFC bpf-next 00/13] bpf: support resilient split BTF Alan Maguire
2024-03-22 10:24 ` [RFC dwarves] btf_encoder: add base_ref BTF feature to generate split BTF with base refs Alan Maguire
2024-03-22 10:24 ` [RFC bpf-next 01/13] libbpf: add support to btf__add_fwd() for ENUM64 Alan Maguire
2024-03-29 21:59   ` Andrii Nakryiko
2024-03-22 10:24 ` [RFC bpf-next 02/13] libbpf: add btf__new_split_base_ref() creating split BTF with reference base BTF Alan Maguire
2024-03-29 22:00   ` Andrii Nakryiko
2024-04-04 15:21     ` Alan Maguire
2024-04-04 22:49       ` Andrii Nakryiko
2024-03-22 10:24 ` [RFC bpf-next 03/13] selftests/bpf: test split base reference BTF generation Alan Maguire
2024-03-22 10:24 ` [RFC bpf-next 04/13] libbpf: add btf__parse_opts() API for flexible BTF parsing Alan Maguire
2024-03-29 22:00   ` Andrii Nakryiko
2024-03-22 10:24 ` [RFC bpf-next 05/13] bpftool: support displaying raw split BTF using base reference BTF as base Alan Maguire
2024-03-22 10:24 ` [RFC bpf-next 06/13] kbuild,bpf: switch to using --btf_features for pahole v1.26 and later Alan Maguire
2024-03-29 22:01   ` Andrii Nakryiko
2024-03-22 10:24 ` [RFC bpf-next 07/13] resolve_btfids: use .BTF.base_ref BTF as base BTF if -r option is used Alan Maguire
2024-03-22 10:24 ` [RFC bpf-next 08/13] kbuild, bpf: add module-specific pahole/resolve_btfids flags for base reference BTF Alan Maguire
2024-03-23  2:50   ` Alexei Starovoitov
2024-03-25  9:51     ` Alan Maguire
2024-03-25 16:41       ` Alexei Starovoitov
2024-03-22 10:24 ` [RFC bpf-next 09/13] libbpf: split BTF reconciliation Alan Maguire
2024-03-29 22:01   ` Andrii Nakryiko
2024-04-05 10:06     ` Alan Maguire
2024-04-05 19:58       ` Andrii Nakryiko
2024-03-22 10:24 ` [RFC bpf-next 10/13] module, bpf: store BTF base reference pointer in struct module Alan Maguire
2024-03-22 10:24 ` [RFC bpf-next 11/13] libbpf,bpf: share BTF reconcile-related code with kernel Alan Maguire
2024-03-29 22:04   ` Andrii Nakryiko
2024-04-01 15:58     ` Andrii Nakryiko
2024-03-22 10:24 ` [RFC bpf-next 12/13] selftests/bpf: extend base reference tests cover BTF reconciliation Alan Maguire
2024-03-22 10:24 ` [RFC bpf-next 13/13] bpftool: support displaying reconciled-with-base split BTF Alan Maguire

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.