bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
@ 2021-10-27 20:37 Mauricio Vásquez
  2021-10-27 20:37 ` [PATCH bpf-next 1/2] libbpf: Implement btf__save_to_file() Mauricio Vásquez
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Mauricio Vásquez @ 2021-10-27 20:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Rafael David Tinoco

CO-RE requires to have BTF information describing the types of the
kernel in order to perform the relocations. This is usually provided by
the kernel itself when it's configured with CONFIG_DEBUG_INFO_BTF.
However, this configuration is not enabled in all the distributions and
it's not available on older kernels.

It's possible to use CO-RE in kernels without CONFIG_DEBUG_INFO_BTF
support by providing the BTF information from an external source.
BTFHub[0] contains BTF files to each released kernel not supporting BTF,
for the most popular distributions.

Providing this BTF file for a given kernel has some challenges:
1. Each BTF file is a few MBs big, then it's not possible to ship the
eBPF program with the all the BTF files needed to run in different
kernels. (The BTF files will be in the order of GBs if you want to
support a high number of kernels)
2. Downloading the BTF file for the current kernel at runtime delays the
start of the program and it's not always possible to reach an external
host to download such a file.

Providing the BTF file with the information about all the data types of
the kernel for running an eBPF program is an overkill in many of the
cases. Usually the eBPF programs access only some kernel fields.

This set of commits extend libbpf to provide an API to generate a BTF
file with only the types that are needed by an eBPF object. These
generated files are very small compared to the ones that contain all the
kernel types (for a program like execsnoop it's around 4kB). This allows
to ship an eBPF program together with the BTF information that it needs
to run for many different kernels.

This idea was discussed during the "Towards truly portable eBPF"[1]
presentation at Linux Plumbers 2021.

We prepared a BTFGen repository[2] with an example of how this API can
be used. Our plan is to include this support in bpftool once it's merged
in libbpf.

There is also a good example[3] on how to use BTFGen and BTFHub together
to generate multiple BTF files, to each existing/supported kernel,
tailored to one application. For example: a complex bpf object might
support nearly 400 kernels by having BTF files summing only 1.5 MB.

[0]: https://github.com/aquasecurity/btfhub/
[1]: https://www.youtube.com/watch?v=igJLKyP1lFk&t=2418s
[2]: https://github.com/kinvolk/btfgen
[3]: https://github.com/aquasecurity/btfhub/tree/main/tools

Mauricio Vásquez (2):
  libbpf: Implement btf__save_to_file()
  libbpf: Implement API for generating BTF for ebpf objects

 tools/lib/bpf/Makefile    |   2 +-
 tools/lib/bpf/btf.c       |  22 ++
 tools/lib/bpf/btf.h       |   2 +
 tools/lib/bpf/libbpf.c    |  28 ++-
 tools/lib/bpf/libbpf.h    |   4 +
 tools/lib/bpf/libbpf.map  |   6 +
 tools/lib/bpf/relo_core.c | 514 +++++++++++++++++++++++++++++++++++++-
 tools/lib/bpf/relo_core.h |  11 +-
 8 files changed, 579 insertions(+), 10 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next 1/2] libbpf: Implement btf__save_to_file()
  2021-10-27 20:37 [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API Mauricio Vásquez
@ 2021-10-27 20:37 ` Mauricio Vásquez
  2021-10-28 18:36   ` Andrii Nakryiko
  2021-10-27 20:37 ` [PATCH bpf-next 2/2] libbpf: Implement API for generating BTF for ebpf objects Mauricio Vásquez
  2021-10-29  2:33 ` [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API Alexei Starovoitov
  2 siblings, 1 reply; 19+ messages in thread
From: Mauricio Vásquez @ 2021-10-27 20:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Rafael David Tinoco, Rafael David Tinoco, Lorenzo Fontana

Implement helper function to save the contents of a BTF object to a
file.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com>
Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co>
---
 tools/lib/bpf/btf.c      | 22 ++++++++++++++++++++++
 tools/lib/bpf/btf.h      |  2 ++
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 25 insertions(+)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 0c628c33e23b..087035574dba 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -4773,3 +4773,25 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
 
 	return 0;
 }
+
+int btf__save_to_file(struct btf *btf, const char *path)
+{
+	const void *data;
+	__u32 data_sz;
+	FILE *file;
+
+	data = btf_get_raw_data(btf, &data_sz, btf->swapped_endian);
+	if (!data)
+		return -ENOMEM;
+
+	file = fopen(path, "wb");
+	if (!file)
+		return -errno;
+
+	if (fwrite(data, 1, data_sz, file) != data_sz) {
+		fclose(file);
+		return -EIO;
+	}
+
+	return fclose(file);
+}
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index bc005ba3ceec..300ad498c615 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -114,6 +114,8 @@ LIBBPF_API struct btf *btf__parse_elf_split(const char *path, struct btf *base_b
 LIBBPF_API struct btf *btf__parse_raw(const char *path);
 LIBBPF_API struct btf *btf__parse_raw_split(const char *path, struct btf *base_btf);
 
+LIBBPF_API int btf__save_to_file(struct btf *btf, const char *path);
+
 LIBBPF_API struct btf *btf__load_vmlinux_btf(void);
 LIBBPF_API struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_btf);
 LIBBPF_API struct btf *libbpf_find_kernel_btf(void);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 15239c05659c..0e9bed7c9b9e 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -399,4 +399,5 @@ LIBBPF_0.6.0 {
 		btf__add_decl_tag;
 		btf__raw_data;
 		btf__type_cnt;
+		btf__save_to_file;
 } LIBBPF_0.5.0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next 2/2] libbpf: Implement API for generating BTF for ebpf objects
  2021-10-27 20:37 [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API Mauricio Vásquez
  2021-10-27 20:37 ` [PATCH bpf-next 1/2] libbpf: Implement btf__save_to_file() Mauricio Vásquez
@ 2021-10-27 20:37 ` Mauricio Vásquez
  2021-10-28 18:45   ` Andrii Nakryiko
  2021-10-29  2:33 ` [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API Alexei Starovoitov
  2 siblings, 1 reply; 19+ messages in thread
From: Mauricio Vásquez @ 2021-10-27 20:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Rafael David Tinoco, Rafael David Tinoco, Lorenzo Fontana

This commit implements a new set of "bpf_reloc_info__" functions for
generating a BTF file with the types a given set of eBPF objects need
for their CO-RE relocations. This code reuses all the existing CO-RE
logic (candidate lookup, matching, etc). The workflow is the same as
when an eBPF program is being loaded, but instead of patching the eBPF
instruction, we save the type involved in the relocation.

A new struct btf_reloc_info is defined to save the BTF types needed by a
set of eBPF objects. It is created with the bpf_reloc_info__new()
function and populated with bpf_object__reloc_info_gen() for each eBPF
object, finally the BTF file is generated with
bpf_reloc_info__get_btf(). Please take at a look at BTFGen[0] to get a
complete example of how this API can be used.

bpf_object__reloc_info_gen() ends up calling btf_reloc_info_gen_field()
that uses the access spec to add all the types needed by a given
relocation. The root type is added and, if it is a complex type, like a
struct or union, the members involved in the relocation are added as
well. References are resolved and all referenced types are added. This
function can be called multiple times to add the types needed for
different objects into the same struct btf_reloc_info, this allows the
user to create a BTF file that contains the BTF information for multiple
eBPF objects.

The bpf_reloc_info__get_btf() generates the BTF file from a given struct
btf_reloc_info. This function first creates a new BTF object and copies
all the types saved in the struct btf_reloc_info there. For structures
and unions, only the members involved in a relocation are copied. While
adding the types to the new BTF object, a map is filled with the type
IDs on the old and new BTF structures.  This map is then used later on
to fix all the IDs in the new BTF object.

Right now only support for field based CO-RE relocations is supported.

[0]: https://github.com/kinvolk/btfgen

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com>
Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co>
---
 tools/lib/bpf/Makefile    |   2 +-
 tools/lib/bpf/libbpf.c    |  28 ++-
 tools/lib/bpf/libbpf.h    |   4 +
 tools/lib/bpf/libbpf.map  |   5 +
 tools/lib/bpf/relo_core.c | 514 +++++++++++++++++++++++++++++++++++++-
 tools/lib/bpf/relo_core.h |  11 +-
 6 files changed, 554 insertions(+), 10 deletions(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index b393b5e82380..b01a1ece2cff 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -237,7 +237,7 @@ install_lib: all_cmd
 
 SRC_HDRS := bpf.h libbpf.h btf.h libbpf_common.h libbpf_legacy.h xsk.h	     \
 	    bpf_helpers.h bpf_tracing.h bpf_endian.h bpf_core_read.h	     \
-	    skel_internal.h libbpf_version.h
+	    skel_internal.h libbpf_version.h relo_core.h
 GEN_HDRS := $(BPF_GENERATED)
 
 INSTALL_PFX := $(DESTDIR)$(prefix)/include/bpf
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2fbed2d4a645..51522b60edfa 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -545,6 +545,9 @@ struct bpf_object {
 	size_t btf_module_cnt;
 	size_t btf_module_cap;
 
+	/* Relocation info when using bpf_object__reloc_info_gen() */
+	struct btf_reloc_info *reloc_info;
+
 	void *priv;
 	bpf_object_clear_priv_t clear_priv;
 
@@ -5386,7 +5389,8 @@ static int bpf_core_apply_relo(struct bpf_program *prog,
 			       const struct bpf_core_relo *relo,
 			       int relo_idx,
 			       const struct btf *local_btf,
-			       struct hashmap *cand_cache)
+			       struct hashmap *cand_cache,
+			       struct btf_reloc_info *reloc_info)
 {
 	const void *type_key = u32_as_hash_key(relo->type_id);
 	struct bpf_core_cand_list *cands = NULL;
@@ -5440,7 +5444,8 @@ static int bpf_core_apply_relo(struct bpf_program *prog,
 		}
 	}
 
-	return bpf_core_apply_relo_insn(prog_name, insn, insn_idx, relo, relo_idx, local_btf, cands);
+	return bpf_core_apply_relo_insn(prog_name, insn, insn_idx, relo, relo_idx, local_btf, cands,
+					reloc_info);
 }
 
 static int
@@ -5465,6 +5470,8 @@ bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path)
 			pr_warn("failed to parse target BTF: %d\n", err);
 			return err;
 		}
+	} else if (obj->reloc_info && bpf_reloc_info_get_src_btf(obj->reloc_info)) {
+		obj->btf_vmlinux_override = bpf_reloc_info_get_src_btf(obj->reloc_info);
 	}
 
 	cand_cache = hashmap__new(bpf_core_hash_fn, bpf_core_equal_fn, NULL);
@@ -5516,7 +5523,8 @@ bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path)
 			if (!prog->load)
 				continue;
 
-			err = bpf_core_apply_relo(prog, rec, i, obj->btf, cand_cache);
+			err = bpf_core_apply_relo(prog, rec, i, obj->btf, cand_cache,
+						  obj->reloc_info);
 			if (err) {
 				pr_warn("prog '%s': relo #%d: failed to relocate: %d\n",
 					prog->name, i, err);
@@ -5526,9 +5534,11 @@ bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path)
 	}
 
 out:
-	/* obj->btf_vmlinux and module BTFs are freed after object load */
-	btf__free(obj->btf_vmlinux_override);
-	obj->btf_vmlinux_override = NULL;
+	if (!obj->reloc_info) {
+		/* obj->btf_vmlinux and module BTFs are freed after object load */
+		btf__free(obj->btf_vmlinux_override);
+		obj->btf_vmlinux_override = NULL;
+	}
 
 	if (!IS_ERR_OR_NULL(cand_cache)) {
 		hashmap__for_each_entry(cand_cache, entry, i) {
@@ -7227,6 +7237,12 @@ static int bpf_object__resolve_externs(struct bpf_object *obj,
 	return 0;
 }
 
+int bpf_object__reloc_info_gen(struct btf_reloc_info *info, struct bpf_object *obj)
+{
+	obj->reloc_info = info;
+	return bpf_object__relocate_core(obj, NULL);
+}
+
 int bpf_object__load_xattr(struct bpf_object_load_attr *attr)
 {
 	struct bpf_object *obj;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e1900819bfab..6fba2a4c9018 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -19,6 +19,7 @@
 
 #include "libbpf_common.h"
 #include "libbpf_legacy.h"
+#include "relo_core.h"
 
 #ifdef __cplusplus
 extern "C" {
@@ -1016,6 +1017,9 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker,
 LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
 LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
 
+LIBBPF_API int bpf_object__reloc_info_gen(struct btf_reloc_info *info,
+					  struct bpf_object *obj);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 0e9bed7c9b9e..067f8b20c894 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -400,4 +400,9 @@ LIBBPF_0.6.0 {
 		btf__raw_data;
 		btf__type_cnt;
 		btf__save_to_file;
+		bpf_reloc_info__new;
+		bpf_reloc_info__free;
+		bpf_object__reloc_info_gen;
+		bpf_reloc_info__get_btf;
+
 } LIBBPF_0.5.0;
diff --git a/tools/lib/bpf/relo_core.c b/tools/lib/bpf/relo_core.c
index b5b8956a1be8..c345826e92fb 100644
--- a/tools/lib/bpf/relo_core.c
+++ b/tools/lib/bpf/relo_core.c
@@ -10,6 +10,7 @@
 #include "libbpf.h"
 #include "bpf.h"
 #include "btf.h"
+#include "hashmap.h"
 #include "str_error.h"
 #include "libbpf_internal.h"
 
@@ -763,6 +764,8 @@ struct bpf_core_relo_res
 	__u32 orig_type_id;
 	__u32 new_sz;
 	__u32 new_type_id;
+
+	const struct bpf_core_spec *targ_spec;
 };
 
 /* Calculate original and target relocation values, given local and target
@@ -854,6 +857,8 @@ static int bpf_core_calc_relo(const char *prog_name,
 			relo->kind, relo->insn_off / 8);
 	}
 
+	res->targ_spec = targ_spec;
+
 	return err;
 }
 
@@ -1092,6 +1097,502 @@ static void bpf_core_dump_spec(int level, const struct bpf_core_spec *spec)
 	}
 }
 
+struct btf_reloc_member {
+	struct btf_member *member;
+	int idx;
+};
+
+struct btf_reloc_type {
+	struct btf_type *type;
+	unsigned int id;
+
+	struct hashmap *members;
+};
+
+struct btf_reloc_info {
+	struct hashmap *types;
+	struct hashmap *ids_map;
+
+	struct btf *src_btf;
+};
+
+static size_t bpf_reloc_info_hash_fn(const void *key, void *ctx)
+{
+	return (size_t)key;
+}
+
+static bool bpf_reloc_info_equal_fn(const void *k1, const void *k2, void *ctx)
+{
+	return k1 == k2;
+}
+
+static void *uint_as_hash_key(int x)
+{
+	return (void *)(uintptr_t)x;
+}
+
+struct btf_reloc_info *bpf_reloc_info__new(const char *targ_btf_path)
+{
+	struct btf_reloc_info *info;
+	struct btf *src_btf;
+	struct hashmap *ids_map;
+	struct hashmap *types;
+
+	info = calloc(1, sizeof(*info));
+	if (!info)
+		return ERR_PTR(-ENOMEM);
+
+	src_btf = btf__parse(targ_btf_path, NULL);
+	if (libbpf_get_error(src_btf)) {
+		bpf_reloc_info__free(info);
+		return (void *) src_btf;
+	}
+
+	info->src_btf = src_btf;
+
+	ids_map = hashmap__new(bpf_reloc_info_hash_fn, bpf_reloc_info_equal_fn, NULL);
+	if (IS_ERR(ids_map)) {
+		bpf_reloc_info__free(info);
+		return (void *) ids_map;
+	}
+
+	info->ids_map = ids_map;
+
+	types = hashmap__new(bpf_reloc_info_hash_fn, bpf_reloc_info_equal_fn, NULL);
+	if (IS_ERR(types)) {
+		bpf_reloc_info__free(info);
+		return (void *) types;
+	}
+
+	info->types = types;
+
+	return info;
+}
+
+static void bpf_reloc_type_free(struct btf_reloc_type *type)
+{
+	struct hashmap_entry *entry;
+	int i;
+
+	if (IS_ERR_OR_NULL(type))
+		return;
+
+	if (!IS_ERR_OR_NULL(type->members)) {
+		hashmap__for_each_entry(type->members, entry, i) {
+			free(entry->value);
+		}
+		hashmap__free(type->members);
+	}
+
+	free(type);
+}
+
+void bpf_reloc_info__free(struct btf_reloc_info *info)
+{
+	struct hashmap_entry *entry;
+	int i;
+
+	if (!info)
+		return;
+
+	btf__free(info->src_btf);
+
+	hashmap__free(info->ids_map);
+
+	if (!IS_ERR_OR_NULL(info->types)) {
+		hashmap__for_each_entry(info->types, entry, i) {
+			bpf_reloc_type_free(entry->value);
+		}
+		hashmap__free(info->types);
+	}
+
+	free(info);
+}
+
+/* Return id for type in new btf instance */
+static unsigned int btf_reloc_id_get(struct btf_reloc_info *info, unsigned int old)
+{
+	uintptr_t new = 0;
+
+	/* deal with BTF_KIND_VOID */
+	if (old == 0)
+		return 0;
+
+	if (!hashmap__find(info->ids_map, uint_as_hash_key(old), (void **)&new)) {
+		/* return id for void as it's possible that the ID we're looking for is
+		 * the type of a pointer that we're not adding.
+		 */
+		return 0;
+	}
+
+	return (unsigned int)(uintptr_t)new;
+}
+
+/* Add new id map to the list of mappings */
+static int btf_reloc_id_add(struct btf_reloc_info *info, unsigned int old, unsigned int new)
+{
+	return hashmap__add(info->ids_map, uint_as_hash_key(old), uint_as_hash_key(new));
+}
+
+/*
+ * Put type in the list. If the type already exists it's returned, otherwise a
+ * new one is created and added to the list. This is called recursively adding
+ * all the types that are needed for the current one.
+ */
+static struct btf_reloc_type *btf_reloc_put_type(struct btf *btf,
+						 struct btf_reloc_info *info,
+						 struct btf_type *btf_type,
+						 unsigned int id)
+{
+	struct btf_reloc_type *reloc_type, *tmp;
+	struct btf_array *array;
+	unsigned int child_id;
+	int err;
+
+	/* check if we already have this type */
+	if (hashmap__find(info->types, uint_as_hash_key(id), (void **)&reloc_type))
+		return reloc_type;
+
+	/* do nothing. void is implicit in BTF */
+	if (id == 0)
+		return NULL;
+
+	reloc_type = calloc(1, sizeof(*reloc_type));
+	if (!reloc_type)
+		return ERR_PTR(-ENOMEM);
+
+	reloc_type->type = btf_type;
+	reloc_type->id = id;
+
+	/* append this type to the relocation type's list before anything else */
+	err = hashmap__add(info->types, uint_as_hash_key(reloc_type->id), reloc_type);
+	if (err)
+		return ERR_PTR(err);
+
+	/* complex types might need further processing */
+	switch (btf_kind(reloc_type->type)) {
+	/* already processed */
+	case BTF_KIND_UNKN:
+	case BTF_KIND_INT:
+	case BTF_KIND_FLOAT:
+	/* processed by callee */
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION:
+	/* doesn't need resolution. If the data of the pointer is used
+	 * then it'll added by the caller in another relocation.
+	 */
+	case BTF_KIND_PTR:
+		break;
+	/* needs resolution */
+	case BTF_KIND_CONST:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_TYPEDEF:
+		child_id = btf_type->type;
+		btf_type = (struct btf_type *) btf__type_by_id(btf, child_id);
+		if (!btf_type)
+			return ERR_PTR(-EINVAL);
+
+		tmp = btf_reloc_put_type(btf, info, btf_type, child_id);
+		if (IS_ERR(tmp))
+			return tmp;
+		break;
+	/* needs resolution */
+	case BTF_KIND_ARRAY:
+		array = btf_array(reloc_type->type);
+
+		/* add type for array type */
+		btf_type = (struct btf_type *) btf__type_by_id(btf, array->type);
+		tmp = btf_reloc_put_type(btf, info, btf_type, array->type);
+		if (IS_ERR(tmp))
+			return tmp;
+
+		/* add type for array's index type */
+		btf_type = (struct btf_type *) btf__type_by_id(btf, array->index_type);
+		tmp = btf_reloc_put_type(btf, info, btf_type, array->index_type);
+		if (IS_ERR(tmp))
+			return tmp;
+
+		break;
+	/* tells if some other type needs to be handled */
+	default:
+		pr_warn("unsupported relocation: %s\n", btf_kind_str(reloc_type->type));
+		return ERR_PTR(-EINVAL);
+	}
+
+	return reloc_type;
+}
+
+/* Return pointer to btf_reloc_type by id */
+static struct btf_reloc_type *btf_reloc_get_type(struct btf_reloc_info *info, int id)
+{
+	struct btf_reloc_type *type = NULL;
+
+	if (!hashmap__find(info->types, uint_as_hash_key(id), (void **)&type))
+		return ERR_PTR(-ENOENT);
+
+	return type;
+}
+
+static int bpf_reloc_type_add_member(struct btf_reloc_info *info,
+				     struct btf_reloc_type *reloc_type,
+				     struct btf_member *btf_member, int idx)
+{
+	int err;
+	struct btf_reloc_member *reloc_member;
+
+	/* create new members hashmap for this relocation type if needed */
+	if (reloc_type->members == NULL) {
+		struct hashmap *tmp = hashmap__new(bpf_reloc_info_hash_fn,
+						   bpf_reloc_info_equal_fn,
+						   NULL);
+		if (IS_ERR(tmp))
+			return PTR_ERR(tmp);
+
+		reloc_type->members = tmp;
+	}
+	/* add given btf_member as a member of the parent relocation_type's type */
+	reloc_member = calloc(1, sizeof(*reloc_member));
+	if (!reloc_member)
+		return -ENOMEM;
+	reloc_member->member = btf_member;
+	reloc_member->idx = idx;
+	/* add given btf_member as member to given relocation type */
+	err = hashmap__add(reloc_type->members, uint_as_hash_key(reloc_member->idx), reloc_member);
+	if (err) {
+		free(reloc_member);
+		if (err != -EEXIST)
+			return err;
+	}
+
+	return 0;
+}
+
+struct btf *bpf_reloc_info__get_btf(struct btf_reloc_info *info)
+{
+	struct btf_dedup_opts dedup_opts = {};
+	struct hashmap_entry *entry;
+	struct btf *btf_new;
+	int err, i;
+
+	btf_new = btf__new_empty();
+	if (IS_ERR(btf_new)) {
+		pr_warn("failed to allocate btf structure\n");
+		return btf_new;
+	}
+
+	/* first pass: add all types and add their new ids to the ids map */
+	hashmap__for_each_entry(info->types, entry, i) {
+		struct btf_reloc_type *reloc_type = entry->value;
+		struct btf_type *btf_type = reloc_type->type;
+		int new_id;
+
+		/* add members for struct and union */
+		if (btf_is_struct(btf_type) || btf_is_union(btf_type)) {
+			struct hashmap_entry *member_entry;
+			struct btf_type *btf_type_cpy;
+			int nmembers, bkt, index;
+			size_t new_size;
+
+			nmembers = reloc_type->members ? hashmap__size(reloc_type->members) : 0;
+			new_size = sizeof(struct btf_type) + nmembers * sizeof(struct btf_member);
+
+			btf_type_cpy = malloc(new_size);
+			if (!btf_type_cpy) {
+				err = -ENOMEM;
+				goto out;
+			}
+
+			/* copy header */
+			memcpy(btf_type_cpy, btf_type, sizeof(*btf_type_cpy));
+
+			/* copy only members that are needed */
+			index = 0;
+			if (nmembers > 0) {
+				hashmap__for_each_entry(reloc_type->members, member_entry, bkt) {
+					struct btf_reloc_member *reloc_member;
+					struct btf_member *btf_member;
+
+					reloc_member = member_entry->value;
+					btf_member = btf_members(btf_type) + reloc_member->idx;
+
+					memcpy(btf_members(btf_type_cpy) + index, btf_member,
+					       sizeof(struct btf_member));
+
+					index++;
+				}
+			}
+
+			/* set new vlen */
+			btf_type_cpy->info = btf_type_info(btf_kind(btf_type_cpy), nmembers,
+							   btf_kflag(btf_type_cpy));
+
+			err = btf__add_type(btf_new, info->src_btf, btf_type_cpy);
+			free(btf_type_cpy);
+		} else {
+			err = btf__add_type(btf_new, info->src_btf, btf_type);
+		}
+
+		if (err < 0)
+			goto out;
+
+		new_id = err;
+
+		/* add ID mapping */
+		err = btf_reloc_id_add(info, reloc_type->id, new_id);
+		if (err)
+			goto out;
+	}
+
+	/* second pass: fix up type ids */
+	for (i = 0; i <= btf__get_nr_types(btf_new); i++) {
+		struct btf_member *btf_member;
+		struct btf_type *btf_type;
+		struct btf_param *params;
+		struct btf_array *array;
+
+		btf_type = (struct btf_type *) btf__type_by_id(btf_new, i);
+
+		switch (btf_kind(btf_type)) {
+		case BTF_KIND_STRUCT:
+		case BTF_KIND_UNION:
+			for (int idx = 0; idx < btf_vlen(btf_type); idx++) {
+				btf_member = btf_members(btf_type) + idx;
+				btf_member->type = btf_reloc_id_get(info, btf_member->type);
+			}
+			break;
+		case BTF_KIND_PTR:
+		case BTF_KIND_TYPEDEF:
+		case BTF_KIND_VOLATILE:
+		case BTF_KIND_CONST:
+		case BTF_KIND_RESTRICT:
+		case BTF_KIND_FUNC:
+		case BTF_KIND_VAR:
+			btf_type->type = btf_reloc_id_get(info, btf_type->type);
+			break;
+		case BTF_KIND_ARRAY:
+			array = btf_array(btf_type);
+			array->index_type = btf_reloc_id_get(info, array->index_type);
+			array->type = btf_reloc_id_get(info, array->type);
+			break;
+		case BTF_KIND_FUNC_PROTO:
+			btf_type->type = btf_reloc_id_get(info, btf_type->type);
+			params = btf_params(btf_type);
+			for (i = 0; i < btf_vlen(btf_type); i++)
+				params[i].type = btf_reloc_id_get(info, params[i].type);
+			break;
+		default:
+			break;
+		}
+	}
+
+	/* deduplicate generated BTF */
+	err = btf__dedup(btf_new, NULL, &dedup_opts);
+	if (err) {
+		pr_warn("error calling btf__dedup()\n");
+		goto out;
+	}
+
+	return btf_new;
+
+out:
+	btf__free(btf_new);
+	return ERR_PTR(err);
+}
+
+struct btf *bpf_reloc_info_get_src_btf(struct btf_reloc_info *info)
+{
+	return info->src_btf;
+}
+
+static int btf_reloc_info_gen_field(struct btf_reloc_info *info, struct bpf_core_spec *targ_spec)
+{
+	struct btf *btf = (struct btf *) targ_spec->btf;
+	struct btf_reloc_type *reloc_type;
+	struct btf_member *btf_member;
+	struct btf_type *btf_type;
+	struct btf_array *array;
+	unsigned int id;
+	int idx, err;
+
+	btf_type = btf_type_by_id(btf, targ_spec->root_type_id);
+
+	/* create reloc type for root type */
+	reloc_type = btf_reloc_put_type(btf, info, btf_type, targ_spec->root_type_id);
+	if (IS_ERR(reloc_type))
+		return PTR_ERR(reloc_type);
+
+	/* add types for complex types (arrays, unions, structures) */
+	for (int i = 1; i < targ_spec->raw_len; i++) {
+
+		/* skip typedefs and mods. */
+		while (btf_is_mod(btf_type) || btf_is_typedef(btf_type)) {
+			id = btf_type->type;
+			reloc_type = btf_reloc_get_type(info, id);
+			if (IS_ERR(reloc_type))
+				return PTR_ERR(reloc_type);
+			btf_type = (struct btf_type *) btf__type_by_id(btf, id);
+		}
+
+		switch (btf_kind(btf_type)) {
+		case BTF_KIND_STRUCT:
+		case BTF_KIND_UNION:
+			idx = targ_spec->raw_spec[i];
+			btf_member = btf_members(btf_type) + idx;
+			btf_type = btf_type_by_id(btf, btf_member->type);
+
+			/* add member to relocation type */
+			err = bpf_reloc_type_add_member(info, reloc_type, btf_member, idx);
+			if (err)
+				return err;
+			/* add relocation type */
+			reloc_type = btf_reloc_put_type(btf, info, btf_type, btf_member->type);
+			if (IS_ERR(reloc_type))
+				return PTR_ERR(reloc_type);
+			break;
+		case BTF_KIND_ARRAY:
+			array = btf_array(btf_type);
+			reloc_type = btf_reloc_get_type(info, array->type);
+			if (IS_ERR(reloc_type))
+				return PTR_ERR(reloc_type);
+			btf_type = (struct btf_type *) btf__type_by_id(btf, array->type);
+			break;
+		default:
+			pr_warn("spec type wasn't handled: %s\n", btf_kind_str(btf_type));
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
+static int btf_reloc_info_gen_type(struct btf_reloc_info *info, struct bpf_core_spec *targ_spec)
+{
+	pr_warn("untreated type based relocation\n");
+	return -EOPNOTSUPP;
+}
+
+static int btf_reloc_info_gen_enumval(struct btf_reloc_info *info, struct bpf_core_spec *targ_spec)
+{
+	pr_warn("untreated enumval based relocation\n");
+	return -EOPNOTSUPP;
+}
+
+static int btf_reloc_info_gen(struct btf_reloc_info *info, struct bpf_core_relo_res *res)
+{
+	struct bpf_core_spec *spec = (struct bpf_core_spec *) res->targ_spec;
+
+	if (core_relo_is_type_based(spec->relo_kind))
+		return btf_reloc_info_gen_type(info, spec);
+
+	if (core_relo_is_enumval_based(spec->relo_kind))
+		return btf_reloc_info_gen_enumval(info, spec);
+
+	if (core_relo_is_field_based(spec->relo_kind))
+		return btf_reloc_info_gen_field(info, spec);
+
+	return -EINVAL;
+}
+
 /*
  * CO-RE relocate single instruction.
  *
@@ -1147,10 +1648,11 @@ int bpf_core_apply_relo_insn(const char *prog_name, struct bpf_insn *insn,
 			     const struct bpf_core_relo *relo,
 			     int relo_idx,
 			     const struct btf *local_btf,
-			     struct bpf_core_cand_list *cands)
+			     struct bpf_core_cand_list *cands,
+			     struct btf_reloc_info *reloc_info)
 {
 	struct bpf_core_spec local_spec, cand_spec, targ_spec = {};
-	struct bpf_core_relo_res cand_res, targ_res;
+	struct bpf_core_relo_res cand_res, targ_res = { .targ_spec = NULL };
 	const struct btf_type *local_type;
 	const char *local_name;
 	__u32 local_id;
@@ -1283,6 +1785,14 @@ int bpf_core_apply_relo_insn(const char *prog_name, struct bpf_insn *insn,
 	}
 
 patch_insn:
+	if (reloc_info && targ_res.targ_spec) {
+		err = btf_reloc_info_gen(reloc_info, &targ_res);
+		if (err)
+			pr_warn("error to generate BTF info\n");
+
+		return err;
+	}
+
 	/* bpf_core_patch_insn() should know how to handle missing targ_spec */
 	err = bpf_core_patch_insn(prog_name, insn, insn_idx, relo, relo_idx, &targ_res);
 	if (err) {
diff --git a/tools/lib/bpf/relo_core.h b/tools/lib/bpf/relo_core.h
index 3b9f8f18346c..ad29150f20f8 100644
--- a/tools/lib/bpf/relo_core.h
+++ b/tools/lib/bpf/relo_core.h
@@ -88,11 +88,20 @@ struct bpf_core_cand_list {
 	int len;
 };
 
+struct btf_reloc_info;
+
+LIBBPF_API struct btf_reloc_info *bpf_reloc_info__new(const char *targ_btf_path);
+LIBBPF_API void bpf_reloc_info__free(struct btf_reloc_info *info);
+LIBBPF_API struct btf *bpf_reloc_info__get_btf(struct btf_reloc_info *info);
+
+struct btf *bpf_reloc_info_get_src_btf(struct btf_reloc_info *info);
+
 int bpf_core_apply_relo_insn(const char *prog_name,
 			     struct bpf_insn *insn, int insn_idx,
 			     const struct bpf_core_relo *relo, int relo_idx,
 			     const struct btf *local_btf,
-			     struct bpf_core_cand_list *cands);
+			     struct bpf_core_cand_list *cands,
+			     struct btf_reloc_info *reloc_info);
 int bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id,
 			      const struct btf *targ_btf, __u32 targ_id);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 1/2] libbpf: Implement btf__save_to_file()
  2021-10-27 20:37 ` [PATCH bpf-next 1/2] libbpf: Implement btf__save_to_file() Mauricio Vásquez
@ 2021-10-28 18:36   ` Andrii Nakryiko
  0 siblings, 0 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2021-10-28 18:36 UTC (permalink / raw)
  To: Mauricio Vásquez
  Cc: Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Rafael David Tinoco, Rafael David Tinoco,
	Lorenzo Fontana

On Wed, Oct 27, 2021 at 1:37 PM Mauricio Vásquez <mauricio@kinvolk.io> wrote:
>
> Implement helper function to save the contents of a BTF object to a
> file.
>
> Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com>
> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co>
> ---
>  tools/lib/bpf/btf.c      | 22 ++++++++++++++++++++++
>  tools/lib/bpf/btf.h      |  2 ++
>  tools/lib/bpf/libbpf.map |  1 +
>  3 files changed, 25 insertions(+)
>
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 0c628c33e23b..087035574dba 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -4773,3 +4773,25 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
>
>         return 0;
>  }
> +
> +int btf__save_to_file(struct btf *btf, const char *path)

given we have its loading counterpart as btf__parse_raw(), let's call
this one btf__save_raw()?

> +{
> +       const void *data;
> +       __u32 data_sz;
> +       FILE *file;
> +
> +       data = btf_get_raw_data(btf, &data_sz, btf->swapped_endian);

use btf__raw_data() instead? no need to think about btf->swapped_endian here

> +       if (!data)
> +               return -ENOMEM;

please use libbpf_err() helper for returning errors, see other use cases

> +
> +       file = fopen(path, "wb");
> +       if (!file)
> +               return -errno;
> +
> +       if (fwrite(data, 1, data_sz, file) != data_sz) {
> +               fclose(file);
> +               return -EIO;

why not propagating original errno? make sure you save it before
fclose(), though


> +       }
> +
> +       return fclose(file);

hm... I'd just do fclose(file) separately and return 0 (because
success). If file closing fails, there isn't anything that can be done
(but it also shouldn't fail in any normal situation).

> +}
> diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> index bc005ba3ceec..300ad498c615 100644
> --- a/tools/lib/bpf/btf.h
> +++ b/tools/lib/bpf/btf.h
> @@ -114,6 +114,8 @@ LIBBPF_API struct btf *btf__parse_elf_split(const char *path, struct btf *base_b
>  LIBBPF_API struct btf *btf__parse_raw(const char *path);
>  LIBBPF_API struct btf *btf__parse_raw_split(const char *path, struct btf *base_btf);
>
> +LIBBPF_API int btf__save_to_file(struct btf *btf, const char *path);

const struct btf? btf__raw_data() (even though it internally modifies
btf) accepts `const struct btf*`, because this is conceptually
read-only operation

> +
>  LIBBPF_API struct btf *btf__load_vmlinux_btf(void);
>  LIBBPF_API struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_btf);
>  LIBBPF_API struct btf *libbpf_find_kernel_btf(void);
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 15239c05659c..0e9bed7c9b9e 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -399,4 +399,5 @@ LIBBPF_0.6.0 {
>                 btf__add_decl_tag;
>                 btf__raw_data;
>                 btf__type_cnt;
> +               btf__save_to_file;
>  } LIBBPF_0.5.0;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf: Implement API for generating BTF for ebpf objects
  2021-10-27 20:37 ` [PATCH bpf-next 2/2] libbpf: Implement API for generating BTF for ebpf objects Mauricio Vásquez
@ 2021-10-28 18:45   ` Andrii Nakryiko
  2021-10-28 22:42     ` Mauricio Vásquez Bernal
  0 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2021-10-28 18:45 UTC (permalink / raw)
  To: Mauricio Vásquez
  Cc: Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Rafael David Tinoco, Rafael David Tinoco,
	Lorenzo Fontana

On Wed, Oct 27, 2021 at 1:37 PM Mauricio Vásquez <mauricio@kinvolk.io> wrote:
>
> This commit implements a new set of "bpf_reloc_info__" functions for
> generating a BTF file with the types a given set of eBPF objects need
> for their CO-RE relocations. This code reuses all the existing CO-RE
> logic (candidate lookup, matching, etc). The workflow is the same as
> when an eBPF program is being loaded, but instead of patching the eBPF
> instruction, we save the type involved in the relocation.
>
> A new struct btf_reloc_info is defined to save the BTF types needed by a
> set of eBPF objects. It is created with the bpf_reloc_info__new()
> function and populated with bpf_object__reloc_info_gen() for each eBPF
> object, finally the BTF file is generated with
> bpf_reloc_info__get_btf(). Please take at a look at BTFGen[0] to get a
> complete example of how this API can be used.
>
> bpf_object__reloc_info_gen() ends up calling btf_reloc_info_gen_field()
> that uses the access spec to add all the types needed by a given
> relocation. The root type is added and, if it is a complex type, like a
> struct or union, the members involved in the relocation are added as
> well. References are resolved and all referenced types are added. This
> function can be called multiple times to add the types needed for
> different objects into the same struct btf_reloc_info, this allows the
> user to create a BTF file that contains the BTF information for multiple
> eBPF objects.
>
> The bpf_reloc_info__get_btf() generates the BTF file from a given struct
> btf_reloc_info. This function first creates a new BTF object and copies
> all the types saved in the struct btf_reloc_info there. For structures
> and unions, only the members involved in a relocation are copied. While
> adding the types to the new BTF object, a map is filled with the type
> IDs on the old and new BTF structures.  This map is then used later on
> to fix all the IDs in the new BTF object.
>
> Right now only support for field based CO-RE relocations is supported.
>
> [0]: https://github.com/kinvolk/btfgen
>
> Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com>
> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co>
> ---

I don't think it's necessary for libbpf to expose all these new APIs.
The format of CO-RE relocations and .BTF.ext is open and fixed. You
don't really need to simulate full CO-RE relocation logic to figure
out which types are necessary. Just go over all .BTF.ext records for
CO-RE relocations, parse spec (simple format as well) and see which
fields are accessed. The only extra bit is ignoring ___<suffix>,
that's it. bpftool (or whatever other tool that's going to be used for
this) can do that on its own by using existing libbpf APIs to work
with BTF. Good bit of optimization would be to only emit forward
declarations for structs that are never used, but are referenced by
structs that are used.

Either way, this is not libbpf's problem to solve. It's a tooling problem.


>  tools/lib/bpf/Makefile    |   2 +-
>  tools/lib/bpf/libbpf.c    |  28 ++-
>  tools/lib/bpf/libbpf.h    |   4 +
>  tools/lib/bpf/libbpf.map  |   5 +
>  tools/lib/bpf/relo_core.c | 514 +++++++++++++++++++++++++++++++++++++-
>  tools/lib/bpf/relo_core.h |  11 +-
>  6 files changed, 554 insertions(+), 10 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf: Implement API for generating BTF for ebpf objects
  2021-10-28 18:45   ` Andrii Nakryiko
@ 2021-10-28 22:42     ` Mauricio Vásquez Bernal
  2021-10-28 22:48       ` Andrii Nakryiko
  0 siblings, 1 reply; 19+ messages in thread
From: Mauricio Vásquez Bernal @ 2021-10-28 22:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Rafael David Tinoco, Rafael David Tinoco,
	Lorenzo Fontana

> I don't think it's necessary for libbpf to expose all these new APIs.
> The format of CO-RE relocations and .BTF.ext is open and fixed. You
> don't really need to simulate full CO-RE relocation logic to figure
> out which types are necessary. Just go over all .BTF.ext records for
> CO-RE relocations, parse spec (simple format as well) and see which
> fields are accessed.

How do you suggest to match the types for the target BTF without
simulating the CO-RE relocation? Are you suggesting to match them only
by name? We want to generate the minimal BTF that is needed by a given
object. Consider that we could generate these files for thousands of
kernels, size is very important for us. For this reason we chose to
simulate the relocation generating only the types (and members) that
are really needed.

> Either way, this is not libbpf's problem to solve. It's a tooling problem.

I agree. When I started working on this I tried to implement it
without using the libbpf relocation logic, but very soon I realized I
was  reimplementing the same logic. Another possibility we have
considered is to expose this relocation logic in the libbpf API,
however I fear it's too complicated and invasive too...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf: Implement API for generating BTF for ebpf objects
  2021-10-28 22:42     ` Mauricio Vásquez Bernal
@ 2021-10-28 22:48       ` Andrii Nakryiko
  0 siblings, 0 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2021-10-28 22:48 UTC (permalink / raw)
  To: Mauricio Vásquez Bernal
  Cc: Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Rafael David Tinoco, Rafael David Tinoco,
	Lorenzo Fontana

On Thu, Oct 28, 2021 at 3:42 PM Mauricio Vásquez Bernal
<mauricio@kinvolk.io> wrote:
>
> > I don't think it's necessary for libbpf to expose all these new APIs.
> > The format of CO-RE relocations and .BTF.ext is open and fixed. You
> > don't really need to simulate full CO-RE relocation logic to figure
> > out which types are necessary. Just go over all .BTF.ext records for
> > CO-RE relocations, parse spec (simple format as well) and see which
> > fields are accessed.
>
> How do you suggest to match the types for the target BTF without
> simulating the CO-RE relocation? Are you suggesting to match them only
> by name? We want to generate the minimal BTF that is needed by a given
> object. Consider that we could generate these files for thousands of
> kernels, size is very important for us. For this reason we chose to
> simulate the relocation generating only the types (and members) that
> are really needed.
>

How many unnecessary structs are matching if you match by name only?

Keep in mind, if your kernel BTF has task_struct and task_struct___2,
then CO-RE relocation will keep matching both; and that's not an error
for libbpf if all the field offsets will be consistent.

In short, I think simple name matching for trimming down BTF is
completely adequate. CO-RE relocation has to be more strict about
matching, but the subset of types that are used will be the same or
almost the same.


> > Either way, this is not libbpf's problem to solve. It's a tooling problem.
>
> I agree. When I started working on this I tried to implement it
> without using the libbpf relocation logic, but very soon I realized I
> was  reimplementing the same logic. Another possibility we have
> considered is to expose this relocation logic in the libbpf API,
> however I fear it's too complicated and invasive too...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-10-27 20:37 [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API Mauricio Vásquez
  2021-10-27 20:37 ` [PATCH bpf-next 1/2] libbpf: Implement btf__save_to_file() Mauricio Vásquez
  2021-10-27 20:37 ` [PATCH bpf-next 2/2] libbpf: Implement API for generating BTF for ebpf objects Mauricio Vásquez
@ 2021-10-29  2:33 ` Alexei Starovoitov
  2021-10-29  5:41   ` Rafael David Tinoco
  2021-10-29 16:12   ` Mauricio Vásquez Bernal
  2 siblings, 2 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2021-10-29  2:33 UTC (permalink / raw)
  To: Mauricio Vásquez
  Cc: Network Development, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Rafael David Tinoco

On Wed, Oct 27, 2021 at 1:37 PM Mauricio Vásquez <mauricio@kinvolk.io> wrote:
> There is also a good example[3] on how to use BTFGen and BTFHub together
> to generate multiple BTF files, to each existing/supported kernel,
> tailored to one application. For example: a complex bpf object might
> support nearly 400 kernels by having BTF files summing only 1.5 MB.

Could you share more details on what kind of fields and types
were used to achieve this compression?
Tracing progs will be peeking into task_struct.
To describe it in the reduced BTF most of the kernel types would be needed,
so I'm a bit skeptical on the practicality of the algorithm.
I think it may work for sk_buff, since it will pull struct sock,
net_device, rb_tree
and not a ton more.
Have you considered generating kernel BTF with fields that are accessed
by bpf prog only and replacing all other fields with padding ?
I think the algo would be quite different from the actual CO-RE logic
you're trying to reuse.
If CO-RE matching style is necessary and it's the best approach then please
add new logic to bpftool. I'm not sure such api would be
useful beyond this particular case to expose as stable libbpf api.
Also note that relo_core.c soon will be dual compiled for kernel and
libbpf needs.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-10-29  2:33 ` [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API Alexei Starovoitov
@ 2021-10-29  5:41   ` Rafael David Tinoco
  2021-10-29  5:51     ` Rafael David Tinoco
  2021-10-29 16:12   ` Mauricio Vásquez Bernal
  1 sibling, 1 reply; 19+ messages in thread
From: Rafael David Tinoco @ 2021-10-29  5:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mauricio Vásquez, Network Development, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko

On Thu, Oct 28, 2021 at 11:34 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Oct 27, 2021 at 1:37 PM Mauricio Vásquez <mauricio@kinvolk.io> wrote:
> > There is also a good example[3] on how to use BTFGen and BTFHub together
> > to generate multiple BTF files, to each existing/supported kernel,
> > tailored to one application. For example: a complex bpf object might
> > support nearly 400 kernels by having BTF files summing only 1.5 MB.
>
> Could you share more details on what kind of fields and types
> were used to achieve this compression?
> Tracing progs will be peeking into task_struct.
> To describe it in the reduced BTF most of the kernel types would be needed,
> so I'm a bit skeptical on the practicality of the algorithm.

https://github.com/aquasecurity/btfhub/tree/main/tools

has a complete README and, at the end, the example used:

https://github.com/aquasecurity/btfhub/tree/main/tools#time-to-test-btfgen-and-btfhub

We tested btfgen with bpfcc tools and tracee:

https://github.com/aquasecurity/tracee/blob/main/tracee-ebpf/tracee/tracee.bpf.c

and the generated BTF files worked. If you run something like:

./btfgen.sh [.../aquasec-tracee/tracee-ebpf/dist/tracee.bpf.core.o]

it will generate the BTFs tailored to a given eBPF object file, 1 smaller BTF
file per existing full external raw BTF file (1 per kernel version, basically).

All the ~500 kernels generated the same amount of BTF files with ~3MB in
total. We then remove all the BTF files that are equal to their previous
kernels:

https://github.com/aquasecurity/btfhub/blob/main/tools/btfgen.sh#L113

and we are able to reduce from 3MB to 1.5MB (as similar BTF files are symlinks
to the previous ones).

> I think it may work for sk_buff, since it will pull struct sock,
> net_device, rb_tree, and not a ton more.
> Have you considered generating kernel BTF with fields that are accessed
> by bpf prog only and replacing all other fields with padding ?

That is exactly the result of our final BTF file. We only include the
fields and types being used by the given eBPF object:

```
$ bpftool btf dump file ./generated/5.4.0-87-generic.btf format raw
[1] PTR '(anon)' type_id=99
[2] TYPEDEF 'u32' type_id=35
[3] TYPEDEF '__be16' type_id=22
[4] PTR '(anon)' type_id=52
[5] TYPEDEF '__u8' type_id=83
[6] PTR '(anon)' type_id=29
[7] STRUCT 'mnt_namespace' size=120 vlen=1
    'ns' type_id=72 bits_offset=64
[8] TYPEDEF '__kernel_gid32_t' type_id=75
[9] STRUCT 'iovec' size=16 vlen=2
    'iov_base' type_id=16 bits_offset=0
    'iov_len' type_id=85 bits_offset=64
[10] PTR '(anon)' type_id=58
[11] STRUCT '(anon)' size=8 vlen=2
    'skc_daddr' type_id=81 bits_offset=0
    'skc_rcv_saddr' type_id=81 bits_offset=32
[12] TYPEDEF '__u64' type_id=89
...
[120] STRUCT 'task_struct' size=9216 vlen=13
    'thread_info' type_id=105 bits_offset=0
    'real_parent' type_id=30 bits_offset=18048
    'real_cred' type_id=16 bits_offset=21248
    'pid' type_id=14 bits_offset=17920
    'mm' type_id=110 bits_offset=16512
    'thread_pid' type_id=56 bits_offset=18752
    'exit_code' type_id=123 bits_offset=17152
    'group_leader' type_id=30 bits_offset=18432
    'flags' type_id=75 bits_offset=288
    'thread_group' type_id=87 bits_offset=19328
    'tgid' type_id=14 bits_offset=17952
    'nsproxy' type_id=100 bits_offset=22080
    'comm' type_id=96 bits_offset=21440
[121] STRUCT 'pid' size=96 vlen=1
    'numbers' type_id=77 bits_offset=640
[122] STRUCT 'new_utsname' size=390 vlen=1
    'nodename' type_id=48 bits_offset=520
[123] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[124] ARRAY '(anon)' type_id=35 index_type_id=123 nr_elems=2
```

If you do a "format c" to the generated BTF file, then bpftool considers
everything as padding:

```
typedef unsigned char __u8;

struct ns_common {
        long: 64;
        long: 64;
        unsigned int inum;
        int: 32;
};

struct mnt_namespace {
        long: 64;
        struct ns_common ns;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
};

typedef unsigned int __kernel_gid32_t;
```

But libbpf is still able to calculate all field relocations.

> I think the algo would be quite different from the actual CO-RE logic
> you're trying to reuse.
> If CO-RE matching style is necessary and it's the best approach then please
> add new logic to bpftool.

Yes, we're heading that direction I suppose :\ ... Accessing .BTF.ext, to get
the "bpf_core_relo" information requires libbpf internals to be exposed:

https://github.com/rafaeldtinoco/btfgen/blob/standalone/btfgen2.c#L119
https://github.com/rafaeldtinoco/btfgen/blob/standalone/include/stolen.h

we would have to check if we can try to export what we need for that (instead
of re-declaring internal headers, which obviously looks bad).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-10-29  5:41   ` Rafael David Tinoco
@ 2021-10-29  5:51     ` Rafael David Tinoco
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael David Tinoco @ 2021-10-29  5:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mauricio Vásquez, Network Development, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko

> > Have you considered generating kernel BTF with fields that are accessed
> > by bpf prog only and replacing all other fields with padding ?
>
> That is exactly the result of our final BTF file. We only include the
> fields and types being used by the given eBPF object:

A full explanation on this here:

https://github.com/aquasecurity/btfhub/tree/main/tools#how-to-use-libbpf-co-re-relocations-to-build-a-btf-file

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-10-29  2:33 ` [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API Alexei Starovoitov
  2021-10-29  5:41   ` Rafael David Tinoco
@ 2021-10-29 16:12   ` Mauricio Vásquez Bernal
  2021-11-02  5:53     ` Andrii Nakryiko
  1 sibling, 1 reply; 19+ messages in thread
From: Mauricio Vásquez Bernal @ 2021-10-29 16:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Network Development, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Rafael David Tinoco, Lorenzo Fontana

> Tracing progs will be peeking into task_struct.
> To describe it in the reduced BTF most of the kernel types would be needed,

That's the point of our algorithm. If a program is only accessing
'pid' on 'task_struct' we'll generate a BTF representation of
task_struct that only contains the 'pid' member with the right offset,
other members are not included and hence we don't need to carry on all
those types that are not used by the program.

> Have you considered generating kernel BTF with fields that are accessed
> by bpf prog only and replacing all other fields with padding ?

Yeah. We're implicitly doing it as described above.

> I think the algo would be quite different from the actual CO-RE logic
> you're trying to reuse.

I'm not 100% sure it's so easy to do without reimplementing much of
the actual CO-RE logic. We don't want to copy all type members but
only the ones actually used. So I'm not sure if Andrii's idea of
performing the matching based only on the type name will work. I'll
try to get deep into the details and will be back to you soon.

> If CO-RE matching style is necessary and it's the best approach then please
> add new logic to bpftool. I'm not sure such api would be
> useful beyond this particular case to expose as stable libbpf api.

I agree 100%. Our goal is to have this available on bpftool so all the
community can use it. However it'd be a shame if we can't use some of
the existing logic in libbpf.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-10-29 16:12   ` Mauricio Vásquez Bernal
@ 2021-11-02  5:53     ` Andrii Nakryiko
  2021-11-02 10:58       ` Leonardo Di Donato
                         ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2021-11-02  5:53 UTC (permalink / raw)
  To: Mauricio Vásquez Bernal
  Cc: Alexei Starovoitov, Network Development, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Rafael David Tinoco,
	Lorenzo Fontana

On Fri, Oct 29, 2021 at 9:12 AM Mauricio Vásquez Bernal
<mauricio@kinvolk.io> wrote:
>
> > Tracing progs will be peeking into task_struct.
> > To describe it in the reduced BTF most of the kernel types would be needed,
>
> That's the point of our algorithm. If a program is only accessing
> 'pid' on 'task_struct' we'll generate a BTF representation of
> task_struct that only contains the 'pid' member with the right offset,
> other members are not included and hence we don't need to carry on all
> those types that are not used by the program.
>
> > Have you considered generating kernel BTF with fields that are accessed
> > by bpf prog only and replacing all other fields with padding ?
>
> Yeah. We're implicitly doing it as described above.
>
> > I think the algo would be quite different from the actual CO-RE logic
> > you're trying to reuse.
>
> I'm not 100% sure it's so easy to do without reimplementing much of
> the actual CO-RE logic. We don't want to copy all type members but
> only the ones actually used. So I'm not sure if Andrii's idea of
> performing the matching based only on the type name will work. I'll
> try to get deep into the details and will be back to you soon.

No, now that I understand what exactly you are doing, it won't work.

But ok, I gave it quite a bit of thought and tried to find a good
compromise between completely exposing all the libbpf internals as
public APIs (which is a huge price I'm not willing to accept) and
actually allowing you to achieve your goal (which I think is worthy to
achieve).

But first. https://github.com/aquasecurity/btfhub/tree/main/tools is
awesome. Great work explaining a lot at exactly the right level of
technical details. It would be great if you published it as a
dedicated blog post, maybe splitting the more general information from
the BTF minimization problem. Both are useful, but it's too long for
one article. Great job, really!

Now, to the problem at hand. And sorry for a long reply, but there is
quite a bit of things to unpack.

I see this overall problem as two distinct problems:
  1. Knowing which fields and types libbpf is using from kernel BTF.
Basically, observe CO-RE logic from outside.
  2. Knowing information from #1, minimize BTF.

Part #2 absolutely doesn't belong in libbpf. Libbpf exposes enough BTF
constructing APIs to implement this in any application, bpftool or
otherwise. It's also a relatively straightforward problem: mark used
types and fields, create a copy of BTF with only those types and
fields.

So let's talk about #1, because I agree that it's extremely painful to
have to reimplement most of CO-RE logic just for getting the list of
types and fields. Here we have two sub-problems (assuming we let
libbpf do CO-RE relocation logic for us):
  a) perform CO-RE relocations but don't create BPF maps and load BPF
programs. Dry-run of sorts.
  b) exposing calculated relocation information.

First, 1a. The problem right now is that CO-RE relocations (and
relocations in general) happen in the same bpf_object__load() phase
and it's not possible to do them without creating maps and
subsequently loading BPF programs first. This is very suboptimal. I've
actually been thinking in the background how to improve that
situation, because even with the recent bpf_program__insns() API,
added a few days ago, you still have to load the BPF program to be
able to clone the BPF program, and so I wanted to solve that for a
long while now.

So how about we split bpf_object__load() phase into two:
bpf_object__prepare() and bpf_object__load(). prepare() will do all
the preparations (subprogs, ELF relos, also almost complete BPF map
relos, but not quite; I'll get to this), basically everything that
load does today short of actually creating maps and progs. load() then
will ensure that bpf_object__prepare() was called (i.e., user can do
just prepare(), prepare() + load() explicitly, or load() which will
call prepare() implicitly).

The biggest problem I see right now is what we do about BPF map
relocations. I propose to perform map relocations but substitute map's
internal index (so that if someone dumps prog instructions after
prepare(), it's still meaningful, even if not yet validatable by
verifier). After maps are actually created, we can do another quick
pass over just RELO_DATA and replace map_idx with map's fd.

It feels like we should split load further into two steps: creating
and pinning maps (+ finalizing FDs in instructions) and actually
loading programs. Again, with the same implicit calling of prepare and
map creation steps if the user only calls bpf_object__load(). For ease
of use and backwards compatibility.

So basically, the most granular set of steps would be:
  1. bpf_object__open()
  2. bpf_object__prepare() (or bpf_object__relocate(), naming is hard)
  3. bpf_object__create_maps();
  4. bpf_object__load().

But the old and trusty bpf_object__open() + bpf_object__load() will
work just as well, with load() doing steps #2-#4 automatically, if
necessary.

So what does that split gives us. Few things:
  - it's possible to "simulate" almost all libbpf logic without
changing the state of the system (no maps or progs created). Yet you
still validate that kconfig externs, per-cpu externs, CO-RE relos, map
relos, etc, all that works.
  - libbpf can store extra information between steps 1, 2, 3, and 4,
but after step #4 all that extra information can be discarded and
cleaned up. So advanced users will get access to stuff like
bpf_program__insns() and CO-RE information, but most users won't have
to pay for that because it will be freed after bpf_object__load(). So
in this case, bpf_program__insns() could (should?) be discarded after
actual load, because if you care about instructions, you can do steps
#1-#3, grab instructions and copy them, if necessary. Then proceed to
#4 (or not) and free the memory.

The last point is important, because to solve the problem 1b (exposing
CO-RE relo info), the best way to minimize public API commitments is
to (optionally, probably) request libbpf to record its CO-RE relo
decisions. Here's what I propose, specifically:
  1. Add something like "bool record_core_relo_info" (awful name,
don't use it) in bpf_object_open_opts.
  2. If it is set to true, libbpf will keep a "log" of CO-RE
relocation decisions, recording stuff like program name, instruction
index, local spec (i.e., root_type_id, spec string, relo kind, maybe
something else), target spec (kernel type_id, kernel spec string, also
module ID, if it's not vmlinux BTF). We can also record relocated
value (i.e., field offset, actual enum value, true/false for
existence, etc). All these are stable concepts, so I'd feel more
comfortable exposing them, compared to stuff like bpf_core_accessor
and other internal details.
  3. The memory for all that will be managed by libbpf for simplicity
of an API, and we'll expose accessors to get those arrays (at object
level or per-program level is TBD).
  4. This info will be available after the prepare() step and will be
discarded either at create_maps() or load().

I think that solves problem #1 completely (at least for BTFGen case)
and actually provides more useful information. E.g., if someone wants
to track CO-RE resolution logic for some other reason, they should
have pretty full information (BTFGen doesn't need all of that).

It also doesn't feel like too much complication to libbpf internals
(even though we'll need to be very careful with BPF map relos and
double-checking that we haven't missed any other critical part of the
process). And for most users there is no change in API or behavior.
And given this gives us a "framework" for more sustainable libbpf
"observability", I think it's a justified price to pay, overall.

I still need to sleep on this, but this feels like a bit cleaner way
forward. Thoughts are welcome.

>
> > If CO-RE matching style is necessary and it's the best approach then please
> > add new logic to bpftool. I'm not sure such api would be
> > useful beyond this particular case to expose as stable libbpf api.
>
> I agree 100%. Our goal is to have this available on bpftool so all the
> community can use it. However it'd be a shame if we can't use some of
> the existing logic in libbpf.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-11-02  5:53     ` Andrii Nakryiko
@ 2021-11-02 10:58       ` Leonardo Di Donato
  2021-11-02 17:12         ` Andrii Nakryiko
  2021-11-02 21:26       ` Mauricio Vásquez Bernal
  2021-11-03 23:40       ` Andrii Nakryiko
  2 siblings, 1 reply; 19+ messages in thread
From: Leonardo Di Donato @ 2021-11-02 10:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Mauricio Vásquez Bernal, Alexei Starovoitov,
	Network Development, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Rafael David Tinoco, Lorenzo Fontana,
	leonardo.didonato

On Tue, Nov 2, 2021 at 6:55 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> No, now that I understand what exactly you are doing, it won't work.
>
> But ok, I gave it quite a bit of thought and tried to find a good
> compromise between completely exposing all the libbpf internals as
> public APIs (which is a huge price I'm not willing to accept) and
> actually allowing you to achieve your goal (which I think is worthy to
> achieve).
>
> But first. https://github.com/aquasecurity/btfhub/tree/main/tools is
> awesome. Great work explaining a lot at exactly the right level of
> technical details. It would be great if you published it as a
> dedicated blog post, maybe splitting the more general information from
> the BTF minimization problem. Both are useful, but it's too long for
> one article. Great job, really!
>
> Now, to the problem at hand. And sorry for a long reply, but there is
> quite a bit of things to unpack.
>
> I see this overall problem as two distinct problems:
>   1. Knowing which fields and types libbpf is using from kernel BTF.
> Basically, observe CO-RE logic from outside.
>   2. Knowing information from #1, minimize BTF.
>
> Part #2 absolutely doesn't belong in libbpf. Libbpf exposes enough BTF
> constructing APIs to implement this in any application, bpftool or
> otherwise. It's also a relatively straightforward problem: mark used
> types and fields, create a copy of BTF with only those types and
> fields.
>
> So let's talk about #1, because I agree that it's extremely painful to
> have to reimplement most of CO-RE logic just for getting the list of
> types and fields. Here we have two sub-problems (assuming we let
> libbpf do CO-RE relocation logic for us):
>   a) perform CO-RE relocations but don't create BPF maps and load BPF
> programs. Dry-run of sorts.
>   b) exposing calculated relocation information.
>
> First, 1a. The problem right now is that CO-RE relocations (and
> relocations in general) happen in the same bpf_object__load() phase
> and it's not possible to do them without creating maps and
> subsequently loading BPF programs first. This is very suboptimal. I've
> actually been thinking in the background how to improve that
> situation, because even with the recent bpf_program__insns() API,
> added a few days ago, you still have to load the BPF program to be
> able to clone the BPF program, and so I wanted to solve that for a
> long while now.
>
> So how about we split bpf_object__load() phase into two:
> bpf_object__prepare() and bpf_object__load(). prepare() will do all
> the preparations (subprogs, ELF relos, also almost complete BPF map
> relos, but not quite; I'll get to this), basically everything that
> load does today short of actually creating maps and progs. load() then
> will ensure that bpf_object__prepare() was called (i.e., user can do
> just prepare(), prepare() + load() explicitly, or load() which will
> call prepare() implicitly).
>
> The biggest problem I see right now is what we do about BPF map
> relocations. I propose to perform map relocations but substitute map's
> internal index (so that if someone dumps prog instructions after
> prepare(), it's still meaningful, even if not yet validatable by
> verifier). After maps are actually created, we can do another quick
> pass over just RELO_DATA and replace map_idx with map's fd.
>
> It feels like we should split load further into two steps: creating
> and pinning maps (+ finalizing FDs in instructions) and actually
> loading programs. Again, with the same implicit calling of prepare and
> map creation steps if the user only calls bpf_object__load(). For ease
> of use and backwards compatibility.
>
> So basically, the most granular set of steps would be:
>   1. bpf_object__open()
>   2. bpf_object__prepare() (or bpf_object__relocate(), naming is hard)
>   3. bpf_object__create_maps();
>   4. bpf_object__load().
>
> But the old and trusty bpf_object__open() + bpf_object__load() will
> work just as well, with load() doing steps #2-#4 automatically, if
> necessary.
>
> So what does that split gives us. Few things:
>   - it's possible to "simulate" almost all libbpf logic without
> changing the state of the system (no maps or progs created). Yet you
> still validate that kconfig externs, per-cpu externs, CO-RE relos, map
> relos, etc, all that works.
>   - libbpf can store extra information between steps 1, 2, 3, and 4,
> but after step #4 all that extra information can be discarded and
> cleaned up. So advanced users will get access to stuff like
> bpf_program__insns() and CO-RE information, but most users won't have
> to pay for that because it will be freed after bpf_object__load(). So
> in this case, bpf_program__insns() could (should?) be discarded after
> actual load, because if you care about instructions, you can do steps
> #1-#3, grab instructions and copy them, if necessary. Then proceed to
> #4 (or not) and free the memory.
>
> The last point is important, because to solve the problem 1b (exposing
> CO-RE relo info), the best way to minimize public API commitments is
> to (optionally, probably) request libbpf to record its CO-RE relo
> decisions. Here's what I propose, specifically:
>   1. Add something like "bool record_core_relo_info" (awful name,
> don't use it) in bpf_object_open_opts.
>   2. If it is set to true, libbpf will keep a "log" of CO-RE
> relocation decisions, recording stuff like program name, instruction
> index, local spec (i.e., root_type_id, spec string, relo kind, maybe
> something else), target spec (kernel type_id, kernel spec string, also
> module ID, if it's not vmlinux BTF). We can also record relocated
> value (i.e., field offset, actual enum value, true/false for
> existence, etc). All these are stable concepts, so I'd feel more
> comfortable exposing them, compared to stuff like bpf_core_accessor
> and other internal details.
>   3. The memory for all that will be managed by libbpf for simplicity
> of an API, and we'll expose accessors to get those arrays (at object
> level or per-program level is TBD).
>   4. This info will be available after the prepare() step and will be
> discarded either at create_maps() or load().
>
> I think that solves problem #1 completely (at least for BTFGen case)
> and actually provides more useful information. E.g., if someone wants
> to track CO-RE resolution logic for some other reason, they should
> have pretty full information (BTFGen doesn't need all of that).
>
> It also doesn't feel like too much complication to libbpf internals
> (even though we'll need to be very careful with BPF map relos and
> double-checking that we haven't missed any other critical part of the
> process). And for most users there is no change in API or behavior.
> And given this gives us a "framework" for more sustainable libbpf
> "observability", I think it's a justified price to pay, overall.
>
> I still need to sleep on this, but this feels like a bit cleaner way
> forward. Thoughts are welcome.
>
> >
> > > If CO-RE matching style is necessary and it's the best approach then please
> > > add new logic to bpftool. I'm not sure such api would be
> > > useful beyond this particular case to expose as stable libbpf api.
> >
> > I agree 100%. Our goal is to have this available on bpftool so all the
> > community can use it. However it'd be a shame if we can't use some of
> > the existing logic in libbpf.

Hello Andrii,

I was experimenting on implementing this during the last few days by
using the preprocessor mechanism builtin in libbpf.

The bpf_object__load_progs (which happens after bpf_object__relocate)
calls bpf_object__load_progs, which calls bpf_program__load that, for
each program instance, calls a bpf_program_prep_t preprocessor. Such a
callback gets passed in the current program instance, its index, its
insn, etc.
The bpf_object__load_progs function gets called just after
bpf_object__relocate (in bpf_object__load_xattr) which finds relo
candidates and applies them.
But I guess you already know all this.

So my idea was to store  the relocation info into the program
instances, exposing them to the userspace implementation of the
aforementioned callback.
At that point, creating a tailored BTF for the program/s would be just
a matter of implementing the logic for grabbing and saving them to the
disk.

Would you think this would be feasible? I think this would be a good
use case for the preprocessor.

L.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-11-02 10:58       ` Leonardo Di Donato
@ 2021-11-02 17:12         ` Andrii Nakryiko
  0 siblings, 0 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2021-11-02 17:12 UTC (permalink / raw)
  To: Leonardo Di Donato
  Cc: Mauricio Vásquez Bernal, Alexei Starovoitov,
	Network Development, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Rafael David Tinoco, Lorenzo Fontana,
	leonardo.didonato

On Tue, Nov 2, 2021 at 3:59 AM Leonardo Di Donato <leodidonato@gmail.com> wrote:
>
> On Tue, Nov 2, 2021 at 6:55 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > No, now that I understand what exactly you are doing, it won't work.
> >
> > But ok, I gave it quite a bit of thought and tried to find a good
> > compromise between completely exposing all the libbpf internals as
> > public APIs (which is a huge price I'm not willing to accept) and
> > actually allowing you to achieve your goal (which I think is worthy to
> > achieve).
> >
> > But first. https://github.com/aquasecurity/btfhub/tree/main/tools is
> > awesome. Great work explaining a lot at exactly the right level of
> > technical details. It would be great if you published it as a
> > dedicated blog post, maybe splitting the more general information from
> > the BTF minimization problem. Both are useful, but it's too long for
> > one article. Great job, really!
> >
> > Now, to the problem at hand. And sorry for a long reply, but there is
> > quite a bit of things to unpack.
> >
> > I see this overall problem as two distinct problems:
> >   1. Knowing which fields and types libbpf is using from kernel BTF.
> > Basically, observe CO-RE logic from outside.
> >   2. Knowing information from #1, minimize BTF.
> >
> > Part #2 absolutely doesn't belong in libbpf. Libbpf exposes enough BTF
> > constructing APIs to implement this in any application, bpftool or
> > otherwise. It's also a relatively straightforward problem: mark used
> > types and fields, create a copy of BTF with only those types and
> > fields.
> >
> > So let's talk about #1, because I agree that it's extremely painful to
> > have to reimplement most of CO-RE logic just for getting the list of
> > types and fields. Here we have two sub-problems (assuming we let
> > libbpf do CO-RE relocation logic for us):
> >   a) perform CO-RE relocations but don't create BPF maps and load BPF
> > programs. Dry-run of sorts.
> >   b) exposing calculated relocation information.
> >
> > First, 1a. The problem right now is that CO-RE relocations (and
> > relocations in general) happen in the same bpf_object__load() phase
> > and it's not possible to do them without creating maps and
> > subsequently loading BPF programs first. This is very suboptimal. I've
> > actually been thinking in the background how to improve that
> > situation, because even with the recent bpf_program__insns() API,
> > added a few days ago, you still have to load the BPF program to be
> > able to clone the BPF program, and so I wanted to solve that for a
> > long while now.
> >
> > So how about we split bpf_object__load() phase into two:
> > bpf_object__prepare() and bpf_object__load(). prepare() will do all
> > the preparations (subprogs, ELF relos, also almost complete BPF map
> > relos, but not quite; I'll get to this), basically everything that
> > load does today short of actually creating maps and progs. load() then
> > will ensure that bpf_object__prepare() was called (i.e., user can do
> > just prepare(), prepare() + load() explicitly, or load() which will
> > call prepare() implicitly).
> >
> > The biggest problem I see right now is what we do about BPF map
> > relocations. I propose to perform map relocations but substitute map's
> > internal index (so that if someone dumps prog instructions after
> > prepare(), it's still meaningful, even if not yet validatable by
> > verifier). After maps are actually created, we can do another quick
> > pass over just RELO_DATA and replace map_idx with map's fd.
> >
> > It feels like we should split load further into two steps: creating
> > and pinning maps (+ finalizing FDs in instructions) and actually
> > loading programs. Again, with the same implicit calling of prepare and
> > map creation steps if the user only calls bpf_object__load(). For ease
> > of use and backwards compatibility.
> >
> > So basically, the most granular set of steps would be:
> >   1. bpf_object__open()
> >   2. bpf_object__prepare() (or bpf_object__relocate(), naming is hard)
> >   3. bpf_object__create_maps();
> >   4. bpf_object__load().
> >
> > But the old and trusty bpf_object__open() + bpf_object__load() will
> > work just as well, with load() doing steps #2-#4 automatically, if
> > necessary.
> >
> > So what does that split gives us. Few things:
> >   - it's possible to "simulate" almost all libbpf logic without
> > changing the state of the system (no maps or progs created). Yet you
> > still validate that kconfig externs, per-cpu externs, CO-RE relos, map
> > relos, etc, all that works.
> >   - libbpf can store extra information between steps 1, 2, 3, and 4,
> > but after step #4 all that extra information can be discarded and
> > cleaned up. So advanced users will get access to stuff like
> > bpf_program__insns() and CO-RE information, but most users won't have
> > to pay for that because it will be freed after bpf_object__load(). So
> > in this case, bpf_program__insns() could (should?) be discarded after
> > actual load, because if you care about instructions, you can do steps
> > #1-#3, grab instructions and copy them, if necessary. Then proceed to
> > #4 (or not) and free the memory.
> >
> > The last point is important, because to solve the problem 1b (exposing
> > CO-RE relo info), the best way to minimize public API commitments is
> > to (optionally, probably) request libbpf to record its CO-RE relo
> > decisions. Here's what I propose, specifically:
> >   1. Add something like "bool record_core_relo_info" (awful name,
> > don't use it) in bpf_object_open_opts.
> >   2. If it is set to true, libbpf will keep a "log" of CO-RE
> > relocation decisions, recording stuff like program name, instruction
> > index, local spec (i.e., root_type_id, spec string, relo kind, maybe
> > something else), target spec (kernel type_id, kernel spec string, also
> > module ID, if it's not vmlinux BTF). We can also record relocated
> > value (i.e., field offset, actual enum value, true/false for
> > existence, etc). All these are stable concepts, so I'd feel more
> > comfortable exposing them, compared to stuff like bpf_core_accessor
> > and other internal details.
> >   3. The memory for all that will be managed by libbpf for simplicity
> > of an API, and we'll expose accessors to get those arrays (at object
> > level or per-program level is TBD).
> >   4. This info will be available after the prepare() step and will be
> > discarded either at create_maps() or load().
> >
> > I think that solves problem #1 completely (at least for BTFGen case)
> > and actually provides more useful information. E.g., if someone wants
> > to track CO-RE resolution logic for some other reason, they should
> > have pretty full information (BTFGen doesn't need all of that).
> >
> > It also doesn't feel like too much complication to libbpf internals
> > (even though we'll need to be very careful with BPF map relos and
> > double-checking that we haven't missed any other critical part of the
> > process). And for most users there is no change in API or behavior.
> > And given this gives us a "framework" for more sustainable libbpf
> > "observability", I think it's a justified price to pay, overall.
> >
> > I still need to sleep on this, but this feels like a bit cleaner way
> > forward. Thoughts are welcome.
> >
> > >
> > > > If CO-RE matching style is necessary and it's the best approach then please
> > > > add new logic to bpftool. I'm not sure such api would be
> > > > useful beyond this particular case to expose as stable libbpf api.
> > >
> > > I agree 100%. Our goal is to have this available on bpftool so all the
> > > community can use it. However it'd be a shame if we can't use some of
> > > the existing logic in libbpf.
>
> Hello Andrii,
>
> I was experimenting on implementing this during the last few days by
> using the preprocessor mechanism builtin in libbpf.
>
> The bpf_object__load_progs (which happens after bpf_object__relocate)
> calls bpf_object__load_progs, which calls bpf_program__load that, for
> each program instance, calls a bpf_program_prep_t preprocessor. Such a
> callback gets passed in the current program instance, its index, its
> insn, etc.
> The bpf_object__load_progs function gets called just after
> bpf_object__relocate (in bpf_object__load_xattr) which finds relo
> candidates and applies them.
> But I guess you already know all this.
>
> So my idea was to store  the relocation info into the program
> instances, exposing them to the userspace implementation of the
> aforementioned callback.
> At that point, creating a tailored BTF for the program/s would be just
> a matter of implementing the logic for grabbing and saving them to the
> disk.
>
> Would you think this would be feasible? I think this would be a good
> use case for the preprocessor.

This preprocessor API is deprecated and is going to be removed in
libbpf 1.0. See [0].

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20211025224531.1088894-4-andrii@kernel.org/

>
> L.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-11-02  5:53     ` Andrii Nakryiko
  2021-11-02 10:58       ` Leonardo Di Donato
@ 2021-11-02 21:26       ` Mauricio Vásquez Bernal
  2021-11-03  5:26         ` Andrii Nakryiko
  2021-11-03 23:40       ` Andrii Nakryiko
  2 siblings, 1 reply; 19+ messages in thread
From: Mauricio Vásquez Bernal @ 2021-11-02 21:26 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Network Development, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Rafael David Tinoco,
	Lorenzo Fontana

> Part #2 absolutely doesn't belong in libbpf. Libbpf exposes enough BTF
> constructing APIs to implement this in any application, bpftool or
> otherwise. It's also a relatively straightforward problem: mark used
> types and fields, create a copy of BTF with only those types and
> fields.

Totally agree.

> The last point is important, because to solve the problem 1b (exposing
> CO-RE relo info), the best way to minimize public API commitments is
> to (optionally, probably) request libbpf to record its CO-RE relo
> decisions. Here's what I propose, specifically:
>   1. Add something like "bool record_core_relo_info" (awful name,
> don't use it) in bpf_object_open_opts.
>   2. If it is set to true, libbpf will keep a "log" of CO-RE
> relocation decisions, recording stuff like program name, instruction
> index, local spec (i.e., root_type_id, spec string, relo kind, maybe
> something else), target spec (kernel type_id, kernel spec string, also
> module ID, if it's not vmlinux BTF). We can also record relocated
> value (i.e., field offset, actual enum value, true/false for
> existence, etc). All these are stable concepts, so I'd feel more
> comfortable exposing them, compared to stuff like bpf_core_accessor
> and other internal details.
>   3. The memory for all that will be managed by libbpf for simplicity
> of an API, and we'll expose accessors to get those arrays (at object
> level or per-program level is TBD).
>   4. This info will be available after the prepare() step and will be
> discarded either at create_maps() or load().

I like all this proposal. It fits very well with the BTFGen use case.

Regarding the information to expose, IIUC that'd be slight versions of
struct bpf_core_relo_res and struct bpf_core_spec. I think we could
expose the following structures and a function to get it (please
ignore the naming for now):

```
/* reduced version of struct bpf_core_spec */
struct bpf_core_spec_pub {
const struct btf *btf;
__u32 root_type_id;
enum bpf_core_relo_kind kind;
/* raw, low-level spec: 1-to-1 with accessor spec string */ --> we can
also use access_str_off and let the user parse it
int raw_spec[BPF_CORE_SPEC_MAX_LEN];
/* raw spec length */
int raw_len;
};

struct bpf_core_relo_pub {
const char *prog_name; --> if we expose it by program then it's not needed.
int insn_idx;

bool poison; --> allows the user to understand if the relocation
succeeded or not.

/* new field offset for field based core relos */
__u32 new_offset;

// TODO: fields for type and enum-based relos

struct bpf_core_spec_pub local_spec, targ_spec; --> BTFGen only needs
targ_spec, I suppose local spec would be useful for other use cases.
};

LIBBPF_API struct bpf_core_relo_pub *bpf_program__core_relos(struct
bpf_program *prog);
```

I don't have strong opinions about exposing it by object or by
program. Both cases should work the same for BTFGen.

Does it make sense to you?

Btw, I'm probably not the right person to give opinions about this API
splitment. I'd be happy to have other opinions here and to make this
change once we agree on a path forward.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-11-02 21:26       ` Mauricio Vásquez Bernal
@ 2021-11-03  5:26         ` Andrii Nakryiko
  2021-11-04 14:58           ` Mauricio Vásquez Bernal
  0 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2021-11-03  5:26 UTC (permalink / raw)
  To: Mauricio Vásquez Bernal
  Cc: Alexei Starovoitov, Network Development, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Rafael David Tinoco,
	Lorenzo Fontana

On Tue, Nov 2, 2021 at 2:26 PM Mauricio Vásquez Bernal
<mauricio@kinvolk.io> wrote:
>
> > Part #2 absolutely doesn't belong in libbpf. Libbpf exposes enough BTF
> > constructing APIs to implement this in any application, bpftool or
> > otherwise. It's also a relatively straightforward problem: mark used
> > types and fields, create a copy of BTF with only those types and
> > fields.
>
> Totally agree.
>
> > The last point is important, because to solve the problem 1b (exposing
> > CO-RE relo info), the best way to minimize public API commitments is
> > to (optionally, probably) request libbpf to record its CO-RE relo
> > decisions. Here's what I propose, specifically:
> >   1. Add something like "bool record_core_relo_info" (awful name,
> > don't use it) in bpf_object_open_opts.
> >   2. If it is set to true, libbpf will keep a "log" of CO-RE
> > relocation decisions, recording stuff like program name, instruction
> > index, local spec (i.e., root_type_id, spec string, relo kind, maybe
> > something else), target spec (kernel type_id, kernel spec string, also
> > module ID, if it's not vmlinux BTF). We can also record relocated
> > value (i.e., field offset, actual enum value, true/false for
> > existence, etc). All these are stable concepts, so I'd feel more
> > comfortable exposing them, compared to stuff like bpf_core_accessor
> > and other internal details.
> >   3. The memory for all that will be managed by libbpf for simplicity
> > of an API, and we'll expose accessors to get those arrays (at object
> > level or per-program level is TBD).
> >   4. This info will be available after the prepare() step and will be
> > discarded either at create_maps() or load().
>
> I like all this proposal. It fits very well with the BTFGen use case.
>
> Regarding the information to expose, IIUC that'd be slight versions of
> struct bpf_core_relo_res and struct bpf_core_spec. I think we could
> expose the following structures and a function to get it (please
> ignore the naming for now):
>
> ```
> /* reduced version of struct bpf_core_spec */
> struct bpf_core_spec_pub {
> const struct btf *btf;
> __u32 root_type_id;
> enum bpf_core_relo_kind kind;
> /* raw, low-level spec: 1-to-1 with accessor spec string */ --> we can
> also use access_str_off and let the user parse it
> int raw_spec[BPF_CORE_SPEC_MAX_LEN];

string might be a more "extensible" way, but we'll need to construct
that string for each relocation

> /* raw spec length */
> int raw_len;

using string would eliminate the need for this

> };
>
> struct bpf_core_relo_pub {
> const char *prog_name; --> if we expose it by program then it's not needed.

yep, not sure about per-program yet, but that's minor

> int insn_idx;
>
> bool poison; --> allows the user to understand if the relocation
> succeeded or not.
>
> /* new field offset for field based core relos */
> __u32 new_offset;
>
> // TODO: fields for type and enum-based relos

isn't it always just u64 new_value for all types of relos? We can also
expose old_value just for completeness

>
> struct bpf_core_spec_pub local_spec, targ_spec; --> BTFGen only needs
> targ_spec, I suppose local spec would be useful for other use cases.

targ_spec doesn't seem necessary given we have root_type_id, relo
kind, access_string (or raw_spec). What am I missing?


> };
>
> LIBBPF_API struct bpf_core_relo_pub *bpf_program__core_relos(struct
> bpf_program *prog);

need also size of this array and it should be const struct *, but
yeah, something like this

> ```
>
> I don't have strong opinions about exposing it by object or by
> program. Both cases should work the same for BTFGen.
>
> Does it make sense to you?

Yeah, more or less.

>
> Btw, I'm probably not the right person to give opinions about this API
> splitment. I'd be happy to have other opinions here and to make this
> change once we agree on a path forward.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-11-02  5:53     ` Andrii Nakryiko
  2021-11-02 10:58       ` Leonardo Di Donato
  2021-11-02 21:26       ` Mauricio Vásquez Bernal
@ 2021-11-03 23:40       ` Andrii Nakryiko
  2 siblings, 0 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2021-11-03 23:40 UTC (permalink / raw)
  To: Mauricio Vásquez Bernal
  Cc: Alexei Starovoitov, Network Development, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Rafael David Tinoco,
	Lorenzo Fontana

On Mon, Nov 1, 2021 at 10:53 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Oct 29, 2021 at 9:12 AM Mauricio Vásquez Bernal
> <mauricio@kinvolk.io> wrote:
> >
> > > Tracing progs will be peeking into task_struct.
> > > To describe it in the reduced BTF most of the kernel types would be needed,
> >
> > That's the point of our algorithm. If a program is only accessing
> > 'pid' on 'task_struct' we'll generate a BTF representation of
> > task_struct that only contains the 'pid' member with the right offset,
> > other members are not included and hence we don't need to carry on all
> > those types that are not used by the program.
> >
> > > Have you considered generating kernel BTF with fields that are accessed
> > > by bpf prog only and replacing all other fields with padding ?
> >
> > Yeah. We're implicitly doing it as described above.
> >
> > > I think the algo would be quite different from the actual CO-RE logic
> > > you're trying to reuse.
> >
> > I'm not 100% sure it's so easy to do without reimplementing much of
> > the actual CO-RE logic. We don't want to copy all type members but
> > only the ones actually used. So I'm not sure if Andrii's idea of
> > performing the matching based only on the type name will work. I'll
> > try to get deep into the details and will be back to you soon.
>
> No, now that I understand what exactly you are doing, it won't work.
>
> But ok, I gave it quite a bit of thought and tried to find a good
> compromise between completely exposing all the libbpf internals as
> public APIs (which is a huge price I'm not willing to accept) and
> actually allowing you to achieve your goal (which I think is worthy to
> achieve).
>
> But first. https://github.com/aquasecurity/btfhub/tree/main/tools is
> awesome. Great work explaining a lot at exactly the right level of
> technical details. It would be great if you published it as a
> dedicated blog post, maybe splitting the more general information from
> the BTF minimization problem. Both are useful, but it's too long for
> one article. Great job, really!
>
> Now, to the problem at hand. And sorry for a long reply, but there is
> quite a bit of things to unpack.
>
> I see this overall problem as two distinct problems:
>   1. Knowing which fields and types libbpf is using from kernel BTF.
> Basically, observe CO-RE logic from outside.
>   2. Knowing information from #1, minimize BTF.
>
> Part #2 absolutely doesn't belong in libbpf. Libbpf exposes enough BTF
> constructing APIs to implement this in any application, bpftool or
> otherwise. It's also a relatively straightforward problem: mark used
> types and fields, create a copy of BTF with only those types and
> fields.
>
> So let's talk about #1, because I agree that it's extremely painful to
> have to reimplement most of CO-RE logic just for getting the list of
> types and fields. Here we have two sub-problems (assuming we let
> libbpf do CO-RE relocation logic for us):
>   a) perform CO-RE relocations but don't create BPF maps and load BPF
> programs. Dry-run of sorts.
>   b) exposing calculated relocation information.
>
> First, 1a. The problem right now is that CO-RE relocations (and
> relocations in general) happen in the same bpf_object__load() phase
> and it's not possible to do them without creating maps and
> subsequently loading BPF programs first. This is very suboptimal. I've
> actually been thinking in the background how to improve that
> situation, because even with the recent bpf_program__insns() API,
> added a few days ago, you still have to load the BPF program to be
> able to clone the BPF program, and so I wanted to solve that for a
> long while now.
>
> So how about we split bpf_object__load() phase into two:
> bpf_object__prepare() and bpf_object__load(). prepare() will do all
> the preparations (subprogs, ELF relos, also almost complete BPF map
> relos, but not quite; I'll get to this), basically everything that
> load does today short of actually creating maps and progs. load() then
> will ensure that bpf_object__prepare() was called (i.e., user can do
> just prepare(), prepare() + load() explicitly, or load() which will
> call prepare() implicitly).
>
> The biggest problem I see right now is what we do about BPF map
> relocations. I propose to perform map relocations but substitute map's
> internal index (so that if someone dumps prog instructions after
> prepare(), it's still meaningful, even if not yet validatable by
> verifier). After maps are actually created, we can do another quick
> pass over just RELO_DATA and replace map_idx with map's fd.
>
> It feels like we should split load further into two steps: creating
> and pinning maps (+ finalizing FDs in instructions) and actually
> loading programs. Again, with the same implicit calling of prepare and
> map creation steps if the user only calls bpf_object__load(). For ease
> of use and backwards compatibility.
>
> So basically, the most granular set of steps would be:
>   1. bpf_object__open()
>   2. bpf_object__prepare() (or bpf_object__relocate(), naming is hard)
>   3. bpf_object__create_maps();
>   4. bpf_object__load().

Talking with Alexei and digging through libbpf code some more, it
might be a bit premature for now to split out create_maps(). It just
complicates things and doesn't really add anything to the goal of
observing CO-RE relocation results. So let's keep it under
bpf_object__load() for now.

But that made me think we need a separate bpf_object__prepare() step
at all. And I see two ways to go about it.

First approach is as follows.

1. We don't add any new "step", keeping open and load phases only. All
the steps that don't depend on the actual host kernel information
(kallsyms, kconfig, etc) are moved into open() step, keeping it
"unprivileged" and kernel-agnostic step. So as long as BPF object and
its programs are "structurally sounds", meaning that all the subprog
calls are relocated, map references are validated (but no map_fd is
substituted, we just record which map has to be relocated, etc). It's
basically a sanity check step.

2. BTF upload, kallsyms + extern resolution, stops initi, map creation
and prog loading stays in load() step.

So that leaves the CO-RE relocation step. For BTFGen CO-RE has to
happen in open() step. The problem is that CO-RE actually relies on
kernel resource (/sys/kernel/btf/vmlinux + plus possible module BTFs),
so it would have to be in load() step. On the other hand,
bpf_object_open_ops  provide a way to specify custom vmlinux BTF, so
it is possible to avoid libbpf complaining about missing BTF by
providing custom BTF (even if it's an empty BTF).

So, say, for `bpftool gen skeleton`, to avoid libbpf erroring out
during skeleton generation, bpftool can substitute empty BTF as custom
BTF and skeleton generation should succeed (all the CO-RE relocated
instructions will be poisoned, but that's irrelevant for skeleton).

But that does feel a bit hacky. So alternatively, we do add
bpf_object__prepare() step, which performs all those structural steps
and CO-RE relocation. And all the rest stays in load().

I'm still leaning towards adding bpf_object__prepare(), as it has no
changes in the observable behavior for end users. But I wonder if
someone has strong arguments for the first approach?

In either approach, bpf_object__load() will still do anything that
requires root or CAP_BPF and modifies the state of the kernel (BTFs,
maps, progs).

>
> But the old and trusty bpf_object__open() + bpf_object__load() will
> work just as well, with load() doing steps #2-#4 automatically, if
> necessary.
>
> So what does that split gives us. Few things:
>   - it's possible to "simulate" almost all libbpf logic without
> changing the state of the system (no maps or progs created). Yet you
> still validate that kconfig externs, per-cpu externs, CO-RE relos, map
> relos, etc, all that works.
>   - libbpf can store extra information between steps 1, 2, 3, and 4,
> but after step #4 all that extra information can be discarded and
> cleaned up. So advanced users will get access to stuff like
> bpf_program__insns() and CO-RE information, but most users won't have
> to pay for that because it will be freed after bpf_object__load(). So
> in this case, bpf_program__insns() could (should?) be discarded after
> actual load, because if you care about instructions, you can do steps
> #1-#3, grab instructions and copy them, if necessary. Then proceed to
> #4 (or not) and free the memory.
>
> The last point is important, because to solve the problem 1b (exposing
> CO-RE relo info), the best way to minimize public API commitments is
> to (optionally, probably) request libbpf to record its CO-RE relo
> decisions. Here's what I propose, specifically:
>   1. Add something like "bool record_core_relo_info" (awful name,
> don't use it) in bpf_object_open_opts.
>   2. If it is set to true, libbpf will keep a "log" of CO-RE
> relocation decisions, recording stuff like program name, instruction
> index, local spec (i.e., root_type_id, spec string, relo kind, maybe
> something else), target spec (kernel type_id, kernel spec string, also
> module ID, if it's not vmlinux BTF). We can also record relocated
> value (i.e., field offset, actual enum value, true/false for
> existence, etc). All these are stable concepts, so I'd feel more
> comfortable exposing them, compared to stuff like bpf_core_accessor
> and other internal details.
>   3. The memory for all that will be managed by libbpf for simplicity
> of an API, and we'll expose accessors to get those arrays (at object
> level or per-program level is TBD).
>   4. This info will be available after the prepare() step and will be
> discarded either at create_maps() or load().
>
> I think that solves problem #1 completely (at least for BTFGen case)
> and actually provides more useful information. E.g., if someone wants
> to track CO-RE resolution logic for some other reason, they should
> have pretty full information (BTFGen doesn't need all of that).
>
> It also doesn't feel like too much complication to libbpf internals
> (even though we'll need to be very careful with BPF map relos and
> double-checking that we haven't missed any other critical part of the
> process). And for most users there is no change in API or behavior.
> And given this gives us a "framework" for more sustainable libbpf
> "observability", I think it's a justified price to pay, overall.
>
> I still need to sleep on this, but this feels like a bit cleaner way
> forward. Thoughts are welcome.
>
> >
> > > If CO-RE matching style is necessary and it's the best approach then please
> > > add new logic to bpftool. I'm not sure such api would be
> > > useful beyond this particular case to expose as stable libbpf api.
> >
> > I agree 100%. Our goal is to have this available on bpftool so all the
> > community can use it. However it'd be a shame if we can't use some of
> > the existing logic in libbpf.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-11-03  5:26         ` Andrii Nakryiko
@ 2021-11-04 14:58           ` Mauricio Vásquez Bernal
  2021-11-04 17:34             ` Andrii Nakryiko
  0 siblings, 1 reply; 19+ messages in thread
From: Mauricio Vásquez Bernal @ 2021-11-04 14:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Network Development, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Rafael David Tinoco,
	Lorenzo Fontana

> > ```
> > /* reduced version of struct bpf_core_spec */
> > struct bpf_core_spec_pub {
> > const struct btf *btf;
> > __u32 root_type_id;
> > enum bpf_core_relo_kind kind;
> > /* raw, low-level spec: 1-to-1 with accessor spec string */ --> we can
> > also use access_str_off and let the user parse it
> > int raw_spec[BPF_CORE_SPEC_MAX_LEN];
>
> string might be a more "extensible" way, but we'll need to construct
> that string for each relocation
>
> > /* raw spec length */
> > int raw_len;
>
> using string would eliminate the need for this
>
> > };
> >
> > struct bpf_core_relo_pub {
> > const char *prog_name; --> if we expose it by program then it's not needed.
>
> yep, not sure about per-program yet, but that's minor
>
> > int insn_idx;
> >
> > bool poison; --> allows the user to understand if the relocation
> > succeeded or not.
> >
> > /* new field offset for field based core relos */
> > __u32 new_offset;
> >
> > // TODO: fields for type and enum-based relos
>
> isn't it always just u64 new_value for all types of relos? We can also
> expose old_value just for completeness
>

Oh right. We can expose new_val, orig_val and let the user interpret
their meaning based on the relo_kind.

> >
> > struct bpf_core_spec_pub local_spec, targ_spec; --> BTFGen only needs
> > targ_spec, I suppose local spec would be useful for other use cases.
>
> targ_spec doesn't seem necessary given we have root_type_id, relo
> kind, access_string (or raw_spec). What am I missing?
>

Not sure I follow. root_type, relo kind and access_string are all part
of bpf_core_spec_pub, there are two instances of this structure,
targ_spec and local_spec.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API
  2021-11-04 14:58           ` Mauricio Vásquez Bernal
@ 2021-11-04 17:34             ` Andrii Nakryiko
  0 siblings, 0 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2021-11-04 17:34 UTC (permalink / raw)
  To: Mauricio Vásquez Bernal
  Cc: Alexei Starovoitov, Network Development, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Rafael David Tinoco,
	Lorenzo Fontana

On Thu, Nov 4, 2021 at 7:58 AM Mauricio Vásquez Bernal
<mauricio@kinvolk.io> wrote:
>
> > > ```
> > > /* reduced version of struct bpf_core_spec */
> > > struct bpf_core_spec_pub {
> > > const struct btf *btf;
> > > __u32 root_type_id;
> > > enum bpf_core_relo_kind kind;
> > > /* raw, low-level spec: 1-to-1 with accessor spec string */ --> we can
> > > also use access_str_off and let the user parse it
> > > int raw_spec[BPF_CORE_SPEC_MAX_LEN];
> >
> > string might be a more "extensible" way, but we'll need to construct
> > that string for each relocation
> >
> > > /* raw spec length */
> > > int raw_len;
> >
> > using string would eliminate the need for this
> >
> > > };
> > >
> > > struct bpf_core_relo_pub {
> > > const char *prog_name; --> if we expose it by program then it's not needed.
> >
> > yep, not sure about per-program yet, but that's minor
> >
> > > int insn_idx;
> > >
> > > bool poison; --> allows the user to understand if the relocation
> > > succeeded or not.
> > >
> > > /* new field offset for field based core relos */
> > > __u32 new_offset;
> > >
> > > // TODO: fields for type and enum-based relos
> >
> > isn't it always just u64 new_value for all types of relos? We can also
> > expose old_value just for completeness
> >
>
> Oh right. We can expose new_val, orig_val and let the user interpret
> their meaning based on the relo_kind.

yep

>
> > >
> > > struct bpf_core_spec_pub local_spec, targ_spec; --> BTFGen only needs
> > > targ_spec, I suppose local spec would be useful for other use cases.
> >
> > targ_spec doesn't seem necessary given we have root_type_id, relo
> > kind, access_string (or raw_spec). What am I missing?
> >
>
> Not sure I follow. root_type, relo kind and access_string are all part
> of bpf_core_spec_pub, there are two instances of this structure,
> targ_spec and local_spec.

Ah, ok, I got a bit confused by the formatting of your response. I got
the impression that we are exposing the same thing twice (and I'm not
talking about local vs target). So never mind.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-11-04 17:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-27 20:37 [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API Mauricio Vásquez
2021-10-27 20:37 ` [PATCH bpf-next 1/2] libbpf: Implement btf__save_to_file() Mauricio Vásquez
2021-10-28 18:36   ` Andrii Nakryiko
2021-10-27 20:37 ` [PATCH bpf-next 2/2] libbpf: Implement API for generating BTF for ebpf objects Mauricio Vásquez
2021-10-28 18:45   ` Andrii Nakryiko
2021-10-28 22:42     ` Mauricio Vásquez Bernal
2021-10-28 22:48       ` Andrii Nakryiko
2021-10-29  2:33 ` [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API Alexei Starovoitov
2021-10-29  5:41   ` Rafael David Tinoco
2021-10-29  5:51     ` Rafael David Tinoco
2021-10-29 16:12   ` Mauricio Vásquez Bernal
2021-11-02  5:53     ` Andrii Nakryiko
2021-11-02 10:58       ` Leonardo Di Donato
2021-11-02 17:12         ` Andrii Nakryiko
2021-11-02 21:26       ` Mauricio Vásquez Bernal
2021-11-03  5:26         ` Andrii Nakryiko
2021-11-04 14:58           ` Mauricio Vásquez Bernal
2021-11-04 17:34             ` Andrii Nakryiko
2021-11-03 23:40       ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).