bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 bpf-next 0/5] Integrate kernel module BTF support
@ 2020-11-10  1:19 Andrii Nakryiko
  2020-11-10  1:19 ` [PATCH v4 bpf-next 1/5] bpf: add in-kernel split " Andrii Nakryiko
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-10  1:19 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: kernel-team, linux-kernel, rafael, jeyu, Andrii Nakryiko,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman

This patch set adds BTF generation for kernel modules using a compact split
BTF approach. Respective patches have all the details.

Kernel module BTFs rely on pahole's split BTF support, which is added in [0]
and will be available starting from v1.19. Support for it is detected
automatically during kernel build time.

This patch set implements in-kernel support for split BTF loading and
validation. It also extends GET_OBJ_INFO API for BTFs to return BTF's module
name and a flag whether BTF itself is in-kernel or user-provided. vmlinux BTF
is also exposed to user-space through the same BTF object iteration APIs.

Follow up patch set will utilize the fact that vmlinux and module BTFs now
have associated ID to provide ability to attach BPF fentry/fexit/etc programs
to functions defined in kernel modules.

bpftool is also extended to show module/vmlinux BTF's name.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=378699&state=*

v3->v4:
  - copy_to_user() on ENOSPC in btf_get_info_by_fd() (Martin);
v2->v3:
  - get rid of unnecessary gotos (Song);
v2->v1:
  - drop WARNs, add fewer pr_warn()'s instead (Greg);
  - properly initialize sysfs binary attribute structure (Greg);
  - add __maybe_unused to any_section_objs, used conditionally by module BTF;
rfc->v1:
  - CONFIG_DEBUG_INFO_BTF_MODULES is derived automatically (Alexei);
  - vmlinux BTF now has explicit "vmlinux" name (Alexei);
  - added sysfs ABI documentation for /sys/kernel/btf/<module> (Greg).

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Andrii Nakryiko (5):
  bpf: add in-kernel split BTF support
  bpf: assign ID to vmlinux BTF and return extra info for BTF in
    GET_OBJ_INFO
  kbuild: build kernel module BTFs if BTF is enabled and pahole supports
    it
  bpf: load and verify kernel module BTFs
  tools/bpftool: add support for in-kernel and named BTF in `btf show`

 Documentation/ABI/testing/sysfs-kernel-btf |   8 +
 include/linux/bpf.h                        |   2 +
 include/linux/module.h                     |   4 +
 include/uapi/linux/bpf.h                   |   3 +
 kernel/bpf/btf.c                           | 406 ++++++++++++++++++---
 kernel/bpf/sysfs_btf.c                     |   2 +-
 kernel/module.c                            |  32 ++
 lib/Kconfig.debug                          |   9 +
 scripts/Makefile.modfinal                  |  20 +-
 tools/bpf/bpftool/btf.c                    |  28 +-
 tools/include/uapi/linux/bpf.h             |   3 +
 11 files changed, 459 insertions(+), 58 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 bpf-next 1/5] bpf: add in-kernel split BTF support
  2020-11-10  1:19 [PATCH v4 bpf-next 0/5] Integrate kernel module BTF support Andrii Nakryiko
@ 2020-11-10  1:19 ` Andrii Nakryiko
  2020-11-10 17:49   ` Song Liu
  2020-11-10  1:19 ` [PATCH v4 bpf-next 2/5] bpf: assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO Andrii Nakryiko
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-10  1:19 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: kernel-team, linux-kernel, rafael, jeyu, Andrii Nakryiko,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman

Adjust in-kernel BTF implementation to support a split BTF mode of operation.
Changes are mostly mirroring libbpf split BTF changes, with the exception of
start_id being 0 for in-kernel implementation due to simpler read-only mode.

Otherwise, for split BTF logic, most of the logic of jumping to base BTF,
where necessary, is encapsulated in few helper functions. Type numbering and
string offset in a split BTF are logically continuing where base BTF ends, so
most of the high-level logic is kept without changes.

Type verification and size resolution is only doing an added resolution of new
split BTF types and relies on already cached size and type resolution results
in the base BTF.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/btf.c | 171 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 119 insertions(+), 52 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 6324de8c59f7..727c1c27053f 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -203,12 +203,17 @@ struct btf {
 	const char *strings;
 	void *nohdr_data;
 	struct btf_header hdr;
-	u32 nr_types;
+	u32 nr_types; /* includes VOID for base BTF */
 	u32 types_size;
 	u32 data_size;
 	refcount_t refcnt;
 	u32 id;
 	struct rcu_head rcu;
+
+	/* split BTF support */
+	struct btf *base_btf;
+	u32 start_id; /* first type ID in this BTF (0 for base BTF) */
+	u32 start_str_off; /* first string offset (0 for base BTF) */
 };
 
 enum verifier_phase {
@@ -449,14 +454,27 @@ static bool btf_type_is_datasec(const struct btf_type *t)
 	return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC;
 }
 
+static u32 btf_nr_types_total(const struct btf *btf)
+{
+	u32 total = 0;
+
+	while (btf) {
+		total += btf->nr_types;
+		btf = btf->base_btf;
+	}
+
+	return total;
+}
+
 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
 {
 	const struct btf_type *t;
 	const char *tname;
-	u32 i;
+	u32 i, total;
 
-	for (i = 1; i <= btf->nr_types; i++) {
-		t = btf->types[i];
+	total = btf_nr_types_total(btf);
+	for (i = 1; i < total; i++) {
+		t = btf_type_by_id(btf, i);
 		if (BTF_INFO_KIND(t->info) != kind)
 			continue;
 
@@ -599,8 +617,14 @@ static const struct btf_kind_operations *btf_type_ops(const struct btf_type *t)
 
 static bool btf_name_offset_valid(const struct btf *btf, u32 offset)
 {
-	return BTF_STR_OFFSET_VALID(offset) &&
-		offset < btf->hdr.str_len;
+	if (!BTF_STR_OFFSET_VALID(offset))
+		return false;
+
+	while (offset < btf->start_str_off)
+		btf = btf->base_btf;
+
+	offset -= btf->start_str_off;
+	return offset < btf->hdr.str_len;
 }
 
 static bool __btf_name_char_ok(char c, bool first, bool dot_ok)
@@ -614,10 +638,22 @@ static bool __btf_name_char_ok(char c, bool first, bool dot_ok)
 	return true;
 }
 
+static const char *btf_str_by_offset(const struct btf *btf, u32 offset)
+{
+	while (offset < btf->start_str_off)
+		btf = btf->base_btf;
+
+	offset -= btf->start_str_off;
+	if (offset < btf->hdr.str_len)
+		return &btf->strings[offset];
+
+	return NULL;
+}
+
 static bool __btf_name_valid(const struct btf *btf, u32 offset, bool dot_ok)
 {
 	/* offset must be valid */
-	const char *src = &btf->strings[offset];
+	const char *src = btf_str_by_offset(btf, offset);
 	const char *src_limit;
 
 	if (!__btf_name_char_ok(*src, true, dot_ok))
@@ -650,27 +686,28 @@ static bool btf_name_valid_section(const struct btf *btf, u32 offset)
 
 static const char *__btf_name_by_offset(const struct btf *btf, u32 offset)
 {
+	const char *name;
+
 	if (!offset)
 		return "(anon)";
-	else if (offset < btf->hdr.str_len)
-		return &btf->strings[offset];
-	else
-		return "(invalid-name-offset)";
+
+	name = btf_str_by_offset(btf, offset);
+	return name ?: "(invalid-name-offset)";
 }
 
 const char *btf_name_by_offset(const struct btf *btf, u32 offset)
 {
-	if (offset < btf->hdr.str_len)
-		return &btf->strings[offset];
-
-	return NULL;
+	return btf_str_by_offset(btf, offset);
 }
 
 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id)
 {
-	if (type_id > btf->nr_types)
-		return NULL;
+	while (type_id < btf->start_id)
+		btf = btf->base_btf;
 
+	type_id -= btf->start_id;
+	if (type_id >= btf->nr_types)
+		return NULL;
 	return btf->types[type_id];
 }
 
@@ -1390,17 +1427,13 @@ static int btf_add_type(struct btf_verifier_env *env, struct btf_type *t)
 {
 	struct btf *btf = env->btf;
 
-	/* < 2 because +1 for btf_void which is always in btf->types[0].
-	 * btf_void is not accounted in btf->nr_types because btf_void
-	 * does not come from the BTF file.
-	 */
-	if (btf->types_size - btf->nr_types < 2) {
+	if (btf->types_size == btf->nr_types) {
 		/* Expand 'types' array */
 
 		struct btf_type **new_types;
 		u32 expand_by, new_size;
 
-		if (btf->types_size == BTF_MAX_TYPE) {
+		if (btf->start_id + btf->types_size == BTF_MAX_TYPE) {
 			btf_verifier_log(env, "Exceeded max num of types");
 			return -E2BIG;
 		}
@@ -1414,18 +1447,23 @@ static int btf_add_type(struct btf_verifier_env *env, struct btf_type *t)
 		if (!new_types)
 			return -ENOMEM;
 
-		if (btf->nr_types == 0)
-			new_types[0] = &btf_void;
-		else
+		if (btf->nr_types == 0) {
+			if (!btf->base_btf) {
+				/* lazily init VOID type */
+				new_types[0] = &btf_void;
+				btf->nr_types++;
+			}
+		} else {
 			memcpy(new_types, btf->types,
-			       sizeof(*btf->types) * (btf->nr_types + 1));
+			       sizeof(*btf->types) * btf->nr_types);
+		}
 
 		kvfree(btf->types);
 		btf->types = new_types;
 		btf->types_size = new_size;
 	}
 
-	btf->types[++(btf->nr_types)] = t;
+	btf->types[btf->nr_types++] = t;
 
 	return 0;
 }
@@ -1498,18 +1536,17 @@ static int env_resolve_init(struct btf_verifier_env *env)
 	u32 *resolved_ids = NULL;
 	u8 *visit_states = NULL;
 
-	/* +1 for btf_void */
-	resolved_sizes = kvcalloc(nr_types + 1, sizeof(*resolved_sizes),
+	resolved_sizes = kvcalloc(nr_types, sizeof(*resolved_sizes),
 				  GFP_KERNEL | __GFP_NOWARN);
 	if (!resolved_sizes)
 		goto nomem;
 
-	resolved_ids = kvcalloc(nr_types + 1, sizeof(*resolved_ids),
+	resolved_ids = kvcalloc(nr_types, sizeof(*resolved_ids),
 				GFP_KERNEL | __GFP_NOWARN);
 	if (!resolved_ids)
 		goto nomem;
 
-	visit_states = kvcalloc(nr_types + 1, sizeof(*visit_states),
+	visit_states = kvcalloc(nr_types, sizeof(*visit_states),
 				GFP_KERNEL | __GFP_NOWARN);
 	if (!visit_states)
 		goto nomem;
@@ -1561,21 +1598,27 @@ static bool env_type_is_resolve_sink(const struct btf_verifier_env *env,
 static bool env_type_is_resolved(const struct btf_verifier_env *env,
 				 u32 type_id)
 {
-	return env->visit_states[type_id] == RESOLVED;
+	/* base BTF types should be resolved by now */
+	if (type_id < env->btf->start_id)
+		return true;
+
+	return env->visit_states[type_id - env->btf->start_id] == RESOLVED;
 }
 
 static int env_stack_push(struct btf_verifier_env *env,
 			  const struct btf_type *t, u32 type_id)
 {
+	const struct btf *btf = env->btf;
 	struct resolve_vertex *v;
 
 	if (env->top_stack == MAX_RESOLVE_DEPTH)
 		return -E2BIG;
 
-	if (env->visit_states[type_id] != NOT_VISITED)
+	if (type_id < btf->start_id
+	    || env->visit_states[type_id - btf->start_id] != NOT_VISITED)
 		return -EEXIST;
 
-	env->visit_states[type_id] = VISITED;
+	env->visit_states[type_id - btf->start_id] = VISITED;
 
 	v = &env->stack[env->top_stack++];
 	v->t = t;
@@ -1605,6 +1648,7 @@ static void env_stack_pop_resolved(struct btf_verifier_env *env,
 	u32 type_id = env->stack[--(env->top_stack)].type_id;
 	struct btf *btf = env->btf;
 
+	type_id -= btf->start_id; /* adjust to local type id */
 	btf->resolved_sizes[type_id] = resolved_size;
 	btf->resolved_ids[type_id] = resolved_type_id;
 	env->visit_states[type_id] = RESOLVED;
@@ -1709,14 +1753,30 @@ btf_resolve_size(const struct btf *btf, const struct btf_type *type,
 	return __btf_resolve_size(btf, type, type_size, NULL, NULL, NULL, NULL);
 }
 
+static u32 btf_resolved_type_id(const struct btf *btf, u32 type_id)
+{
+	while (type_id < btf->start_id)
+		btf = btf->base_btf;
+
+	return btf->resolved_ids[type_id - btf->start_id];
+}
+
 /* The input param "type_id" must point to a needs_resolve type */
 static const struct btf_type *btf_type_id_resolve(const struct btf *btf,
 						  u32 *type_id)
 {
-	*type_id = btf->resolved_ids[*type_id];
+	*type_id = btf_resolved_type_id(btf, *type_id);
 	return btf_type_by_id(btf, *type_id);
 }
 
+static u32 btf_resolved_type_size(const struct btf *btf, u32 type_id)
+{
+	while (type_id < btf->start_id)
+		btf = btf->base_btf;
+
+	return btf->resolved_sizes[type_id - btf->start_id];
+}
+
 const struct btf_type *btf_type_id_size(const struct btf *btf,
 					u32 *type_id, u32 *ret_size)
 {
@@ -1731,7 +1791,7 @@ const struct btf_type *btf_type_id_size(const struct btf *btf,
 	if (btf_type_has_size(size_type)) {
 		size = size_type->size;
 	} else if (btf_type_is_array(size_type)) {
-		size = btf->resolved_sizes[size_type_id];
+		size = btf_resolved_type_size(btf, size_type_id);
 	} else if (btf_type_is_ptr(size_type)) {
 		size = sizeof(void *);
 	} else {
@@ -1739,14 +1799,14 @@ const struct btf_type *btf_type_id_size(const struct btf *btf,
 				 !btf_type_is_var(size_type)))
 			return NULL;
 
-		size_type_id = btf->resolved_ids[size_type_id];
+		size_type_id = btf_resolved_type_id(btf, size_type_id);
 		size_type = btf_type_by_id(btf, size_type_id);
 		if (btf_type_nosize_or_null(size_type))
 			return NULL;
 		else if (btf_type_has_size(size_type))
 			size = size_type->size;
 		else if (btf_type_is_array(size_type))
-			size = btf->resolved_sizes[size_type_id];
+			size = btf_resolved_type_size(btf, size_type_id);
 		else if (btf_type_is_ptr(size_type))
 			size = sizeof(void *);
 		else
@@ -3798,7 +3858,7 @@ static int btf_check_all_metas(struct btf_verifier_env *env)
 	cur = btf->nohdr_data + hdr->type_off;
 	end = cur + hdr->type_len;
 
-	env->log_type_id = 1;
+	env->log_type_id = btf->base_btf ? btf->start_id : 1;
 	while (cur < end) {
 		struct btf_type *t = cur;
 		s32 meta_size;
@@ -3825,8 +3885,8 @@ static bool btf_resolve_valid(struct btf_verifier_env *env,
 		return false;
 
 	if (btf_type_is_struct(t) || btf_type_is_datasec(t))
-		return !btf->resolved_ids[type_id] &&
-		       !btf->resolved_sizes[type_id];
+		return !btf_resolved_type_id(btf, type_id) &&
+		       !btf_resolved_type_size(btf, type_id);
 
 	if (btf_type_is_modifier(t) || btf_type_is_ptr(t) ||
 	    btf_type_is_var(t)) {
@@ -3846,7 +3906,7 @@ static bool btf_resolve_valid(struct btf_verifier_env *env,
 		elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
 		return elem_type && !btf_type_is_modifier(elem_type) &&
 			(array->nelems * elem_size ==
-			 btf->resolved_sizes[type_id]);
+			 btf_resolved_type_size(btf, type_id));
 	}
 
 	return false;
@@ -3888,7 +3948,8 @@ static int btf_resolve(struct btf_verifier_env *env,
 static int btf_check_all_types(struct btf_verifier_env *env)
 {
 	struct btf *btf = env->btf;
-	u32 type_id;
+	const struct btf_type *t;
+	u32 type_id, i;
 	int err;
 
 	err = env_resolve_init(env);
@@ -3896,8 +3957,9 @@ static int btf_check_all_types(struct btf_verifier_env *env)
 		return err;
 
 	env->phase++;
-	for (type_id = 1; type_id <= btf->nr_types; type_id++) {
-		const struct btf_type *t = btf_type_by_id(btf, type_id);
+	for (i = btf->base_btf ? 0 : 1; i < btf->nr_types; i++) {
+		type_id = btf->start_id + i;
+		t = btf_type_by_id(btf, type_id);
 
 		env->log_type_id = type_id;
 		if (btf_type_needs_resolve(t) &&
@@ -3934,7 +3996,7 @@ static int btf_parse_type_sec(struct btf_verifier_env *env)
 		return -EINVAL;
 	}
 
-	if (!hdr->type_len) {
+	if (!env->btf->base_btf && !hdr->type_len) {
 		btf_verifier_log(env, "No type found");
 		return -EINVAL;
 	}
@@ -3961,13 +4023,18 @@ static int btf_parse_str_sec(struct btf_verifier_env *env)
 		return -EINVAL;
 	}
 
-	if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_NAME_OFFSET ||
-	    start[0] || end[-1]) {
+	btf->strings = start;
+
+	if (btf->base_btf && !hdr->str_len)
+		return 0;
+	if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_NAME_OFFSET || end[-1]) {
+		btf_verifier_log(env, "Invalid string section");
+		return -EINVAL;
+	}
+	if (!btf->base_btf && start[0]) {
 		btf_verifier_log(env, "Invalid string section");
 		return -EINVAL;
 	}
-
-	btf->strings = start;
 
 	return 0;
 }
@@ -4908,7 +4975,7 @@ static int __get_type_size(struct btf *btf, u32 btf_id,
 	while (t && btf_type_is_modifier(t))
 		t = btf_type_by_id(btf, t->type);
 	if (!t) {
-		*bad_type = btf->types[0];
+		*bad_type = btf_type_by_id(btf, 0);
 		return -EINVAL;
 	}
 	if (btf_type_is_ptr(t))
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 bpf-next 2/5] bpf: assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO
  2020-11-10  1:19 [PATCH v4 bpf-next 0/5] Integrate kernel module BTF support Andrii Nakryiko
  2020-11-10  1:19 ` [PATCH v4 bpf-next 1/5] bpf: add in-kernel split " Andrii Nakryiko
@ 2020-11-10  1:19 ` Andrii Nakryiko
  2020-11-10 20:24   ` Song Liu
  2020-11-10  1:19 ` [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it Andrii Nakryiko
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-10  1:19 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: kernel-team, linux-kernel, rafael, jeyu, Andrii Nakryiko,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman

Allocate ID for vmlinux BTF. This makes it visible when iterating over all BTF
objects in the system. To allow distinguishing vmlinux BTF (and later kernel
module BTF) from user-provided BTFs, expose extra kernel_btf flag, as well as
BTF name ("vmlinux" for vmlinux BTF, will equal to module's name for module
BTF).  We might want to later allow specifying BTF name for user-provided BTFs
as well, if that makes sense. But currently this is reserved only for
in-kernel BTFs.

Having in-kernel BTFs exposed IDs will allow to extend BPF APIs that require
in-kernel BTF type with ability to specify BTF types from kernel modules, not
just vmlinux BTF. This will be implemented in a follow up patch set for
fentry/fexit/fmod_ret/lsm/etc.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/uapi/linux/bpf.h       |  3 +++
 kernel/bpf/btf.c               | 43 +++++++++++++++++++++++++++++++---
 tools/include/uapi/linux/bpf.h |  3 +++
 3 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9879d6793e90..162999b12790 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4466,6 +4466,9 @@ struct bpf_btf_info {
 	__aligned_u64 btf;
 	__u32 btf_size;
 	__u32 id;
+	__aligned_u64 name;
+	__u32 name_len;
+	__u32 kernel_btf;
 } __attribute__((aligned(8)));
 
 struct bpf_link_info {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 727c1c27053f..856585db7aa7 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -214,6 +214,8 @@ struct btf {
 	struct btf *base_btf;
 	u32 start_id; /* first type ID in this BTF (0 for base BTF) */
 	u32 start_str_off; /* first string offset (0 for base BTF) */
+	char name[MODULE_NAME_LEN];
+	bool kernel_btf;
 };
 
 enum verifier_phase {
@@ -4429,6 +4431,8 @@ struct btf *btf_parse_vmlinux(void)
 
 	btf->data = __start_BTF;
 	btf->data_size = __stop_BTF - __start_BTF;
+	btf->kernel_btf = true;
+	snprintf(btf->name, sizeof(btf->name), "vmlinux");
 
 	err = btf_parse_hdr(env);
 	if (err)
@@ -4454,8 +4458,13 @@ struct btf *btf_parse_vmlinux(void)
 
 	bpf_struct_ops_init(btf, log);
 
-	btf_verifier_env_free(env);
 	refcount_set(&btf->refcnt, 1);
+
+	err = btf_alloc_id(btf);
+	if (err)
+		goto errout;
+
+	btf_verifier_env_free(env);
 	return btf;
 
 errout:
@@ -5553,7 +5562,9 @@ int btf_get_info_by_fd(const struct btf *btf,
 	struct bpf_btf_info info;
 	u32 info_copy, btf_copy;
 	void __user *ubtf;
-	u32 uinfo_len;
+	char __user *uname;
+	u32 uinfo_len, uname_len, name_len;
+	int ret = 0;
 
 	uinfo = u64_to_user_ptr(attr->info.info);
 	uinfo_len = attr->info.info_len;
@@ -5570,11 +5581,37 @@ int btf_get_info_by_fd(const struct btf *btf,
 		return -EFAULT;
 	info.btf_size = btf->data_size;
 
+	info.kernel_btf = btf->kernel_btf;
+
+	uname = u64_to_user_ptr(info.name);
+	uname_len = info.name_len;
+	if (!uname ^ !uname_len)
+		return -EINVAL;
+
+	name_len = strlen(btf->name);
+	info.name_len = name_len;
+
+	if (uname) {
+		if (uname_len >= name_len + 1) {
+			if (copy_to_user(uname, btf->name, name_len + 1))
+				return -EFAULT;
+		} else {
+			char zero = '\0';
+
+			if (copy_to_user(uname, btf->name, uname_len - 1))
+				return -EFAULT;
+			if (put_user(zero, uname + uname_len - 1))
+				return -EFAULT;
+			/* let user-space know about too short buffer */
+			ret = -ENOSPC;
+		}
+	}
+
 	if (copy_to_user(uinfo, &info, info_copy) ||
 	    put_user(info_copy, &uattr->info.info_len))
 		return -EFAULT;
 
-	return 0;
+	return ret;
 }
 
 int btf_get_fd_by_id(u32 id)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9879d6793e90..162999b12790 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4466,6 +4466,9 @@ struct bpf_btf_info {
 	__aligned_u64 btf;
 	__u32 btf_size;
 	__u32 id;
+	__aligned_u64 name;
+	__u32 name_len;
+	__u32 kernel_btf;
 } __attribute__((aligned(8)));
 
 struct bpf_link_info {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it
  2020-11-10  1:19 [PATCH v4 bpf-next 0/5] Integrate kernel module BTF support Andrii Nakryiko
  2020-11-10  1:19 ` [PATCH v4 bpf-next 1/5] bpf: add in-kernel split " Andrii Nakryiko
  2020-11-10  1:19 ` [PATCH v4 bpf-next 2/5] bpf: assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO Andrii Nakryiko
@ 2020-11-10  1:19 ` Andrii Nakryiko
  2020-11-11  1:04   ` Song Liu
  2020-11-17  3:44   ` David Ahern
  2020-11-10  1:19 ` [PATCH v4 bpf-next 4/5] bpf: load and verify kernel module BTFs Andrii Nakryiko
  2020-11-10  1:19 ` [PATCH v4 bpf-next 5/5] tools/bpftool: add support for in-kernel and named BTF in `btf show` Andrii Nakryiko
  4 siblings, 2 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-10  1:19 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: kernel-team, linux-kernel, rafael, jeyu, Andrii Nakryiko,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman, Masahiro Yamada

Detect if pahole supports split BTF generation, and generate BTF for each
selected kernel module, if it does. This is exposed to Makefiles and C code as
CONFIG_DEBUG_INFO_BTF_MODULES flag.

Kernel module BTF has to be re-generated if either vmlinux's BTF changes or
module's .ko changes. To achieve that, I needed a helper similar to
if_changed, but that would allow to filter out vmlinux from the list of
updated dependencies for .ko building. I've put it next to the only place that
uses and needs it, but it might be a better idea to just add it along the
other if_changed variants into scripts/Kbuild.include.

Each kernel module's BTF deduplication is pretty fast, as it does only
incremental BTF deduplication on top of already deduplicated vmlinux BTF. To
show the added build time, I've first ran make only just built kernel (to
establish the baseline) and then forced only BTF re-generation, without
regenerating .ko files. The build was performed with -j60 parallelization on
56-core machine. The final time also includes bzImage building, so it's not
a pure BTF overhead.

$ time make -j60
...
make -j60  27.65s user 10.96s system 782% cpu 4.933 total
$ touch ~/linux-build/default/vmlinux && time make -j60
...
make -j60  123.69s user 27.85s system 1566% cpu 9.675 total

So 4.6 seconds real time, with noticeable part spent in compressed vmlinux and
bzImage building.

To show size savings, I've built my kernel configuration with about 700 kernel
modules with full BTF per each kernel module (without deduplicating against
vmlinux) and with split BTF against deduplicated vmlinux (approach in this
patch). Below are top 10 modules with biggest BTF sizes. And total size of BTF
data across all kernel modules.

It shows that split BTF "compresses" 115MB down to 5MB total. And the biggest
kernel modules get a downsize from 500-570KB down to 200-300KB.

FULL BTF
========

$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
115710691

$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 570570
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 520240
./drivers/gpu/drm/radeon/radeon.ko 503849
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 491777
./fs/xfs/xfs.ko 411544
./drivers/net/ethernet/intel/i40e/i40e.ko 403904
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 398754
./drivers/infiniband/core/ib_core.ko 397224
./fs/cifs/cifs.ko 386249
./fs/nfsd/nfsd.ko 379738

SPLIT BTF
=========

$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
5194047

$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 293206
./drivers/gpu/drm/radeon/radeon.ko 282103
./fs/xfs/xfs.ko 222150
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
./fs/cifs/cifs.ko 109379
./arch/x86/kvm/kvm.ko 100225
./drivers/gpu/drm/drm.ko 94827
./drivers/infiniband/core/ib_core.ko 91188

Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 lib/Kconfig.debug         |  9 +++++++++
 scripts/Makefile.modfinal | 20 ++++++++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d7a7bc3b6098..1e78faaf20a5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -274,6 +274,15 @@ config DEBUG_INFO_BTF
 	  Turning this on expects presence of pahole tool, which will convert
 	  DWARF type info into equivalent deduplicated BTF type info.
 
+config PAHOLE_HAS_SPLIT_BTF
+	def_bool $(success, test `$(PAHOLE) --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "119")
+
+config DEBUG_INFO_BTF_MODULES
+	def_bool y
+	depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF
+	help
+	  Generate compact split BTF type information for kernel modules.
+
 config GDB_SCRIPTS
 	bool "Provide GDB scripts for kernel debugging"
 	help
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index ae01baf96f4e..02b892421f7a 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -6,6 +6,7 @@
 PHONY := __modfinal
 __modfinal:
 
+include include/config/auto.conf
 include $(srctree)/scripts/Kbuild.include
 
 # for c_flags
@@ -36,8 +37,23 @@ quiet_cmd_ld_ko_o = LD [M]  $@
 		-T scripts/module.lds -o $@ $(filter %.o, $^);		\
 	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
 
-$(modules): %.ko: %.o %.mod.o scripts/module.lds FORCE
-	+$(call if_changed,ld_ko_o)
+quiet_cmd_btf_ko = BTF [M] $@
+      cmd_btf_ko = LLVM_OBJCOPY=$(OBJCOPY) $(PAHOLE) -J --btf_base vmlinux $@
+
+# Same as newer-prereqs, but allows to exclude specified extra dependencies
+newer_prereqs_except = $(filter-out $(PHONY) $(1),$?)
+
+# Same as if_changed, but allows to exclude specified extra dependencies
+if_changed_except = $(if $(call newer_prereqs_except,$(2))$(cmd-check),      \
+	$(cmd);                                                              \
+	printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd, @:)
+
+# Re-generate module BTFs if either module's .ko or vmlinux changed
+$(modules): %.ko: %.o %.mod.o scripts/module.lds vmlinux FORCE
+	+$(call if_changed_except,ld_ko_o,vmlinux)
+ifdef CONFIG_DEBUG_INFO_BTF_MODULES
+	+$(if $(newer-prereqs),$(call cmd,btf_ko))
+endif
 
 targets += $(modules) $(modules:.ko=.mod.o)
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 bpf-next 4/5] bpf: load and verify kernel module BTFs
  2020-11-10  1:19 [PATCH v4 bpf-next 0/5] Integrate kernel module BTF support Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2020-11-10  1:19 ` [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it Andrii Nakryiko
@ 2020-11-10  1:19 ` Andrii Nakryiko
  2020-11-11  7:11   ` kernel test robot
  2020-11-11 10:13   ` Jessica Yu
  2020-11-10  1:19 ` [PATCH v4 bpf-next 5/5] tools/bpftool: add support for in-kernel and named BTF in `btf show` Andrii Nakryiko
  4 siblings, 2 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-10  1:19 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: kernel-team, linux-kernel, rafael, jeyu, Andrii Nakryiko,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman

Add kernel module listener that will load/validate and unload module BTF.
Module BTFs gets ID generated for them, which makes it possible to iterate
them with existing BTF iteration API. They are given their respective module's
names, which will get reported through GET_OBJ_INFO API. They are also marked
as in-kernel BTFs for tooling to distinguish them from user-provided BTFs.

Also, similarly to vmlinux BTF, kernel module BTFs are exposed through
sysfs as /sys/kernel/btf/<module-name>. This is convenient for user-space
tools to inspect module BTF contents and dump their types with existing tools:

[vmuser@archvm bpf]$ ls -la /sys/kernel/btf
total 0
drwxr-xr-x  2 root root       0 Nov  4 19:46 .
drwxr-xr-x 13 root root       0 Nov  4 19:46 ..

...

-r--r--r--  1 root root     888 Nov  4 19:46 irqbypass
-r--r--r--  1 root root  100225 Nov  4 19:46 kvm
-r--r--r--  1 root root   35401 Nov  4 19:46 kvm_intel
-r--r--r--  1 root root     120 Nov  4 19:46 pcspkr
-r--r--r--  1 root root     399 Nov  4 19:46 serio_raw
-r--r--r--  1 root root 4094095 Nov  4 19:46 vmlinux

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 Documentation/ABI/testing/sysfs-kernel-btf |   8 +
 include/linux/bpf.h                        |   2 +
 include/linux/module.h                     |   4 +
 kernel/bpf/btf.c                           | 194 +++++++++++++++++++++
 kernel/bpf/sysfs_btf.c                     |   2 +-
 kernel/module.c                            |  32 ++++
 6 files changed, 241 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-btf b/Documentation/ABI/testing/sysfs-kernel-btf
index 2c9744b2cd59..fe96efdc9b6c 100644
--- a/Documentation/ABI/testing/sysfs-kernel-btf
+++ b/Documentation/ABI/testing/sysfs-kernel-btf
@@ -15,3 +15,11 @@ Description:
 		information with description of all internal kernel types. See
 		Documentation/bpf/btf.rst for detailed description of format
 		itself.
+
+What:		/sys/kernel/btf/<module-name>
+Date:		Nov 2020
+KernelVersion:	5.11
+Contact:	bpf@vger.kernel.org
+Description:
+		Read-only binary attribute exposing kernel module's BTF type
+		information as an add-on to the kernel's BTF (/sys/kernel/btf/vmlinux).
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 73d5381a5d5c..581b2a2e78eb 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -36,9 +36,11 @@ struct seq_operations;
 struct bpf_iter_aux_info;
 struct bpf_local_storage;
 struct bpf_local_storage_map;
+struct kobject;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
+extern struct kobject *btf_kobj;
 
 typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
 					struct bpf_iter_aux_info *aux);
diff --git a/include/linux/module.h b/include/linux/module.h
index a29187f7c360..20fce258ffba 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -475,6 +475,10 @@ struct module {
 	unsigned int num_bpf_raw_events;
 	struct bpf_raw_event_map *bpf_raw_events;
 #endif
+#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
+	unsigned int btf_data_size;
+	void *btf_data;
+#endif
 #ifdef CONFIG_JUMP_LABEL
 	struct jump_entry *jump_entries;
 	unsigned int num_jump_entries;
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 856585db7aa7..0f1fd2669d69 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -22,6 +22,8 @@
 #include <linux/skmsg.h>
 #include <linux/perf_event.h>
 #include <linux/bsearch.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
 #include <net/sock.h>
 
 /* BTF (BPF Type Format) is the meta data format which describes
@@ -4476,6 +4478,75 @@ struct btf *btf_parse_vmlinux(void)
 	return ERR_PTR(err);
 }
 
+static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
+{
+	struct btf_verifier_env *env = NULL;
+	struct bpf_verifier_log *log;
+	struct btf *btf = NULL, *base_btf;
+	int err;
+
+	base_btf = bpf_get_btf_vmlinux();
+	if (IS_ERR(base_btf))
+		return base_btf;
+	if (!base_btf)
+		return ERR_PTR(-EINVAL);
+
+	env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
+	if (!env)
+		return ERR_PTR(-ENOMEM);
+
+	log = &env->log;
+	log->level = BPF_LOG_KERNEL;
+
+	btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
+	if (!btf) {
+		err = -ENOMEM;
+		goto errout;
+	}
+	env->btf = btf;
+
+	btf->base_btf = base_btf;
+	btf->start_id = base_btf->nr_types;
+	btf->start_str_off = base_btf->hdr.str_len;
+	btf->kernel_btf = true;
+	snprintf(btf->name, sizeof(btf->name), "%s", module_name);
+
+	btf->data = kvmalloc(data_size, GFP_KERNEL | __GFP_NOWARN);
+	if (!btf->data) {
+		err = -ENOMEM;
+		goto errout;
+	}
+	memcpy(btf->data, data, data_size);
+	btf->data_size = data_size;
+
+	err = btf_parse_hdr(env);
+	if (err)
+		goto errout;
+
+	btf->nohdr_data = btf->data + btf->hdr.hdr_len;
+
+	err = btf_parse_str_sec(env);
+	if (err)
+		goto errout;
+
+	err = btf_check_all_metas(env);
+	if (err)
+		goto errout;
+
+	btf_verifier_env_free(env);
+	refcount_set(&btf->refcnt, 1);
+	return btf;
+
+errout:
+	btf_verifier_env_free(env);
+	if (btf) {
+		kvfree(btf->data);
+		kvfree(btf->types);
+		kfree(btf);
+	}
+	return ERR_PTR(err);
+}
+
 struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog)
 {
 	struct bpf_prog *tgt_prog = prog->aux->dst_prog;
@@ -5651,3 +5722,126 @@ bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
 {
 	return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
 }
+
+#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
+struct btf_module {
+	struct list_head list;
+	struct module *module;
+	struct btf *btf;
+	struct bin_attribute *sysfs_attr;
+};
+
+static LIST_HEAD(btf_modules);
+static DEFINE_MUTEX(btf_module_mutex);
+
+static ssize_t
+btf_module_read(struct file *file, struct kobject *kobj,
+		struct bin_attribute *bin_attr,
+		char *buf, loff_t off, size_t len)
+{
+	const struct btf *btf = bin_attr->private;
+
+	memcpy(buf, btf->data + off, len);
+	return len;
+}
+
+static int btf_module_notify(struct notifier_block *nb, unsigned long op,
+			     void *module)
+{
+	struct btf_module *btf_mod, *tmp;
+	struct module *mod = module;
+	struct btf *btf;
+	int err = 0;
+
+	if (mod->btf_data_size == 0 ||
+	    (op != MODULE_STATE_COMING && op != MODULE_STATE_GOING))
+		goto out;
+
+	switch (op) {
+	case MODULE_STATE_COMING:
+		btf_mod = kzalloc(sizeof(*btf_mod), GFP_KERNEL);
+		if (!btf_mod) {
+			err = -ENOMEM;
+			goto out;
+		}
+		btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
+		if (IS_ERR(btf)) {
+			pr_warn("failed to validate module [%s] BTF: %ld\n",
+				mod->name, PTR_ERR(btf));
+			kfree(btf_mod);
+			err = PTR_ERR(btf);
+			goto out;
+		}
+		err = btf_alloc_id(btf);
+		if (err) {
+			btf_free(btf);
+			kfree(btf_mod);
+			goto out;
+		}
+
+		mutex_lock(&btf_module_mutex);
+		btf_mod->module = module;
+		btf_mod->btf = btf;
+		list_add(&btf_mod->list, &btf_modules);
+		mutex_unlock(&btf_module_mutex);
+
+		if (IS_ENABLED(CONFIG_SYSFS)) {
+			struct bin_attribute *attr;
+
+			attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+			if (!attr)
+				goto out;
+
+			sysfs_bin_attr_init(attr);
+			attr->attr.name = btf->name;
+			attr->attr.mode = 0444;
+			attr->size = btf->data_size;
+			attr->private = btf;
+			attr->read = btf_module_read;
+
+			err = sysfs_create_bin_file(btf_kobj, attr);
+			if (err) {
+				pr_warn("failed to register module [%s] BTF in sysfs: %d\n",
+					mod->name, err);
+				kfree(attr);
+				err = 0;
+				goto out;
+			}
+
+			btf_mod->sysfs_attr = attr;
+		}
+
+		break;
+	case MODULE_STATE_GOING:
+		mutex_lock(&btf_module_mutex);
+		list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
+			if (btf_mod->module != module)
+				continue;
+
+			list_del(&btf_mod->list);
+			if (btf_mod->sysfs_attr)
+				sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
+			btf_put(btf_mod->btf);
+			kfree(btf_mod->sysfs_attr);
+			kfree(btf_mod);
+			break;
+		}
+		mutex_unlock(&btf_module_mutex);
+		break;
+	}
+out:
+	return notifier_from_errno(err);
+}
+
+static struct notifier_block btf_module_nb = {
+	.notifier_call = btf_module_notify,
+};
+
+static int __init btf_module_init(void)
+{
+	register_module_notifier(&btf_module_nb);
+	return 0;
+}
+
+fs_initcall(btf_module_init);
+#endif /* CONFIG_DEBUG_INFO_BTF_MODULES */
diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c
index 11b3380887fa..ef6911aee3bb 100644
--- a/kernel/bpf/sysfs_btf.c
+++ b/kernel/bpf/sysfs_btf.c
@@ -26,7 +26,7 @@ static struct bin_attribute bin_attr_btf_vmlinux __ro_after_init = {
 	.read = btf_vmlinux_read,
 };
 
-static struct kobject *btf_kobj;
+struct kobject *btf_kobj;
 
 static int __init btf_vmlinux_init(void)
 {
diff --git a/kernel/module.c b/kernel/module.c
index a4fa44a652a7..f2996b02ab2e 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -380,6 +380,35 @@ static void *section_objs(const struct load_info *info,
 	return (void *)info->sechdrs[sec].sh_addr;
 }
 
+/* Find a module section: 0 means not found. Ignores SHF_ALLOC flag. */
+static unsigned int find_any_sec(const struct load_info *info, const char *name)
+{
+	unsigned int i;
+
+	for (i = 1; i < info->hdr->e_shnum; i++) {
+		Elf_Shdr *shdr = &info->sechdrs[i];
+		if (strcmp(info->secstrings + shdr->sh_name, name) == 0)
+			return i;
+	}
+	return 0;
+}
+
+/*
+ * Find a module section, or NULL. Fill in number of "objects" in section.
+ * Ignores SHF_ALLOC flag.
+ */
+static __maybe_unused void *any_section_objs(const struct load_info *info,
+					     const char *name,
+					     size_t object_size,
+					     unsigned int *num)
+{
+	unsigned int sec = find_any_sec(info, name);
+
+	/* Section 0 has sh_addr 0 and sh_size 0. */
+	*num = info->sechdrs[sec].sh_size / object_size;
+	return (void *)info->sechdrs[sec].sh_addr;
+}
+
 /* Provided by the linker */
 extern const struct kernel_symbol __start___ksymtab[];
 extern const struct kernel_symbol __stop___ksymtab[];
@@ -3250,6 +3279,9 @@ static int find_module_sections(struct module *mod, struct load_info *info)
 					   sizeof(*mod->bpf_raw_events),
 					   &mod->num_bpf_raw_events);
 #endif
+#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
+	mod->btf_data = any_section_objs(info, ".BTF", 1, &mod->btf_data_size);
+#endif
 #ifdef CONFIG_JUMP_LABEL
 	mod->jump_entries = section_objs(info, "__jump_table",
 					sizeof(*mod->jump_entries),
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 bpf-next 5/5] tools/bpftool: add support for in-kernel and named BTF in `btf show`
  2020-11-10  1:19 [PATCH v4 bpf-next 0/5] Integrate kernel module BTF support Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2020-11-10  1:19 ` [PATCH v4 bpf-next 4/5] bpf: load and verify kernel module BTFs Andrii Nakryiko
@ 2020-11-10  1:19 ` Andrii Nakryiko
  2020-11-11  1:15   ` Song Liu
  4 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-10  1:19 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: kernel-team, linux-kernel, rafael, jeyu, Andrii Nakryiko,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman, Alan Maguire

Display vmlinux BTF name and kernel module names when listing available BTFs
on the system.

In human-readable output mode, module BTFs are reported with "name
[module-name]", while vmlinux BTF will be reported as "name [vmlinux]".
Square brackets are added by bpftool and follow kernel convention when
displaying modules in human-readable text outputs.

[vmuser@archvm bpf]$ sudo ../../../bpf/bpftool/bpftool btf s
1: name [vmlinux]  size 4082281B
6: size 2365B  prog_ids 8,6  map_ids 3
7: name [button]  size 46895B
8: name [pcspkr]  size 42328B
9: name [serio_raw]  size 39375B
10: name [floppy]  size 57185B
11: name [i2c_core]  size 76186B
12: name [crc32c_intel]  size 16036B
13: name [i2c_piix4]  size 50497B
14: name [irqbypass]  size 14124B
15: name [kvm]  size 197985B
16: name [kvm_intel]  size 123564B
17: name [cryptd]  size 42466B
18: name [crypto_simd]  size 17187B
19: name [glue_helper]  size 39205B
20: name [aesni_intel]  size 41034B
25: size 36150B
        pids bpftool(2519)

In JSON mode, two fields (boolean "kernel" and string "name") are reported for
each BTF object. vmlinux BTF is reported with name "vmlinux" (kernel itself
returns and empty name for vmlinux BTF).

[vmuser@archvm bpf]$ sudo ../../../bpf/bpftool/bpftool btf s -jp
[{
        "id": 1,
        "size": 4082281,
        "prog_ids": [],
        "map_ids": [],
        "kernel": true,
        "name": "vmlinux"
    },{
        "id": 6,
        "size": 2365,
        "prog_ids": [8,6
        ],
        "map_ids": [3
        ],
        "kernel": false
    },{
        "id": 7,
        "size": 46895,
        "prog_ids": [],
        "map_ids": [],
        "kernel": true,
        "name": "button"
    },{

...

Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/bpf/bpftool/btf.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index c96b56e8e3a4..ed5e97157241 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -742,9 +742,14 @@ show_btf_plain(struct bpf_btf_info *info, int fd,
 	       struct btf_attach_table *btf_map_table)
 {
 	struct btf_attach_point *obj;
+	const char *name = u64_to_ptr(info->name);
 	int n;
 
 	printf("%u: ", info->id);
+	if (info->kernel_btf)
+		printf("name [%s]  ", name);
+	else if (name && name[0])
+		printf("name %s  ", name);
 	printf("size %uB", info->btf_size);
 
 	n = 0;
@@ -771,6 +776,7 @@ show_btf_json(struct bpf_btf_info *info, int fd,
 	      struct btf_attach_table *btf_map_table)
 {
 	struct btf_attach_point *obj;
+	const char *name = u64_to_ptr(info->name);
 
 	jsonw_start_object(json_wtr);	/* btf object */
 	jsonw_uint_field(json_wtr, "id", info->id);
@@ -796,6 +802,11 @@ show_btf_json(struct bpf_btf_info *info, int fd,
 
 	emit_obj_refs_json(&refs_table, info->id, json_wtr); /* pids */
 
+	jsonw_bool_field(json_wtr, "kernel", info->kernel_btf);
+
+	if (name && name[0])
+		jsonw_string_field(json_wtr, "name", name);
+
 	jsonw_end_object(json_wtr);	/* btf object */
 }
 
@@ -803,15 +814,30 @@ static int
 show_btf(int fd, struct btf_attach_table *btf_prog_table,
 	 struct btf_attach_table *btf_map_table)
 {
-	struct bpf_btf_info info = {};
+	struct bpf_btf_info info;
 	__u32 len = sizeof(info);
+	char name[64];
 	int err;
 
+	memset(&info, 0, sizeof(info));
 	err = bpf_obj_get_info_by_fd(fd, &info, &len);
 	if (err) {
 		p_err("can't get BTF object info: %s", strerror(errno));
 		return -1;
 	}
+	/* if kernel support emitting BTF object name, pass name pointer */
+	if (info.name_len) {
+		memset(&info, 0, sizeof(info));
+		info.name_len = sizeof(name);
+		info.name = ptr_to_u64(name);
+		len = sizeof(info);
+
+		err = bpf_obj_get_info_by_fd(fd, &info, &len);
+		if (err) {
+			p_err("can't get BTF object info: %s", strerror(errno));
+			return -1;
+		}
+	}
 
 	if (json_output)
 		show_btf_json(&info, fd, btf_prog_table, btf_map_table);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 1/5] bpf: add in-kernel split BTF support
  2020-11-10  1:19 ` [PATCH v4 bpf-next 1/5] bpf: add in-kernel split " Andrii Nakryiko
@ 2020-11-10 17:49   ` Song Liu
  2020-11-10 18:31     ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2020-11-10 17:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, daniel, Kernel Team,
	linux-kernel, rafael, jeyu, Arnaldo Carvalho de Melo,
	Greg Kroah-Hartman



> On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> Adjust in-kernel BTF implementation to support a split BTF mode of operation.
> Changes are mostly mirroring libbpf split BTF changes, with the exception of
> start_id being 0 for in-kernel implementation due to simpler read-only mode.
> 
> Otherwise, for split BTF logic, most of the logic of jumping to base BTF,
> where necessary, is encapsulated in few helper functions. Type numbering and
> string offset in a split BTF are logically continuing where base BTF ends, so
> most of the high-level logic is kept without changes.
> 
> Type verification and size resolution is only doing an added resolution of new
> split BTF types and relies on already cached size and type resolution results
> in the base BTF.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
> kernel/bpf/btf.c | 171 +++++++++++++++++++++++++++++++++--------------
> 1 file changed, 119 insertions(+), 52 deletions(-)
> 
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 6324de8c59f7..727c1c27053f 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -203,12 +203,17 @@ struct btf {
> 	const char *strings;
> 	void *nohdr_data;
> 	struct btf_header hdr;
> -	u32 nr_types;
> +	u32 nr_types; /* includes VOID for base BTF */
> 	u32 types_size;
> 	u32 data_size;
> 	refcount_t refcnt;
> 	u32 id;
> 	struct rcu_head rcu;
> +
> +	/* split BTF support */
> +	struct btf *base_btf;
> +	u32 start_id; /* first type ID in this BTF (0 for base BTF) */
> +	u32 start_str_off; /* first string offset (0 for base BTF) */
> };
> 
> enum verifier_phase {
> @@ -449,14 +454,27 @@ static bool btf_type_is_datasec(const struct btf_type *t)
> 	return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC;
> }
> 
> +static u32 btf_nr_types_total(const struct btf *btf)
> +{
> +	u32 total = 0;
> +
> +	while (btf) {
> +		total += btf->nr_types;
> +		btf = btf->base_btf;
> +	}
> +
> +	return total;
> +}
> +
> s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
> {
> 	const struct btf_type *t;
> 	const char *tname;
> -	u32 i;
> +	u32 i, total;
> 
> -	for (i = 1; i <= btf->nr_types; i++) {
> -		t = btf->types[i];
> +	total = btf_nr_types_total(btf);
> +	for (i = 1; i < total; i++) {
> +		t = btf_type_by_id(btf, i);
> 		if (BTF_INFO_KIND(t->info) != kind)
> 			continue;
> 
> @@ -599,8 +617,14 @@ static const struct btf_kind_operations *btf_type_ops(const struct btf_type *t)
> 
> static bool btf_name_offset_valid(const struct btf *btf, u32 offset)
> {
> -	return BTF_STR_OFFSET_VALID(offset) &&
> -		offset < btf->hdr.str_len;
> +	if (!BTF_STR_OFFSET_VALID(offset))
> +		return false;
> +
> +	while (offset < btf->start_str_off)
> +		btf = btf->base_btf;

Do we need "if (!btf) return false;" in the while loop? (and some other loops below)

[...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 1/5] bpf: add in-kernel split BTF support
  2020-11-10 17:49   ` Song Liu
@ 2020-11-10 18:31     ` Andrii Nakryiko
  2020-11-10 20:14       ` Song Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-10 18:31 UTC (permalink / raw)
  To: Song Liu
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov, daniel,
	Kernel Team, linux-kernel, rafael, jeyu,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman

On Tue, Nov 10, 2020 at 9:50 AM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Adjust in-kernel BTF implementation to support a split BTF mode of operation.
> > Changes are mostly mirroring libbpf split BTF changes, with the exception of
> > start_id being 0 for in-kernel implementation due to simpler read-only mode.
> >
> > Otherwise, for split BTF logic, most of the logic of jumping to base BTF,
> > where necessary, is encapsulated in few helper functions. Type numbering and
> > string offset in a split BTF are logically continuing where base BTF ends, so
> > most of the high-level logic is kept without changes.
> >
> > Type verification and size resolution is only doing an added resolution of new
> > split BTF types and relies on already cached size and type resolution results
> > in the base BTF.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> > kernel/bpf/btf.c | 171 +++++++++++++++++++++++++++++++++--------------
> > 1 file changed, 119 insertions(+), 52 deletions(-)
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 6324de8c59f7..727c1c27053f 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -203,12 +203,17 @@ struct btf {
> >       const char *strings;
> >       void *nohdr_data;
> >       struct btf_header hdr;
> > -     u32 nr_types;
> > +     u32 nr_types; /* includes VOID for base BTF */
> >       u32 types_size;
> >       u32 data_size;
> >       refcount_t refcnt;
> >       u32 id;
> >       struct rcu_head rcu;
> > +
> > +     /* split BTF support */
> > +     struct btf *base_btf;
> > +     u32 start_id; /* first type ID in this BTF (0 for base BTF) */
> > +     u32 start_str_off; /* first string offset (0 for base BTF) */
> > };
> >
> > enum verifier_phase {
> > @@ -449,14 +454,27 @@ static bool btf_type_is_datasec(const struct btf_type *t)
> >       return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC;
> > }
> >
> > +static u32 btf_nr_types_total(const struct btf *btf)
> > +{
> > +     u32 total = 0;
> > +
> > +     while (btf) {
> > +             total += btf->nr_types;
> > +             btf = btf->base_btf;
> > +     }
> > +
> > +     return total;
> > +}
> > +
> > s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
> > {
> >       const struct btf_type *t;
> >       const char *tname;
> > -     u32 i;
> > +     u32 i, total;
> >
> > -     for (i = 1; i <= btf->nr_types; i++) {
> > -             t = btf->types[i];
> > +     total = btf_nr_types_total(btf);
> > +     for (i = 1; i < total; i++) {
> > +             t = btf_type_by_id(btf, i);
> >               if (BTF_INFO_KIND(t->info) != kind)
> >                       continue;
> >
> > @@ -599,8 +617,14 @@ static const struct btf_kind_operations *btf_type_ops(const struct btf_type *t)
> >
> > static bool btf_name_offset_valid(const struct btf *btf, u32 offset)
> > {
> > -     return BTF_STR_OFFSET_VALID(offset) &&
> > -             offset < btf->hdr.str_len;
> > +     if (!BTF_STR_OFFSET_VALID(offset))
> > +             return false;
> > +
> > +     while (offset < btf->start_str_off)
> > +             btf = btf->base_btf;
>
> Do we need "if (!btf) return false;" in the while loop? (and some other loops below)

No, because for base btf start_str_off and start_type_id are always
zero, so loop condition is always false.

>
> [...]
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 1/5] bpf: add in-kernel split BTF support
  2020-11-10 18:31     ` Andrii Nakryiko
@ 2020-11-10 20:14       ` Song Liu
  0 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2020-11-10 20:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov, daniel,
	Kernel Team, linux-kernel, rafael, jeyu,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman



> On Nov 10, 2020, at 10:31 AM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Tue, Nov 10, 2020 at 9:50 AM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
>>> 
>>> Adjust in-kernel BTF implementation to support a split BTF mode of operation.
>>> Changes are mostly mirroring libbpf split BTF changes, with the exception of
>>> start_id being 0 for in-kernel implementation due to simpler read-only mode.
>>> 
>>> Otherwise, for split BTF logic, most of the logic of jumping to base BTF,
>>> where necessary, is encapsulated in few helper functions. Type numbering and
>>> string offset in a split BTF are logically continuing where base BTF ends, so
>>> most of the high-level logic is kept without changes.
>>> 
>>> Type verification and size resolution is only doing an added resolution of new
>>> split BTF types and relies on already cached size and type resolution results
>>> in the base BTF.
>>> 
>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>>> ---
>>> kernel/bpf/btf.c | 171 +++++++++++++++++++++++++++++++++--------------
>>> 1 file changed, 119 insertions(+), 52 deletions(-)
>>> 
>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>> index 6324de8c59f7..727c1c27053f 100644
>>> --- a/kernel/bpf/btf.c
>>> +++ b/kernel/bpf/btf.c
>>> @@ -203,12 +203,17 @@ struct btf {
>>>      const char *strings;
>>>      void *nohdr_data;
>>>      struct btf_header hdr;
>>> -     u32 nr_types;
>>> +     u32 nr_types; /* includes VOID for base BTF */
>>>      u32 types_size;
>>>      u32 data_size;
>>>      refcount_t refcnt;
>>>      u32 id;
>>>      struct rcu_head rcu;
>>> +
>>> +     /* split BTF support */
>>> +     struct btf *base_btf;
>>> +     u32 start_id; /* first type ID in this BTF (0 for base BTF) */
>>> +     u32 start_str_off; /* first string offset (0 for base BTF) */
>>> };
>>> 
>>> enum verifier_phase {
>>> @@ -449,14 +454,27 @@ static bool btf_type_is_datasec(const struct btf_type *t)
>>>      return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC;
>>> }
>>> 
>>> +static u32 btf_nr_types_total(const struct btf *btf)
>>> +{
>>> +     u32 total = 0;
>>> +
>>> +     while (btf) {
>>> +             total += btf->nr_types;
>>> +             btf = btf->base_btf;
>>> +     }
>>> +
>>> +     return total;
>>> +}
>>> +
>>> s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
>>> {
>>>      const struct btf_type *t;
>>>      const char *tname;
>>> -     u32 i;
>>> +     u32 i, total;
>>> 
>>> -     for (i = 1; i <= btf->nr_types; i++) {
>>> -             t = btf->types[i];
>>> +     total = btf_nr_types_total(btf);
>>> +     for (i = 1; i < total; i++) {
>>> +             t = btf_type_by_id(btf, i);
>>>              if (BTF_INFO_KIND(t->info) != kind)
>>>                      continue;
>>> 
>>> @@ -599,8 +617,14 @@ static const struct btf_kind_operations *btf_type_ops(const struct btf_type *t)
>>> 
>>> static bool btf_name_offset_valid(const struct btf *btf, u32 offset)
>>> {
>>> -     return BTF_STR_OFFSET_VALID(offset) &&
>>> -             offset < btf->hdr.str_len;
>>> +     if (!BTF_STR_OFFSET_VALID(offset))
>>> +             return false;
>>> +
>>> +     while (offset < btf->start_str_off)
>>> +             btf = btf->base_btf;
>> 
>> Do we need "if (!btf) return false;" in the while loop? (and some other loops below)
> 
> No, because for base btf start_str_off and start_type_id are always
> zero, so loop condition is always false.

Ah, I misread the code. Thanks for the explanation. 

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/5] bpf: assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO
  2020-11-10  1:19 ` [PATCH v4 bpf-next 2/5] bpf: assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO Andrii Nakryiko
@ 2020-11-10 20:24   ` Song Liu
  0 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2020-11-10 20:24 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, daniel, Kernel Team,
	linux-kernel, rafael, jeyu, Arnaldo Carvalho de Melo,
	Greg Kroah-Hartman



> On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> Allocate ID for vmlinux BTF. This makes it visible when iterating over all BTF
> objects in the system. To allow distinguishing vmlinux BTF (and later kernel
> module BTF) from user-provided BTFs, expose extra kernel_btf flag, as well as
> BTF name ("vmlinux" for vmlinux BTF, will equal to module's name for module
> BTF).  We might want to later allow specifying BTF name for user-provided BTFs
> as well, if that makes sense. But currently this is reserved only for
> in-kernel BTFs.
> 
> Having in-kernel BTFs exposed IDs will allow to extend BPF APIs that require
> in-kernel BTF type with ability to specify BTF types from kernel modules, not
> just vmlinux BTF. This will be implemented in a follow up patch set for
> fentry/fexit/fmod_ret/lsm/etc.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Song Liu <songliubraving@fb.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it
  2020-11-10  1:19 ` [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it Andrii Nakryiko
@ 2020-11-11  1:04   ` Song Liu
  2020-11-16 19:55     ` Allan, Bruce W
  2020-11-17  3:44   ` David Ahern
  1 sibling, 1 reply; 26+ messages in thread
From: Song Liu @ 2020-11-11  1:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, open list, rafael, jeyu, Arnaldo Carvalho de Melo,
	Greg Kroah-Hartman, Masahiro Yamada



> On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:

[...]

> SPLIT BTF
> =========
> 
> $ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
> 5194047
> 
> $ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
> ./drivers/gpu/drm/i915/i915.ko 293206
> ./drivers/gpu/drm/radeon/radeon.ko 282103
> ./fs/xfs/xfs.ko 222150
> ./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
> ./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
> ./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
> ./fs/cifs/cifs.ko 109379
> ./arch/x86/kvm/kvm.ko 100225
> ./drivers/gpu/drm/drm.ko 94827
> ./drivers/infiniband/core/ib_core.ko 91188
> 
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 5/5] tools/bpftool: add support for in-kernel and named BTF in `btf show`
  2020-11-10  1:19 ` [PATCH v4 bpf-next 5/5] tools/bpftool: add support for in-kernel and named BTF in `btf show` Andrii Nakryiko
@ 2020-11-11  1:15   ` Song Liu
  2020-11-11  4:19     ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2020-11-11  1:15 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, daniel, Kernel Team,
	linux-kernel, rafael, jeyu, Arnaldo Carvalho de Melo,
	Greg Kroah-Hartman, Alan Maguire



> On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:

[...]

> ...
> 
> Tested-by: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Song Liu <songliubraving@fb.com>

With one nit:

> ---
> tools/bpf/bpftool/btf.c | 28 +++++++++++++++++++++++++++-
> 1 file changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
> index c96b56e8e3a4..ed5e97157241 100644
> --- a/tools/bpf/bpftool/btf.c
> +++ b/tools/bpf/bpftool/btf.c
> @@ -742,9 +742,14 @@ show_btf_plain(struct bpf_btf_info *info, int fd,
> 	       struct btf_attach_table *btf_map_table)
> {
> 	struct btf_attach_point *obj;
> +	const char *name = u64_to_ptr(info->name);
> 	int n;
> 
> 	printf("%u: ", info->id);
> +	if (info->kernel_btf)
> +		printf("name [%s]  ", name);
> +	else if (name && name[0])
> +		printf("name %s  ", name);

Maybe explicitly say "name <anonymous>" for btf without a name? I think 
it will benefit plain output.  

> 	printf("size %uB", info->btf_size);
> 
> 	n = 0;

[...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 5/5] tools/bpftool: add support for in-kernel and named BTF in `btf show`
  2020-11-11  1:15   ` Song Liu
@ 2020-11-11  4:19     ` Andrii Nakryiko
  2020-11-11  7:44       ` Song Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-11  4:19 UTC (permalink / raw)
  To: Song Liu
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov, daniel,
	Kernel Team, linux-kernel, rafael, jeyu,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman, Alan Maguire

On Tue, Nov 10, 2020 at 5:15 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
>
> [...]
>
> > ...
> >
> > Tested-by: Alan Maguire <alan.maguire@oracle.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Acked-by: Song Liu <songliubraving@fb.com>
>
> With one nit:
>
> > ---
> > tools/bpf/bpftool/btf.c | 28 +++++++++++++++++++++++++++-
> > 1 file changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
> > index c96b56e8e3a4..ed5e97157241 100644
> > --- a/tools/bpf/bpftool/btf.c
> > +++ b/tools/bpf/bpftool/btf.c
> > @@ -742,9 +742,14 @@ show_btf_plain(struct bpf_btf_info *info, int fd,
> >              struct btf_attach_table *btf_map_table)
> > {
> >       struct btf_attach_point *obj;
> > +     const char *name = u64_to_ptr(info->name);
> >       int n;
> >
> >       printf("%u: ", info->id);
> > +     if (info->kernel_btf)
> > +             printf("name [%s]  ", name);
> > +     else if (name && name[0])
> > +             printf("name %s  ", name);
>
> Maybe explicitly say "name <anonymous>" for btf without a name? I think
> it will benefit plain output.

This patch set is already landed. But I can do a follow-up patch to add this.

>
> >       printf("size %uB", info->btf_size);
> >
> >       n = 0;
>
> [...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 4/5] bpf: load and verify kernel module BTFs
  2020-11-10  1:19 ` [PATCH v4 bpf-next 4/5] bpf: load and verify kernel module BTFs Andrii Nakryiko
@ 2020-11-11  7:11   ` kernel test robot
  2020-11-11 10:13   ` Jessica Yu
  1 sibling, 0 replies; 26+ messages in thread
From: kernel test robot @ 2020-11-11  7:11 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: kbuild-all, clang-built-linux, kernel-team, linux-kernel, rafael,
	jeyu, Andrii Nakryiko, Arnaldo Carvalho de Melo

[-- Attachment #1: Type: text/plain, Size: 4563 bytes --]

Hi Andrii,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Andrii-Nakryiko/Integrate-kernel-module-BTF-support/20201110-095309
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: mips-randconfig-r002-20201110 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 4d81c8adb6ed9840257f6cb6b93f60856d422a15)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install mips cross compiling tool for clang build
        # apt-get install binutils-mips-linux-gnu
        # https://github.com/0day-ci/linux/commit/dcd763b7808fdc01ebf70bbe07ba92388df4d20d
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Andrii-Nakryiko/Integrate-kernel-module-BTF-support/20201110-095309
        git checkout dcd763b7808fdc01ebf70bbe07ba92388df4d20d
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> kernel/bpf/btf.c:4481:20: warning: unused function 'btf_parse_module'
   static struct btf char const void unsigned int data_size)
   ^
   fatal error: error in backend: Nested variants found in inline asm string: ' .set push
   .set mips64r2
   .if ( 0x00 ) != -1)) 0x00 ) != -1)) : ($( static struct ftrace_branch_data __attribute__((__aligned__(4))) __attribute__((__section__("_ftrace_branch"))) __if_trace = $( .func = __func__, .file = "arch/mips/include/asm/atomic.h", .line = 154, $); 0x00 ) != -1)) : $))) ) && ( 0 ); .set push; .set mips64r2; .rept 1; sync 0x00; .endr; .set pop; .else; ; .endif
   1: ll $1, $2 # atomic_fetch_sub
   subu $0, $1, $3
   sc $0, $2
   beqz $0, 1b
   .set pop
   move $0, $1
   '
   clang-12: error: clang frontend command failed with exit code 70 (use -v to see invocation)
   clang version 12.0.0 (git://gitmirror/llvm_project 874b0a0b9db93f5d3350ffe6b5efda2d908415d0)
   Target: mipsel-unknown-linux-gnu
   Thread model: posix
   InstalledDir: /opt/cross/clang-874b0a0b9d/bin
   clang-12: note: diagnostic msg:
   Makefile arch drivers include kernel mm scripts source usr

vim +/btf_parse_module +4481 kernel/bpf/btf.c

  4480	
> 4481	static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
  4482	{
  4483		struct btf_verifier_env *env = NULL;
  4484		struct bpf_verifier_log *log;
  4485		struct btf *btf = NULL, *base_btf;
  4486		int err;
  4487	
  4488		base_btf = bpf_get_btf_vmlinux();
  4489		if (IS_ERR(base_btf))
  4490			return base_btf;
  4491		if (!base_btf)
  4492			return ERR_PTR(-EINVAL);
  4493	
  4494		env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
  4495		if (!env)
  4496			return ERR_PTR(-ENOMEM);
  4497	
  4498		log = &env->log;
  4499		log->level = BPF_LOG_KERNEL;
  4500	
  4501		btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
  4502		if (!btf) {
  4503			err = -ENOMEM;
  4504			goto errout;
  4505		}
  4506		env->btf = btf;
  4507	
  4508		btf->base_btf = base_btf;
  4509		btf->start_id = base_btf->nr_types;
  4510		btf->start_str_off = base_btf->hdr.str_len;
  4511		btf->kernel_btf = true;
  4512		snprintf(btf->name, sizeof(btf->name), "%s", module_name);
  4513	
  4514		btf->data = kvmalloc(data_size, GFP_KERNEL | __GFP_NOWARN);
  4515		if (!btf->data) {
  4516			err = -ENOMEM;
  4517			goto errout;
  4518		}
  4519		memcpy(btf->data, data, data_size);
  4520		btf->data_size = data_size;
  4521	
  4522		err = btf_parse_hdr(env);
  4523		if (err)
  4524			goto errout;
  4525	
  4526		btf->nohdr_data = btf->data + btf->hdr.hdr_len;
  4527	
  4528		err = btf_parse_str_sec(env);
  4529		if (err)
  4530			goto errout;
  4531	
  4532		err = btf_check_all_metas(env);
  4533		if (err)
  4534			goto errout;
  4535	
  4536		btf_verifier_env_free(env);
  4537		refcount_set(&btf->refcnt, 1);
  4538		return btf;
  4539	
  4540	errout:
  4541		btf_verifier_env_free(env);
  4542		if (btf) {
  4543			kvfree(btf->data);
  4544			kvfree(btf->types);
  4545			kfree(btf);
  4546		}
  4547		return ERR_PTR(err);
  4548	}
  4549	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30898 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 5/5] tools/bpftool: add support for in-kernel and named BTF in `btf show`
  2020-11-11  4:19     ` Andrii Nakryiko
@ 2020-11-11  7:44       ` Song Liu
  0 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2020-11-11  7:44 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov, daniel,
	Kernel Team, linux-kernel, rafael, jeyu,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman, Alan Maguire



> On Nov 10, 2020, at 8:19 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Tue, Nov 10, 2020 at 5:15 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
>> 
>> [...]
>> 
>>> ...
>>> 
>>> Tested-by: Alan Maguire <alan.maguire@oracle.com>
>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>> 
>> Acked-by: Song Liu <songliubraving@fb.com>
>> 
>> With one nit:
>> 
>>> ---
>>> tools/bpf/bpftool/btf.c | 28 +++++++++++++++++++++++++++-
>>> 1 file changed, 27 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
>>> index c96b56e8e3a4..ed5e97157241 100644
>>> --- a/tools/bpf/bpftool/btf.c
>>> +++ b/tools/bpf/bpftool/btf.c
>>> @@ -742,9 +742,14 @@ show_btf_plain(struct bpf_btf_info *info, int fd,
>>>             struct btf_attach_table *btf_map_table)
>>> {
>>>      struct btf_attach_point *obj;
>>> +     const char *name = u64_to_ptr(info->name);
>>>      int n;
>>> 
>>>      printf("%u: ", info->id);
>>> +     if (info->kernel_btf)
>>> +             printf("name [%s]  ", name);
>>> +     else if (name && name[0])
>>> +             printf("name %s  ", name);
>> 
>> Maybe explicitly say "name <anonymous>" for btf without a name? I think
>> it will benefit plain output.
> 
> This patch set is already landed. But I can do a follow-up patch to add this.

I realized this was applied soon after sending this. Yeah, a follow-up 
patch would be great. 

Thanks,
Song 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 4/5] bpf: load and verify kernel module BTFs
  2020-11-10  1:19 ` [PATCH v4 bpf-next 4/5] bpf: load and verify kernel module BTFs Andrii Nakryiko
  2020-11-11  7:11   ` kernel test robot
@ 2020-11-11 10:13   ` Jessica Yu
  2020-11-11 20:11     ` Andrii Nakryiko
  1 sibling, 1 reply; 26+ messages in thread
From: Jessica Yu @ 2020-11-11 10:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, ast, daniel, kernel-team, linux-kernel, rafael,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman

+++ Andrii Nakryiko [09/11/20 17:19 -0800]:
[snipped]
>diff --git a/kernel/module.c b/kernel/module.c
>index a4fa44a652a7..f2996b02ab2e 100644
>--- a/kernel/module.c
>+++ b/kernel/module.c
>@@ -380,6 +380,35 @@ static void *section_objs(const struct load_info *info,
> 	return (void *)info->sechdrs[sec].sh_addr;
> }
>
>+/* Find a module section: 0 means not found. Ignores SHF_ALLOC flag. */
>+static unsigned int find_any_sec(const struct load_info *info, const char *name)
>+{
>+	unsigned int i;
>+
>+	for (i = 1; i < info->hdr->e_shnum; i++) {
>+		Elf_Shdr *shdr = &info->sechdrs[i];
>+		if (strcmp(info->secstrings + shdr->sh_name, name) == 0)
>+			return i;
>+	}
>+	return 0;
>+}
>+
>+/*
>+ * Find a module section, or NULL. Fill in number of "objects" in section.
>+ * Ignores SHF_ALLOC flag.
>+ */
>+static __maybe_unused void *any_section_objs(const struct load_info *info,
>+					     const char *name,
>+					     size_t object_size,
>+					     unsigned int *num)
>+{
>+	unsigned int sec = find_any_sec(info, name);
>+
>+	/* Section 0 has sh_addr 0 and sh_size 0. */
>+	*num = info->sechdrs[sec].sh_size / object_size;
>+	return (void *)info->sechdrs[sec].sh_addr;
>+}
>+

Hm, I see this patchset has already been applied to bpf-next, but I
guess that doesn't preclude any follow-up patches :-)

I am not a huge fan of the code duplication here, and also the fact
that they're only called in one place. any_section_objs() and
find_any_sec() are pretty much identical to section_objs() and
find_sec(), other than the fact the former drops the SHF_ALLOC check.

Moreover, since it appears that the ".BTF" section is not marked
SHF_ALLOC, I think this will leave mod->btf_data as a dangling pointer
after the module is done loading and the module's load_info has been
deallocated, since SHF_ALLOC sections are not allocated nor copied to
the module's final location in memory.

Why not simply mark the ".BTF" section in the module SHF_ALLOC? We
already do some sh_flags rewriting in rewrite_section_headers(). Then
the module loader knows to keep the section in memory and you can use
section_objs(). And since the .BTF section stays in module memory,
that might save you the memcpy() to btf->data in btf_parse_module()
(unless that is still needed for some reason).

Thanks,

Jessica

> /* Provided by the linker */
> extern const struct kernel_symbol __start___ksymtab[];
> extern const struct kernel_symbol __stop___ksymtab[];
>@@ -3250,6 +3279,9 @@ static int find_module_sections(struct module *mod, struct load_info *info)
> 					   sizeof(*mod->bpf_raw_events),
> 					   &mod->num_bpf_raw_events);
> #endif
>+#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
>+	mod->btf_data = any_section_objs(info, ".BTF", 1, &mod->btf_data_size);
>+#endif
> #ifdef CONFIG_JUMP_LABEL
> 	mod->jump_entries = section_objs(info, "__jump_table",
> 					sizeof(*mod->jump_entries),
>-- 
>2.24.1
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 4/5] bpf: load and verify kernel module BTFs
  2020-11-11 10:13   ` Jessica Yu
@ 2020-11-11 20:11     ` Andrii Nakryiko
  2020-11-13 10:31       ` Jessica Yu
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-11 20:11 UTC (permalink / raw)
  To: Jessica Yu
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, open list, Rafael J. Wysocki,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman

On Wed, Nov 11, 2020 at 2:13 AM Jessica Yu <jeyu@kernel.org> wrote:
>
> +++ Andrii Nakryiko [09/11/20 17:19 -0800]:
> [snipped]
> >diff --git a/kernel/module.c b/kernel/module.c
> >index a4fa44a652a7..f2996b02ab2e 100644
> >--- a/kernel/module.c
> >+++ b/kernel/module.c
> >@@ -380,6 +380,35 @@ static void *section_objs(const struct load_info *info,
> >       return (void *)info->sechdrs[sec].sh_addr;
> > }
> >
> >+/* Find a module section: 0 means not found. Ignores SHF_ALLOC flag. */
> >+static unsigned int find_any_sec(const struct load_info *info, const char *name)
> >+{
> >+      unsigned int i;
> >+
> >+      for (i = 1; i < info->hdr->e_shnum; i++) {
> >+              Elf_Shdr *shdr = &info->sechdrs[i];
> >+              if (strcmp(info->secstrings + shdr->sh_name, name) == 0)
> >+                      return i;
> >+      }
> >+      return 0;
> >+}
> >+
> >+/*
> >+ * Find a module section, or NULL. Fill in number of "objects" in section.
> >+ * Ignores SHF_ALLOC flag.
> >+ */
> >+static __maybe_unused void *any_section_objs(const struct load_info *info,
> >+                                           const char *name,
> >+                                           size_t object_size,
> >+                                           unsigned int *num)
> >+{
> >+      unsigned int sec = find_any_sec(info, name);
> >+
> >+      /* Section 0 has sh_addr 0 and sh_size 0. */
> >+      *num = info->sechdrs[sec].sh_size / object_size;
> >+      return (void *)info->sechdrs[sec].sh_addr;
> >+}
> >+
>
> Hm, I see this patchset has already been applied to bpf-next, but I
> guess that doesn't preclude any follow-up patches :-)

Of course!

>
> I am not a huge fan of the code duplication here, and also the fact
> that they're only called in one place. any_section_objs() and
> find_any_sec() are pretty much identical to section_objs() and
> find_sec(), other than the fact the former drops the SHF_ALLOC check.

Right, but the alternative was to add a new flag to existing
section_objs() and find_sec() functions, which would cause much more
code churn for no good reason (besides saving some trivial code
duplication). And those true/false flags are harder to read in code
anyways.

>
> Moreover, since it appears that the ".BTF" section is not marked
> SHF_ALLOC, I think this will leave mod->btf_data as a dangling pointer
> after the module is done loading and the module's load_info has been
> deallocated, since SHF_ALLOC sections are not allocated nor copied to
> the module's final location in memory.

I can make sure that we also reset the btf_data pointer back to NULL,
if that's a big concern.

>
> Why not simply mark the ".BTF" section in the module SHF_ALLOC? We
> already do some sh_flags rewriting in rewrite_section_headers(). Then
> the module loader knows to keep the section in memory and you can use
> section_objs(). And since the .BTF section stays in module memory,
> that might save you the memcpy() to btf->data in btf_parse_module()
> (unless that is still needed for some reason).

Wasn't aware about rewrite_section_headers() manipulations. Are you
suggesting to just add SHF_ALLOC there for the .BTF section from the
kernel side? I guess that would work, but won't avoid memory copy (so
actually would waste kernel memory, if I understand correctly). The
reason being that the module's BTF is registered as an independently
ref-counted BTF object, which could be held past the kernel module
being unloaded. So I can't directly reference module's .BTF data
anyways.

Also, marking .BTF with SHF_ALLOC with pahole or objcopy tool actually
might generate warnings because SHF_ALLOC sections need to be
allocated to data segments, which neither of those tools know how to
do, it requires a linker support. We do that for vmlinux with extra
linker script logic, but for kernel modules we don't have and probably
don't want to do that.

So in the end, the cleanest approach still seems like not doing
SHF_ALLOC but allowing "capturing" .BTF data with an extra helper.

>
> Thanks,
>
> Jessica
>
> > /* Provided by the linker */
> > extern const struct kernel_symbol __start___ksymtab[];
> > extern const struct kernel_symbol __stop___ksymtab[];
> >@@ -3250,6 +3279,9 @@ static int find_module_sections(struct module *mod, struct load_info *info)
> >                                          sizeof(*mod->bpf_raw_events),
> >                                          &mod->num_bpf_raw_events);
> > #endif
> >+#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
> >+      mod->btf_data = any_section_objs(info, ".BTF", 1, &mod->btf_data_size);
> >+#endif
> > #ifdef CONFIG_JUMP_LABEL
> >       mod->jump_entries = section_objs(info, "__jump_table",
> >                                       sizeof(*mod->jump_entries),
> >--
> >2.24.1
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 4/5] bpf: load and verify kernel module BTFs
  2020-11-11 20:11     ` Andrii Nakryiko
@ 2020-11-13 10:31       ` Jessica Yu
  2020-11-13 18:54         ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Jessica Yu @ 2020-11-13 10:31 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, open list, Rafael J. Wysocki,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman

+++ Andrii Nakryiko [11/11/20 12:11 -0800]:
>On Wed, Nov 11, 2020 at 2:13 AM Jessica Yu <jeyu@kernel.org> wrote:
>>
>> +++ Andrii Nakryiko [09/11/20 17:19 -0800]:
>> [snipped]
>> >diff --git a/kernel/module.c b/kernel/module.c
>> >index a4fa44a652a7..f2996b02ab2e 100644
>> >--- a/kernel/module.c
>> >+++ b/kernel/module.c
>> >@@ -380,6 +380,35 @@ static void *section_objs(const struct load_info *info,
>> >       return (void *)info->sechdrs[sec].sh_addr;
>> > }
>> >
>> >+/* Find a module section: 0 means not found. Ignores SHF_ALLOC flag. */
>> >+static unsigned int find_any_sec(const struct load_info *info, const char *name)
>> >+{
>> >+      unsigned int i;
>> >+
>> >+      for (i = 1; i < info->hdr->e_shnum; i++) {
>> >+              Elf_Shdr *shdr = &info->sechdrs[i];
>> >+              if (strcmp(info->secstrings + shdr->sh_name, name) == 0)
>> >+                      return i;
>> >+      }
>> >+      return 0;
>> >+}
>> >+
>> >+/*
>> >+ * Find a module section, or NULL. Fill in number of "objects" in section.
>> >+ * Ignores SHF_ALLOC flag.
>> >+ */
>> >+static __maybe_unused void *any_section_objs(const struct load_info *info,
>> >+                                           const char *name,
>> >+                                           size_t object_size,
>> >+                                           unsigned int *num)
>> >+{
>> >+      unsigned int sec = find_any_sec(info, name);
>> >+
>> >+      /* Section 0 has sh_addr 0 and sh_size 0. */
>> >+      *num = info->sechdrs[sec].sh_size / object_size;
>> >+      return (void *)info->sechdrs[sec].sh_addr;
>> >+}
>> >+
>>
>> Hm, I see this patchset has already been applied to bpf-next, but I
>> guess that doesn't preclude any follow-up patches :-)
>
>Of course!
>
>>
>> I am not a huge fan of the code duplication here, and also the fact
>> that they're only called in one place. any_section_objs() and
>> find_any_sec() are pretty much identical to section_objs() and
>> find_sec(), other than the fact the former drops the SHF_ALLOC check.
>
>Right, but the alternative was to add a new flag to existing
>section_objs() and find_sec() functions, which would cause much more
>code churn for no good reason (besides saving some trivial code
>duplication). And those true/false flags are harder to read in code
>anyways.

That's true, all fair points. I thought there was the possibility to
avoid the code duplication if .BTF were also set to SHF_ALLOC, but I
see for reasons you explained below it is more trouble than it's worth.

>>
>> Moreover, since it appears that the ".BTF" section is not marked
>> SHF_ALLOC, I think this will leave mod->btf_data as a dangling pointer
>> after the module is done loading and the module's load_info has been
>> deallocated, since SHF_ALLOC sections are not allocated nor copied to
>> the module's final location in memory.
>
>I can make sure that we also reset the btf_data pointer back to NULL,
>if that's a big concern.

It's not a terribly huge concern, since mod->btf_data is only accessed
in the btf coming notifier at the moment, but it's probably best to at
least not advertise it as a valid pointer anymore after the module is
done loading. We do some pointer and section size cleanup at the end
of do_init_module() for sections that are deallocated at the end of
module load (starting where init_layout.base is reset to NULL),
we could just tack on mod->btf_data = NULL there as well.

>>
>> Why not simply mark the ".BTF" section in the module SHF_ALLOC? We
>> already do some sh_flags rewriting in rewrite_section_headers(). Then
>> the module loader knows to keep the section in memory and you can use
>> section_objs(). And since the .BTF section stays in module memory,
>> that might save you the memcpy() to btf->data in btf_parse_module()
>> (unless that is still needed for some reason).
>
>Wasn't aware about rewrite_section_headers() manipulations. Are you
>suggesting to just add SHF_ALLOC there for the .BTF section from the
>kernel side? I guess that would work, but won't avoid memory copy (so
>actually would waste kernel memory, if I understand correctly). The
>reason being that the module's BTF is registered as an independently
>ref-counted BTF object, which could be held past the kernel module
>being unloaded. So I can't directly reference module's .BTF data
>anyways.

Ah OK, I was not aware that the section could be held past the module
being unloaded. Then yeah, it would be a memory waste to keep them in
memory if they are being memcpy'd anyway. Thanks for clarifying!

Jessica

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 4/5] bpf: load and verify kernel module BTFs
  2020-11-13 10:31       ` Jessica Yu
@ 2020-11-13 18:54         ` Andrii Nakryiko
  0 siblings, 0 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-13 18:54 UTC (permalink / raw)
  To: Jessica Yu
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, open list, Rafael J. Wysocki,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman

On Fri, Nov 13, 2020 at 2:32 AM Jessica Yu <jeyu@kernel.org> wrote:
>
> +++ Andrii Nakryiko [11/11/20 12:11 -0800]:
> >On Wed, Nov 11, 2020 at 2:13 AM Jessica Yu <jeyu@kernel.org> wrote:
> >>
> >> +++ Andrii Nakryiko [09/11/20 17:19 -0800]:
> >> [snipped]
> >> >diff --git a/kernel/module.c b/kernel/module.c
> >> >index a4fa44a652a7..f2996b02ab2e 100644
> >> >--- a/kernel/module.c
> >> >+++ b/kernel/module.c
> >> >@@ -380,6 +380,35 @@ static void *section_objs(const struct load_info *info,
> >> >       return (void *)info->sechdrs[sec].sh_addr;
> >> > }
> >> >
> >> >+/* Find a module section: 0 means not found. Ignores SHF_ALLOC flag. */
> >> >+static unsigned int find_any_sec(const struct load_info *info, const char *name)
> >> >+{
> >> >+      unsigned int i;
> >> >+
> >> >+      for (i = 1; i < info->hdr->e_shnum; i++) {
> >> >+              Elf_Shdr *shdr = &info->sechdrs[i];
> >> >+              if (strcmp(info->secstrings + shdr->sh_name, name) == 0)
> >> >+                      return i;
> >> >+      }
> >> >+      return 0;
> >> >+}
> >> >+
> >> >+/*
> >> >+ * Find a module section, or NULL. Fill in number of "objects" in section.
> >> >+ * Ignores SHF_ALLOC flag.
> >> >+ */
> >> >+static __maybe_unused void *any_section_objs(const struct load_info *info,
> >> >+                                           const char *name,
> >> >+                                           size_t object_size,
> >> >+                                           unsigned int *num)
> >> >+{
> >> >+      unsigned int sec = find_any_sec(info, name);
> >> >+
> >> >+      /* Section 0 has sh_addr 0 and sh_size 0. */
> >> >+      *num = info->sechdrs[sec].sh_size / object_size;
> >> >+      return (void *)info->sechdrs[sec].sh_addr;
> >> >+}
> >> >+
> >>
> >> Hm, I see this patchset has already been applied to bpf-next, but I
> >> guess that doesn't preclude any follow-up patches :-)
> >
> >Of course!
> >
> >>
> >> I am not a huge fan of the code duplication here, and also the fact
> >> that they're only called in one place. any_section_objs() and
> >> find_any_sec() are pretty much identical to section_objs() and
> >> find_sec(), other than the fact the former drops the SHF_ALLOC check.
> >
> >Right, but the alternative was to add a new flag to existing
> >section_objs() and find_sec() functions, which would cause much more
> >code churn for no good reason (besides saving some trivial code
> >duplication). And those true/false flags are harder to read in code
> >anyways.
>
> That's true, all fair points. I thought there was the possibility to
> avoid the code duplication if .BTF were also set to SHF_ALLOC, but I
> see for reasons you explained below it is more trouble than it's worth.
>
> >>
> >> Moreover, since it appears that the ".BTF" section is not marked
> >> SHF_ALLOC, I think this will leave mod->btf_data as a dangling pointer
> >> after the module is done loading and the module's load_info has been
> >> deallocated, since SHF_ALLOC sections are not allocated nor copied to
> >> the module's final location in memory.
> >
> >I can make sure that we also reset the btf_data pointer back to NULL,
> >if that's a big concern.
>
> It's not a terribly huge concern, since mod->btf_data is only accessed
> in the btf coming notifier at the moment, but it's probably best to at
> least not advertise it as a valid pointer anymore after the module is
> done loading. We do some pointer and section size cleanup at the end
> of do_init_module() for sections that are deallocated at the end of
> module load (starting where init_layout.base is reset to NULL),
> we could just tack on mod->btf_data = NULL there as well.

Sounds good, I'll send a follow up patch. Thanks!

>
> >>
> >> Why not simply mark the ".BTF" section in the module SHF_ALLOC? We
> >> already do some sh_flags rewriting in rewrite_section_headers(). Then
> >> the module loader knows to keep the section in memory and you can use
> >> section_objs(). And since the .BTF section stays in module memory,
> >> that might save you the memcpy() to btf->data in btf_parse_module()
> >> (unless that is still needed for some reason).
> >
> >Wasn't aware about rewrite_section_headers() manipulations. Are you
> >suggesting to just add SHF_ALLOC there for the .BTF section from the
> >kernel side? I guess that would work, but won't avoid memory copy (so
> >actually would waste kernel memory, if I understand correctly). The
> >reason being that the module's BTF is registered as an independently
> >ref-counted BTF object, which could be held past the kernel module
> >being unloaded. So I can't directly reference module's .BTF data
> >anyways.
>
> Ah OK, I was not aware that the section could be held past the module
> being unloaded. Then yeah, it would be a memory waste to keep them in
> memory if they are being memcpy'd anyway. Thanks for clarifying!
>
> Jessica

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it
  2020-11-11  1:04   ` Song Liu
@ 2020-11-16 19:55     ` Allan, Bruce W
  2020-11-16 20:34       ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Allan, Bruce W @ 2020-11-16 19:55 UTC (permalink / raw)
  To: Song Liu, Andrii Nakryiko
  Cc: bpf, Networking, Starovoitov, Alexei, Daniel Borkmann,
	Kernel Team, open list, rafael, jeyu, Arnaldo Carvalho de Melo,
	Greg Kroah-Hartman, Masahiro Yamada

> -----Original Message-----
> From: Song Liu <songliubraving@fb.com>
> Sent: Tuesday, November 10, 2020 5:05 PM
> To: Andrii Nakryiko <andrii@kernel.org>
> Cc: bpf <bpf@vger.kernel.org>; Networking <netdev@vger.kernel.org>;
> Starovoitov, Alexei <ast@fb.com>; Daniel Borkmann <daniel@iogearbox.net>;
> Kernel Team <Kernel-team@fb.com>; open list <linux-
> kernel@vger.kernel.org>; rafael@kernel.org; jeyu@kernel.org; Arnaldo
> Carvalho de Melo <acme@redhat.com>; Greg Kroah-Hartman
> <gregkh@linuxfoundation.org>; Masahiro Yamada
> <yamada.masahiro@socionext.com>
> Subject: Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is
> enabled and pahole supports it
> 
> 
> 
> > On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> [...]
> 
> > SPLIT BTF
> > =========
> >
> > $ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}';
> done | awk '{ s += $1 } END { print s }'
> > 5194047
> >
> > $ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep
> BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
> > ./drivers/gpu/drm/i915/i915.ko 293206
> > ./drivers/gpu/drm/radeon/radeon.ko 282103
> > ./fs/xfs/xfs.ko 222150
> > ./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
> > ./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
> > ./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
> > ./fs/cifs/cifs.ko 109379
> > ./arch/x86/kvm/kvm.ko 100225
> > ./drivers/gpu/drm/drm.ko 94827
> > ./drivers/infiniband/core/ib_core.ko 91188
> >
> > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> 
> Acked-by: Song Liu <songliubraving@fb.com>

This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
"Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."

Is that intentional?

Thanks,
Bruce.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it
  2020-11-16 19:55     ` Allan, Bruce W
@ 2020-11-16 20:34       ` Andrii Nakryiko
  2020-11-16 21:24         ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-16 20:34 UTC (permalink / raw)
  To: Allan, Bruce W
  Cc: Song Liu, Andrii Nakryiko, bpf, Networking, Starovoitov, Alexei,
	Daniel Borkmann, Kernel Team, open list, rafael, jeyu,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman, Masahiro Yamada

On Mon, Nov 16, 2020 at 11:55 AM Allan, Bruce W <bruce.w.allan@intel.com> wrote:
>
> > -----Original Message-----
> > From: Song Liu <songliubraving@fb.com>
> > Sent: Tuesday, November 10, 2020 5:05 PM
> > To: Andrii Nakryiko <andrii@kernel.org>
> > Cc: bpf <bpf@vger.kernel.org>; Networking <netdev@vger.kernel.org>;
> > Starovoitov, Alexei <ast@fb.com>; Daniel Borkmann <daniel@iogearbox.net>;
> > Kernel Team <Kernel-team@fb.com>; open list <linux-
> > kernel@vger.kernel.org>; rafael@kernel.org; jeyu@kernel.org; Arnaldo
> > Carvalho de Melo <acme@redhat.com>; Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org>; Masahiro Yamada
> > <yamada.masahiro@socionext.com>
> > Subject: Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is
> > enabled and pahole supports it
> >
> >
> >
> > > On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > [...]
> >
> > > SPLIT BTF
> > > =========
> > >
> > > $ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}';
> > done | awk '{ s += $1 } END { print s }'
> > > 5194047
> > >
> > > $ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep
> > BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
> > > ./drivers/gpu/drm/i915/i915.ko 293206
> > > ./drivers/gpu/drm/radeon/radeon.ko 282103
> > > ./fs/xfs/xfs.ko 222150
> > > ./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
> > > ./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
> > > ./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
> > > ./fs/cifs/cifs.ko 109379
> > > ./arch/x86/kvm/kvm.ko 100225
> > > ./drivers/gpu/drm/drm.ko 94827
> > > ./drivers/infiniband/core/ib_core.ko 91188
> > >
> > > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >
> > Acked-by: Song Liu <songliubraving@fb.com>
>
> This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
> supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
> kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
> first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
> "Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
> errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."
>
> Is that intentional?

I wasn't aware of such a use pattern, so definitely not intentional.
But vmlinux is absolutely necessary to generate the module BTF. So I'm
wondering what's the proper fix here? Leave it as is (that error
message is actually surprisingly descriptive, btw)? Force vmlinux
build? Or skip BTF generation for that module?

>
> Thanks,
> Bruce.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it
  2020-11-16 20:34       ` Andrii Nakryiko
@ 2020-11-16 21:24         ` Jakub Kicinski
  2020-11-16 22:35           ` Andrii Nakryiko
  2021-11-26 15:16           ` Fabian Grünbichler
  0 siblings, 2 replies; 26+ messages in thread
From: Jakub Kicinski @ 2020-11-16 21:24 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Allan, Bruce W, Song Liu, Andrii Nakryiko, bpf, Networking,
	Starovoitov, Alexei, Daniel Borkmann, Kernel Team, open list,
	rafael, jeyu, Arnaldo Carvalho de Melo, Greg Kroah-Hartman,
	Masahiro Yamada

On Mon, 16 Nov 2020 12:34:17 -0800 Andrii Nakryiko wrote:
> > This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
> > supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
> > kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
> > first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
> > "Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
> > errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."
> >
> > Is that intentional?  
> 
> I wasn't aware of such a use pattern, so definitely not intentional.
> But vmlinux is absolutely necessary to generate the module BTF. So I'm
> wondering what's the proper fix here? Leave it as is (that error
> message is actually surprisingly descriptive, btw)? Force vmlinux
> build? Or skip BTF generation for that module?

For an external out-of-tree module there is no guarantee that vmlinux
will even be on the system, no? So only the last option can work in
that case.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it
  2020-11-16 21:24         ` Jakub Kicinski
@ 2020-11-16 22:35           ` Andrii Nakryiko
  2021-11-26 15:16           ` Fabian Grünbichler
  1 sibling, 0 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2020-11-16 22:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Allan, Bruce W, Song Liu, Andrii Nakryiko, bpf, Networking,
	Starovoitov, Alexei, Daniel Borkmann, Kernel Team, open list,
	rafael, jeyu, Arnaldo Carvalho de Melo, Greg Kroah-Hartman,
	Masahiro Yamada

On Mon, Nov 16, 2020 at 1:24 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 16 Nov 2020 12:34:17 -0800 Andrii Nakryiko wrote:
> > > This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
> > > supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
> > > kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
> > > first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
> > > "Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
> > > errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."
> > >
> > > Is that intentional?
> >
> > I wasn't aware of such a use pattern, so definitely not intentional.
> > But vmlinux is absolutely necessary to generate the module BTF. So I'm
> > wondering what's the proper fix here? Leave it as is (that error
> > message is actually surprisingly descriptive, btw)? Force vmlinux
> > build? Or skip BTF generation for that module?
>
> For an external out-of-tree module there is no guarantee that vmlinux
> will even be on the system, no? So only the last option can work in
> that case.


Ok, how about something like the patch below. With that I seem to be
getting the desired behavior:

$ make clean
$ touch drivers/acpi/button.c
$ make M=drivers/acpi
make[1]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
  CC [M]  drivers/acpi/button.o
  MODPOST drivers/acpi/Module.symvers
  LD [M]  drivers/acpi/button.ko
  BTF [M] drivers/acpi/button.ko
Skipping BTF generation for drivers/acpi/button.ko due to
unavailability of vmlinux
make[1]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
$ readelf -S ~/linux-build/default/drivers/acpi/button.ko | grep BTF -A1
... empty ...

Now with normal build:

$ make all
...
LD [M]  drivers/acpi/button.ko
BTF [M] drivers/acpi/button.ko
...
$ readelf -S ~/linux-build/default/drivers/acpi/button.ko | grep BTF -A1
  [60] .BTF              PROGBITS         0000000000000000  00029310
       000000000000ab3f  0000000000000000           0     0     1



diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 02b892421f7a..d49ec001825d 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -38,7 +38,12 @@ quiet_cmd_ld_ko_o = LD [M]  $@
     $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)

 quiet_cmd_btf_ko = BTF [M] $@
-      cmd_btf_ko = LLVM_OBJCOPY=$(OBJCOPY) $(PAHOLE) -J --btf_base vmlinux $@
+      cmd_btf_ko =                             \
+    if [ -f vmlinux ]; then                        \
+        LLVM_OBJCOPY=$(OBJCOPY) $(PAHOLE) -J --btf_base vmlinux $@; \
+    else                                \
+        printf "Skipping BTF generation for %s due to unavailability
of vmlinux\n" $@ 1>&2; \
+    fi;

 # Same as newer-prereqs, but allows to exclude specified extra dependencies
 newer_prereqs_except = $(filter-out $(PHONY) $(1),$?)
@@ -49,7 +54,7 @@ if_changed_except = $(if $(call
newer_prereqs_except,$(2))$(cmd-check),      \
     printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd, @:)

 # Re-generate module BTFs if either module's .ko or vmlinux changed
-$(modules): %.ko: %.o %.mod.o scripts/module.lds vmlinux FORCE
+$(modules): %.ko: %.o %.mod.o scripts/module.lds $(if
$(KBUILD_BUILTIN),vmlinux) FORCE
     +$(call if_changed_except,ld_ko_o,vmlinux)
 ifdef CONFIG_DEBUG_INFO_BTF_MODULES
     +$(if $(newer-prereqs),$(call cmd,btf_ko))

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it
  2020-11-10  1:19 ` [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it Andrii Nakryiko
  2020-11-11  1:04   ` Song Liu
@ 2020-11-17  3:44   ` David Ahern
  1 sibling, 0 replies; 26+ messages in thread
From: David Ahern @ 2020-11-17  3:44 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: kernel-team, linux-kernel, rafael, jeyu,
	Arnaldo Carvalho de Melo, Greg Kroah-Hartman, Masahiro Yamada,
	bpf, netdev, ast, daniel

On 11/9/20 6:19 PM, Andrii Nakryiko wrote:
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index d7a7bc3b6098..1e78faaf20a5 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -274,6 +274,15 @@ config DEBUG_INFO_BTF
>  	  Turning this on expects presence of pahole tool, which will convert
>  	  DWARF type info into equivalent deduplicated BTF type info.
>  
> +config PAHOLE_HAS_SPLIT_BTF
> +	def_bool $(success, test `$(PAHOLE) --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "119")
> +
> +config DEBUG_INFO_BTF_MODULES
> +	def_bool y
> +	depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF
> +	help
> +	  Generate compact split BTF type information for kernel modules.
> +
>  config GDB_SCRIPTS
>  	bool "Provide GDB scripts for kernel debugging"
>  	help

Thank you for adding a config option for this feature vs bumping the
required pahole version in scripts/link-vmlinux.sh. This is a much more
friendly way of handling kernel features that require support from build
tools.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it
  2020-11-16 21:24         ` Jakub Kicinski
  2020-11-16 22:35           ` Andrii Nakryiko
@ 2021-11-26 15:16           ` Fabian Grünbichler
  2021-12-08  0:08             ` Andrii Nakryiko
  1 sibling, 1 reply; 26+ messages in thread
From: Fabian Grünbichler @ 2021-11-26 15:16 UTC (permalink / raw)
  To: Andrii Nakryiko, Jakub Kicinski
  Cc: Arnaldo Carvalho de Melo, Andrii Nakryiko, Starovoitov, Alexei,
	bpf, Allan, Bruce W, Daniel Borkmann, Greg Kroah-Hartman, jeyu,
	Kernel Team, open list, Networking, rafael, Song Liu,
	Masahiro Yamada

On November 16, 2020 10:24 pm, Jakub Kicinski wrote:
> On Mon, 16 Nov 2020 12:34:17 -0800 Andrii Nakryiko wrote:
>> > This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
>> > supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
>> > kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
>> > first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
>> > "Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
>> > errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."
>> >
>> > Is that intentional?  
>> 
>> I wasn't aware of such a use pattern, so definitely not intentional.
>> But vmlinux is absolutely necessary to generate the module BTF. So I'm
>> wondering what's the proper fix here? Leave it as is (that error
>> message is actually surprisingly descriptive, btw)? Force vmlinux
>> build? Or skip BTF generation for that module?
> 
> For an external out-of-tree module there is no guarantee that vmlinux
> will even be on the system, no? So only the last option can work in
> that case.

a year late to the party, but it seems to me that this patch 
series/features also missed another, not yet fixed scenario. I have to 
admit I am not very well-versed in BTF/BPF matters though, so please 
take the analysis below with a grain of salt or two ;)

(am subscribed to LKML/netdev, but not the bpf list, so please keep me 
CCed if discussion moves there! apologies if too many people are CCed 
here, feel free to trim down to relevant people/lists)

many distros do their own tracking of kernel <-> module ABI (usually 
these checks use Module.symvers and some combination of lists/symbols/.. 
to skip/ignore[0]).

depending on detected changes, a kernel update can either
- bump ABI, resulting in a new kernel/modules package that is installed 
  next to the current one
- keep ABI, resulting in an updated kernel/modules package that is 
  installed over/instead of the current one

the former case is obviously not an issue, since the modules and vmlinux 
image shipped in that (set of) package(s) match. but in the later case 
of updated, compatible kernel image + modules with unchanged ABI
(which is important, as it allows shipping fixed modules that are 
loadable for a compatible, older, booted kernel image), the following is 
possible:
- install kernel+modules with ABI 1
- boot kernel with ABI 1
- install ABI-compatible upgrade (e.g. a security fix)
- load module
- BTF validation fails, because the base_btf (loaded at boot time) and 
  the offsets in the module's .BTF section (loaded at module load time) 
  aren't matching

of course the validation might also not fail but the parsed BTF info 
might be bogus, or the base_btf might be similar enough that validation 
passes and the parsed BTF data is correct.

in our case the symptoms look like this (exact details vary with kernel 
builds/modules, but likely not relevant):

Nov 24 11:39:11 host kernel: BPF:         type_id=7 bits_offset=0
Nov 24 11:39:11 host kernel: BPF:
Nov 24 11:39:11 host kernel: BPF:Invalid name
Nov 24 11:39:11 host kernel: BPF:
Nov 24 11:39:11 host kernel: failed to validate module [overlay] BTF: -22

where the booted kernel and the (attempted to get) loaded module are not 
from the same build, but the Module.symvers is matching and loading 
should thus work. adding some more debug logging reveals that the root 
cause is the module's BTF start_str_off being, well, off, since it's derived 
from vmlinux' BTF/base_btf. if it is too big, the name/type lookups will 
wrongly look in the base_btf, if it's too small, the name/type lookups 
will be offset within the module or wrongly look inside the module when 
they should look inside base_btf/vmlinux. in any case, random garbage is 
the result, usually tripping up some validation check (e.g. the first 
byte not being 0 when checking a name). but even if it's correct (old 
and new vmlinux image have the same nr_types/hdr.str_len), there is no 
guarantuee that the offsets into base_btf are pointing at the right 
stuff.

example with debug logging patched in, note the garbled names, and 
offset slightly below the (wrong) start_str_off:

----8<----

BPF:magic: 0xeb9f

BPF:version: 1

BPF:flags: 0x0

BPF:hdr_len: 24

BPF:type_off: 0

BPF:type_len: 9264

BPF:str_off: 9264

BPF:str_len: 5511

BPF:btf_total_size: 14799

BPF:[106314] STRUCT rimary_device
BPF:size=56 vlen=14
BPF:

BPF:offset at call: 1915394
BPF:offset too small, choosing base_btf: 1915397

BPF:offset after base_btf: 1915394

BPF:     ce type_id=49 bits_offset=0
BPF:

BPF:offset at call: 1915403
BPF:offset after base_btf: 6

BPF:     nfig type_id=49 bits_offset=64
BPF:

BPF:offset at call: 1915412
BPF:offset after base_btf: 15

BPF:     rdir type_id=49 bits_offset=128
BPF:

BPF:offset at call: 768428
BPF:offset too small, choosing base_btf: 1915397

BPF:offset after base_btf: 768428

BPF:     _dio type_id=56 bits_offset=192
BPF:

BPF:offset at call: 1915420
BPF:offset after base_btf: 23

BPF:     erdir type_id=56 bits_offset=200
BPF:

BPF:offset at call: 1915433
BPF:offset after base_btf: 36

BPF:first char wrong - 0

BPF:      type_id=56 bits_offset=208
BPF:
BPF:Invalid name STRUCT MEMBER (name offset 1915433)
BPF:

failed to validate module [overlay] BTF: -22

---->8----

also note how it's only after a few botched entries that a check 
actually trips up - not sure what the impliciations for crafted BTF info 
are, but might be worthy a closer look by someone more knowledgable as 
well..

it seems to me this can be solved on the distro/user side by tracking 
vmlinux BTF infos as part of the ABI tracking (how stable are those 
compared to the existing interfaces making up the kernel <-> module 
ABI/Module.symvers? does this effectively mean bumping ABI for any 
change anyway?) or by disabling CONFIG_DEBUG_INFO_BTF_MODULES.

on the kernel/libbpf side it could maybe be solved by storing a hash of 
the base_btf data used to generate the split BTF-sections inside the 
modules, and skip BTF loading/validating if another base_btf is 
currently loaded (so BTF is best-effort, if the booted kernel and the 
module are matching it works, if not module loading works but no BTF 
support). this might be a good safe-guard for split-BTF in general?

I'd appreciate input on how to proceed (we were recently hit by this in 
a downstream Debian derivative, and will disable BTF info for modules as 
an interim measure).

thanks!

0: e.g., Debian's: https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/bin/abiupdate.py


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it
  2021-11-26 15:16           ` Fabian Grünbichler
@ 2021-12-08  0:08             ` Andrii Nakryiko
  0 siblings, 0 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2021-12-08  0:08 UTC (permalink / raw)
  To: Fabian Grünbichler
  Cc: Jakub Kicinski, Arnaldo Carvalho de Melo, Andrii Nakryiko,
	Starovoitov, Alexei, bpf, Allan, Bruce W, Daniel Borkmann,
	Greg Kroah-Hartman, jeyu, Kernel Team, open list, Networking,
	rafael, Song Liu, Masahiro Yamada

On Fri, Nov 26, 2021 at 7:16 AM Fabian Grünbichler
<f.gruenbichler@proxmox.com> wrote:
>
> On November 16, 2020 10:24 pm, Jakub Kicinski wrote:
> > On Mon, 16 Nov 2020 12:34:17 -0800 Andrii Nakryiko wrote:
> >> > This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
> >> > supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
> >> > kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
> >> > first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
> >> > "Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
> >> > errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."
> >> >
> >> > Is that intentional?
> >>
> >> I wasn't aware of such a use pattern, so definitely not intentional.
> >> But vmlinux is absolutely necessary to generate the module BTF. So I'm
> >> wondering what's the proper fix here? Leave it as is (that error
> >> message is actually surprisingly descriptive, btw)? Force vmlinux
> >> build? Or skip BTF generation for that module?
> >
> > For an external out-of-tree module there is no guarantee that vmlinux
> > will even be on the system, no? So only the last option can work in
> > that case.
>
> a year late to the party, but it seems to me that this patch

hi, sorry, you email slipped through the cracks initially, just
getting back to it...

> series/features also missed another, not yet fixed scenario. I have to
> admit I am not very well-versed in BTF/BPF matters though, so please
> take the analysis below with a grain of salt or two ;)
>
> (am subscribed to LKML/netdev, but not the bpf list, so please keep me
> CCed if discussion moves there! apologies if too many people are CCed
> here, feel free to trim down to relevant people/lists)
>
> many distros do their own tracking of kernel <-> module ABI (usually
> these checks use Module.symvers and some combination of lists/symbols/..
> to skip/ignore[0]).
>
> depending on detected changes, a kernel update can either
> - bump ABI, resulting in a new kernel/modules package that is installed
>   next to the current one
> - keep ABI, resulting in an updated kernel/modules package that is
>   installed over/instead of the current one
>
> the former case is obviously not an issue, since the modules and vmlinux
> image shipped in that (set of) package(s) match. but in the later case
> of updated, compatible kernel image + modules with unchanged ABI
> (which is important, as it allows shipping fixed modules that are
> loadable for a compatible, older, booted kernel image), the following is
> possible:
> - install kernel+modules with ABI 1
> - boot kernel with ABI 1
> - install ABI-compatible upgrade (e.g. a security fix)
> - load module
> - BTF validation fails, because the base_btf (loaded at boot time) and
>   the offsets in the module's .BTF section (loaded at module load time)
>   aren't matching
>
> of course the validation might also not fail but the parsed BTF info
> might be bogus, or the base_btf might be similar enough that validation
> passes and the parsed BTF data is correct.
>
> in our case the symptoms look like this (exact details vary with kernel
> builds/modules, but likely not relevant):
>
> Nov 24 11:39:11 host kernel: BPF:         type_id=7 bits_offset=0
> Nov 24 11:39:11 host kernel: BPF:
> Nov 24 11:39:11 host kernel: BPF:Invalid name
> Nov 24 11:39:11 host kernel: BPF:
> Nov 24 11:39:11 host kernel: failed to validate module [overlay] BTF: -22
>
> where the booted kernel and the (attempted to get) loaded module are not
> from the same build, but the Module.symvers is matching and loading
> should thus work. adding some more debug logging reveals that the root
> cause is the module's BTF start_str_off being, well, off, since it's derived
> from vmlinux' BTF/base_btf. if it is too big, the name/type lookups will
> wrongly look in the base_btf, if it's too small, the name/type lookups
> will be offset within the module or wrongly look inside the module when
> they should look inside base_btf/vmlinux. in any case, random garbage is
> the result, usually tripping up some validation check (e.g. the first
> byte not being 0 when checking a name). but even if it's correct (old
> and new vmlinux image have the same nr_types/hdr.str_len), there is no
> guarantuee that the offsets into base_btf are pointing at the right
> stuff.
>
> example with debug logging patched in, note the garbled names, and
> offset slightly below the (wrong) start_str_off:
>
> ----8<----
>
> BPF:magic: 0xeb9f
>
> BPF:version: 1
>
> BPF:flags: 0x0
>
> BPF:hdr_len: 24
>
> BPF:type_off: 0
>
> BPF:type_len: 9264
>
> BPF:str_off: 9264
>
> BPF:str_len: 5511
>
> BPF:btf_total_size: 14799
>
> BPF:[106314] STRUCT rimary_device
> BPF:size=56 vlen=14
> BPF:
>
> BPF:offset at call: 1915394
> BPF:offset too small, choosing base_btf: 1915397
>
> BPF:offset after base_btf: 1915394
>
> BPF:     ce type_id=49 bits_offset=0
> BPF:
>
> BPF:offset at call: 1915403
> BPF:offset after base_btf: 6
>
> BPF:     nfig type_id=49 bits_offset=64
> BPF:
>
> BPF:offset at call: 1915412
> BPF:offset after base_btf: 15
>
> BPF:     rdir type_id=49 bits_offset=128
> BPF:
>
> BPF:offset at call: 768428
> BPF:offset too small, choosing base_btf: 1915397
>
> BPF:offset after base_btf: 768428
>
> BPF:     _dio type_id=56 bits_offset=192
> BPF:
>
> BPF:offset at call: 1915420
> BPF:offset after base_btf: 23
>
> BPF:     erdir type_id=56 bits_offset=200
> BPF:
>
> BPF:offset at call: 1915433
> BPF:offset after base_btf: 36
>
> BPF:first char wrong - 0
>
> BPF:      type_id=56 bits_offset=208
> BPF:
> BPF:Invalid name STRUCT MEMBER (name offset 1915433)
> BPF:
>
> failed to validate module [overlay] BTF: -22
>
> ---->8----
>
> also note how it's only after a few botched entries that a check
> actually trips up - not sure what the impliciations for crafted BTF info
> are, but might be worthy a closer look by someone more knowledgable as
> well..
>
> it seems to me this can be solved on the distro/user side by tracking
> vmlinux BTF infos as part of the ABI tracking (how stable are those
> compared to the existing interfaces making up the kernel <-> module
> ABI/Module.symvers? does this effectively mean bumping ABI for any

Yes, I think so, unfortunately. Any change in order of types emitted
by DWARF or any extra string might cause a big change in string
offsets or type IDs.


> change anyway?) or by disabling CONFIG_DEBUG_INFO_BTF_MODULES.
>
> on the kernel/libbpf side it could maybe be solved by storing a hash of
> the base_btf data used to generate the split BTF-sections inside the
> modules, and skip BTF loading/validating if another base_btf is
> currently loaded (so BTF is best-effort, if the booted kernel and the
> module are matching it works, if not module loading works but no BTF
> support). this might be a good safe-guard for split-BTF in general?
>
> I'd appreciate input on how to proceed (we were recently hit by this in
> a downstream Debian derivative, and will disable BTF info for modules as
> an interim measure).
>

Yeah, I think "BTF is best-effort" is a necessity for the complicated
setups you described above. It probably is good to keep strict
behavior (BTF fails to load -> fail loading the module), but allow you
to tune it with the Kconfig option. Most modules probably would be
just fine without its BTF loaded successfully.

As for the check sums, this would add another layer of protection, so
I think storing base vmlinux BTF checksum in some special ELF section
in module' ELF (e.g. ".BTF.base_checksum" or something along those
lines) is also a good idea and would be easy to implement. Insmod and
related tooling might even provide a validation check that vmlinux BTF
matches module's expectation, if necessary.

> thanks!
>
> 0: e.g., Debian's: https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/bin/abiupdate.py
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-12-08  0:08 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-10  1:19 [PATCH v4 bpf-next 0/5] Integrate kernel module BTF support Andrii Nakryiko
2020-11-10  1:19 ` [PATCH v4 bpf-next 1/5] bpf: add in-kernel split " Andrii Nakryiko
2020-11-10 17:49   ` Song Liu
2020-11-10 18:31     ` Andrii Nakryiko
2020-11-10 20:14       ` Song Liu
2020-11-10  1:19 ` [PATCH v4 bpf-next 2/5] bpf: assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO Andrii Nakryiko
2020-11-10 20:24   ` Song Liu
2020-11-10  1:19 ` [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it Andrii Nakryiko
2020-11-11  1:04   ` Song Liu
2020-11-16 19:55     ` Allan, Bruce W
2020-11-16 20:34       ` Andrii Nakryiko
2020-11-16 21:24         ` Jakub Kicinski
2020-11-16 22:35           ` Andrii Nakryiko
2021-11-26 15:16           ` Fabian Grünbichler
2021-12-08  0:08             ` Andrii Nakryiko
2020-11-17  3:44   ` David Ahern
2020-11-10  1:19 ` [PATCH v4 bpf-next 4/5] bpf: load and verify kernel module BTFs Andrii Nakryiko
2020-11-11  7:11   ` kernel test robot
2020-11-11 10:13   ` Jessica Yu
2020-11-11 20:11     ` Andrii Nakryiko
2020-11-13 10:31       ` Jessica Yu
2020-11-13 18:54         ` Andrii Nakryiko
2020-11-10  1:19 ` [PATCH v4 bpf-next 5/5] tools/bpftool: add support for in-kernel and named BTF in `btf show` Andrii Nakryiko
2020-11-11  1:15   ` Song Liu
2020-11-11  4:19     ` Andrii Nakryiko
2020-11-11  7:44       ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).