[PATCH RFC bpf-next 0/4] Support kernel module ksym variables

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC bpf-next  0/4] Support kernel module ksym variables
@ 2020-12-11  4:27 Andrii Nakryiko
  2020-12-11  4:27 ` [PATCH RFC bpf-next 1/4] selftests/bpf: sync RCU before unloading bpf_testmod Andrii Nakryiko
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2020-12-11  4:27 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team

This RFC is sent to show how ksym variable access in BPF program can be
supported for kernel modules and gather some feedback while necessary fixes
for pahole ([0]) are reviewed and hopefully will make it into 1.20 version.

This work builds on all the previous kernel and libbpf changes to support
kernel module BTFs. On top of that, BPF verifier will now support ldimm64 with
src_reg=BPF_PSEUDO_BTF_ID and non-zero insn[1].imm field, which contains BTF
FD for kernel module. In such case, used module BTF, similarly to used BPF
map, will be recorded and refcnt will be increased for both BTF object itself
and its kernel module. This makes sure kernel module won't be unloaded from
under active attached BPF program.

New selftest validates all this is working as intended. bpf_testmod.ko is
extended with per-CPU variable.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=400229&state=*

Andrii Nakryiko (4):
  selftests/bpf: sync RCU before unloading bpf_testmod
  bpf: support BPF ksym variables in kernel modules
  libbpf: support kernel module ksym externs
  selftests/bpf: test kernel module ksym externs

 include/linux/bpf.h                           |   9 ++
 include/linux/bpf_verifier.h                  |   3 +
 include/linux/btf.h                           |   3 +
 kernel/bpf/btf.c                              |  31 +++-
 kernel/bpf/core.c                             |  23 +++
 kernel/bpf/verifier.c                         | 149 ++++++++++++++----
 tools/lib/bpf/libbpf.c                        |  47 ++++--
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   |   3 +
 .../selftests/bpf/prog_tests/btf_map_in_map.c |  33 ----
 .../selftests/bpf/prog_tests/ksyms_module.c   |  33 ++++
 .../selftests/bpf/progs/test_ksyms_module.c   |  26 +++
 tools/testing/selftests/bpf/test_progs.c      |  40 +++++
 tools/testing/selftests/bpf/test_progs.h      |   1 +
 13 files changed, 321 insertions(+), 80 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_module.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_module.c

-- 
2.24.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RFC bpf-next  1/4] selftests/bpf: sync RCU before unloading bpf_testmod
  2020-12-11  4:27 [PATCH RFC bpf-next 0/4] Support kernel module ksym variables Andrii Nakryiko
@ 2020-12-11  4:27 ` Andrii Nakryiko
  2020-12-11  4:27 ` [PATCH RFC bpf-next 2/4] bpf: support BPF ksym variables in kernel modules Andrii Nakryiko
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2020-12-11  4:27 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team

If some of the subtests use module BTFs through ksyms, they will cause
bpf_prog to take a refcount on bpf_testmod module, which will prevent it from
successfully unloading. Module's refcnt is decremented when bpf_prog is freed,
which generally happens in RCU callback. So reuse kernel_sync_rcu(), so far
utilized by map-in-map selftest, to wait a bit before attempting to unload
bpf_testmod.

Fixes: 9f7fa225894c ("selftests/bpf: Add bpf_testmod kernel module for testing")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/btf_map_in_map.c | 33 ---------------
 tools/testing/selftests/bpf/test_progs.c      | 40 +++++++++++++++++++
 tools/testing/selftests/bpf/test_progs.h      |  1 +
 3 files changed, 41 insertions(+), 33 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
index 76ebe4c250f1..eb90a6b8850d 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
@@ -20,39 +20,6 @@ static __u32 bpf_map_id(struct bpf_map *map)
 	return info.id;
 }
 
-/*
- * Trigger synchronize_rcu() in kernel.
- *
- * ARRAY_OF_MAPS/HASH_OF_MAPS lookup/update operations trigger synchronize_rcu()
- * if looking up an existing non-NULL element or updating the map with a valid
- * inner map FD. Use this fact to trigger synchronize_rcu(): create map-in-map,
- * create a trivial ARRAY map, update map-in-map with ARRAY inner map. Then
- * cleanup. At the end, at least one synchronize_rcu() would be called.
- */
-static int kern_sync_rcu(void)
-{
-	int inner_map_fd, outer_map_fd, err, zero = 0;
-
-	inner_map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 4, 1, 0);
-	if (CHECK(inner_map_fd < 0, "inner_map_create", "failed %d\n", -errno))
-		return -1;
-
-	outer_map_fd = bpf_create_map_in_map(BPF_MAP_TYPE_ARRAY_OF_MAPS, NULL,
-					     sizeof(int), inner_map_fd, 1, 0);
-	if (CHECK(outer_map_fd < 0, "outer_map_create", "failed %d\n", -errno)) {
-		close(inner_map_fd);
-		return -1;
-	}
-
-	err = bpf_map_update_elem(outer_map_fd, &zero, &inner_map_fd, 0);
-	if (err)
-		err = -errno;
-	CHECK(err, "outer_map_update", "failed %d\n", err);
-	close(inner_map_fd);
-	close(outer_map_fd);
-	return err;
-}
-
 static void test_lookup_update(void)
 {
 	int map1_fd, map2_fd, map3_fd, map4_fd, map5_fd, map1_id, map2_id;
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 5ef081bdae4e..6e59e6bcb5d4 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -370,8 +370,48 @@ static int delete_module(const char *name, int flags)
 	return syscall(__NR_delete_module, name, flags);
 }
 
+/*
+ * Trigger synchronize_rcu() in kernel.
+ *
+ * ARRAY_OF_MAPS/HASH_OF_MAPS lookup/update operations trigger synchronize_rcu()
+ * if looking up an existing non-NULL element or updating the map with a valid
+ * inner map FD. Use this fact to trigger synchronize_rcu(): create map-in-map,
+ * create a trivial ARRAY map, update map-in-map with ARRAY inner map. Then
+ * cleanup. At the end, at least one synchronize_rcu() would be called.
+ */
+int kern_sync_rcu(void)
+{
+	int inner_map_fd, outer_map_fd, err, zero = 0;
+
+	inner_map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 4, 1, 0);
+	if (inner_map_fd < 0) {
+		fprintf(env.stderr,
+			"sync_rcu: failed to created outer map: %d\n", -errno);
+		return -1;
+	}
+
+	outer_map_fd = bpf_create_map_in_map(BPF_MAP_TYPE_ARRAY_OF_MAPS, NULL,
+					     sizeof(int), inner_map_fd, 1, 0);
+	if (outer_map_fd < 0) {
+		fprintf(env.stderr,
+			"sync_rcu: failed to created outer map: %d\n", -errno);
+		close(inner_map_fd);
+		return -1;
+	}
+
+	err = bpf_map_update_elem(outer_map_fd, &zero, &inner_map_fd, 0);
+	if (err)
+		err = -errno;
+
+	close(inner_map_fd);
+	close(outer_map_fd);
+	return err;
+}
+
 static void unload_bpf_testmod(void)
 {
+	if (kern_sync_rcu())
+		fprintf(env.stderr, "Failed to trigger kernel-side RCU sync!\n");
 	if (delete_module("bpf_testmod", 0)) {
 		if (errno == ENOENT) {
 			if (env.verbosity > VERBOSE_NONE)
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 115953243f62..e49e2fdde942 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -219,6 +219,7 @@ int bpf_find_map(const char *test, struct bpf_object *obj, const char *name);
 int compare_map_keys(int map1_fd, int map2_fd);
 int compare_stack_ips(int smap_fd, int amap_fd, int stack_trace_len);
 int extract_build_id(char *build_id, size_t size);
+int kern_sync_rcu(void);
 
 #ifdef __x86_64__
 #define SYS_NANOSLEEP_KPROBE_NAME "__x64_sys_nanosleep"
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC bpf-next  2/4] bpf: support BPF ksym variables in kernel modules
  2020-12-11  4:27 [PATCH RFC bpf-next 0/4] Support kernel module ksym variables Andrii Nakryiko
  2020-12-11  4:27 ` [PATCH RFC bpf-next 1/4] selftests/bpf: sync RCU before unloading bpf_testmod Andrii Nakryiko
@ 2020-12-11  4:27 ` Andrii Nakryiko
  2020-12-11 11:55   ` kernel test robot
  2020-12-11 21:27   ` Alexei Starovoitov
  2020-12-11  4:27 ` [PATCH RFC bpf-next 3/4] libbpf: support kernel module ksym externs Andrii Nakryiko
  2020-12-11  4:27 ` [PATCH RFC bpf-next 4/4] selftests/bpf: test " Andrii Nakryiko
  3 siblings, 2 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2020-12-11  4:27 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team

Add support for directly accessing kernel module variables from BPF programs
using special ldimm64 instructions. This functionality builds upon vmlinux
ksym support, but extends ldimm64 with src_reg=BPF_PSEUDO_BTF_ID to allow
specifying kernel module BTF's FD in insn[1].imm field.

During BPF program load time, verifier will resolve FD to BTF object and will
take reference on BTF object itself and, for module BTFs, corresponding module
as well, to make sure it won't be unloaded from under running BPF program. The
mechanism used is similar to how bpf_prog keeps track of used bpf_maps.

Better naming suggestions for struct btf_mod_pair is greatly appreciated.

One interesting change is also in how per-CPU variable is determined. The
logic is to find .data..percpu data section in provided BTF, but both vmlinux
and module each have their own .data..percpu entries in BTF. So for module's
case, the search for DATASEC record needs to look at only module's added BTF
types. This is implemented with custom search function.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h          |   9 +++
 include/linux/bpf_verifier.h |   3 +
 include/linux/btf.h          |   3 +
 kernel/bpf/btf.c             |  31 +++++++-
 kernel/bpf/core.c            |  23 ++++++
 kernel/bpf/verifier.c        | 149 ++++++++++++++++++++++++++++-------
 6 files changed, 188 insertions(+), 30 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 07cb5d15e743..408db1122e9a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -761,9 +761,15 @@ struct bpf_ctx_arg_aux {
 	u32 btf_id;
 };
 
+struct btf_mod_pair {
+	struct btf *btf;
+	struct module *module;
+};
+
 struct bpf_prog_aux {
 	atomic64_t refcnt;
 	u32 used_map_cnt;
+	u32 used_btf_cnt;
 	u32 max_ctx_offset;
 	u32 max_pkt_offset;
 	u32 max_tp_access;
@@ -802,6 +808,7 @@ struct bpf_prog_aux {
 	const struct bpf_prog_ops *ops;
 	struct bpf_map **used_maps;
 	struct mutex used_maps_mutex; /* mutex for used_maps and used_map_cnt */
+	struct btf_mod_pair *used_btfs;
 	struct bpf_prog *prog;
 	struct user_struct *user;
 	u64 load_time; /* ns since boottime */
@@ -1208,6 +1215,8 @@ struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
 void bpf_prog_put(struct bpf_prog *prog);
 void __bpf_free_used_maps(struct bpf_prog_aux *aux,
 			  struct bpf_map **used_maps, u32 len);
+void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
+			  struct btf_mod_pair *used_btfs, u32 len);
 
 void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
 void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index e941fe1484e5..dfe6f85d97dd 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -340,6 +340,7 @@ struct bpf_insn_aux_data {
 };
 
 #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
+#define MAX_USED_BTFS 64 /* max number of BTFs accessed by one BPF program */
 
 #define BPF_VERIFIER_TMP_LOG_SIZE	1024
 
@@ -398,7 +399,9 @@ struct bpf_verifier_env {
 	struct bpf_verifier_state_list **explored_states; /* search pruning optimization */
 	struct bpf_verifier_state_list *free_list;
 	struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
+	struct btf_mod_pair used_btfs[MAX_USED_BTFS]; /* array of BTF's used by BPF program */
 	u32 used_map_cnt;		/* number of used maps */
+	u32 used_btf_cnt;		/* number of used BTF objects */
 	u32 id_gen;			/* used to generate unique reg IDs */
 	bool allow_ptr_leaks;
 	bool allow_ptr_to_map_access;
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 4c200f5d242b..7fabf1428093 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -91,6 +91,9 @@ int btf_type_snprintf_show(const struct btf *btf, u32 type_id, void *obj,
 int btf_get_fd_by_id(u32 id);
 u32 btf_obj_id(const struct btf *btf);
 bool btf_is_kernel(const struct btf *btf);
+bool btf_is_module(const struct btf *btf);
+struct module *btf_try_get_module(const struct btf *btf);
+u32 btf_nr_types(const struct btf *btf);
 bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   const struct btf_member *m,
 			   u32 expected_offset, u32 expected_size);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 8d6bdb4f4d61..7ccc0133723a 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -458,7 +458,7 @@ static bool btf_type_is_datasec(const struct btf_type *t)
 	return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC;
 }
 
-static u32 btf_nr_types_total(const struct btf *btf)
+u32 btf_nr_types(const struct btf *btf)
 {
 	u32 total = 0;
 
@@ -476,7 +476,7 @@ s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
 	const char *tname;
 	u32 i, total;
 
-	total = btf_nr_types_total(btf);
+	total = btf_nr_types(btf);
 	for (i = 1; i < total; i++) {
 		t = btf_type_by_id(btf, i);
 		if (BTF_INFO_KIND(t->info) != kind)
@@ -5743,6 +5743,11 @@ bool btf_is_kernel(const struct btf *btf)
 	return btf->kernel_btf;
 }
 
+bool btf_is_module(const struct btf *btf)
+{
+	return btf->kernel_btf && strcmp(btf->name, "vmlinux") != 0;
+}
+
 static int btf_id_cmp_func(const void *a, const void *b)
 {
 	const int *pa = a, *pb = b;
@@ -5877,3 +5882,25 @@ static int __init btf_module_init(void)
 
 fs_initcall(btf_module_init);
 #endif /* CONFIG_DEBUG_INFO_BTF_MODULES */
+
+struct module *btf_try_get_module(const struct btf *btf)
+{
+	struct module *res = NULL;
+#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
+	struct btf_module *btf_mod, *tmp;
+
+	mutex_lock(&btf_module_mutex);
+	list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
+		if (btf_mod->btf != btf)
+			continue;
+
+		if (try_module_get(btf_mod->module))
+			res = btf_mod->module;
+
+		break;
+	}
+	mutex_unlock(&btf_module_mutex);
+#endif
+
+	return res;
+}
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 261f8692d0d2..69c3c308de5e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2119,6 +2119,28 @@ static void bpf_free_used_maps(struct bpf_prog_aux *aux)
 	kfree(aux->used_maps);
 }
 
+void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
+			  struct btf_mod_pair *used_btfs, u32 len)
+{
+#ifdef CONFIG_BPF_SYSCALL
+	struct btf_mod_pair *btf_mod;
+	u32 i;
+
+	for (i = 0; i < len; i++) {
+		btf_mod = &used_btfs[i];
+		if (btf_mod->module)
+			module_put(btf_mod->module);
+		btf_put(btf_mod->btf);
+	}
+#endif
+}
+
+static void bpf_free_used_btfs(struct bpf_prog_aux *aux)
+{
+	__bpf_free_used_btfs(aux, aux->used_btfs, aux->used_btf_cnt);
+	kfree(aux->used_btfs);
+}
+
 static void bpf_prog_free_deferred(struct work_struct *work)
 {
 	struct bpf_prog_aux *aux;
@@ -2126,6 +2148,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 
 	aux = container_of(work, struct bpf_prog_aux, work);
 	bpf_free_used_maps(aux);
+	bpf_free_used_btfs(aux);
 	if (bpf_prog_is_dev_bound(aux))
 		bpf_prog_offload_destroy(aux->prog);
 #ifdef CONFIG_PERF_EVENTS
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 93def76cf32b..ac0cf84a2d67 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9702,6 +9702,31 @@ static int do_check(struct bpf_verifier_env *env)
 	return 0;
 }
 
+static int find_btf_percpu_datasec(struct btf *btf)
+{
+	const struct btf_type *t;
+	const char *tname;
+	int i, n;
+
+	n = btf_nr_types(btf);
+	if (btf_is_module(btf))
+		i = btf_nr_types(btf_vmlinux);
+	else
+		i = 1;
+
+	for(; i < n; i++) {
+		t = btf_type_by_id(btf, i);
+		if (BTF_INFO_KIND(t->info) != BTF_KIND_DATASEC)
+			continue;
+
+		tname = btf_name_by_offset(btf, t->name_off);
+		if (!strcmp(tname, ".data..percpu"))
+			return i;
+	}
+
+	return -ENOENT;
+}
+
 /* replace pseudo btf_id with kernel symbol address */
 static int check_pseudo_btf_id(struct bpf_verifier_env *env,
 			       struct bpf_insn *insn,
@@ -9709,48 +9734,57 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env,
 {
 	const struct btf_var_secinfo *vsi;
 	const struct btf_type *datasec;
+	struct btf_mod_pair *btf_mod;
 	const struct btf_type *t;
 	const char *sym_name;
 	bool percpu = false;
 	u32 type, id = insn->imm;
+	struct btf *btf;
 	s32 datasec_id;
 	u64 addr;
-	int i;
+	int i, btf_fd, err;
 
-	if (!btf_vmlinux) {
-		verbose(env, "kernel is missing BTF, make sure CONFIG_DEBUG_INFO_BTF=y is specified in Kconfig.\n");
-		return -EINVAL;
-	}
-
-	if (insn[1].imm != 0) {
-		verbose(env, "reserved field (insn[1].imm) is used in pseudo_btf_id ldimm64 insn.\n");
-		return -EINVAL;
+	btf_fd = insn[1].imm;
+	if (btf_fd) {
+		btf = btf_get_by_fd(btf_fd);
+		if (IS_ERR(btf)) {
+			verbose(env, "invalid module BTF object FD specified.\n");
+			return -EINVAL;
+		}
+	} else {
+		if (!btf_vmlinux) {
+			verbose(env, "kernel is missing BTF, make sure CONFIG_DEBUG_INFO_BTF=y is specified in Kconfig.\n");
+			return -EINVAL;
+		}
+		btf = btf_vmlinux;
+		btf_get(btf);
 	}
 
-	t = btf_type_by_id(btf_vmlinux, id);
+	t = btf_type_by_id(btf, id);
 	if (!t) {
 		verbose(env, "ldimm64 insn specifies invalid btf_id %d.\n", id);
-		return -ENOENT;
+		err = -ENOENT;
+		goto err_put;
 	}
 
 	if (!btf_type_is_var(t)) {
-		verbose(env, "pseudo btf_id %d in ldimm64 isn't KIND_VAR.\n",
-			id);
-		return -EINVAL;
+		verbose(env, "pseudo btf_id %d in ldimm64 isn't KIND_VAR.\n", id);
+		err = -EINVAL;
+		goto err_put;
 	}
 
-	sym_name = btf_name_by_offset(btf_vmlinux, t->name_off);
+	sym_name = btf_name_by_offset(btf, t->name_off);
 	addr = kallsyms_lookup_name(sym_name);
 	if (!addr) {
 		verbose(env, "ldimm64 failed to find the address for kernel symbol '%s'.\n",
 			sym_name);
-		return -ENOENT;
+		err = -ENOENT;
+		goto err_put;
 	}
 
-	datasec_id = btf_find_by_name_kind(btf_vmlinux, ".data..percpu",
-					   BTF_KIND_DATASEC);
+	datasec_id = find_btf_percpu_datasec(btf);
 	if (datasec_id > 0) {
-		datasec = btf_type_by_id(btf_vmlinux, datasec_id);
+		datasec = btf_type_by_id(btf, datasec_id);
 		for_each_vsi(i, datasec, vsi) {
 			if (vsi->type == id) {
 				percpu = true;
@@ -9763,10 +9797,10 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env,
 	insn[1].imm = addr >> 32;
 
 	type = t->type;
-	t = btf_type_skip_modifiers(btf_vmlinux, type, NULL);
+	t = btf_type_skip_modifiers(btf, type, NULL);
 	if (percpu) {
 		aux->btf_var.reg_type = PTR_TO_PERCPU_BTF_ID;
-		aux->btf_var.btf = btf_vmlinux;
+		aux->btf_var.btf = btf;
 		aux->btf_var.btf_id = type;
 	} else if (!btf_type_is_struct(t)) {
 		const struct btf_type *ret;
@@ -9774,21 +9808,54 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env,
 		u32 tsize;
 
 		/* resolve the type size of ksym. */
-		ret = btf_resolve_size(btf_vmlinux, t, &tsize);
+		ret = btf_resolve_size(btf, t, &tsize);
 		if (IS_ERR(ret)) {
-			tname = btf_name_by_offset(btf_vmlinux, t->name_off);
+			tname = btf_name_by_offset(btf, t->name_off);
 			verbose(env, "ldimm64 unable to resolve the size of type '%s': %ld\n",
 				tname, PTR_ERR(ret));
-			return -EINVAL;
+			err = -EINVAL;
+			goto err_put;
 		}
 		aux->btf_var.reg_type = PTR_TO_MEM;
 		aux->btf_var.mem_size = tsize;
 	} else {
 		aux->btf_var.reg_type = PTR_TO_BTF_ID;
-		aux->btf_var.btf = btf_vmlinux;
+		aux->btf_var.btf = btf;
 		aux->btf_var.btf_id = type;
 	}
+
+	/* check whether we recorded this BTF (and maybe module) already */
+	for (i = 0; i < env->used_btf_cnt; i++) {
+		if (env->used_btfs[i].btf == btf) {
+			btf_put(btf);
+			return 0;
+		}
+	}
+
+	if (env->used_btf_cnt >= MAX_USED_BTFS) {
+		err = -E2BIG;
+		goto err_put;
+	}
+
+	btf_mod = &env->used_btfs[env->used_btf_cnt];
+	btf_mod->btf = btf;
+	btf_mod->module = NULL;
+
+	/* if we reference variables from kernel module, bump its refcount */
+	if (btf_is_module(btf)) {
+		btf_mod->module = btf_try_get_module(btf);
+		if (!btf_mod->module) {
+			err = -ENXIO;
+			goto err_put;
+		}
+	}
+
+	env->used_btf_cnt++;
+
 	return 0;
+err_put:
+	btf_put(btf);
+	return err;
 }
 
 static int check_map_prealloc(struct bpf_map *map)
@@ -10085,6 +10152,13 @@ static void release_maps(struct bpf_verifier_env *env)
 			     env->used_map_cnt);
 }
 
+/* drop refcnt of maps used by the rejected program */
+static void release_btfs(struct bpf_verifier_env *env)
+{
+	__bpf_free_used_btfs(env->prog->aux, env->used_btfs,
+			     env->used_btf_cnt);
+}
+
 /* convert pseudo BPF_LD_IMM64 into generic BPF_LD_IMM64 */
 static void convert_pseudo_ld_imm64(struct bpf_verifier_env *env)
 {
@@ -12097,7 +12171,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 		goto err_release_maps;
 	}
 
-	if (ret == 0 && env->used_map_cnt) {
+	if (ret)
+		goto err_release_maps;
+
+	if (env->used_map_cnt) {
 		/* if program passed verifier, update used_maps in bpf_prog_info */
 		env->prog->aux->used_maps = kmalloc_array(env->used_map_cnt,
 							  sizeof(env->used_maps[0]),
@@ -12111,15 +12188,29 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 		memcpy(env->prog->aux->used_maps, env->used_maps,
 		       sizeof(env->used_maps[0]) * env->used_map_cnt);
 		env->prog->aux->used_map_cnt = env->used_map_cnt;
+	}
+	if (env->used_btf_cnt) {
+		/* if program passed verifier, update used_btfs in bpf_prog_aux */
+		env->prog->aux->used_btfs = kmalloc_array(env->used_btf_cnt,
+							  sizeof(env->used_btfs[0]),
+							  GFP_KERNEL);
+		if (!env->prog->aux->used_btfs) {
+			ret = -ENOMEM;
+			goto err_release_maps;
+		}
 
+		memcpy(env->prog->aux->used_btfs, env->used_btfs,
+		       sizeof(env->used_btfs[0]) * env->used_btf_cnt);
+		env->prog->aux->used_btf_cnt = env->used_btf_cnt;
+	}
+	if (env->used_map_cnt || env->used_btf_cnt) {
 		/* program is valid. Convert pseudo bpf_ld_imm64 into generic
 		 * bpf_ld_imm64 instructions
 		 */
 		convert_pseudo_ld_imm64(env);
 	}
 
-	if (ret == 0)
-		adjust_btf_func(env);
+	adjust_btf_func(env);
 
 err_release_maps:
 	if (!env->prog->aux->used_maps)
@@ -12127,6 +12218,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 		 * them now. Otherwise free_used_maps() will release them.
 		 */
 		release_maps(env);
+	if (!env->prog->aux->used_btfs)
+		release_btfs(env);
 
 	/* extension progs temporarily inherit the attach_type of their targets
 	   for verification purposes, so set it back to zero before returning
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC bpf-next  3/4] libbpf: support kernel module ksym externs
  2020-12-11  4:27 [PATCH RFC bpf-next 0/4] Support kernel module ksym variables Andrii Nakryiko
  2020-12-11  4:27 ` [PATCH RFC bpf-next 1/4] selftests/bpf: sync RCU before unloading bpf_testmod Andrii Nakryiko
  2020-12-11  4:27 ` [PATCH RFC bpf-next 2/4] bpf: support BPF ksym variables in kernel modules Andrii Nakryiko
@ 2020-12-11  4:27 ` Andrii Nakryiko
  2020-12-11  4:27 ` [PATCH RFC bpf-next 4/4] selftests/bpf: test " Andrii Nakryiko
  3 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2020-12-11  4:27 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team

Add support for searching for ksym externs not just in vmlinux BTF, but across
all module BTFs, similarly to how it's done for CO-RE relocations. Kernels
that expose module BTFs through sysfs are assumed to support new ldimm64
instruction extension with BTF FD provided in insn[1].imm field, so no extra
feature detection is performed.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 47 +++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9be88a90a4aa..5dd1975cb707 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -395,7 +395,8 @@ struct extern_desc {
 			unsigned long long addr;
 
 			/* target btf_id of the corresponding kernel var. */
-			int vmlinux_btf_id;
+			int kernel_btf_obj_fd;
+			int kernel_btf_id;
 
 			/* local btf_id of the ksym extern's type. */
 			__u32 type_id;
@@ -6156,7 +6157,8 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
 			} else /* EXT_KSYM */ {
 				if (ext->ksym.type_id) { /* typed ksyms */
 					insn[0].src_reg = BPF_PSEUDO_BTF_ID;
-					insn[0].imm = ext->ksym.vmlinux_btf_id;
+					insn[0].imm = ext->ksym.kernel_btf_id;
+					insn[1].imm = ext->ksym.kernel_btf_obj_fd;
 				} else { /* typeless ksyms */
 					insn[0].imm = (__u32)ext->ksym.addr;
 					insn[1].imm = ext->ksym.addr >> 32;
@@ -7313,7 +7315,8 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
 static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
 {
 	struct extern_desc *ext;
-	int i, id;
+	struct btf *btf;
+	int i, j, id, btf_fd, err;
 
 	for (i = 0; i < obj->nr_extern; i++) {
 		const struct btf_type *targ_var, *targ_type;
@@ -7325,8 +7328,22 @@ static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
 		if (ext->type != EXT_KSYM || !ext->ksym.type_id)
 			continue;
 
-		id = btf__find_by_name_kind(obj->btf_vmlinux, ext->name,
-					    BTF_KIND_VAR);
+		btf = obj->btf_vmlinux;
+		btf_fd = 0;
+		id = btf__find_by_name_kind(btf, ext->name, BTF_KIND_VAR);
+		if (id == -ENOENT) {
+			err = load_module_btfs(obj);
+			if (err)
+				return err;
+
+			for (j = 0; j < obj->btf_module_cnt; j++) {
+				btf = obj->btf_modules[j].btf;
+				btf_fd = obj->btf_modules[j].fd;
+				id = btf__find_by_name_kind(btf, ext->name, BTF_KIND_VAR);
+				if (id != -ENOENT)
+					break;
+			}
+		}
 		if (id <= 0) {
 			pr_warn("extern (ksym) '%s': failed to find BTF ID in vmlinux BTF.\n",
 				ext->name);
@@ -7337,24 +7354,19 @@ static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
 		local_type_id = ext->ksym.type_id;
 
 		/* find target type_id */
-		targ_var = btf__type_by_id(obj->btf_vmlinux, id);
-		targ_var_name = btf__name_by_offset(obj->btf_vmlinux,
-						    targ_var->name_off);
-		targ_type = skip_mods_and_typedefs(obj->btf_vmlinux,
-						   targ_var->type,
-						   &targ_type_id);
+		targ_var = btf__type_by_id(btf, id);
+		targ_var_name = btf__name_by_offset(btf, targ_var->name_off);
+		targ_type = skip_mods_and_typedefs(btf, targ_var->type, &targ_type_id);
 
 		ret = bpf_core_types_are_compat(obj->btf, local_type_id,
-						obj->btf_vmlinux, targ_type_id);
+						btf, targ_type_id);
 		if (ret <= 0) {
 			const struct btf_type *local_type;
 			const char *targ_name, *local_name;
 
 			local_type = btf__type_by_id(obj->btf, local_type_id);
-			local_name = btf__name_by_offset(obj->btf,
-							 local_type->name_off);
-			targ_name = btf__name_by_offset(obj->btf_vmlinux,
-							targ_type->name_off);
+			local_name = btf__name_by_offset(obj->btf, local_type->name_off);
+			targ_name = btf__name_by_offset(btf, targ_type->name_off);
 
 			pr_warn("extern (ksym) '%s': incompatible types, expected [%d] %s %s, but kernel has [%d] %s %s\n",
 				ext->name, local_type_id,
@@ -7364,7 +7376,8 @@ static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
 		}
 
 		ext->is_set = true;
-		ext->ksym.vmlinux_btf_id = id;
+		ext->ksym.kernel_btf_obj_fd = btf_fd;
+		ext->ksym.kernel_btf_id = id;
 		pr_debug("extern (ksym) '%s': resolved to [%d] %s %s\n",
 			 ext->name, id, btf_kind_str(targ_var), targ_var_name);
 	}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC bpf-next  4/4] selftests/bpf: test kernel module ksym externs
  2020-12-11  4:27 [PATCH RFC bpf-next 0/4] Support kernel module ksym variables Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2020-12-11  4:27 ` [PATCH RFC bpf-next 3/4] libbpf: support kernel module ksym externs Andrii Nakryiko
@ 2020-12-11  4:27 ` Andrii Nakryiko
  3 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2020-12-11  4:27 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team

Add per-CPU variable to bpf_testmod.ko and use those from new selftest to
validate it works end-to-end.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   |  3 ++
 .../selftests/bpf/prog_tests/ksyms_module.c   | 33 +++++++++++++++++++
 .../selftests/bpf/progs/test_ksyms_module.c   | 26 +++++++++++++++
 3 files changed, 62 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_module.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_module.c

diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 2df19d73ca49..0b991e115d1f 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -3,6 +3,7 @@
 #include <linux/error-injection.h>
 #include <linux/init.h>
 #include <linux/module.h>
+#include <linux/percpu-defs.h>
 #include <linux/sysfs.h>
 #include <linux/tracepoint.h>
 #include "bpf_testmod.h"
@@ -10,6 +11,8 @@
 #define CREATE_TRACE_POINTS
 #include "bpf_testmod-events.h"
 
+DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123;
+
 noinline ssize_t
 bpf_testmod_test_read(struct file *file, struct kobject *kobj,
 		      struct bin_attribute *bin_attr,
diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms_module.c b/tools/testing/selftests/bpf/prog_tests/ksyms_module.c
new file mode 100644
index 000000000000..af6234e46cf7
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/ksyms_module.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+
+#include <test_progs.h>
+#include <bpf/libbpf.h>
+#include <bpf/btf.h>
+#include "test_ksyms_module.skel.h"
+
+static int duration;
+
+void test_ksyms_module(void)
+{
+	struct test_ksyms_module* skel;
+	struct test_ksyms_module__bss *bss;
+	int err;
+
+	skel = test_ksyms_module__open_and_load();
+	if (CHECK(!skel, "skel_open", "failed to open skeleton\n"))
+		return;
+	bss = skel->bss;
+
+	err = test_ksyms_module__attach(skel);
+	if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err))
+		goto cleanup;
+
+	usleep(1);
+
+	ASSERT_EQ(bss->triggered, true, "triggered");
+	ASSERT_EQ(bss->out_mod_ksym_global, 123, "global_ksym_val");
+
+cleanup:
+	test_ksyms_module__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_ksyms_module.c b/tools/testing/selftests/bpf/progs/test_ksyms_module.c
new file mode 100644
index 000000000000..ede7602410e1
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_ksyms_module.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+
+extern const int bpf_testmod_ksym_percpu __ksym;
+
+int out_mod_ksym_global = 0;
+bool triggered = false;
+
+SEC("raw_tp/sys_enter")
+int handler(const void *ctx)
+{
+	int *val;
+	__u32 cpu;
+
+	val = (int *)bpf_this_cpu_ptr(&bpf_testmod_ksym_percpu);
+	out_mod_ksym_global = *val;
+	triggered = true;
+
+	return 0;
+}
+
+char LICENSE[] SEC("license") = "GPL";
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next 2/4] bpf: support BPF ksym variables in kernel modules
  2020-12-11  4:27 ` [PATCH RFC bpf-next 2/4] bpf: support BPF ksym variables in kernel modules Andrii Nakryiko
@ 2020-12-11 11:55   ` kernel test robot
  2020-12-11 21:27   ` Alexei Starovoitov
  1 sibling, 0 replies; 10+ messages in thread
From: kernel test robot @ 2020-12-11 11:55 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4504 bytes --]

Hi Andrii,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Andrii-Nakryiko/Support-kernel-module-ksym-variables/20201211-123414
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: i386-randconfig-r036-20201209 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/6612e8dbf119471e023d79a56d7834fceeb3e33b
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Andrii-Nakryiko/Support-kernel-module-ksym-variables/20201211-123414
        git checkout 6612e8dbf119471e023d79a56d7834fceeb3e33b
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/bpf/core.c:1350:12: warning: no previous prototype for 'bpf_probe_read_kernel' [-Wmissing-prototypes]
    1350 | u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
         |            ^~~~~~~~~~~~~~~~~~~~~
   In file included from kernel/bpf/core.c:21:
   kernel/bpf/core.c: In function '___bpf_prog_run':
   include/linux/filter.h:888:3: warning: cast between incompatible function types from 'u64 (*)(u64,  u64,  u64,  u64,  u64)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int)'} to 'u64 (*)(u64,  u64,  u64,  u64,  u64,  const struct bpf_insn *)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  const struct bpf_insn *)'} [-Wcast-function-type]
     888 |  ((u64 (*)(u64, u64, u64, u64, u64, const struct bpf_insn *)) \
         |   ^
   kernel/bpf/core.c:1518:13: note: in expansion of macro '__bpf_call_base_args'
    1518 |   BPF_R0 = (__bpf_call_base_args + insn->imm)(BPF_R1, BPF_R2,
         |             ^~~~~~~~~~~~~~~~~~~~
   kernel/bpf/core.c: At top level:
   kernel/bpf/core.c:1704:6: warning: no previous prototype for 'bpf_patch_call_args' [-Wmissing-prototypes]
    1704 | void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth)
         |      ^~~~~~~~~~~~~~~~~~~
   In file included from kernel/bpf/core.c:21:
   kernel/bpf/core.c: In function 'bpf_patch_call_args':
   include/linux/filter.h:888:3: warning: cast between incompatible function types from 'u64 (*)(u64,  u64,  u64,  u64,  u64)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int)'} to 'u64 (*)(u64,  u64,  u64,  u64,  u64,  const struct bpf_insn *)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  const struct bpf_insn *)'} [-Wcast-function-type]
     888 |  ((u64 (*)(u64, u64, u64, u64, u64, const struct bpf_insn *)) \
         |   ^
   kernel/bpf/core.c:1709:3: note: in expansion of macro '__bpf_call_base_args'
    1709 |   __bpf_call_base_args;
         |   ^~~~~~~~~~~~~~~~~~~~
   kernel/bpf/core.c: At top level:
   kernel/bpf/core.c:2102:6: warning: no previous prototype for '__bpf_free_used_maps' [-Wmissing-prototypes]
    2102 | void __bpf_free_used_maps(struct bpf_prog_aux *aux,
         |      ^~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/core.c:2122:6: warning: no previous prototype for '__bpf_free_used_btfs' [-Wmissing-prototypes]
    2122 | void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
         |      ^~~~~~~~~~~~~~~~~~~~

vim +/__bpf_free_used_btfs +2122 kernel/bpf/core.c

  2121	
> 2122	void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
  2123				  struct btf_mod_pair *used_btfs, u32 len)
  2124	{
  2125	#ifdef CONFIG_BPF_SYSCALL
  2126		struct btf_mod_pair *btf_mod;
  2127		u32 i;
  2128	
  2129		for (i = 0; i < len; i++) {
  2130			btf_mod = &used_btfs[i];
  2131			if (btf_mod->module)
  2132				module_put(btf_mod->module);
  2133			btf_put(btf_mod->btf);
  2134		}
  2135	#endif
  2136	}
  2137	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35634 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next  2/4] bpf: support BPF ksym variables in kernel modules
  2020-12-11  4:27 ` [PATCH RFC bpf-next 2/4] bpf: support BPF ksym variables in kernel modules Andrii Nakryiko
  2020-12-11 11:55   ` kernel test robot
@ 2020-12-11 21:27   ` Alexei Starovoitov
  2020-12-11 22:15     ` Andrii Nakryiko
  1 sibling, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2020-12-11 21:27 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, netdev, ast, daniel, kernel-team

On Thu, Dec 10, 2020 at 08:27:32PM -0800, Andrii Nakryiko wrote:
> During BPF program load time, verifier will resolve FD to BTF object and will
> take reference on BTF object itself and, for module BTFs, corresponding module
> as well, to make sure it won't be unloaded from under running BPF program. The
> mechanism used is similar to how bpf_prog keeps track of used bpf_maps.
...
> +
> +	/* if we reference variables from kernel module, bump its refcount */
> +	if (btf_is_module(btf)) {
> +		btf_mod->module = btf_try_get_module(btf);

Is it necessary to refcnt the module? Correct me if I'm wrong, but
for module's BTF we register a notifier. Then the module can be rmmod-ed
at any time and we will do btf_put() for corresponding BTF, but that BTF may
stay around because bpftool or something is looking at it.
Similarly when prog is attached to raw_tp in a module we currently do try_module_get(),
but is it really necessary ? When bpf is attached to a netdev the netdev can
be removed and the link will be dangling. May be it makes sense to do the same
with modules?  The raw_tp can become dangling after rmmod and the prog won't be
executed anymore. So hard coded address of a per-cpu var in a ksym will
be pointing to freed mod memory after rmmod, but that's ok, since that prog will
never execute.
On the other side if we envision a bpf prog attaching to a vmlinux function
and accessing per-cpu or normal ksym in some module it would need to inc refcnt
of that module, since we won't be able to guarantee that this prog will
not execute any more. So we cannot allow dangling memory addresses.
If latter is what we want to allow then we probably need a test case for it and
document the reasons for keeping modules pinned while progs access their data.
Since such pinning behavior is different from other bpf attaching cases where
underlying objects (like netdev and cgroup) can go away.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next 2/4] bpf: support BPF ksym variables in kernel modules
  2020-12-11 21:27   ` Alexei Starovoitov
@ 2020-12-11 22:15     ` Andrii Nakryiko
  2020-12-12  1:52       ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2020-12-11 22:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Fri, Dec 11, 2020 at 1:27 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Dec 10, 2020 at 08:27:32PM -0800, Andrii Nakryiko wrote:
> > During BPF program load time, verifier will resolve FD to BTF object and will
> > take reference on BTF object itself and, for module BTFs, corresponding module
> > as well, to make sure it won't be unloaded from under running BPF program. The
> > mechanism used is similar to how bpf_prog keeps track of used bpf_maps.
> ...
> > +
> > +     /* if we reference variables from kernel module, bump its refcount */
> > +     if (btf_is_module(btf)) {
> > +             btf_mod->module = btf_try_get_module(btf);
>
> Is it necessary to refcnt the module? Correct me if I'm wrong, but
> for module's BTF we register a notifier. Then the module can be rmmod-ed
> at any time and we will do btf_put() for corresponding BTF, but that BTF may
> stay around because bpftool or something is looking at it.

Correct, BTF object itself doesn't take a refcnt on module.

> Similarly when prog is attached to raw_tp in a module we currently do try_module_get(),
> but is it really necessary ? When bpf is attached to a netdev the netdev can
> be removed and the link will be dangling. May be it makes sense to do the same
> with modules?  The raw_tp can become dangling after rmmod and the prog won't be

So for raw_tp it's not the case today. I tested, I attached raw_tp,
kept triggering it in a loop, and tried to rmmod bpf_testmod. It
failed, because raw tracepoint takes refcnt on module. rmmod -f
bpf_testmod also didn't work, but it's because my kernel wasn't built
with force-unload enabled for modules. But force-unload is an entirely
different matter and it's inherently dangerous to do, it can crash and
corrupt anything in the kernel.

> executed anymore. So hard coded address of a per-cpu var in a ksym will
> be pointing to freed mod memory after rmmod, but that's ok, since that prog will
> never execute.

Not so fast :) Indeed, if somehow module gets unloaded while we keep
BPF program loaded, we'll point to unallocated memory **OR** to a
memory re-used for something else. That's bad. Nothing will crash even
if it's unmapped memory (due to bpf_probe_read semantics), but we will
potentially be reading some garbage (not zeroes), if some other module
re-uses that per-CPU memory.

As for the BPF program won't be triggered. That's not true in general,
as you mention yourself below.

> On the other side if we envision a bpf prog attaching to a vmlinux function
> and accessing per-cpu or normal ksym in some module it would need to inc refcnt
> of that module, since we won't be able to guarantee that this prog will
> not execute any more. So we cannot allow dangling memory addresses.

That's what my new selftest is doing actually. It's a generic
sys_enter raw_tp, which doesn't attach to the module, but it does read
module's per-CPU variable. So I actually ran a test before posting. I
successfully unloaded bpf_testmod, but kept running the prog. And it
kept returning *correct* per-CPU value. Most probably due to per-CPU
memory not unmapped and not yet reused for something else. But it's a
really nasty and surprising situation.

Keep in mind, also, that whenever BPF program declares per-cpu
variable extern, it doesn't know or care whether it will get resolved
to built-in vmlinux per-CPU variable or module per-CPU variable.
Restricting attachment to only module-provided hooks is both tedious
and might be quite surprising sometimes, seems not worth the pain.

> If latter is what we want to allow then we probably need a test case for it and
> document the reasons for keeping modules pinned while progs access their data.
> Since such pinning behavior is different from other bpf attaching cases where
> underlying objects (like netdev and cgroup) can go away.

See above, that's already the case for module tracepoints.

So in summary, I think we should take a refcnt on module, as that's
already the case for stuff like raw_tp. I can add more comments to
make this clear, of course.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next 2/4] bpf: support BPF ksym variables in kernel modules
  2020-12-11 22:15     ` Andrii Nakryiko
@ 2020-12-12  1:52       ` Alexei Starovoitov
  2020-12-12  5:23         ` Andrii Nakryiko
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2020-12-12  1:52 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Fri, Dec 11, 2020 at 02:15:28PM -0800, Andrii Nakryiko wrote:
> On Fri, Dec 11, 2020 at 1:27 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Dec 10, 2020 at 08:27:32PM -0800, Andrii Nakryiko wrote:
> > > During BPF program load time, verifier will resolve FD to BTF object and will
> > > take reference on BTF object itself and, for module BTFs, corresponding module
> > > as well, to make sure it won't be unloaded from under running BPF program. The
> > > mechanism used is similar to how bpf_prog keeps track of used bpf_maps.
> > ...
> > > +
> > > +     /* if we reference variables from kernel module, bump its refcount */
> > > +     if (btf_is_module(btf)) {
> > > +             btf_mod->module = btf_try_get_module(btf);
> >
> > Is it necessary to refcnt the module? Correct me if I'm wrong, but
> > for module's BTF we register a notifier. Then the module can be rmmod-ed
> > at any time and we will do btf_put() for corresponding BTF, but that BTF may
> > stay around because bpftool or something is looking at it.
> 
> Correct, BTF object itself doesn't take a refcnt on module.
> 
> > Similarly when prog is attached to raw_tp in a module we currently do try_module_get(),
> > but is it really necessary ? When bpf is attached to a netdev the netdev can
> > be removed and the link will be dangling. May be it makes sense to do the same
> > with modules?  The raw_tp can become dangling after rmmod and the prog won't be
> 
> So for raw_tp it's not the case today. I tested, I attached raw_tp,
> kept triggering it in a loop, and tried to rmmod bpf_testmod. It
> failed, because raw tracepoint takes refcnt on module. rmmod -f

Right. I meant that we can change that behavior if it would make sense to do so.

> bpf_testmod also didn't work, but it's because my kernel wasn't built
> with force-unload enabled for modules. But force-unload is an entirely
> different matter and it's inherently dangerous to do, it can crash and
> corrupt anything in the kernel.
> 
> > executed anymore. So hard coded address of a per-cpu var in a ksym will
> > be pointing to freed mod memory after rmmod, but that's ok, since that prog will
> > never execute.
> 
> Not so fast :) Indeed, if somehow module gets unloaded while we keep
> BPF program loaded, we'll point to unallocated memory **OR** to a
> memory re-used for something else. That's bad. Nothing will crash even
> if it's unmapped memory (due to bpf_probe_read semantics), but we will
> potentially be reading some garbage (not zeroes), if some other module
> re-uses that per-CPU memory.
> 
> As for the BPF program won't be triggered. That's not true in general,
> as you mention yourself below.
> 
> > On the other side if we envision a bpf prog attaching to a vmlinux function
> > and accessing per-cpu or normal ksym in some module it would need to inc refcnt
> > of that module, since we won't be able to guarantee that this prog will
> > not execute any more. So we cannot allow dangling memory addresses.
> 
> That's what my new selftest is doing actually. It's a generic
> sys_enter raw_tp, which doesn't attach to the module, but it does read
> module's per-CPU variable. 

Got it. I see that now.

> So I actually ran a test before posting. I
> successfully unloaded bpf_testmod, but kept running the prog. And it
> kept returning *correct* per-CPU value. Most probably due to per-CPU
> memory not unmapped and not yet reused for something else. But it's a
> really nasty and surprising situation.

you mean you managed to unload early during development before
you've introduced refcnting of modules?

> Keep in mind, also, that whenever BPF program declares per-cpu
> variable extern, it doesn't know or care whether it will get resolved
> to built-in vmlinux per-CPU variable or module per-CPU variable.
> Restricting attachment to only module-provided hooks is both tedious
> and might be quite surprising sometimes, seems not worth the pain.
> 
> > If latter is what we want to allow then we probably need a test case for it and
> > document the reasons for keeping modules pinned while progs access their data.
> > Since such pinning behavior is different from other bpf attaching cases where
> > underlying objects (like netdev and cgroup) can go away.
> 
> See above, that's already the case for module tracepoints.
> 
> So in summary, I think we should take a refcnt on module, as that's
> already the case for stuff like raw_tp. I can add more comments to
> make this clear, of course.

ok. agreed.

Regarding fd+id in upper/lower 32-bit of ld_imm64...
That works for ksyms because at that end the pair is converted to single
address that fits into ld_imm64. That won't work for Alan's case
where btf_obj pointer and btf_id are two values (64-bit and 32-bit).
So api-wise it's fine here, but cannot adopt the same idea everywhere.

re: patch 4
Please add non-percpu var to the test. Just for completeness.
The pair fd+id should be enough to disambiguate, right?

re: patch 1.
Instead of copy paste that hack please convert it to sys_membarrier(MEMBARRIER_CMD_GLOBAL).

The rest looks good to me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next 2/4] bpf: support BPF ksym variables in kernel modules
  2020-12-12  1:52       ` Alexei Starovoitov
@ 2020-12-12  5:23         ` Andrii Nakryiko
  0 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2020-12-12  5:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Fri, Dec 11, 2020 at 5:52 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Dec 11, 2020 at 02:15:28PM -0800, Andrii Nakryiko wrote:
> > On Fri, Dec 11, 2020 at 1:27 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Dec 10, 2020 at 08:27:32PM -0800, Andrii Nakryiko wrote:
> > > > During BPF program load time, verifier will resolve FD to BTF object and will
> > > > take reference on BTF object itself and, for module BTFs, corresponding module
> > > > as well, to make sure it won't be unloaded from under running BPF program. The
> > > > mechanism used is similar to how bpf_prog keeps track of used bpf_maps.
> > > ...
> > > > +
> > > > +     /* if we reference variables from kernel module, bump its refcount */
> > > > +     if (btf_is_module(btf)) {
> > > > +             btf_mod->module = btf_try_get_module(btf);
> > >
> > > Is it necessary to refcnt the module? Correct me if I'm wrong, but
> > > for module's BTF we register a notifier. Then the module can be rmmod-ed
> > > at any time and we will do btf_put() for corresponding BTF, but that BTF may
> > > stay around because bpftool or something is looking at it.
> >
> > Correct, BTF object itself doesn't take a refcnt on module.
> >
> > > Similarly when prog is attached to raw_tp in a module we currently do try_module_get(),
> > > but is it really necessary ? When bpf is attached to a netdev the netdev can
> > > be removed and the link will be dangling. May be it makes sense to do the same
> > > with modules?  The raw_tp can become dangling after rmmod and the prog won't be
> >
> > So for raw_tp it's not the case today. I tested, I attached raw_tp,
> > kept triggering it in a loop, and tried to rmmod bpf_testmod. It
> > failed, because raw tracepoint takes refcnt on module. rmmod -f
>
> Right. I meant that we can change that behavior if it would make sense to do so.

Oh, ok, yeah, I guess we can. But given it's been like that for a
while and no one complained, might as well leave it as is for now.

>
> > bpf_testmod also didn't work, but it's because my kernel wasn't built
> > with force-unload enabled for modules. But force-unload is an entirely
> > different matter and it's inherently dangerous to do, it can crash and
> > corrupt anything in the kernel.
> >
> > > executed anymore. So hard coded address of a per-cpu var in a ksym will
> > > be pointing to freed mod memory after rmmod, but that's ok, since that prog will
> > > never execute.
> >
> > Not so fast :) Indeed, if somehow module gets unloaded while we keep
> > BPF program loaded, we'll point to unallocated memory **OR** to a
> > memory re-used for something else. That's bad. Nothing will crash even
> > if it's unmapped memory (due to bpf_probe_read semantics), but we will
> > potentially be reading some garbage (not zeroes), if some other module
> > re-uses that per-CPU memory.
> >
> > As for the BPF program won't be triggered. That's not true in general,
> > as you mention yourself below.
> >
> > > On the other side if we envision a bpf prog attaching to a vmlinux function
> > > and accessing per-cpu or normal ksym in some module it would need to inc refcnt
> > > of that module, since we won't be able to guarantee that this prog will
> > > not execute any more. So we cannot allow dangling memory addresses.
> >
> > That's what my new selftest is doing actually. It's a generic
> > sys_enter raw_tp, which doesn't attach to the module, but it does read
> > module's per-CPU variable.
>
> Got it. I see that now.
>
> > So I actually ran a test before posting. I
> > successfully unloaded bpf_testmod, but kept running the prog. And it
> > kept returning *correct* per-CPU value. Most probably due to per-CPU
> > memory not unmapped and not yet reused for something else. But it's a
> > really nasty and surprising situation.
>
> you mean you managed to unload early during development before
> you've introduced refcnting of modules?

Yep, exactly.

>
> > Keep in mind, also, that whenever BPF program declares per-cpu
> > variable extern, it doesn't know or care whether it will get resolved
> > to built-in vmlinux per-CPU variable or module per-CPU variable.
> > Restricting attachment to only module-provided hooks is both tedious
> > and might be quite surprising sometimes, seems not worth the pain.
> >
> > > If latter is what we want to allow then we probably need a test case for it and
> > > document the reasons for keeping modules pinned while progs access their data.
> > > Since such pinning behavior is different from other bpf attaching cases where
> > > underlying objects (like netdev and cgroup) can go away.
> >
> > See above, that's already the case for module tracepoints.
> >
> > So in summary, I think we should take a refcnt on module, as that's
> > already the case for stuff like raw_tp. I can add more comments to
> > make this clear, of course.
>
> ok. agreed.
>
> Regarding fd+id in upper/lower 32-bit of ld_imm64...
> That works for ksyms because at that end the pair is converted to single
> address that fits into ld_imm64. That won't work for Alan's case
> where btf_obj pointer and btf_id are two values (64-bit and 32-bit).
> So api-wise it's fine here, but cannot adopt the same idea everywhere.

Right, won't work for Alan, but not because of ldimm64 not having
space for pointer and u32. There is bpf_insn_aux, which can store
whatever extra needs to be stored, if necessary.

But for Alan's case, because we want runtime lookup of btf obj + btf
id, the approach has to be different. We might do something like
sockmap, but for BPF types. I.e., user-space writes btf fd + btf id,
but internally we take BTF obj refcnt and store struct btf * and btf
type ID. On read we get BTF object ID + BTF type ID, for example.
WDYT?


>
> re: patch 4
> Please add non-percpu var to the test. Just for completeness.
> The pair fd+id should be enough to disambiguate, right?

Right, kernel detects per-CPU vs non-per-CPU on its own from the BTF
info. The problem is that pahole doesn't generate BTF for non-per-CPU
variables, so it's really impossible to test non-per-CPU variables
right now. :(

>
> re: patch 1.
> Instead of copy paste that hack please convert it to sys_membarrier(MEMBARRIER_CMD_GLOBAL).

Oh, cool, didn't know about it, nice.

>
> The rest looks good to me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-12-12  5:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-11  4:27 [PATCH RFC bpf-next 0/4] Support kernel module ksym variables Andrii Nakryiko
2020-12-11  4:27 ` [PATCH RFC bpf-next 1/4] selftests/bpf: sync RCU before unloading bpf_testmod Andrii Nakryiko
2020-12-11  4:27 ` [PATCH RFC bpf-next 2/4] bpf: support BPF ksym variables in kernel modules Andrii Nakryiko
2020-12-11 11:55   ` kernel test robot
2020-12-11 21:27   ` Alexei Starovoitov
2020-12-11 22:15     ` Andrii Nakryiko
2020-12-12  1:52       ` Alexei Starovoitov
2020-12-12  5:23         ` Andrii Nakryiko
2020-12-11  4:27 ` [PATCH RFC bpf-next 3/4] libbpf: support kernel module ksym externs Andrii Nakryiko
2020-12-11  4:27 ` [PATCH RFC bpf-next 4/4] selftests/bpf: test " Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.