All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support
@ 2024-01-04  1:38 Andrii Nakryiko
  2024-01-04  1:38 ` [PATCH v3 bpf-next 1/9] libbpf: make uniform use of btf__fd() accessor inside libbpf Andrii Nakryiko
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04  1:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Support __arg_ctx global function argument tag semantics even on older kernels
that don't natively support it through btf_decl_tag("arg:ctx").

Patches #2-#6 are preparatory work to allow to postpone BTF loading into the
kernel until after all the BPF program relocations (including global func
appending to main programs) are done. Patch #4 is perhaps the most important
and establishes pre-created stable placeholder FDs, so that relocations can
embed valid map FDs into ldimm64 instructions.

Once BTF is done after relocation, what's left is to adjust BTF information to
have each main program's copy of each used global subprog to point to its own
adjusted FUNC -> FUNC_PROTO type chain (if they use __arg_ctx) in such a way
as to satisfy type expectations of BPF verifier regarding the PTR_TO_CTX
argument definition. See patch #8 for details.

Patch #8 adds few more __arg_ctx use cases (edge cases like multiple arguments
having __arg_ctx, etc) to test_global_func_ctx_args.c, to make it simple to
validate that this logic indeed works on old kernels. It does. But just to be
100% sure patch #9 adds a test validating that libbpf uploads func_info with
properly modified BTF data.

v2->v3:
  - drop renaming patch (Alexei, Eduard);
  - use memfd_create() instead of /dev/null for placeholder FD (Eduard);
  - add one more test for validating BTF rewrite logic (Eduard);
  - fixed wrong -errno usage, reshuffled some BTF rewrite bits (Eduard);
v1->v2:
  - do internal functions renaming in patch #1 (Alexei);
  - extract cloning of FUNC -> FUNC_PROTO information into separate function
    (Alexei);

Andrii Nakryiko (9):
  libbpf: make uniform use of btf__fd() accessor inside libbpf
  libbpf: use explicit map reuse flag to skip map creation steps
  libbpf: don't rely on map->fd as an indicator of map being created
  libbpf: use stable map placeholder FDs
  libbpf: move exception callbacks assignment logic into relocation step
  libbpf: move BTF loading step after relocation step
  libbpf: implement __arg_ctx fallback logic
  selftests/bpf: add arg:ctx cases to test_global_funcs tests
  selftests/bpf: add __arg_ctx BTF rewrite test

 tools/lib/bpf/libbpf.c                        | 570 +++++++++++++-----
 tools/lib/bpf/libbpf_internal.h               |  14 +
 .../bpf/prog_tests/test_global_funcs.c        | 106 ++++
 .../bpf/progs/test_global_func_ctx_args.c     |  49 ++
 4 files changed, 599 insertions(+), 140 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 bpf-next 1/9] libbpf: make uniform use of btf__fd() accessor inside libbpf
  2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
@ 2024-01-04  1:38 ` Andrii Nakryiko
  2024-01-04  1:38 ` [PATCH v3 bpf-next 2/9] libbpf: use explicit map reuse flag to skip map creation steps Andrii Nakryiko
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04  1:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Jiri Olsa

It makes future grepping and code analysis a bit easier.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ebcfb2147fbd..f1521a400f02 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -7050,7 +7050,7 @@ static int bpf_object_load_prog(struct bpf_object *obj, struct bpf_program *prog
 	load_attr.prog_ifindex = prog->prog_ifindex;
 
 	/* specify func_info/line_info only if kernel supports them */
-	btf_fd = bpf_object__btf_fd(obj);
+	btf_fd = btf__fd(obj->btf);
 	if (btf_fd >= 0 && kernel_supports(obj, FEAT_BTF_FUNC)) {
 		load_attr.prog_btf_fd = btf_fd;
 		load_attr.func_info = prog->func_info;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 bpf-next 2/9] libbpf: use explicit map reuse flag to skip map creation steps
  2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
  2024-01-04  1:38 ` [PATCH v3 bpf-next 1/9] libbpf: make uniform use of btf__fd() accessor inside libbpf Andrii Nakryiko
@ 2024-01-04  1:38 ` Andrii Nakryiko
  2024-01-04  1:38 ` [PATCH v3 bpf-next 3/9] libbpf: don't rely on map->fd as an indicator of map being created Andrii Nakryiko
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04  1:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Jiri Olsa

Instead of inferring whether map already point to previously
created/pinned BPF map (which user can specify with bpf_map__reuse_fd()) API),
use explicit map->reused flag that is set in such case.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index f1521a400f02..3b678b617213 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -5465,7 +5465,7 @@ bpf_object__create_maps(struct bpf_object *obj)
 			}
 		}
 
-		if (map->fd >= 0) {
+		if (map->reused) {
 			pr_debug("map '%s': skipping creation (preset fd=%d)\n",
 				 map->name, map->fd);
 		} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 bpf-next 3/9] libbpf: don't rely on map->fd as an indicator of map being created
  2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
  2024-01-04  1:38 ` [PATCH v3 bpf-next 1/9] libbpf: make uniform use of btf__fd() accessor inside libbpf Andrii Nakryiko
  2024-01-04  1:38 ` [PATCH v3 bpf-next 2/9] libbpf: use explicit map reuse flag to skip map creation steps Andrii Nakryiko
@ 2024-01-04  1:38 ` Andrii Nakryiko
  2024-01-04  1:38 ` [PATCH v3 bpf-next 4/9] libbpf: use stable map placeholder FDs Andrii Nakryiko
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04  1:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Jiri Olsa

With the upcoming switch to preallocated placeholder FDs for maps,
switch various getters/setter away from checking map->fd. Use
map_is_created() helper that detect whether BPF map can be modified based
on map->obj->loaded state, with special provision for maps set up with
bpf_map__reuse_fd().

For backwards compatibility, we take map_is_created() into account in
bpf_map__fd() getter as well. This way before bpf_object__load() phase
bpf_map__fd() will always return -1, just as before the changes in
subsequent patches adding stable map->fd placeholders.

We also get rid of all internal uses of bpf_map__fd() getter, as it's
more oriented for uses external to libbpf. The above map_is_created()
check actually interferes with some of the internal uses, if map FD is
fetched through bpf_map__fd().

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 42 +++++++++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 3b678b617213..6b27edd47c84 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -5200,6 +5200,11 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
 
 static void bpf_map__destroy(struct bpf_map *map);
 
+static bool map_is_created(const struct bpf_map *map)
+{
+	return map->obj->loaded || map->reused;
+}
+
 static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, bool is_inner)
 {
 	LIBBPF_OPTS(bpf_map_create_opts, create_attr);
@@ -5231,7 +5236,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 					map->name, err);
 				return err;
 			}
-			map->inner_map_fd = bpf_map__fd(map->inner_map);
+			map->inner_map_fd = map->inner_map->fd;
 		}
 		if (map->inner_map_fd >= 0)
 			create_attr.inner_map_fd = map->inner_map_fd;
@@ -5314,7 +5319,7 @@ static int init_map_in_map_slots(struct bpf_object *obj, struct bpf_map *map)
 			continue;
 
 		targ_map = map->init_slots[i];
-		fd = bpf_map__fd(targ_map);
+		fd = targ_map->fd;
 
 		if (obj->gen_loader) {
 			bpf_gen__populate_outer_map(obj->gen_loader,
@@ -7135,7 +7140,7 @@ static int bpf_object_load_prog(struct bpf_object *obj, struct bpf_program *prog
 				if (map->libbpf_type != LIBBPF_MAP_RODATA)
 					continue;
 
-				if (bpf_prog_bind_map(ret, bpf_map__fd(map), NULL)) {
+				if (bpf_prog_bind_map(ret, map->fd, NULL)) {
 					cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
 					pr_warn("prog '%s': failed to bind map '%s': %s\n",
 						prog->name, map->real_name, cp);
@@ -9601,7 +9606,11 @@ int libbpf_attach_type_by_name(const char *name,
 
 int bpf_map__fd(const struct bpf_map *map)
 {
-	return map ? map->fd : libbpf_err(-EINVAL);
+	if (!map)
+		return libbpf_err(-EINVAL);
+	if (!map_is_created(map))
+		return -1;
+	return map->fd;
 }
 
 static bool map_uses_real_name(const struct bpf_map *map)
@@ -9637,7 +9646,7 @@ enum bpf_map_type bpf_map__type(const struct bpf_map *map)
 
 int bpf_map__set_type(struct bpf_map *map, enum bpf_map_type type)
 {
-	if (map->fd >= 0)
+	if (map_is_created(map))
 		return libbpf_err(-EBUSY);
 	map->def.type = type;
 	return 0;
@@ -9650,7 +9659,7 @@ __u32 bpf_map__map_flags(const struct bpf_map *map)
 
 int bpf_map__set_map_flags(struct bpf_map *map, __u32 flags)
 {
-	if (map->fd >= 0)
+	if (map_is_created(map))
 		return libbpf_err(-EBUSY);
 	map->def.map_flags = flags;
 	return 0;
@@ -9663,7 +9672,7 @@ __u64 bpf_map__map_extra(const struct bpf_map *map)
 
 int bpf_map__set_map_extra(struct bpf_map *map, __u64 map_extra)
 {
-	if (map->fd >= 0)
+	if (map_is_created(map))
 		return libbpf_err(-EBUSY);
 	map->map_extra = map_extra;
 	return 0;
@@ -9676,7 +9685,7 @@ __u32 bpf_map__numa_node(const struct bpf_map *map)
 
 int bpf_map__set_numa_node(struct bpf_map *map, __u32 numa_node)
 {
-	if (map->fd >= 0)
+	if (map_is_created(map))
 		return libbpf_err(-EBUSY);
 	map->numa_node = numa_node;
 	return 0;
@@ -9689,7 +9698,7 @@ __u32 bpf_map__key_size(const struct bpf_map *map)
 
 int bpf_map__set_key_size(struct bpf_map *map, __u32 size)
 {
-	if (map->fd >= 0)
+	if (map_is_created(map))
 		return libbpf_err(-EBUSY);
 	map->def.key_size = size;
 	return 0;
@@ -9773,7 +9782,7 @@ static int map_btf_datasec_resize(struct bpf_map *map, __u32 size)
 
 int bpf_map__set_value_size(struct bpf_map *map, __u32 size)
 {
-	if (map->fd >= 0)
+	if (map->obj->loaded || map->reused)
 		return libbpf_err(-EBUSY);
 
 	if (map->mmaped) {
@@ -9814,8 +9823,11 @@ __u32 bpf_map__btf_value_type_id(const struct bpf_map *map)
 int bpf_map__set_initial_value(struct bpf_map *map,
 			       const void *data, size_t size)
 {
+	if (map->obj->loaded || map->reused)
+		return libbpf_err(-EBUSY);
+
 	if (!map->mmaped || map->libbpf_type == LIBBPF_MAP_KCONFIG ||
-	    size != map->def.value_size || map->fd >= 0)
+	    size != map->def.value_size)
 		return libbpf_err(-EINVAL);
 
 	memcpy(map->mmaped, data, size);
@@ -9842,7 +9854,7 @@ __u32 bpf_map__ifindex(const struct bpf_map *map)
 
 int bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex)
 {
-	if (map->fd >= 0)
+	if (map_is_created(map))
 		return libbpf_err(-EBUSY);
 	map->map_ifindex = ifindex;
 	return 0;
@@ -9947,7 +9959,7 @@ bpf_object__find_map_fd_by_name(const struct bpf_object *obj, const char *name)
 static int validate_map_op(const struct bpf_map *map, size_t key_sz,
 			   size_t value_sz, bool check_value_sz)
 {
-	if (map->fd <= 0)
+	if (!map_is_created(map)) /* map is not yet created */
 		return -ENOENT;
 
 	if (map->def.key_size != key_sz) {
@@ -12400,7 +12412,7 @@ int bpf_link__update_map(struct bpf_link *link, const struct bpf_map *map)
 	__u32 zero = 0;
 	int err;
 
-	if (!bpf_map__is_struct_ops(map) || map->fd < 0)
+	if (!bpf_map__is_struct_ops(map) || !map_is_created(map))
 		return -EINVAL;
 
 	st_ops_link = container_of(link, struct bpf_link_struct_ops, link);
@@ -13304,7 +13316,7 @@ int bpf_object__load_skeleton(struct bpf_object_skeleton *s)
 	for (i = 0; i < s->map_cnt; i++) {
 		struct bpf_map *map = *s->maps[i].map;
 		size_t mmap_sz = bpf_map_mmap_sz(map->def.value_size, map->def.max_entries);
-		int prot, map_fd = bpf_map__fd(map);
+		int prot, map_fd = map->fd;
 		void **mmaped = s->maps[i].mmaped;
 
 		if (!mmaped)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 bpf-next 4/9] libbpf: use stable map placeholder FDs
  2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2024-01-04  1:38 ` [PATCH v3 bpf-next 3/9] libbpf: don't rely on map->fd as an indicator of map being created Andrii Nakryiko
@ 2024-01-04  1:38 ` Andrii Nakryiko
  2024-01-04  1:38 ` [PATCH v3 bpf-next 5/9] libbpf: move exception callbacks assignment logic into relocation step Andrii Nakryiko
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04  1:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Jiri Olsa

Move map creation to later during BPF object loading by pre-creating
stable placeholder FDs (utilizing memfd_create()). Use dup2()
syscall to then atomically make those placeholder FDs point to real
kernel BPF map objects.

This change allows to delay BPF map creation to after all the BPF
program relocations. That, in turn, allows to delay BTF finalization and
loading into kernel to after all the relocations as well. We'll take
advantage of the latter in subsequent patches to allow libbpf to adjust
BTF in a way that helps with BPF global function usage.

Clean up a few places where we close map->fd, which now shouldn't
happen, because map->fd should be a valid FD regardless of whether map
was created or not. Surprisingly and nicely it simplifies a bunch of
error handling code. If this change doesn't backfire, I'm tempted to
pre-create such stable FDs for other entities (progs, maybe even BTF).
We previously did some manipulations to make gen_loader work with fake
map FDs, with stable map FDs this hack is not necessary for maps (we
still have it for BTF, but I left it as is for now).

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c          | 101 ++++++++++++++++++++------------
 tools/lib/bpf/libbpf_internal.h |  14 +++++
 2 files changed, 77 insertions(+), 38 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6b27edd47c84..a58569b7e4bf 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1503,6 +1503,16 @@ static Elf64_Sym *find_elf_var_sym(const struct bpf_object *obj, const char *nam
 	return ERR_PTR(-ENOENT);
 }
 
+static int create_placeholder_fd(void)
+{
+	int fd;
+
+	fd = ensure_good_fd(memfd_create("libbpf-placeholder-fd", MFD_CLOEXEC));
+	if (fd < 0)
+		return -errno;
+	return fd;
+}
+
 static struct bpf_map *bpf_object__add_map(struct bpf_object *obj)
 {
 	struct bpf_map *map;
@@ -1515,7 +1525,21 @@ static struct bpf_map *bpf_object__add_map(struct bpf_object *obj)
 
 	map = &obj->maps[obj->nr_maps++];
 	map->obj = obj;
-	map->fd = -1;
+	/* Preallocate map FD without actually creating BPF map just yet.
+	 * These map FD "placeholders" will be reused later without changing
+	 * FD value when map is actually created in the kernel.
+	 *
+	 * This is useful to be able to perform BPF program relocations
+	 * without having to create BPF maps before that step. This allows us
+	 * to finalize and load BTF very late in BPF object's loading phase,
+	 * right before BPF maps have to be created and BPF programs have to
+	 * be loaded. By having these map FD placeholders we can perform all
+	 * the sanitizations, relocations, and any other adjustments before we
+	 * start creating actual BPF kernel objects (BTF, maps, progs).
+	 */
+	map->fd = create_placeholder_fd();
+	if (map->fd < 0)
+		return ERR_PTR(map->fd);
 	map->inner_map_fd = -1;
 	map->autocreate = true;
 
@@ -2607,7 +2631,9 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj,
 		map->inner_map = calloc(1, sizeof(*map->inner_map));
 		if (!map->inner_map)
 			return -ENOMEM;
-		map->inner_map->fd = -1;
+		map->inner_map->fd = create_placeholder_fd();
+		if (map->inner_map->fd < 0)
+			return map->inner_map->fd;
 		map->inner_map->sec_idx = sec_idx;
 		map->inner_map->name = malloc(strlen(map_name) + sizeof(".inner") + 1);
 		if (!map->inner_map->name)
@@ -4549,14 +4575,12 @@ int bpf_map__reuse_fd(struct bpf_map *map, int fd)
 		goto err_free_new_name;
 	}
 
-	err = zclose(map->fd);
-	if (err) {
-		err = -errno;
-		goto err_close_new_fd;
-	}
+	err = reuse_fd(map->fd, new_fd);
+	if (err)
+		goto err_free_new_name;
+
 	free(map->name);
 
-	map->fd = new_fd;
 	map->name = new_name;
 	map->def.type = info.type;
 	map->def.key_size = info.key_size;
@@ -4570,8 +4594,6 @@ int bpf_map__reuse_fd(struct bpf_map *map, int fd)
 
 	return 0;
 
-err_close_new_fd:
-	close(new_fd);
 err_free_new_name:
 	free(new_name);
 	return libbpf_err(err);
@@ -5210,7 +5232,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 	LIBBPF_OPTS(bpf_map_create_opts, create_attr);
 	struct bpf_map_def *def = &map->def;
 	const char *map_name = NULL;
-	int err = 0;
+	int err = 0, map_fd;
 
 	if (kernel_supports(obj, FEAT_PROG_NAME))
 		map_name = map->name;
@@ -5269,17 +5291,19 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 		bpf_gen__map_create(obj->gen_loader, def->type, map_name,
 				    def->key_size, def->value_size, def->max_entries,
 				    &create_attr, is_inner ? -1 : map - obj->maps);
-		/* Pretend to have valid FD to pass various fd >= 0 checks.
-		 * This fd == 0 will not be used with any syscall and will be reset to -1 eventually.
+		/* We keep pretenting we have valid FD to pass various fd >= 0
+		 * checks by just keeping original placeholder FDs in place.
+		 * See bpf_object__add_map() comment.
+		 * This placeholder fd will not be used with any syscall and
+		 * will be reset to -1 eventually.
 		 */
-		map->fd = 0;
+		map_fd = map->fd;
 	} else {
-		map->fd = bpf_map_create(def->type, map_name,
-					 def->key_size, def->value_size,
-					 def->max_entries, &create_attr);
+		map_fd = bpf_map_create(def->type, map_name,
+					def->key_size, def->value_size,
+					def->max_entries, &create_attr);
 	}
-	if (map->fd < 0 && (create_attr.btf_key_type_id ||
-			    create_attr.btf_value_type_id)) {
+	if (map_fd < 0 && (create_attr.btf_key_type_id || create_attr.btf_value_type_id)) {
 		char *cp, errmsg[STRERR_BUFSIZE];
 
 		err = -errno;
@@ -5291,13 +5315,11 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 		create_attr.btf_value_type_id = 0;
 		map->btf_key_type_id = 0;
 		map->btf_value_type_id = 0;
-		map->fd = bpf_map_create(def->type, map_name,
-					 def->key_size, def->value_size,
-					 def->max_entries, &create_attr);
+		map_fd = bpf_map_create(def->type, map_name,
+					def->key_size, def->value_size,
+					def->max_entries, &create_attr);
 	}
 
-	err = map->fd < 0 ? -errno : 0;
-
 	if (bpf_map_type__is_map_in_map(def->type) && map->inner_map) {
 		if (obj->gen_loader)
 			map->inner_map->fd = -1;
@@ -5305,7 +5327,19 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 		zfree(&map->inner_map);
 	}
 
-	return err;
+	if (map_fd < 0)
+		return map_fd;
+
+	/* obj->gen_loader case, prevent reuse_fd() from closing map_fd */
+	if (map->fd == map_fd)
+		return 0;
+
+	/* Keep placeholder FD value but now point it to the BPF map object.
+	 * This way everything that relied on this map's FD (e.g., relocated
+	 * ldimm64 instructions) will stay valid and won't need adjustments.
+	 * map->fd stays valid but now point to what map_fd points to.
+	 */
+	return reuse_fd(map->fd, map_fd);
 }
 
 static int init_map_in_map_slots(struct bpf_object *obj, struct bpf_map *map)
@@ -5389,10 +5423,8 @@ static int bpf_object_init_prog_arrays(struct bpf_object *obj)
 			continue;
 
 		err = init_prog_array_slots(obj, map);
-		if (err < 0) {
-			zclose(map->fd);
+		if (err < 0)
 			return err;
-		}
 	}
 	return 0;
 }
@@ -5483,25 +5515,20 @@ bpf_object__create_maps(struct bpf_object *obj)
 
 			if (bpf_map__is_internal(map)) {
 				err = bpf_object__populate_internal_map(obj, map);
-				if (err < 0) {
-					zclose(map->fd);
+				if (err < 0)
 					goto err_out;
-				}
 			}
 
 			if (map->init_slots_sz && map->def.type != BPF_MAP_TYPE_PROG_ARRAY) {
 				err = init_map_in_map_slots(obj, map);
-				if (err < 0) {
-					zclose(map->fd);
+				if (err < 0)
 					goto err_out;
-				}
 			}
 		}
 
 		if (map->pin_path && !map->pinned) {
 			err = bpf_map__pin(map, NULL);
 			if (err) {
-				zclose(map->fd);
 				if (!retried && err == -EEXIST) {
 					retried = true;
 					goto retry;
@@ -8075,8 +8102,8 @@ static int bpf_object_load(struct bpf_object *obj, int extra_log_level, const ch
 	err = err ? : bpf_object__sanitize_and_load_btf(obj);
 	err = err ? : bpf_object__sanitize_maps(obj);
 	err = err ? : bpf_object__init_kern_struct_ops_maps(obj);
-	err = err ? : bpf_object__create_maps(obj);
 	err = err ? : bpf_object__relocate(obj, obj->btf_custom_path ? : target_btf_path);
+	err = err ? : bpf_object__create_maps(obj);
 	err = err ? : bpf_object__load_progs(obj, extra_log_level);
 	err = err ? : bpf_object_init_prog_arrays(obj);
 	err = err ? : bpf_object_prepare_struct_ops(obj);
@@ -8085,8 +8112,6 @@ static int bpf_object_load(struct bpf_object *obj, int extra_log_level, const ch
 		/* reset FDs */
 		if (obj->btf)
 			btf__set_fd(obj->btf, -1);
-		for (i = 0; i < obj->nr_maps; i++)
-			obj->maps[i].fd = -1;
 		if (!err)
 			err = bpf_gen__finish(obj->gen_loader, obj->nr_programs, obj->nr_maps);
 	}
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index b5d334754e5d..27e4e320e1a6 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -555,6 +555,20 @@ static inline int ensure_good_fd(int fd)
 	return fd;
 }
 
+/* Point *fixed_fd* to the same file that *tmp_fd* points to.
+ * Regardless of success, *tmp_fd* is closed.
+ * Whatever *fixed_fd* pointed to is closed silently.
+ */
+static inline int reuse_fd(int fixed_fd, int tmp_fd)
+{
+	int err;
+
+	err = dup2(tmp_fd, fixed_fd);
+	err = err < 0 ? -errno : 0;
+	close(tmp_fd); /* clean up temporary FD */
+	return err;
+}
+
 /* The following two functions are exposed to bpftool */
 int bpf_core_add_cands(struct bpf_core_cand *local_cand,
 		       size_t local_essent_len,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 bpf-next 5/9] libbpf: move exception callbacks assignment logic into relocation step
  2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2024-01-04  1:38 ` [PATCH v3 bpf-next 4/9] libbpf: use stable map placeholder FDs Andrii Nakryiko
@ 2024-01-04  1:38 ` Andrii Nakryiko
  2024-01-04  1:38 ` [PATCH v3 bpf-next 6/9] libbpf: move BTF loading step after " Andrii Nakryiko
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04  1:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Jiri Olsa

Move the logic of finding and assigning exception callback indices from
BTF sanitization step to program relocations step, which seems more
logical and will unblock moving BTF loading to after relocation step.

Exception callbacks discovery and assignment has no dependency on BTF
being loaded into the kernel, it only uses BTF information. It does need
to happen before subprogram relocations happen, though. Which is why the
split.

No functional changes.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 165 +++++++++++++++++++++--------------------
 1 file changed, 85 insertions(+), 80 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a58569b7e4bf..01d45f0c40d0 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -3192,86 +3192,6 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
 		}
 	}
 
-	if (!kernel_supports(obj, FEAT_BTF_DECL_TAG))
-		goto skip_exception_cb;
-	for (i = 0; i < obj->nr_programs; i++) {
-		struct bpf_program *prog = &obj->programs[i];
-		int j, k, n;
-
-		if (prog_is_subprog(obj, prog))
-			continue;
-		n = btf__type_cnt(obj->btf);
-		for (j = 1; j < n; j++) {
-			const char *str = "exception_callback:", *name;
-			size_t len = strlen(str);
-			struct btf_type *t;
-
-			t = btf_type_by_id(obj->btf, j);
-			if (!btf_is_decl_tag(t) || btf_decl_tag(t)->component_idx != -1)
-				continue;
-
-			name = btf__str_by_offset(obj->btf, t->name_off);
-			if (strncmp(name, str, len))
-				continue;
-
-			t = btf_type_by_id(obj->btf, t->type);
-			if (!btf_is_func(t) || btf_func_linkage(t) != BTF_FUNC_GLOBAL) {
-				pr_warn("prog '%s': exception_callback:<value> decl tag not applied to the main program\n",
-					prog->name);
-				return -EINVAL;
-			}
-			if (strcmp(prog->name, btf__str_by_offset(obj->btf, t->name_off)))
-				continue;
-			/* Multiple callbacks are specified for the same prog,
-			 * the verifier will eventually return an error for this
-			 * case, hence simply skip appending a subprog.
-			 */
-			if (prog->exception_cb_idx >= 0) {
-				prog->exception_cb_idx = -1;
-				break;
-			}
-
-			name += len;
-			if (str_is_empty(name)) {
-				pr_warn("prog '%s': exception_callback:<value> decl tag contains empty value\n",
-					prog->name);
-				return -EINVAL;
-			}
-
-			for (k = 0; k < obj->nr_programs; k++) {
-				struct bpf_program *subprog = &obj->programs[k];
-
-				if (!prog_is_subprog(obj, subprog))
-					continue;
-				if (strcmp(name, subprog->name))
-					continue;
-				/* Enforce non-hidden, as from verifier point of
-				 * view it expects global functions, whereas the
-				 * mark_btf_static fixes up linkage as static.
-				 */
-				if (!subprog->sym_global || subprog->mark_btf_static) {
-					pr_warn("prog '%s': exception callback %s must be a global non-hidden function\n",
-						prog->name, subprog->name);
-					return -EINVAL;
-				}
-				/* Let's see if we already saw a static exception callback with the same name */
-				if (prog->exception_cb_idx >= 0) {
-					pr_warn("prog '%s': multiple subprogs with same name as exception callback '%s'\n",
-					        prog->name, subprog->name);
-					return -EINVAL;
-				}
-				prog->exception_cb_idx = k;
-				break;
-			}
-
-			if (prog->exception_cb_idx >= 0)
-				continue;
-			pr_warn("prog '%s': cannot find exception callback '%s'\n", prog->name, name);
-			return -ENOENT;
-		}
-	}
-skip_exception_cb:
-
 	sanitize = btf_needs_sanitization(obj);
 	if (sanitize) {
 		const void *raw_data;
@@ -6661,6 +6581,88 @@ static void bpf_object__sort_relos(struct bpf_object *obj)
 	}
 }
 
+static int bpf_prog_assign_exc_cb(struct bpf_object *obj, struct bpf_program *prog)
+{
+	const char *str = "exception_callback:";
+	size_t pfx_len = strlen(str);
+	int i, j, n;
+
+	if (!obj->btf || !kernel_supports(obj, FEAT_BTF_DECL_TAG))
+		return 0;
+
+	n = btf__type_cnt(obj->btf);
+	for (i = 1; i < n; i++) {
+		const char *name;
+		struct btf_type *t;
+
+		t = btf_type_by_id(obj->btf, i);
+		if (!btf_is_decl_tag(t) || btf_decl_tag(t)->component_idx != -1)
+			continue;
+
+		name = btf__str_by_offset(obj->btf, t->name_off);
+		if (strncmp(name, str, pfx_len) != 0)
+			continue;
+
+		t = btf_type_by_id(obj->btf, t->type);
+		if (!btf_is_func(t) || btf_func_linkage(t) != BTF_FUNC_GLOBAL) {
+			pr_warn("prog '%s': exception_callback:<value> decl tag not applied to the main program\n",
+				prog->name);
+			return -EINVAL;
+		}
+		if (strcmp(prog->name, btf__str_by_offset(obj->btf, t->name_off)) != 0)
+			continue;
+		/* Multiple callbacks are specified for the same prog,
+		 * the verifier will eventually return an error for this
+		 * case, hence simply skip appending a subprog.
+		 */
+		if (prog->exception_cb_idx >= 0) {
+			prog->exception_cb_idx = -1;
+			break;
+		}
+
+		name += pfx_len;
+		if (str_is_empty(name)) {
+			pr_warn("prog '%s': exception_callback:<value> decl tag contains empty value\n",
+				prog->name);
+			return -EINVAL;
+		}
+
+		for (j = 0; j < obj->nr_programs; j++) {
+			struct bpf_program *subprog = &obj->programs[j];
+
+			if (!prog_is_subprog(obj, subprog))
+				continue;
+			if (strcmp(name, subprog->name) != 0)
+				continue;
+			/* Enforce non-hidden, as from verifier point of
+			 * view it expects global functions, whereas the
+			 * mark_btf_static fixes up linkage as static.
+			 */
+			if (!subprog->sym_global || subprog->mark_btf_static) {
+				pr_warn("prog '%s': exception callback %s must be a global non-hidden function\n",
+					prog->name, subprog->name);
+				return -EINVAL;
+			}
+			/* Let's see if we already saw a static exception callback with the same name */
+			if (prog->exception_cb_idx >= 0) {
+				pr_warn("prog '%s': multiple subprogs with same name as exception callback '%s'\n",
+					prog->name, subprog->name);
+				return -EINVAL;
+			}
+			prog->exception_cb_idx = j;
+			break;
+		}
+
+		if (prog->exception_cb_idx >= 0)
+			continue;
+
+		pr_warn("prog '%s': cannot find exception callback '%s'\n", prog->name, name);
+		return -ENOENT;
+	}
+
+	return 0;
+}
+
 static int
 bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
 {
@@ -6721,6 +6723,9 @@ bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
 			return err;
 		}
 
+		err = bpf_prog_assign_exc_cb(obj, prog);
+		if (err)
+			return err;
 		/* Now, also append exception callback if it has not been done already. */
 		if (prog->exception_cb_idx >= 0) {
 			struct bpf_program *subprog = &obj->programs[prog->exception_cb_idx];
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 bpf-next 6/9] libbpf: move BTF loading step after relocation step
  2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2024-01-04  1:38 ` [PATCH v3 bpf-next 5/9] libbpf: move exception callbacks assignment logic into relocation step Andrii Nakryiko
@ 2024-01-04  1:38 ` Andrii Nakryiko
  2024-01-04  1:38 ` [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic Andrii Nakryiko
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04  1:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Jiri Olsa

With all the preparations in previous patches done we are ready to
postpone BTF loading and sanitization step until after all the
relocations are performed.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 01d45f0c40d0..836986974de3 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8104,10 +8104,10 @@ static int bpf_object_load(struct bpf_object *obj, int extra_log_level, const ch
 	err = bpf_object__probe_loading(obj);
 	err = err ? : bpf_object__load_vmlinux_btf(obj, false);
 	err = err ? : bpf_object__resolve_externs(obj, obj->kconfig);
-	err = err ? : bpf_object__sanitize_and_load_btf(obj);
 	err = err ? : bpf_object__sanitize_maps(obj);
 	err = err ? : bpf_object__init_kern_struct_ops_maps(obj);
 	err = err ? : bpf_object__relocate(obj, obj->btf_custom_path ? : target_btf_path);
+	err = err ? : bpf_object__sanitize_and_load_btf(obj);
 	err = err ? : bpf_object__create_maps(obj);
 	err = err ? : bpf_object__load_progs(obj, extra_log_level);
 	err = err ? : bpf_object_init_prog_arrays(obj);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2024-01-04  1:38 ` [PATCH v3 bpf-next 6/9] libbpf: move BTF loading step after " Andrii Nakryiko
@ 2024-01-04  1:38 ` Andrii Nakryiko
  2024-01-04  5:39   ` Alexei Starovoitov
  2024-01-04  1:38 ` [PATCH v3 bpf-next 8/9] selftests/bpf: add arg:ctx cases to test_global_funcs tests Andrii Nakryiko
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04  1:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Jiri Olsa

Out of all special global func arg tag annotations, __arg_ctx is
practically is the most immediately useful and most critical to have
working across multitude kernel version, if possible. This would allow
end users to write much simpler code if __arg_ctx semantics worked for
older kernels that don't natively understand btf_decl_tag("arg:ctx") in
verifier logic.

Luckily, it is possible to ensure __arg_ctx works on old kernels through
a bit of extra work done by libbpf, at least in a lot of common cases.

To explain the overall idea, we need to go back at how context argument
was supported in global funcs before __arg_ctx support was added. This
was done based on special struct name checks in kernel. E.g., for
BPF_PROG_TYPE_PERF_EVENT the expectation is that argument type `struct
bpf_perf_event_data *` mark that argument as PTR_TO_CTX. This is all
good as long as global function is used from the same BPF program types
only, which is often not the case. If the same subprog has to be called
from, say, kprobe and perf_event program types, there is no single
definition that would satisfy BPF verifier. Subprog will have context
argument either for kprobe (if using bpf_user_pt_regs_t struct name) or
perf_event (with bpf_perf_event_data struct name), but not both.

This limitation was the reason to add btf_decl_tag("arg:ctx"), making
the actual argument type not important, so that user can just define
"generic" signature:

  __noinline int global_subprog(void *ctx __arg_ctx) { ... }

I won't belabor how libbpf is implementing subprograms, see a huge
comment next to bpf_object_relocate_calls() function. The idea is that
each main/entry BPF program gets its own copy of global_subprog's code
appended.

This per-program copy of global subprog code *and* associated func_info
.BTF.ext information, pointing to FUNC -> FUNC_PROTO BTF type chain
allows libbpf to simulate __arg_ctx behavior transparently, even if the
kernel doesn't yet support __arg_ctx annotation natively.

The idea is straightforward: each time we append global subprog's code
and func_info information, we adjust its FUNC -> FUNC_PROTO type
information, if necessary (that is, libbpf can detect the presence of
btf_decl_tag("arg:ctx") just like BPF verifier would do it).

The rest is just mechanical and somewhat painful BTF manipulation code.
It's painful because we need to clone FUNC -> FUNC_PROTO, instead of
reusing it, as same FUNC -> FUNC_PROTO chain might be used by another
main BPF program within the same BPF object, so we can't just modify it
in-place (and cloning BTF types within the same struct btf object is
painful due to constant memory invalidation, see comments in code).
Uploaded BPF object's BTF information has to work for all BPF
programs at the same time.

Once we have FUNC -> FUNC_PROTO clones, we make sure that instead of
using some `void *ctx` parameter definition, we have an expected `struct
bpf_perf_event_data *ctx` definition (as far as BPF verifier and kernel
is concerned), which will mark it as context for BPF verifier. Same
global subprog relocated and copied into another main BPF program will
get different type information according to main program's type. It all
works out in the end in a completely transparent way for end user.

Libbpf maintains internal program type -> expected context struct name
mapping internally. Note, not all BPF program types have named context
struct, so this approach won't work for such programs (just like it
didn't before __arg_ctx). So native __arg_ctx is still important to have
in kernel to have generic context support across all BPF program types.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 256 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 252 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 836986974de3..c5a42ac309fd 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6181,7 +6181,7 @@ reloc_prog_func_and_line_info(const struct bpf_object *obj,
 	int err;
 
 	/* no .BTF.ext relocation if .BTF.ext is missing or kernel doesn't
-	 * supprot func/line info
+	 * support func/line info
 	 */
 	if (!obj->btf_ext || !kernel_supports(obj, FEAT_BTF_FUNC))
 		return 0;
@@ -6663,8 +6663,247 @@ static int bpf_prog_assign_exc_cb(struct bpf_object *obj, struct bpf_program *pr
 	return 0;
 }
 
-static int
-bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
+static struct {
+	enum bpf_prog_type prog_type;
+	const char *ctx_name;
+} global_ctx_map[] = {
+	{ BPF_PROG_TYPE_CGROUP_DEVICE,           "bpf_cgroup_dev_ctx" },
+	{ BPF_PROG_TYPE_CGROUP_SKB,              "__sk_buff" },
+	{ BPF_PROG_TYPE_CGROUP_SOCK,             "bpf_sock" },
+	{ BPF_PROG_TYPE_CGROUP_SOCK_ADDR,        "bpf_sock_addr" },
+	{ BPF_PROG_TYPE_CGROUP_SOCKOPT,          "bpf_sockopt" },
+	{ BPF_PROG_TYPE_CGROUP_SYSCTL,           "bpf_sysctl" },
+	{ BPF_PROG_TYPE_FLOW_DISSECTOR,          "__sk_buff" },
+	{ BPF_PROG_TYPE_KPROBE,                  "bpf_user_pt_regs_t" },
+	{ BPF_PROG_TYPE_LWT_IN,                  "__sk_buff" },
+	{ BPF_PROG_TYPE_LWT_OUT,                 "__sk_buff" },
+	{ BPF_PROG_TYPE_LWT_SEG6LOCAL,           "__sk_buff" },
+	{ BPF_PROG_TYPE_LWT_XMIT,                "__sk_buff" },
+	{ BPF_PROG_TYPE_NETFILTER,               "bpf_nf_ctx" },
+	{ BPF_PROG_TYPE_PERF_EVENT,              "bpf_perf_event_data" },
+	{ BPF_PROG_TYPE_RAW_TRACEPOINT,          "bpf_raw_tracepoint_args" },
+	{ BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, "bpf_raw_tracepoint_args" },
+	{ BPF_PROG_TYPE_SCHED_ACT,               "__sk_buff" },
+	{ BPF_PROG_TYPE_SCHED_CLS,               "__sk_buff" },
+	{ BPF_PROG_TYPE_SK_LOOKUP,               "bpf_sk_lookup" },
+	{ BPF_PROG_TYPE_SK_MSG,                  "sk_msg_md" },
+	{ BPF_PROG_TYPE_SK_REUSEPORT,            "sk_reuseport_md" },
+	{ BPF_PROG_TYPE_SK_SKB,                  "__sk_buff" },
+	{ BPF_PROG_TYPE_SOCK_OPS,                "bpf_sock_ops" },
+	{ BPF_PROG_TYPE_SOCKET_FILTER,           "__sk_buff" },
+	{ BPF_PROG_TYPE_XDP,                     "xdp_md" },
+	/* all other program types don't have "named" context structs */
+};
+
+static int clone_func_btf_info(struct btf *btf, int orig_fn_id, struct bpf_program *prog)
+{
+	int fn_id, fn_proto_id, ret_type_id, orig_proto_id;
+	int i, err, arg_cnt, fn_name_off, linkage;
+	struct btf_type *fn_t, *fn_proto_t, *t;
+	struct btf_param *p;
+
+	/* caller already validated FUNC -> FUNC_PROTO validity */
+	fn_t = btf_type_by_id(btf, orig_fn_id);
+	fn_proto_t = btf_type_by_id(btf, fn_t->type);
+
+	/* Note that each btf__add_xxx() operation invalidates
+	 * all btf_type and string pointers, so we need to be
+	 * very careful when cloning BTF types. BTF type
+	 * pointers have to be always refetched. And to avoid
+	 * problems with invalidated string pointers, we
+	 * add empty strings initially, then just fix up
+	 * name_off offsets in place. Offsets are stable for
+	 * existing strings, so that works out.
+	 */
+	fn_name_off = fn_t->name_off; /* we are about to invalidate fn_t */
+	linkage = btf_func_linkage(fn_t);
+	orig_proto_id = fn_t->type; /* original FUNC_PROTO ID */
+	ret_type_id = fn_proto_t->type; /* fn_proto_t will be invalidated */
+	arg_cnt = btf_vlen(fn_proto_t);
+
+	/* clone FUNC_PROTO and its params */
+	fn_proto_id = btf__add_func_proto(btf, ret_type_id);
+	if (fn_proto_id < 0)
+		return -EINVAL;
+
+	for (i = 0; i < arg_cnt; i++) {
+		int name_off;
+
+		/* copy original parameter data */
+		t = btf_type_by_id(btf, orig_proto_id);
+		p = &btf_params(t)[i];
+		name_off = p->name_off;
+
+		err = btf__add_func_param(btf, "", p->type);
+		if (err)
+			return err;
+
+		fn_proto_t = btf_type_by_id(btf, fn_proto_id);
+		p = &btf_params(fn_proto_t)[i];
+		p->name_off = name_off; /* use remembered str offset */
+	}
+
+	/* clone FUNC now, btf__add_func() enforces non-empty name, so use
+	 * entry program's name as a placeholder, which we replace immediately
+	 * with original name_off
+	 */
+	fn_id = btf__add_func(btf, prog->name, linkage, fn_proto_id);
+	if (fn_id < 0)
+		return -EINVAL;
+
+	fn_t = btf_type_by_id(btf, fn_id);
+	fn_t->name_off = fn_name_off; /* reuse original string */
+
+	return fn_id;
+}
+
+/* Check if main program or global subprog's function prototype has `arg:ctx`
+ * argument tags, and, if necessary, substitute correct type to match what BPF
+ * verifier would expect, taking into account specific program type. This
+ * allows to support __arg_ctx tag transparently on old kernels that don't yet
+ * have a native support for it in the verifier, making user's life much
+ * easier.
+ */
+static int bpf_program_fixup_func_info(struct bpf_object *obj, struct bpf_program *prog)
+{
+	const char *ctx_name = NULL, *ctx_tag = "arg:ctx";
+	struct bpf_func_info_min *func_rec;
+	struct btf_type *fn_t, *fn_proto_t;
+	struct btf *btf = obj->btf;
+	const struct btf_type *t;
+	struct btf_param *p;
+	int ptr_id = 0, struct_id, tag_id, orig_fn_id;
+	int i, n, arg_idx, arg_cnt, err, rec_idx;
+	int *orig_ids;
+
+	/* no .BTF.ext, no problem */
+	if (!obj->btf_ext || !prog->func_info)
+		return 0;
+
+	/* some BPF program types just don't have named context structs, so
+	 * this fallback mechanism doesn't work for them
+	 */
+	for (i = 0; i < ARRAY_SIZE(global_ctx_map); i++) {
+		if (global_ctx_map[i].prog_type != prog->type)
+			continue;
+		ctx_name = global_ctx_map[i].ctx_name;
+		break;
+	}
+	if (!ctx_name)
+		return 0;
+
+	/* remember original func BTF IDs to detect if we already cloned them */
+	orig_ids = calloc(prog->func_info_cnt, sizeof(*orig_ids));
+	if (!orig_ids)
+		return -ENOMEM;
+	for (i = 0; i < prog->func_info_cnt; i++) {
+		func_rec = prog->func_info + prog->func_info_rec_size * i;
+		orig_ids[i] = func_rec->type_id;
+	}
+
+	/* go through each DECL_TAG with "arg:ctx" and see if it points to one
+	 * of our subprogs; if yes and subprog is global and needs adjustment,
+	 * clone and adjust FUNC -> FUNC_PROTO combo
+	 */
+	for (i = 1, n = btf__type_cnt(btf); i < n; i++) {
+		/* only DECL_TAG with "arg:ctx" value are interesting */
+		t = btf__type_by_id(btf, i);
+		if (!btf_is_decl_tag(t))
+			continue;
+		if (strcmp(btf__str_by_offset(btf, t->name_off), ctx_tag) != 0)
+			continue;
+
+		/* only global funcs need adjustment, if at all */
+		orig_fn_id = t->type;
+		fn_t = btf_type_by_id(btf, orig_fn_id);
+		if (!btf_is_func(fn_t) || btf_func_linkage(fn_t) != BTF_FUNC_GLOBAL)
+			continue;
+
+		/* sanity check FUNC -> FUNC_PROTO chain, just in case */
+		fn_proto_t = btf_type_by_id(btf, fn_t->type);
+		if (!fn_proto_t || !btf_is_func_proto(fn_proto_t))
+			continue;
+
+		/* find corresponding func_info record */
+		func_rec = NULL;
+		for (rec_idx = 0; rec_idx < prog->func_info_cnt; rec_idx++) {
+			if (orig_ids[rec_idx] == t->type) {
+				func_rec = prog->func_info + prog->func_info_rec_size * rec_idx;
+				break;
+			}
+		}
+		/* current main program doesn't call into this subprog */
+		if (!func_rec)
+			continue;
+
+		/* some more sanity checking of DECL_TAG */
+		arg_cnt = btf_vlen(fn_proto_t);
+		arg_idx = btf_decl_tag(t)->component_idx;
+		if (arg_idx < 0 || arg_idx >= arg_cnt)
+			continue;
+
+		/* check if existing parameter already matches verifier expectations */
+		p = &btf_params(fn_proto_t)[arg_idx];
+		t = skip_mods_and_typedefs(btf, p->type, NULL);
+		if (btf_is_ptr(t) &&
+		    (t = skip_mods_and_typedefs(btf, t->type, NULL)) &&
+		    btf_is_struct(t) &&
+		    strcmp(btf__str_by_offset(btf, t->name_off), ctx_name) == 0) {
+			continue; /* no need for fix up */
+		}
+
+		/* clone fn/fn_proto, unless we already did it for another arg */
+		if (func_rec->type_id == orig_fn_id) {
+			int fn_id;
+
+			fn_id = clone_func_btf_info(btf, orig_fn_id, prog);
+			if (fn_id < 0) {
+				err = fn_id;
+				goto err_out;
+			}
+
+			/* point func_info record to a cloned FUNC type */
+			func_rec->type_id = fn_id;
+		}
+
+		/* create PTR -> STRUCT type chain to mark PTR_TO_CTX argument;
+		 * we do it just once per main BPF program, as all global
+		 * funcs share the same program type, so need only PTR ->
+		 * STRUCT type chain
+		 */
+		if (ptr_id == 0) {
+			struct_id = btf__add_struct(btf, ctx_name, 0);
+			ptr_id = btf__add_ptr(btf, struct_id);
+			if (ptr_id < 0 || struct_id < 0) {
+				err = -EINVAL;
+				goto err_out;
+			}
+		}
+
+		/* for completeness, clone DECL_TAG and point it to cloned param */
+		tag_id = btf__add_decl_tag(btf, ctx_tag, func_rec->type_id, arg_idx);
+		if (tag_id < 0) {
+			err = -EINVAL;
+			goto err_out;
+		}
+
+		/* all the BTF manipulations invalidated pointers, refetch them */
+		fn_t = btf_type_by_id(btf, func_rec->type_id);
+		fn_proto_t = btf_type_by_id(btf, fn_t->type);
+
+		/* fix up type ID pointed to by param */
+		p = &btf_params(fn_proto_t)[arg_idx];
+		p->type = ptr_id;
+	}
+
+	free(orig_ids);
+	return 0;
+err_out:
+	free(orig_ids);
+	return err;
+}
+
+static int bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
 {
 	struct bpf_program *prog;
 	size_t i, j;
@@ -6745,19 +6984,28 @@ bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
 			}
 		}
 	}
-	/* Process data relos for main programs */
 	for (i = 0; i < obj->nr_programs; i++) {
 		prog = &obj->programs[i];
 		if (prog_is_subprog(obj, prog))
 			continue;
 		if (!prog->autoload)
 			continue;
+
+		/* Process data relos for main programs */
 		err = bpf_object__relocate_data(obj, prog);
 		if (err) {
 			pr_warn("prog '%s': failed to relocate data references: %d\n",
 				prog->name, err);
 			return err;
 		}
+
+		/* Fix up .BTF.ext information, if necessary */
+		err = bpf_program_fixup_func_info(obj, prog);
+		if (err) {
+			pr_warn("prog '%s': failed to perform .BTF.ext fix ups: %d\n",
+				prog->name, err);
+			return err;
+		}
 	}
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 bpf-next 8/9] selftests/bpf: add arg:ctx cases to test_global_funcs tests
  2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2024-01-04  1:38 ` [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic Andrii Nakryiko
@ 2024-01-04  1:38 ` Andrii Nakryiko
  2024-01-04  1:38 ` [PATCH v3 bpf-next 9/9] selftests/bpf: add __arg_ctx BTF rewrite test Andrii Nakryiko
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04  1:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Jiri Olsa

Add a few extra cases of global funcs with context arguments. This time
rely on "arg:ctx" decl_tag (__arg_ctx macro), but put it next to
"classic" cases where context argument has to be of an exact type that
BPF verifier expects (e.g., bpf_user_pt_regs_t for kprobe/uprobe).

Colocating all these cases separately from other global func args that
rely on arg:xxx decl tags (in verifier_global_subprogs.c) allows for
simpler backwards compatibility testing on old kernels. All the cases in
test_global_func_ctx_args.c are supposed to work on older kernels, which
was manually validated during development.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../bpf/progs/test_global_func_ctx_args.c     | 49 +++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c b/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
index 7faa8eef0598..9a06e5eb1fbe 100644
--- a/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
+++ b/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
@@ -102,3 +102,52 @@ int perf_event_ctx(void *ctx)
 {
 	return perf_event_ctx_subprog(ctx);
 }
+
+/* this global subprog can be now called from many types of entry progs, each
+ * with different context type
+ */
+__weak int subprog_ctx_tag(void *ctx __arg_ctx)
+{
+	return bpf_get_stack(ctx, stack, sizeof(stack), 0);
+}
+
+struct my_struct { int x; };
+
+__weak int subprog_multi_ctx_tags(void *ctx1 __arg_ctx,
+				  struct my_struct *mem,
+				  void *ctx2 __arg_ctx)
+{
+	if (!mem)
+		return 0;
+
+	return bpf_get_stack(ctx1, stack, sizeof(stack), 0) +
+	       mem->x +
+	       bpf_get_stack(ctx2, stack, sizeof(stack), 0);
+}
+
+SEC("?raw_tp")
+__success __log_level(2)
+int arg_tag_ctx_raw_tp(void *ctx)
+{
+	struct my_struct x = { .x = 123 };
+
+	return subprog_ctx_tag(ctx) + subprog_multi_ctx_tags(ctx, &x, ctx);
+}
+
+SEC("?perf_event")
+__success __log_level(2)
+int arg_tag_ctx_perf(void *ctx)
+{
+	struct my_struct x = { .x = 123 };
+
+	return subprog_ctx_tag(ctx) + subprog_multi_ctx_tags(ctx, &x, ctx);
+}
+
+SEC("?kprobe")
+__success __log_level(2)
+int arg_tag_ctx_kprobe(void *ctx)
+{
+	struct my_struct x = { .x = 123 };
+
+	return subprog_ctx_tag(ctx) + subprog_multi_ctx_tags(ctx, &x, ctx);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 bpf-next 9/9] selftests/bpf: add __arg_ctx BTF rewrite test
  2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2024-01-04  1:38 ` [PATCH v3 bpf-next 8/9] selftests/bpf: add arg:ctx cases to test_global_funcs tests Andrii Nakryiko
@ 2024-01-04  1:38 ` Andrii Nakryiko
  2024-01-04  2:33 ` [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Eduard Zingerman
  2024-01-04  8:13 ` patchwork-bot+netdevbpf
  10 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04  1:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Eduard Zingerman

Add a test validating that libbpf uploads BTF and func_info with
rewritten type information for arguments of global subprogs that are
marked with __arg_ctx tag.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../bpf/prog_tests/test_global_funcs.c        | 106 ++++++++++++++++++
 1 file changed, 106 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c
index e0879df38639..67d4ef9e62b3 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c
@@ -20,6 +20,109 @@
 #include "test_global_func17.skel.h"
 #include "test_global_func_ctx_args.skel.h"
 
+#include "bpf/libbpf_internal.h"
+#include "btf_helpers.h"
+
+static void check_ctx_arg_type(const struct btf *btf, const struct btf_param *p)
+{
+	const struct btf_type *t;
+	const char *s;
+
+	t = btf__type_by_id(btf, p->type);
+	if (!ASSERT_EQ(btf_kind(t), BTF_KIND_PTR, "ptr_t"))
+		return;
+
+	s = btf_type_raw_dump(btf, t->type);
+	if (!ASSERT_HAS_SUBSTR(s, "STRUCT 'bpf_perf_event_data' size=0 vlen=0",
+			       "ctx_struct_t"))
+		return;
+}
+
+static void subtest_ctx_arg_rewrite(void)
+{
+	struct test_global_func_ctx_args *skel = NULL;
+	struct bpf_prog_info info;
+	char func_info_buf[1024] __attribute__((aligned(8)));
+	struct bpf_func_info_min *rec;
+	struct btf *btf = NULL;
+	__u32 info_len = sizeof(info);
+	int err, fd, i;
+
+	skel = test_global_func_ctx_args__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	bpf_program__set_autoload(skel->progs.arg_tag_ctx_perf, true);
+
+	err = test_global_func_ctx_args__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto out;
+
+	memset(&info, 0, sizeof(info));
+	info.func_info = ptr_to_u64(&func_info_buf);
+	info.nr_func_info = 3;
+	info.func_info_rec_size = sizeof(struct bpf_func_info_min);
+
+	fd = bpf_program__fd(skel->progs.arg_tag_ctx_perf);
+	err = bpf_prog_get_info_by_fd(fd, &info, &info_len);
+	if (!ASSERT_OK(err, "prog_info"))
+		goto out;
+
+	if (!ASSERT_EQ(info.nr_func_info, 3, "nr_func_info"))
+		goto out;
+
+	btf = btf__load_from_kernel_by_id(info.btf_id);
+	if (!ASSERT_OK_PTR(btf, "obj_kern_btf"))
+		goto out;
+
+	rec = (struct bpf_func_info_min *)func_info_buf;
+	for (i = 0; i < info.nr_func_info; i++, rec = (void *)rec + info.func_info_rec_size) {
+		const struct btf_type *fn_t, *proto_t;
+		const char *name;
+
+		if (rec->insn_off == 0)
+			continue; /* main prog, skip */
+
+		fn_t = btf__type_by_id(btf, rec->type_id);
+		if (!ASSERT_OK_PTR(fn_t, "fn_type"))
+			goto out;
+		if (!ASSERT_EQ(btf_kind(fn_t), BTF_KIND_FUNC, "fn_type_kind"))
+			goto out;
+		proto_t = btf__type_by_id(btf, fn_t->type);
+		if (!ASSERT_OK_PTR(proto_t, "proto_type"))
+			goto out;
+
+		name = btf__name_by_offset(btf, fn_t->name_off);
+		if (strcmp(name, "subprog_ctx_tag") == 0) {
+			/* int subprog_ctx_tag(void *ctx __arg_ctx) */
+			if (!ASSERT_EQ(btf_vlen(proto_t), 1, "arg_cnt"))
+				goto out;
+
+			/* arg 0 is PTR -> STRUCT bpf_perf_event_data */
+			check_ctx_arg_type(btf, &btf_params(proto_t)[0]);
+		} else if (strcmp(name, "subprog_multi_ctx_tags") == 0) {
+			/* int subprog_multi_ctx_tags(void *ctx1 __arg_ctx,
+			 *			      struct my_struct *mem,
+			 *			      void *ctx2 __arg_ctx)
+			 */
+			if (!ASSERT_EQ(btf_vlen(proto_t), 3, "arg_cnt"))
+				goto out;
+
+			/* arg 0 is PTR -> STRUCT bpf_perf_event_data */
+			check_ctx_arg_type(btf, &btf_params(proto_t)[0]);
+			/* arg 2 is PTR -> STRUCT bpf_perf_event_data */
+			check_ctx_arg_type(btf, &btf_params(proto_t)[2]);
+		} else {
+			ASSERT_FAIL("unexpected subprog %s", name);
+			goto out;
+		}
+	}
+
+out:
+	btf__free(btf);
+	test_global_func_ctx_args__destroy(skel);
+}
+
 void test_test_global_funcs(void)
 {
 	RUN_TESTS(test_global_func1);
@@ -40,4 +143,7 @@ void test_test_global_funcs(void)
 	RUN_TESTS(test_global_func16);
 	RUN_TESTS(test_global_func17);
 	RUN_TESTS(test_global_func_ctx_args);
+
+	if (test__start_subtest("ctx_arg_rewrite"))
+		subtest_ctx_arg_rewrite();
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support
  2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2024-01-04  1:38 ` [PATCH v3 bpf-next 9/9] selftests/bpf: add __arg_ctx BTF rewrite test Andrii Nakryiko
@ 2024-01-04  2:33 ` Eduard Zingerman
  2024-01-04  8:13 ` patchwork-bot+netdevbpf
  10 siblings, 0 replies; 24+ messages in thread
From: Eduard Zingerman @ 2024-01-04  2:33 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Wed, 2024-01-03 at 17:38 -0800, Andrii Nakryiko wrote:
> Support __arg_ctx global function argument tag semantics even on older kernels
> that don't natively support it through btf_decl_tag("arg:ctx").
> 
> Patches #2-#6 are preparatory work to allow to postpone BTF loading into the
> kernel until after all the BPF program relocations (including global func
> appending to main programs) are done. Patch #4 is perhaps the most important
> and establishes pre-created stable placeholder FDs, so that relocations can
> embed valid map FDs into ldimm64 instructions.
> 
> Once BTF is done after relocation, what's left is to adjust BTF information to
> have each main program's copy of each used global subprog to point to its own
> adjusted FUNC -> FUNC_PROTO type chain (if they use __arg_ctx) in such a way
> as to satisfy type expectations of BPF verifier regarding the PTR_TO_CTX
> argument definition. See patch #8 for details.
> 
> Patch #8 adds few more __arg_ctx use cases (edge cases like multiple arguments
> having __arg_ctx, etc) to test_global_func_ctx_args.c, to make it simple to
> validate that this logic indeed works on old kernels. It does. But just to be
> 100% sure patch #9 adds a test validating that libbpf uploads func_info with
> properly modified BTF data.
> 
> v2->v3:
>   - drop renaming patch (Alexei, Eduard);
>   - use memfd_create() instead of /dev/null for placeholder FD (Eduard);
>   - add one more test for validating BTF rewrite logic (Eduard);
>   - fixed wrong -errno usage, reshuffled some BTF rewrite bits (Eduard);
> v1->v2:
>   - do internal functions renaming in patch #1 (Alexei);
>   - extract cloning of FUNC -> FUNC_PROTO information into separate function
>     (Alexei);

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

Thank you for adding the test in patch #9.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-04  1:38 ` [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic Andrii Nakryiko
@ 2024-01-04  5:39   ` Alexei Starovoitov
  2024-01-04 18:37     ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2024-01-04  5:39 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Kernel Team, Jiri Olsa

On Wed, Jan 3, 2024 at 5:39 PM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> This limitation was the reason to add btf_decl_tag("arg:ctx"), making
> the actual argument type not important, so that user can just define
> "generic" signature:
>
>   __noinline int global_subprog(void *ctx __arg_ctx) { ... }

I still think that this __arg_ctx only makes sense with 'void *'.
Blind rewrite of ctx is a foot gun.

I've tried the following:

diff --git a/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
b/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
index 9a06e5eb1fbe..0e5f5205d4a8 100644
--- a/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
+++ b/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
@@ -106,9 +106,9 @@ int perf_event_ctx(void *ctx)
 /* this global subprog can be now called from many types of entry progs, each
  * with different context type
  */
-__weak int subprog_ctx_tag(void *ctx __arg_ctx)
+__weak int subprog_ctx_tag(long ctx __arg_ctx)
 {
-       return bpf_get_stack(ctx, stack, sizeof(stack), 0);
+       return bpf_get_stack((void *)ctx, stack, sizeof(stack), 0);
 }

 struct my_struct { int x; };
@@ -131,7 +131,7 @@ int arg_tag_ctx_raw_tp(void *ctx)
 {
        struct my_struct x = { .x = 123 };

-       return subprog_ctx_tag(ctx) + subprog_multi_ctx_tags(ctx, &x, ctx);
+       return subprog_ctx_tag((long)ctx) +
subprog_multi_ctx_tags(ctx, &x, ctx);
 }

and it "works".

Both kernel and libbpf should really limit it to 'void *'.

In the other email I suggested to allow types that match expected
based on prog type, but even that is probably a danger zone as well.
The correct type would already be detected by the verifier,
so extra __arg_ctx is pointless.
It makes sense only for such polymorphic functions and those
better use 'void *' and don't dereference it.

I think this can be a follow up.

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support
  2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
                   ` (9 preceding siblings ...)
  2024-01-04  2:33 ` [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Eduard Zingerman
@ 2024-01-04  8:13 ` patchwork-bot+netdevbpf
  10 siblings, 0 replies; 24+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-01-04  8:13 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, martin.lau, kernel-team

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Wed, 3 Jan 2024 17:38:38 -0800 you wrote:
> Support __arg_ctx global function argument tag semantics even on older kernels
> that don't natively support it through btf_decl_tag("arg:ctx").
> 
> Patches #2-#6 are preparatory work to allow to postpone BTF loading into the
> kernel until after all the BPF program relocations (including global func
> appending to main programs) are done. Patch #4 is perhaps the most important
> and establishes pre-created stable placeholder FDs, so that relocations can
> embed valid map FDs into ldimm64 instructions.
> 
> [...]

Here is the summary with links:
  - [v3,bpf-next,1/9] libbpf: make uniform use of btf__fd() accessor inside libbpf
    https://git.kernel.org/bpf/bpf-next/c/df7c3f7d3a3d
  - [v3,bpf-next,2/9] libbpf: use explicit map reuse flag to skip map creation steps
    https://git.kernel.org/bpf/bpf-next/c/fa98b54bff39
  - [v3,bpf-next,3/9] libbpf: don't rely on map->fd as an indicator of map being created
    https://git.kernel.org/bpf/bpf-next/c/f08c18e083ad
  - [v3,bpf-next,4/9] libbpf: use stable map placeholder FDs
    https://git.kernel.org/bpf/bpf-next/c/dac645b950ea
  - [v3,bpf-next,5/9] libbpf: move exception callbacks assignment logic into relocation step
    https://git.kernel.org/bpf/bpf-next/c/fb03be7c4a27
  - [v3,bpf-next,6/9] libbpf: move BTF loading step after relocation step
    https://git.kernel.org/bpf/bpf-next/c/1004742d7ff0
  - [v3,bpf-next,7/9] libbpf: implement __arg_ctx fallback logic
    https://git.kernel.org/bpf/bpf-next/c/2f38fe689470
  - [v3,bpf-next,8/9] selftests/bpf: add arg:ctx cases to test_global_funcs tests
    https://git.kernel.org/bpf/bpf-next/c/67fe459144dd
  - [v3,bpf-next,9/9] selftests/bpf: add __arg_ctx BTF rewrite test
    https://git.kernel.org/bpf/bpf-next/c/95226f5a3669

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-04  5:39   ` Alexei Starovoitov
@ 2024-01-04 18:37     ` Andrii Nakryiko
  2024-01-04 18:52       ` Alexei Starovoitov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04 18:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Jiri Olsa

On Wed, Jan 3, 2024 at 9:39 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jan 3, 2024 at 5:39 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > This limitation was the reason to add btf_decl_tag("arg:ctx"), making
> > the actual argument type not important, so that user can just define
> > "generic" signature:
> >
> >   __noinline int global_subprog(void *ctx __arg_ctx) { ... }
>
> I still think that this __arg_ctx only makes sense with 'void *'.
> Blind rewrite of ctx is a foot gun.
>
> I've tried the following:
>
> diff --git a/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> b/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> index 9a06e5eb1fbe..0e5f5205d4a8 100644
> --- a/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> +++ b/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> @@ -106,9 +106,9 @@ int perf_event_ctx(void *ctx)
>  /* this global subprog can be now called from many types of entry progs, each
>   * with different context type
>   */
> -__weak int subprog_ctx_tag(void *ctx __arg_ctx)
> +__weak int subprog_ctx_tag(long ctx __arg_ctx)
>  {
> -       return bpf_get_stack(ctx, stack, sizeof(stack), 0);
> +       return bpf_get_stack((void *)ctx, stack, sizeof(stack), 0);
>  }
>
>  struct my_struct { int x; };
> @@ -131,7 +131,7 @@ int arg_tag_ctx_raw_tp(void *ctx)
>  {
>         struct my_struct x = { .x = 123 };
>
> -       return subprog_ctx_tag(ctx) + subprog_multi_ctx_tags(ctx, &x, ctx);
> +       return subprog_ctx_tag((long)ctx) +
> subprog_multi_ctx_tags(ctx, &x, ctx);
>  }
>
> and it "works".

Yeah, but you had to actively force casting everywhere *and* you still
had to consciously add __arg_ctx, right? If a user wants to subvert
the type system, they will do it. It's C, after all. But if they just
accidentally use sk_buff ctx and call it from the XDP program with
xdp_buff/xdp_md, the compiler will call out type mismatch.

>
> Both kernel and libbpf should really limit it to 'void *'.
>
> In the other email I suggested to allow types that match expected
> based on prog type, but even that is probably a danger zone as well.
> The correct type would already be detected by the verifier,
> so extra __arg_ctx is pointless.
> It makes sense only for such polymorphic functions and those
> better use 'void *' and don't dereference it.
>
> I think this can be a follow up.

Not really just polymorphic functions. Think about subprog
specifically for the fentry program, as one example. You *need*
__arg_ctx just to make context passing work, but you also want
non-`void *` type to access arguments.

int subprog(u64 *args __arg_ctx) { ... }

SEC("fentry")
int BPF_PROG(main_prog, ...)
{
    return subprog(ctx);
}

Similarly, tracepoint programs, you'd have:

int subprog(struct syscall_trace_enter* ctx __arg_ctx) { ... }

SEC("tracepoint/syscalls/sys_enter_kill")
int main_prog(struct syscall_trace_enter* ctx)
{
    return subprog(ctx);
}

So that's one group of cases.

Another special case are networking programs, where both "__sk_buff"
and "sk_buff" are allowed, same for "xdp_buff" and "xdp_md".

Also, kprobes are special, both "struct bpf_user_pt_regs_t" and
*typedef* "bpf_user_pt_regs_t" are supported. But in practice users
will often just use `struct pt_regs *ctx`, actually.

There might be some other edges I don't yet realize.

In short, I think any sort of enforcement will just cause unnecessary
pain, while seemingly fixing some problem that doesn't seem to be a
problem in practice.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-04 18:37     ` Andrii Nakryiko
@ 2024-01-04 18:52       ` Alexei Starovoitov
  2024-01-04 20:58         ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2024-01-04 18:52 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Jiri Olsa

On Thu, Jan 4, 2024 at 10:37 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Jan 3, 2024 at 9:39 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Jan 3, 2024 at 5:39 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > This limitation was the reason to add btf_decl_tag("arg:ctx"), making
> > > the actual argument type not important, so that user can just define
> > > "generic" signature:
> > >
> > >   __noinline int global_subprog(void *ctx __arg_ctx) { ... }
> >
> > I still think that this __arg_ctx only makes sense with 'void *'.
> > Blind rewrite of ctx is a foot gun.
> >
> > I've tried the following:
> >
> > diff --git a/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> > b/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> > index 9a06e5eb1fbe..0e5f5205d4a8 100644
> > --- a/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> > +++ b/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> > @@ -106,9 +106,9 @@ int perf_event_ctx(void *ctx)
> >  /* this global subprog can be now called from many types of entry progs, each
> >   * with different context type
> >   */
> > -__weak int subprog_ctx_tag(void *ctx __arg_ctx)
> > +__weak int subprog_ctx_tag(long ctx __arg_ctx)
> >  {
> > -       return bpf_get_stack(ctx, stack, sizeof(stack), 0);
> > +       return bpf_get_stack((void *)ctx, stack, sizeof(stack), 0);
> >  }
> >
> >  struct my_struct { int x; };
> > @@ -131,7 +131,7 @@ int arg_tag_ctx_raw_tp(void *ctx)
> >  {
> >         struct my_struct x = { .x = 123 };
> >
> > -       return subprog_ctx_tag(ctx) + subprog_multi_ctx_tags(ctx, &x, ctx);
> > +       return subprog_ctx_tag((long)ctx) +
> > subprog_multi_ctx_tags(ctx, &x, ctx);
> >  }
> >
> > and it "works".
>
> Yeah, but you had to actively force casting everywhere *and* you still
> had to consciously add __arg_ctx, right? If a user wants to subvert
> the type system, they will do it. It's C, after all. But if they just
> accidentally use sk_buff ctx and call it from the XDP program with
> xdp_buff/xdp_md, the compiler will call out type mismatch.

I could have used long everywhere and avoided casts.

> >
> > Both kernel and libbpf should really limit it to 'void *'.
> >
> > In the other email I suggested to allow types that match expected
> > based on prog type, but even that is probably a danger zone as well.
> > The correct type would already be detected by the verifier,
> > so extra __arg_ctx is pointless.
> > It makes sense only for such polymorphic functions and those
> > better use 'void *' and don't dereference it.
> >
> > I think this can be a follow up.
>
> Not really just polymorphic functions. Think about subprog
> specifically for the fentry program, as one example. You *need*
> __arg_ctx just to make context passing work, but you also want
> non-`void *` type to access arguments.
>
> int subprog(u64 *args __arg_ctx) { ... }
>
> SEC("fentry")
> int BPF_PROG(main_prog, ...)
> {
>     return subprog(ctx);
> }
>
> Similarly, tracepoint programs, you'd have:
>
> int subprog(struct syscall_trace_enter* ctx __arg_ctx) { ... }
>
> SEC("tracepoint/syscalls/sys_enter_kill")
> int main_prog(struct syscall_trace_enter* ctx)
> {
>     return subprog(ctx);
> }
>
> So that's one group of cases.

But the above two are not supported by libbpf
since it doesn't handle "tracing" and "tracepoint" prog types
in global_ctx_map.
I suspect the kernel sort-of supports above, but in a dangerous
and broken way.

My point is that users must not use __arg_ctx in these two cases.
fentry (tracing prog type) wants 'void *' in the kernel to
match to ctx.
So the existing mechanism (prior to arg_ctx in the kernel)
should already work.

> Another special case are networking programs, where both "__sk_buff"
> and "sk_buff" are allowed, same for "xdp_buff" and "xdp_md".

what do you mean both?
networking bpf prog must only use __sk_buff and that is one and
only supported ctx.
Using 'struct sk_buff *ctx __arg_ctx' will be a bad bug.
Since offsets will be all wrong while ctx rewrite will apply garbage
and will likely fail.

> Also, kprobes are special, both "struct bpf_user_pt_regs_t" and
> *typedef* "bpf_user_pt_regs_t" are supported. But in practice users
> will often just use `struct pt_regs *ctx`, actually.

Same thing. The global bpf prog has to use bpf_user_pt_regs_t
to be properly recognized as ctx arg type.
Nothing special. Using 'struct pt_regs * ctx __arg_ctx' and blind
rewrite will cause similar hard to debug bugs when
bpf_user_pt_regs_t doesn't match pt_regs that bpf prog sees
at compile time.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-04 18:52       ` Alexei Starovoitov
@ 2024-01-04 20:58         ` Andrii Nakryiko
  2024-01-05  1:33           ` Alexei Starovoitov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-04 20:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Jiri Olsa

On Thu, Jan 4, 2024 at 10:52 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jan 4, 2024 at 10:37 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Jan 3, 2024 at 9:39 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Jan 3, 2024 at 5:39 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > This limitation was the reason to add btf_decl_tag("arg:ctx"), making
> > > > the actual argument type not important, so that user can just define
> > > > "generic" signature:
> > > >
> > > >   __noinline int global_subprog(void *ctx __arg_ctx) { ... }
> > >
> > > I still think that this __arg_ctx only makes sense with 'void *'.
> > > Blind rewrite of ctx is a foot gun.
> > >
> > > I've tried the following:
> > >
> > > diff --git a/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> > > b/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> > > index 9a06e5eb1fbe..0e5f5205d4a8 100644
> > > --- a/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> > > +++ b/tools/testing/selftests/bpf/progs/test_global_func_ctx_args.c
> > > @@ -106,9 +106,9 @@ int perf_event_ctx(void *ctx)
> > >  /* this global subprog can be now called from many types of entry progs, each
> > >   * with different context type
> > >   */
> > > -__weak int subprog_ctx_tag(void *ctx __arg_ctx)
> > > +__weak int subprog_ctx_tag(long ctx __arg_ctx)
> > >  {
> > > -       return bpf_get_stack(ctx, stack, sizeof(stack), 0);
> > > +       return bpf_get_stack((void *)ctx, stack, sizeof(stack), 0);
> > >  }
> > >
> > >  struct my_struct { int x; };
> > > @@ -131,7 +131,7 @@ int arg_tag_ctx_raw_tp(void *ctx)
> > >  {
> > >         struct my_struct x = { .x = 123 };
> > >
> > > -       return subprog_ctx_tag(ctx) + subprog_multi_ctx_tags(ctx, &x, ctx);
> > > +       return subprog_ctx_tag((long)ctx) +
> > > subprog_multi_ctx_tags(ctx, &x, ctx);
> > >  }
> > >
> > > and it "works".
> >
> > Yeah, but you had to actively force casting everywhere *and* you still
> > had to consciously add __arg_ctx, right? If a user wants to subvert
> > the type system, they will do it. It's C, after all. But if they just
> > accidentally use sk_buff ctx and call it from the XDP program with
> > xdp_buff/xdp_md, the compiler will call out type mismatch.
>
> I could have used long everywhere and avoided casts.
>

My point was that it's hard to accidentally forget to "generalize"
type if you were supporting sk_buff, and suddenly started calling it
with xdp_md.

From my POV, if I'm a user, and I declare an argument as long and
annotate it as __arg_ctx, then I know what I'm doing and I'd hate for
some smart-ass library to double-guess me dictating what exact
incantation I should specify to make it happy.

If I'm clueless and just randomly sprinkling __arg_ctx, then I have
bigger problems than type mismatch.

> > >
> > > Both kernel and libbpf should really limit it to 'void *'.
> > >
> > > In the other email I suggested to allow types that match expected
> > > based on prog type, but even that is probably a danger zone as well.
> > > The correct type would already be detected by the verifier,
> > > so extra __arg_ctx is pointless.
> > > It makes sense only for such polymorphic functions and those
> > > better use 'void *' and don't dereference it.
> > >
> > > I think this can be a follow up.
> >
> > Not really just polymorphic functions. Think about subprog
> > specifically for the fentry program, as one example. You *need*
> > __arg_ctx just to make context passing work, but you also want
> > non-`void *` type to access arguments.
> >
> > int subprog(u64 *args __arg_ctx) { ... }
> >
> > SEC("fentry")
> > int BPF_PROG(main_prog, ...)
> > {
> >     return subprog(ctx);
> > }
> >
> > Similarly, tracepoint programs, you'd have:
> >
> > int subprog(struct syscall_trace_enter* ctx __arg_ctx) { ... }
> >
> > SEC("tracepoint/syscalls/sys_enter_kill")
> > int main_prog(struct syscall_trace_enter* ctx)
> > {
> >     return subprog(ctx);
> > }
> >
> > So that's one group of cases.
>
> But the above two are not supported by libbpf
> since it doesn't handle "tracing" and "tracepoint" prog types
> in global_ctx_map.

Ok, so I'm confused now. I thought we were talking about both
kernel-side and libbpf-side extra checks.

Look, I don't want libbpf to be too smart and actually cause
unnecessary problems for users (pt_regs being one such case, see
below), and making users do work arounds just to satisfy libbpf. Like
passing `void * ctx __arg_ctx`, but then casting to `struct pt_regs`,
for example. (see below about pt_regs)

Sure, if someone has no clue what they are doing and specifies a
different type, I think it's acceptable for them to have that bug.
They will debug it, fix it, learn something, and won't do it again.
I'd rather assume users know what they are doing rather than
double-guess what they are doing.

If we are talking about libbpf-only changes just for those types that
libbpf is rewriting, fine (though I'm still not happy about struct
pt_regs case not working), we can add it. If Eduard concurs, I'll add
it, it's not hard. But as I said, I think libbpf would be doing
something that it's not supposed to do here (libbpf is just silently
adding an annotation, effectively, it's not changing how code is
generated or how verifier is interpreting types).

If we are talking about kernel-side extra checks, I propose we do that
on my next patch set adding PTR_TO_BTF_ID, but again, we need to keep
those non-polymorphic valid cases in mind (u64 *ctx for fentry,
tracepoint structs, etc) and not make them unnecessarily painful.

> I suspect the kernel sort-of supports above, but in a dangerous
> and broken way.
>
> My point is that users must not use __arg_ctx in these two cases.
> fentry (tracing prog type) wants 'void *' in the kernel to
> match to ctx.
> So the existing mechanism (prior to arg_ctx in the kernel)
> should already work.

Let's unpack. fentry doesn't "want" `void *`, it just doesn't support
passing context argument to global subprog. So you would have to
specify __arg_ctx, and that will only work on recent enough kernels.

At that point, all of `long ctx __arg_ctx`, `void *ctx __arg_ctx` and
`u64 *ctx __arg_ctx` will work. Yes, `long ctx` out of those 3 are
weird, but verifier will treat it as PTR_TO_CTX regardless of specific
type correctly.

More importantly, I'm saying that both `void *ctx __arg_ctx` and `u64
*ctx __arg_ctx` should work for fentry, don't you agree?

>
> > Another special case are networking programs, where both "__sk_buff"
> > and "sk_buff" are allowed, same for "xdp_buff" and "xdp_md".
>
> what do you mean both?
> networking bpf prog must only use __sk_buff and that is one and
> only supported ctx.
> Using 'struct sk_buff *ctx __arg_ctx' will be a bad bug.
> Since offsets will be all wrong while ctx rewrite will apply garbage
> and will likely fail.

You are right about wrong offsets, but the kernel does allow it. See
[0]. I actually tried, and indeed, it allows sk_buff to denote
"context". Note that I had to comment out skb->len dereference
(otherwise verifier will correctly complain about wrong offset), but
it is recognized as PTR_TO_CTX and I could technically pass it to
another subprog or helpers/kfuncs (and that would work).

  [0] https://lore.kernel.org/all/20230301154953.641654-2-joannelkoong@gmail.com/

diff --git a/tools/testing/selftests/bpf/progs/test_global_func2.c
b/tools/testing/selftests/bpf/progs/test_global_func2.c
index 2beab9c3b68a..29d7f3e78f8e 100644
--- a/tools/testing/selftests/bpf/progs/test_global_func2.c
+++ b/tools/testing/selftests/bpf/progs/test_global_func2.c
@@ -1,16 +1,15 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright (c) 2020 Facebook */
-#include <stddef.h>
-#include <linux/bpf.h>
+#include "vmlinux.h"
 #include <bpf/bpf_helpers.h>
 #include "bpf_misc.h"

 #define MAX_STACK (512 - 3 * 32)

 static __attribute__ ((noinline))
-int f0(int var, struct __sk_buff *skb)
+int f0(int var, struct sk_buff *skb)
 {
-       return skb->len;
+       return 0;
 }

 __attribute__ ((noinline))
@@ -20,7 +19,7 @@ int f1(struct __sk_buff *skb)

        __sink(buf[MAX_STACK - 1]);

-       return f0(0, skb) + skb->len;
+       return f0(0, (void*)skb) + skb->len;
 }

 int f3(int, struct __sk_buff *skb, int);
@@ -45,5 +44,5 @@ SEC("tc")
 __success
 int global_func2(struct __sk_buff *skb)
 {
-       return f0(1, skb) + f1(skb) + f2(2, skb) + f3(3, skb, 4);
+       return f0(1, (void *)skb) + f1(skb) + f2(2, skb) + f3(3, skb, 4);
 }


>
> > Also, kprobes are special, both "struct bpf_user_pt_regs_t" and
> > *typedef* "bpf_user_pt_regs_t" are supported. But in practice users
> > will often just use `struct pt_regs *ctx`, actually.
>
> Same thing. The global bpf prog has to use bpf_user_pt_regs_t
> to be properly recognized as ctx arg type.
> Nothing special. Using 'struct pt_regs * ctx __arg_ctx' and blind
> rewrite will cause similar hard to debug bugs when
> bpf_user_pt_regs_t doesn't match pt_regs that bpf prog sees
> at compile time.

So this is not the same thing as skbuff. If BPF program is meant for a
single architecture, like x86-64, it's completely valid (and that's
what people have been doing with static subprogs for ages now) to just
use `struct pt_regs`. They are the same thing on x86.

I'll say even more, with libbpf's PT_REGS_xxx() macros you don't even
need to know about pt_regs vs user_pt_regs difference, as macros
properly force-cast arguments, depending on architecture. So in your
BPF code you can just pass `struct pt_regs *` around just fine across
multiple architectures as long as you only use PT_REGS_xxx() macros
and then pass that context to helpers (to get stack trace,
bpf_perf_event_output, etc).

No one even knows about bpf_user_pt_regs_t, I had to dig it up from
kernel source code and let users know what exact type name to use for
global subprog.

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-04 20:58         ` Andrii Nakryiko
@ 2024-01-05  1:33           ` Alexei Starovoitov
  2024-01-05  3:57             ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2024-01-05  1:33 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Jiri Olsa

On Thu, Jan 4, 2024 at 12:58 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
>
> My point was that it's hard to accidentally forget to "generalize"
> type if you were supporting sk_buff, and suddenly started calling it
> with xdp_md.
>
> From my POV, if I'm a user, and I declare an argument as long and
> annotate it as __arg_ctx, then I know what I'm doing and I'd hate for
> some smart-ass library to double-guess me dictating what exact
> incantation I should specify to make it happy.

But that's exactly what's happening!
The smart-ass libbpf ignores the type in 'struct sk_buff *skb __arg_ctx'
and replaces it with whatever is appropriate for prog type.
More below.

>  static __attribute__ ((noinline))
> -int f0(int var, struct __sk_buff *skb)
> +int f0(int var, struct sk_buff *skb)
>  {
> -       return skb->len;
> +       return 0;
>  }
>
>  __attribute__ ((noinline))
> @@ -20,7 +19,7 @@ int f1(struct __sk_buff *skb)
>
>         __sink(buf[MAX_STACK - 1]);
>
> -       return f0(0, skb) + skb->len;
> +       return f0(0, (void*)skb) + skb->len;

This is static f0. Not sure what you're trying to say.
I don't think btf_get_prog_ctx_type() logic applies here.

> I'll say even more, with libbpf's PT_REGS_xxx() macros you don't even
> need to know about pt_regs vs user_pt_regs difference, as macros
> properly force-cast arguments, depending on architecture. So in your
> BPF code you can just pass `struct pt_regs *` around just fine across
> multiple architectures as long as you only use PT_REGS_xxx() macros
> and then pass that context to helpers (to get stack trace,
> bpf_perf_event_output, etc).

Pretty much. For some time the kernel recognized bpf_user_pt_regs_t
as PTR_TO_CTX for kprobe.
And the users who needed global prog verification with ctx
already used that feature.
We even have helper macros to typeof to correct btf type.

From selftests:

_weak int kprobe_typedef_ctx_subprog(bpf_user_pt_regs_t *ctx)
{
        return bpf_get_stack(ctx, &stack, sizeof(stack), 0);
}

SEC("?kprobe")
__success
int kprobe_typedef_ctx(void *ctx)
{
        return kprobe_typedef_ctx_subprog(ctx);
}

#define pt_regs_struct_t typeof(*(__PT_REGS_CAST((struct pt_regs *)NULL)))

__weak int kprobe_struct_ctx_subprog(pt_regs_struct_t *ctx)
{
        return bpf_get_stack((void *)ctx, &stack, sizeof(stack), 0);
}

SEC("?kprobe")
__success
int kprobe_resolved_ctx(void *ctx)
{
        return kprobe_struct_ctx_subprog(ctx);
}

__PT_REGS_CAST is arch dependent and typeof makes it seen with
correct btf_id and the kernel knows it's PTR_TO_CTX.
All that works. No need for __arg_ctx.
I'm sure you know this.
I'm only explaining for everybody else to follow.

> No one even knows about bpf_user_pt_regs_t, I had to dig it up from
> kernel source code and let users know what exact type name to use for
> global subprog.

Few people know that global subprogs are verified differently than static.
That's true, but I bet people that knew also used the right type for ctx.
If you're saying that __arg_ctx is making it easier for users
to use global subprogs I certainly agree, but it's not
something that was mandatory for uniform global progs.
__arg_ctx main value is for polymorphic subprogs.
An add-on value is ease-of-use for existing non polymorphic subrpogs.

I'm saying that in the above example working code:

__weak int kprobe_typedef_ctx_subprog(bpf_user_pt_regs_t *ctx)

should _not_ be allowed to be replaced with:

__weak int kprobe_typedef_ctx_subprog(struct pt_regs *ctx __arg_ctx)

Unfortunately in the newest kernel/libbpf patches allowed it and
this way both kernel and libbpf are silently breaking C type
matching rules and general expectations of C language.

Consider these variants:

1.
__weak int kprobe_typedef_ctx_subprog(struct pt_regs *ctx __arg_ctx)
{ PT_REGS_PARM1(ctx); }

2.
__weak int kprobe_typedef_ctx_subprog(void *ctx __arg_ctx)
{ struct pt_regs *regs = ctx; PT_REGS_PARM1(regs); }

3.
__weak int kprobe_typedef_ctx_subprog(bpf_user_pt_regs_t *ctx)
{ PT_REGS_PARM1(ctx); }

In 1 and 3 the caller has to type cast to correct type.
In 2 the caller can pass anything without type cast.

In C when the user writes: void foo(int *p)
it knows that it can access it as pointer to int in the callee
and it's caller's job to pass correct pointer into it.
When caller type casts something else to 'int *' it's caller's fault
if things don't work.
Now when user writes:
void foo(void *p) { int *i = p;

the caller can pass anything into foo() and callee's fault
to assume that 'void *' is 'int *'.
These are the C rules that we're breaking with __arg_ctx.

In 2 it's clear to callee that any ctx argument could have been passed
and type cast to 'struct pt_regs *' it's callee's responsibility.

In 3 the users know that only bpf_user_pt_regs_t will be passed in.

But 1 (the current kernel and libbpf) breaks these C rules.
The C language tells prog writer to expect that only 'struct pt_regs *'
will be passed, but the kernel/libbpf allows any ctx to be passed in.

Hence 1 should be disallowed.

The 'void *' case 2 we extend in the future to truly support polymorphism:

__weak int subprog(void *ctx __arg_ctx)
{
  __u32 ctx_btf_id = bpf_core_typeof(*ctx);

  if (ctx_btf_id == bpf_core_type_id_kernel(struct sk_buff)) {
      struct sk_buff *skb = ctx;
      ..
  } else if (ctx_btf_id == bpf_core_type_id_kernel(struct xdp_buff)) {
      struct xdp_buff *xdp = ctx;

and it will conform to C rules. It's on callee side to do the right
thing with 'void *'.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-05  1:33           ` Alexei Starovoitov
@ 2024-01-05  3:57             ` Andrii Nakryiko
  2024-01-05  5:42               ` Alexei Starovoitov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-05  3:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Jiri Olsa

On Thu, Jan 4, 2024 at 5:34 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jan 4, 2024 at 12:58 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> >
> > My point was that it's hard to accidentally forget to "generalize"
> > type if you were supporting sk_buff, and suddenly started calling it
> > with xdp_md.
> >
> > From my POV, if I'm a user, and I declare an argument as long and
> > annotate it as __arg_ctx, then I know what I'm doing and I'd hate for
> > some smart-ass library to double-guess me dictating what exact
> > incantation I should specify to make it happy.
>
> But that's exactly what's happening!
> The smart-ass libbpf ignores the type in 'struct sk_buff *skb __arg_ctx'
> and replaces it with whatever is appropriate for prog type.

The only thing that libbpf does in this case is it honors __arg_ctx
and makes it work *exactly the same* as __arg_ctx natively works on
the newest kernel. Not more, not less. It doesn't change compilation
or verification rules. At all.

Libbpf is not a compiler. And libbpf is not a verifier. It sees
__arg_ctx, it makes sure this argument is communicated to the verifier
as context. That's all. Again, it has no effect on code generation
*or* verification (compared to a newer kernel with native __arg_ctx
support). If a user has a bug, either the compiler or verifier will
complain (or not), depending on how subtle the bug is.

> More below.
>
> >  static __attribute__ ((noinline))
> > -int f0(int var, struct __sk_buff *skb)
> > +int f0(int var, struct sk_buff *skb)
> >  {
> > -       return skb->len;
> > +       return 0;
> >  }
> >
> >  __attribute__ ((noinline))
> > @@ -20,7 +19,7 @@ int f1(struct __sk_buff *skb)
> >
> >         __sink(buf[MAX_STACK - 1]);
> >
> > -       return f0(0, skb) + skb->len;
> > +       return f0(0, (void*)skb) + skb->len;
>
> This is static f0. Not sure what you're trying to say.

Ok, I brainfarted and converted a static function in the test called
test_global_func2 without even checking if it's global or not, as I
didn't expect that there are any static subprogs at all. My bad. But
the point stands, both sk_buff and __sk_buff are recognized as
PTR_TO_CTX for global subprogs, see below. And that's all I'm trying
to say.

> I don't think btf_get_prog_ctx_type() logic applies here.
>

Did you check the patch I referenced? I'm saying that `struct sk_buff
*ctx` is recognized as a context type by kernel *for global subprog*.
Here we go again, this time on f1(), which is not static:

diff --git a/tools/testing/selftests/bpf/progs/test_global_func2.c
b/tools/testing/selftests/bpf/progs/test_global_func2.c
index 2beab9c3b68a..4a54350f0aa0 100644
--- a/tools/testing/selftests/bpf/progs/test_global_func2.c
+++ b/tools/testing/selftests/bpf/progs/test_global_func2.c
@@ -1,7 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright (c) 2020 Facebook */
-#include <stddef.h>
-#include <linux/bpf.h>
+#include "vmlinux.h"
 #include <bpf/bpf_helpers.h>
 #include "bpf_misc.h"

@@ -14,13 +13,13 @@ int f0(int var, struct __sk_buff *skb)
 }

 __attribute__ ((noinline))
-int f1(struct __sk_buff *skb)
+int f1(struct sk_buff *skb)
 {
        volatile char buf[MAX_STACK] = {};

        __sink(buf[MAX_STACK - 1]);

-       return f0(0, skb) + skb->len;
+       return f0(0, (void *)skb);
 }

 int f3(int, struct __sk_buff *skb, int);
@@ -28,7 +27,7 @@ int f3(int, struct __sk_buff *skb, int);
 __attribute__ ((noinline))
 int f2(int val, struct __sk_buff *skb)
 {
-       return f1(skb) + f3(val, skb, 1);
+       return f1((void *)skb) + f3(val, skb, 1);
 }

 __attribute__ ((noinline))
@@ -45,5 +44,5 @@ SEC("tc")
 __success
 int global_func2(struct __sk_buff *skb)
 {
-       return f0(1, skb) + f1(skb) + f2(2, skb) + f3(3, skb, 4);
+       return f0(1, skb) + f1((void *)skb) + f2(2, skb) + f3(3, skb, 4);
 }


And here's portion of veristat log output to be 100% sure this time:

Validating f1() func#2...
20: R1=ctx() R10=fp0
; int f1(struct sk_buff *skb)

It's a context.

> > I'll say even more, with libbpf's PT_REGS_xxx() macros you don't even
> > need to know about pt_regs vs user_pt_regs difference, as macros
> > properly force-cast arguments, depending on architecture. So in your
> > BPF code you can just pass `struct pt_regs *` around just fine across
> > multiple architectures as long as you only use PT_REGS_xxx() macros
> > and then pass that context to helpers (to get stack trace,
> > bpf_perf_event_output, etc).
>
> Pretty much. For some time the kernel recognized bpf_user_pt_regs_t
> as PTR_TO_CTX for kprobe.
> And the users who needed global prog verification with ctx
> already used that feature.

Not really, see below. For a long time *we thought* that kernel
recognizes bpf_user_pt_regs_t, but in reality it wanted `struct
bpf_user_pt_regs_t` which doesn't even exist in kernel and has nothing
common with either `struct pt_regs` or `struct user_pt_regs`. I fixed
that and now the kernel recognizes *both* typedef and struct
bpf_user_pt_regs_t. And there is no point in using typedef, because
`struct bpf_user_pt_regs_t` is backwards compatible and that's what
users actually use in practice.

> We even have helper macros to typeof to correct btf type.
>
> From selftests:
>
> _weak int kprobe_typedef_ctx_subprog(bpf_user_pt_regs_t *ctx)
> {
>         return bpf_get_stack(ctx, &stack, sizeof(stack), 0);
> }
>
> SEC("?kprobe")
> __success
> int kprobe_typedef_ctx(void *ctx)
> {
>         return kprobe_typedef_ctx_subprog(ctx);
> }
>
> #define pt_regs_struct_t typeof(*(__PT_REGS_CAST((struct pt_regs *)NULL)))
>
> __weak int kprobe_struct_ctx_subprog(pt_regs_struct_t *ctx)
> {
>         return bpf_get_stack((void *)ctx, &stack, sizeof(stack), 0);
> }
>
> SEC("?kprobe")
> __success
> int kprobe_resolved_ctx(void *ctx)
> {
>         return kprobe_struct_ctx_subprog(ctx);
> }
>
> __PT_REGS_CAST is arch dependent and typeof makes it seen with
> correct btf_id and the kernel knows it's PTR_TO_CTX.

TBH, I don't know what btf_id has to do with this, it looks either as
a distraction or subtle point you are making that I'm missing.
__PT_REGS_CAST() just does C language cast, there is no BTF or BTF ID
involved here, so what am I missing?

> All that works. No need for __arg_ctx.
> I'm sure you know this.
> I'm only explaining for everybody else to follow.
>

Ok, though I'm not sure we are actually agreeing that with libbpf's
PT_REGS_xxx() macros it's kind of expected that users will be using
`struct pt_regs *` everywhere (and not user_pt_regs).

And so actual real world code is actually written with explicit
`struct pt_regs *` being passed around. In all static subprogs as
well. And only global subprogs currently force the use of the fake
`struct bpf_user_pt_regs_t` (not typedef, it's so confusing, but I
have to emphasize the big difference, sorry!).

So what I'm saying (and I'm repeating that below) is that it would be
nice to make global subprogs use the same types as static subprogs,
which is just plain `struct pt_regs *ctx __arg_ctx`.

> > No one even knows about bpf_user_pt_regs_t, I had to dig it up from
> > kernel source code and let users know what exact type name to use for
> > global subprog.
>
> Few people know that global subprogs are verified differently than static.
> That's true, but I bet people that knew also used the right type for ctx.
> If you're saying that __arg_ctx is making it easier for users
> to use global subprogs I certainly agree, but it's not
> something that was mandatory for uniform global progs.

What is "uniform global progs"? If you mean those polymorphic global
subprogs, then keep in mind that only one level of global subprogs are
possible without this __arg_ctx approach. It's a big limitation right
now, actually.

> __arg_ctx main value is for polymorphic subprogs.

If you mean this libbpf's type rewriting logic for __arg_ctx, yes, I
agree. If you mean in general, then no, it's not just for polymorphic
subprogs. It's also to allow passing context in program types that
don't support passing context to global subprog (fentry, tracepoint,
etc). But libbpf cannot do anything about the latter case, if kernel
doesn't support __arg_ctx natively.

> An add-on value is ease-of-use for existing non polymorphic subrpogs.
>
> I'm saying that in the above example working code:
>
> __weak int kprobe_typedef_ctx_subprog(bpf_user_pt_regs_t *ctx)
>
> should _not_ be allowed to be replaced with:
>
> __weak int kprobe_typedef_ctx_subprog(struct pt_regs *ctx __arg_ctx)

Why not? This is what I don't get. Here's a real piece of code to
demonstrate what users do in practice:

struct bpf_user_pt_regs_t {}

__hidden int handle_event_user_pt_regs(struct bpf_user_pt_regs_t* ctx) {
  if (pyperf_prog_cfg.sample_interval > 0) {
    if (__sync_fetch_and_add(&total_events_count, 1) %
        pyperf_prog_cfg.sample_interval) {
      return 0;
    }
  }

  return handle_event_helper((struct pt_regs*)ctx, NULL);
}

See that cast to `struct pt_regs *`? It's because all non-global code
is working with struct pt_regs already, and it's fine.

Keep in mind, they can't use bpf_user_pt_regs_t typedef and avoid the
cast, because older kernels didn't recognize typedef, so they use
empty `struct bpf_user_pt_regs_t`, which has to be casted.

I don't want to get into a debate about whether they should convert
all `struct pt_regs *` to `bpf_user_pt_regs_t *`, that's not the
point. Maybe they could, but their code already is written like that
and works. Using struct pt_regs is not broken for them, both on x86-64
and arm64.

I'm saying that I explicitly do want to be able to declare (in general):

int handle_event_user(struct pt_regs *ctx __arg_ctx) { ...}

And this would work both on old and new kernels, with and without
native __arg_ctx support. And it will be very close to static subprogs
in the existing code base.

Why do you want to disallow this artificially?

>
> Unfortunately in the newest kernel/libbpf patches allowed it and
> this way both kernel and libbpf are silently breaking C type
> matching rules and general expectations of C language.

C types are still checked and enforced by the compiler. And only the
compiler. Verifier doesn't use BTF for verification of PTR_TO_CTX
beyond getting type name for global subprog argument. How is libbpf
breaking anything here?

>
> Consider these variants:
>
> 1.
> __weak int kprobe_typedef_ctx_subprog(struct pt_regs *ctx __arg_ctx)
> { PT_REGS_PARM1(ctx); }
>
> 2.
> __weak int kprobe_typedef_ctx_subprog(void *ctx __arg_ctx)
> { struct pt_regs *regs = ctx; PT_REGS_PARM1(regs); }
>
> 3.
> __weak int kprobe_typedef_ctx_subprog(bpf_user_pt_regs_t *ctx)
> { PT_REGS_PARM1(ctx); }
>
> In 1 and 3 the caller has to type cast to correct type.
> In 2 the caller can pass anything without type cast.
>
> In C when the user writes: void foo(int *p)
> it knows that it can access it as pointer to int in the callee
> and it's caller's job to pass correct pointer into it.
> When caller type casts something else to 'int *' it's caller's fault
> if things don't work.
> Now when user writes:
> void foo(void *p) { int *i = p;
>
> the caller can pass anything into foo() and callee's fault
> to assume that 'void *' is 'int *'.
> These are the C rules that we're breaking with __arg_ctx.
>
> In 2 it's clear to callee that any ctx argument could have been passed
> and type cast to 'struct pt_regs *' it's callee's responsibility.
>
> In 3 the users know that only bpf_user_pt_regs_t will be passed in.
>
> But 1 (the current kernel and libbpf) breaks these C rules.
> The C language tells prog writer to expect that only 'struct pt_regs *'
> will be passed, but the kernel/libbpf allows any ctx to be passed in.
>
> Hence 1 should be disallowed.

All the above is already checked and enforced by the compiler. Libbpf
doesn't subvert it in any way. All that libbpf is doing is saying "ah,
user, you want this argument to be treated as PTR_TO_CTX, right? Too
bad host kernel is a bit too old to understand __arg_ctx natively, but
worry you not, I'll just quickly fix up BTF information that *only
kernel* uses *only to check type name* (nothing else!), and it will
look like kernel actually understood __arg_ctx, that's all, happy
BPF'ing!". If a user is misusing types in his code, that will be
caught by the compiler. If user's code is doing something that the BPF
verifier detects as illegal, regardless of types and whatnot, the
verifier will complain.

I don't want libbpf to perform functions of both compiler and verifier
in these narrow and unnecessary cases. Especially that there are
specific situations where the user's code is correct and legal, and
yet libbpf will be complaining because... reasons.

>
> The 'void *' case 2 we extend in the future to truly support polymorphism:
>
> __weak int subprog(void *ctx __arg_ctx)
> {
>   __u32 ctx_btf_id = bpf_core_typeof(*ctx);
>
>   if (ctx_btf_id == bpf_core_type_id_kernel(struct sk_buff)) {
>       struct sk_buff *skb = ctx;
>       ..
>   } else if (ctx_btf_id == bpf_core_type_id_kernel(struct xdp_buff)) {
>       struct xdp_buff *xdp = ctx;
>
> and it will conform to C rules. It's on callee side to do the right
> thing with 'void *'.

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-05  3:57             ` Andrii Nakryiko
@ 2024-01-05  5:42               ` Alexei Starovoitov
  2024-01-08 23:45                 ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2024-01-05  5:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Jiri Olsa

On Thu, Jan 4, 2024 at 7:58 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Jan 4, 2024 at 5:34 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jan 4, 2024 at 12:58 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > >
> > > My point was that it's hard to accidentally forget to "generalize"
> > > type if you were supporting sk_buff, and suddenly started calling it
> > > with xdp_md.
> > >
> > > From my POV, if I'm a user, and I declare an argument as long and
> > > annotate it as __arg_ctx, then I know what I'm doing and I'd hate for
> > > some smart-ass library to double-guess me dictating what exact
> > > incantation I should specify to make it happy.
> >
> > But that's exactly what's happening!
> > The smart-ass libbpf ignores the type in 'struct sk_buff *skb __arg_ctx'
> > and replaces it with whatever is appropriate for prog type.
>
> The only thing that libbpf does in this case is it honors __arg_ctx
> and makes it work *exactly the same* as __arg_ctx natively works on
> the newest kernel. Not more, not less. It doesn't change compilation
> or verification rules. At all.

Here in all previous emails I was talking about both kernel and libbpf.
Both shouldn't be breaking C rules.
Not singling out libbpf.

> Validating f1() func#2...
> 20: R1=ctx() R10=fp0
> ; int f1(struct sk_buff *skb)
>
> It's a context.

Ohh. Looks like I screwed it up back then.
        /* only compare that prog's ctx type name is the same as
         * kernel expects. No need to compare field by field.
         * It's ok for bpf prog to do:
         * struct __sk_buff {};
         * int socket_filter_bpf_prog(struct __sk_buff *skb)
         * { // no fields of skb are ever used }
         */
        if (strcmp(ctx_tname, "__sk_buff") == 0 && strcmp(tname,
"sk_buff") == 0)
                return ctx_type;

See comment. The intent was to allow __sk_buff in prog to
match with __sk_buff in the kernel.
Brainfart.

> Not really, see below. For a long time *we thought* that kernel
> recognizes bpf_user_pt_regs_t, but in reality it wanted `struct
> bpf_user_pt_regs_t` which doesn't even exist in kernel and has nothing
> common with either `struct pt_regs` or `struct user_pt_regs`. I fixed
> that and now the kernel recognizes *both* typedef and struct
> bpf_user_pt_regs_t. And there is no point in using typedef, because
> `struct bpf_user_pt_regs_t` is backwards compatible and that's what
> users actually use in practice.

Hmm.
The test with
__weak int kprobe_typedef_ctx_subprog(bpf_user_pt_regs_t *ctx)

was added back in Feb 2023.
So it was surely working for the last year.

> > __PT_REGS_CAST is arch dependent and typeof makes it seen with
> > correct btf_id and the kernel knows it's PTR_TO_CTX.
>
> TBH, I don't know what btf_id has to do with this, it looks either as
> a distraction or subtle point you are making that I'm missing.
> __PT_REGS_CAST() just does C language cast, there is no BTF or BTF ID
> involved here, so what am I missing?

That was your patch :)
I'm just pointing out the neat trick with typeof to put
the correct type in there,
so it's later seen with proper btf_id and recognized as ctx.
You added it a year ago.

>
> Why not? This is what I don't get. Here's a real piece of code to
> demonstrate what users do in practice:
>
> struct bpf_user_pt_regs_t {}
>
> __hidden int handle_event_user_pt_regs(struct bpf_user_pt_regs_t* ctx) {
>   if (pyperf_prog_cfg.sample_interval > 0) {
>     if (__sync_fetch_and_add(&total_events_count, 1) %
>         pyperf_prog_cfg.sample_interval) {
>       return 0;
>     }
>   }
>
>   return handle_event_helper((struct pt_regs*)ctx, NULL);
> }

I think you're talking about kernel prior to that commit a year ago
that made it possible to drop 'struct'.

> I'm saying that I explicitly do want to be able to declare (in general):>
> int handle_event_user(struct pt_regs *ctx __arg_ctx) { ...}
>
> And this would work both on old and new kernels, with and without
> native __arg_ctx support. And it will be very close to static subprogs
> in the existing code base.
>
> Why do you want to disallow this artificially?

Not artificially, but only when pt_regs in bpf prog doesn't match
what kernel is passing.
I think allowing only:
  handle_event_user(void *ctx __arg_ctx)
and prog will cast it to pt_regs immediately is less surprising
and proper C code,
but
  handle_event_user(struct pt_regs *ctx __arg_ctx)
is also ok when pt_regs is indeed what is being passed.
Which will be the case for x86.
And will be fine on arm64 too, because
arch/arm64/include/asm/ptrace.h
struct pt_regs {
        union {
                struct user_pt_regs user_regs;

but if arm64 ever changes that layout we should start failing to load.

> All the above is already checked and enforced by the compiler. Libbpf
> doesn't subvert it in any way. All that libbpf is doing is saying "ah,
> user, you want this argument to be treated as PTR_TO_CTX, right? Too
> bad host kernel is a bit too old to understand __arg_ctx natively, but
> worry you not, I'll just quickly fix up BTF information that *only
> kernel* uses *only to check type name* (nothing else!), and it will
> look like kernel actually understood __arg_ctx, that's all, happy
> BPF'ing!".

and this way libbpf _may_ introduce a hard to debug bug.
The same mistake the new kernel _may_ do with __arg_ctx with old libbpf.
Both will do a hidden typecast when the bpf prog is
potentially written with different type.

foo(struct pt_regs *ctx __arg_ctx)
Quick git grep shows that it will probably work on all archs
except 'arc' where
arch/arc/include/uapi/asm/bpf_perf_event.h:typedef struct
user_regs_struct bpf_user_pt_regs_t;
and struct pt_regs seems to have a different layout than user_regs_struct.

But when kernel allows sk_buff to be passed into
foo(struct pt_regs *ctx __arg_ctx)
is just broken.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-05  5:42               ` Alexei Starovoitov
@ 2024-01-08 23:45                 ` Andrii Nakryiko
  2024-01-09  1:49                   ` Alexei Starovoitov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-08 23:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Jiri Olsa

On Thu, Jan 4, 2024 at 9:42 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jan 4, 2024 at 7:58 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Jan 4, 2024 at 5:34 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Jan 4, 2024 at 12:58 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > >
> > > > My point was that it's hard to accidentally forget to "generalize"
> > > > type if you were supporting sk_buff, and suddenly started calling it
> > > > with xdp_md.
> > > >
> > > > From my POV, if I'm a user, and I declare an argument as long and
> > > > annotate it as __arg_ctx, then I know what I'm doing and I'd hate for
> > > > some smart-ass library to double-guess me dictating what exact
> > > > incantation I should specify to make it happy.
> > >
> > > But that's exactly what's happening!
> > > The smart-ass libbpf ignores the type in 'struct sk_buff *skb __arg_ctx'
> > > and replaces it with whatever is appropriate for prog type.
> >
> > The only thing that libbpf does in this case is it honors __arg_ctx
> > and makes it work *exactly the same* as __arg_ctx natively works on
> > the newest kernel. Not more, not less. It doesn't change compilation
> > or verification rules. At all.
>
> Here in all previous emails I was talking about both kernel and libbpf.
> Both shouldn't be breaking C rules.
> Not singling out libbpf.
>

Ok, I kept doubting which side (or both) we were talking about.

BTW, just as an aside (and I just realized it), identical checks
performed on global subprog by libbpf and kernel are not really
identical (anymore), because of the kernel's lazy global subprog
validation. Libbpf unfortunately doesn't know which global functions
might be dead code, so we need to be careful about being too eager.


> > Validating f1() func#2...
> > 20: R1=ctx() R10=fp0
> > ; int f1(struct sk_buff *skb)
> >
> > It's a context.
>
> Ohh. Looks like I screwed it up back then.
>         /* only compare that prog's ctx type name is the same as
>          * kernel expects. No need to compare field by field.
>          * It's ok for bpf prog to do:
>          * struct __sk_buff {};
>          * int socket_filter_bpf_prog(struct __sk_buff *skb)
>          * { // no fields of skb are ever used }
>          */
>         if (strcmp(ctx_tname, "__sk_buff") == 0 && strcmp(tname,
> "sk_buff") == 0)
>                 return ctx_type;
>
> See comment. The intent was to allow __sk_buff in prog to
> match with __sk_buff in the kernel.
> Brainfart.

Ok, then at least we shouldn't allow sk_buff in new use cases (like
__arg_ctx tag).

>
> > Not really, see below. For a long time *we thought* that kernel
> > recognizes bpf_user_pt_regs_t, but in reality it wanted `struct
> > bpf_user_pt_regs_t` which doesn't even exist in kernel and has nothing
> > common with either `struct pt_regs` or `struct user_pt_regs`. I fixed
> > that and now the kernel recognizes *both* typedef and struct
> > bpf_user_pt_regs_t. And there is no point in using typedef, because
> > `struct bpf_user_pt_regs_t` is backwards compatible and that's what
> > users actually use in practice.
>
> Hmm.
> The test with
> __weak int kprobe_typedef_ctx_subprog(bpf_user_pt_regs_t *ctx)
>
> was added back in Feb 2023.
> So it was surely working for the last year.

right, but I meant even earlier kernels that did support `struct
bpf_user_pt_regs_t *`, but didn't support `bpf_user_pt_regs_t *`.

>
> > > __PT_REGS_CAST is arch dependent and typeof makes it seen with
> > > correct btf_id and the kernel knows it's PTR_TO_CTX.
> >
> > TBH, I don't know what btf_id has to do with this, it looks either as
> > a distraction or subtle point you are making that I'm missing.
> > __PT_REGS_CAST() just does C language cast, there is no BTF or BTF ID
> > involved here, so what am I missing?
>
> That was your patch :)
> I'm just pointing out the neat trick with typeof to put
> the correct type in there,
> so it's later seen with proper btf_id and recognized as ctx.
> You added it a year ago.

ah, ok. TBH, I don't even know (now) why it works, must be some other
quirk in kernel logic? But either way, this might have been
appropriate for selftest, but I wouldn't recommend users to do this,
it relies on "internal macro" __PT_REGS_CAST, so could technically
break (but also is just pure magic). Anyways, see below.

>
> >
> > Why not? This is what I don't get. Here's a real piece of code to
> > demonstrate what users do in practice:
> >
> > struct bpf_user_pt_regs_t {}
> >
> > __hidden int handle_event_user_pt_regs(struct bpf_user_pt_regs_t* ctx) {
> >   if (pyperf_prog_cfg.sample_interval > 0) {
> >     if (__sync_fetch_and_add(&total_events_count, 1) %
> >         pyperf_prog_cfg.sample_interval) {
> >       return 0;
> >     }
> >   }
> >
> >   return handle_event_helper((struct pt_regs*)ctx, NULL);
> > }
>
> I think you're talking about kernel prior to that commit a year ago
> that made it possible to drop 'struct'.

yes, exactly. Global funcs were supported way earlier than my fix.

>
> > I'm saying that I explicitly do want to be able to declare (in general):>
> > int handle_event_user(struct pt_regs *ctx __arg_ctx) { ...}
> >
> > And this would work both on old and new kernels, with and without
> > native __arg_ctx support. And it will be very close to static subprogs
> > in the existing code base.
> >
> > Why do you want to disallow this artificially?
>
> Not artificially, but only when pt_regs in bpf prog doesn't match
> what kernel is passing.
> I think allowing only:
>   handle_event_user(void *ctx __arg_ctx)
> and prog will cast it to pt_regs immediately is less surprising
> and proper C code,
> but
>   handle_event_user(struct pt_regs *ctx __arg_ctx)
> is also ok when pt_regs is indeed what is being passed.
> Which will be the case for x86.
> And will be fine on arm64 too, because
> arch/arm64/include/asm/ptrace.h
> struct pt_regs {
>         union {
>                 struct user_pt_regs user_regs;
>
> but if arm64 ever changes that layout we should start failing to load.

Ok, I'm glad you agreed to allow `struct pt_regs *`. I also will say
that (as it stands right now) passing `struct pt_regs *` is valid on
all architectures, because that's what kernel passes around internally
as context for uprobe, kprobe, and kprobe-multi. See uprobe_prog_run,
kprobe_multi_link_prog_run, and perf_trace_run_bpf_submit, we always
pass real `struct pt_regs *`.

So, I'll add kprobe/multi-kprobe special handling to allows `struct
pt_regs *` then, ok?

>
> > All the above is already checked and enforced by the compiler. Libbpf
> > doesn't subvert it in any way. All that libbpf is doing is saying "ah,
> > user, you want this argument to be treated as PTR_TO_CTX, right? Too
> > bad host kernel is a bit too old to understand __arg_ctx natively, but
> > worry you not, I'll just quickly fix up BTF information that *only
> > kernel* uses *only to check type name* (nothing else!), and it will
> > look like kernel actually understood __arg_ctx, that's all, happy
> > BPF'ing!".
>
> and this way libbpf _may_ introduce a hard to debug bug.
> The same mistake the new kernel _may_ do with __arg_ctx with old libbpf.
> Both will do a hidden typecast when the bpf prog is
> potentially written with different type.
>
> foo(struct pt_regs *ctx __arg_ctx)
> Quick git grep shows that it will probably work on all archs
> except 'arc' where
> arch/arc/include/uapi/asm/bpf_perf_event.h:typedef struct
> user_regs_struct bpf_user_pt_regs_t;
> and struct pt_regs seems to have a different layout than user_regs_struct.
>
> But when kernel allows sk_buff to be passed into
> foo(struct pt_regs *ctx __arg_ctx)
> is just broken.

Yes, of course, sk_buff instead of pt_regs is definitely broken. But
that will be detected even by the compiler.

Anyways, I can add special casing for pt_regs and start enforcing
types. A bit hesitant to do that on libbpf side, still, due to that
eager global func behavior, which deviates from kernel, but if you
insist I'll do it.

(Eduard, I'll add feature detection for the need to rewrite BTF at the
same time, just FYI)

Keep in mind, though, for non-global subprogs kernel doesn't enforce
any types, so you can really pass sk_buff into pt_regs argument, if
you really want to, but kernel will happily still assume PTR_TO_CTX
(and I'm sure you know this as well, so this is mostly for others and
for completeness).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-08 23:45                 ` Andrii Nakryiko
@ 2024-01-09  1:49                   ` Alexei Starovoitov
  2024-01-09 17:17                     ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2024-01-09  1:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Jiri Olsa

On Mon, Jan 8, 2024 at 3:45 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> >
> > Not artificially, but only when pt_regs in bpf prog doesn't match
> > what kernel is passing.
> > I think allowing only:
> >   handle_event_user(void *ctx __arg_ctx)
> > and prog will cast it to pt_regs immediately is less surprising
> > and proper C code,
> > but
> >   handle_event_user(struct pt_regs *ctx __arg_ctx)
> > is also ok when pt_regs is indeed what is being passed.
> > Which will be the case for x86.
> > And will be fine on arm64 too, because
> > arch/arm64/include/asm/ptrace.h
> > struct pt_regs {
> >         union {
> >                 struct user_pt_regs user_regs;
> >
> > but if arm64 ever changes that layout we should start failing to load.
>
> Ok, I'm glad you agreed to allow `struct pt_regs *`. I also will say
> that (as it stands right now) passing `struct pt_regs *` is valid on
> all architectures, because that's what kernel passes around internally
> as context for uprobe, kprobe, and kprobe-multi. See uprobe_prog_run,
> kprobe_multi_link_prog_run, and perf_trace_run_bpf_submit, we always
> pass real `struct pt_regs *`.

Right, but for perf event progs it's actually bpf_user_pt_regs_t:
ctx.regs = perf_arch_bpf_user_pt_regs(regs);
bpf_prog_run(prog, &ctx);
yet all such progs are written assuming struct pt_regs
which is not correct.
It's a bit of a mess while strict type checking should make it better.

BPF is a strictly typed assembly language and the verifier
should not be violating its own promises of type checking when
it sees arg_ctx.

The reason I was proposing to restrict both kernel and libbpf
to 'void *ctx __arg_ctx' is because it's trivial to implement
in both.
To allow 'struct correct_type *ctx __arg_ctx' generically is much more
work.

> So, I'll add kprobe/multi-kprobe special handling to allows `struct
> pt_regs *` then, ok?

If you mean to allow 'void *ctx __arg_ctx' in kernel and libbpf and
in addition allow 'struct pt_reg *ctx __arg_ctx' for kprobe and other
prog types where that's what is being passed then yes.
Totally fine with me.
These two are easy to enforce in kernel and libbpf.

> Yes, of course, sk_buff instead of pt_regs is definitely broken. But
> that will be detected even by the compiler.

Right. C can do casts, but in bpf asm the verifier promises strict type
checking and it goes further and makes safety decisions based on types.

> Anyways, I can add special casing for pt_regs and start enforcing
> types. A bit hesitant to do that on libbpf side, still, due to that
> eager global func behavior, which deviates from kernel, but if you
> insist I'll do it.

I don't understand this part.
Both kernel and libbpf can check
if (btf_type_id(ctx) == 'struct pt_regs'
  && prog_type == kprobe) allow such __arg_ctx.

>
> (Eduard, I'll add feature detection for the need to rewrite BTF at the
> same time, just FYI)
>
> Keep in mind, though, for non-global subprogs kernel doesn't enforce
> any types, so you can really pass sk_buff into pt_regs argument, if
> you really want to, but kernel will happily still assume PTR_TO_CTX
> (and I'm sure you know this as well, so this is mostly for others and
> for completeness).

static functions are very different. They're typeless and will stay
typeless for some time.
Compiler can do whatever it wants with them.
Like in katran case the static function of 6 arguments is optimized
into 5 args. The types are unknown. The compiler can specialize
args with constant, partially inline, etc.
Even if it kept types of args after heavy transformations
the verifier cannot rely on that for safety or enforce strict types (yet).
static foo() is like static inline foo().
kprobe-ing into static func is questionable.
Only if case of global __weak the types are dependable and that's why
the verifier treats them differently.
Hopefully the -Overifiable llvm/gcc proposal will keep moving.
Then, one day, we can potentially disable some of the transformations
on static functions that makes types useless. Then the verifier
will be able to verify them just as globals and enforce strict types.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-09  1:49                   ` Alexei Starovoitov
@ 2024-01-09 17:17                     ` Andrii Nakryiko
  2024-01-10  1:58                       ` Alexei Starovoitov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-09 17:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Jiri Olsa

On Mon, Jan 8, 2024 at 5:49 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jan 8, 2024 at 3:45 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > >
> > > Not artificially, but only when pt_regs in bpf prog doesn't match
> > > what kernel is passing.
> > > I think allowing only:
> > >   handle_event_user(void *ctx __arg_ctx)
> > > and prog will cast it to pt_regs immediately is less surprising
> > > and proper C code,
> > > but
> > >   handle_event_user(struct pt_regs *ctx __arg_ctx)
> > > is also ok when pt_regs is indeed what is being passed.
> > > Which will be the case for x86.
> > > And will be fine on arm64 too, because
> > > arch/arm64/include/asm/ptrace.h
> > > struct pt_regs {
> > >         union {
> > >                 struct user_pt_regs user_regs;
> > >
> > > but if arm64 ever changes that layout we should start failing to load.
> >
> > Ok, I'm glad you agreed to allow `struct pt_regs *`. I also will say
> > that (as it stands right now) passing `struct pt_regs *` is valid on
> > all architectures, because that's what kernel passes around internally
> > as context for uprobe, kprobe, and kprobe-multi. See uprobe_prog_run,
> > kprobe_multi_link_prog_run, and perf_trace_run_bpf_submit, we always
> > pass real `struct pt_regs *`.
>
> Right, but for perf event progs it's actually bpf_user_pt_regs_t:
> ctx.regs = perf_arch_bpf_user_pt_regs(regs);
> bpf_prog_run(prog, &ctx);
> yet all such progs are written assuming struct pt_regs
> which is not correct.

Yes, SEC("perf_event") programs have very different context
(bpf_perf_event_data, where *pointer to pt_regs* is the first field,
so it's not compatible even memory layout-wise). So I'm not going to
allow struct pt_regs there.

> It's a bit of a mess while strict type checking should make it better.
>
> BPF is a strictly typed assembly language and the verifier
> should not be violating its own promises of type checking when
> it sees arg_ctx.
>
> The reason I was proposing to restrict both kernel and libbpf
> to 'void *ctx __arg_ctx' is because it's trivial to implement
> in both.
> To allow 'struct correct_type *ctx __arg_ctx' generically is much more
> work.

Yes, it's definitely more complicated (but kernel has full BTF info,
so maybe not too bad, I need to try). I'll give it a try, if it's too
bad, we can discuss a fallback plan. But we should try at least,
forcing users to do unnecessary void * casts to u64[] or tracepoint
struct is suboptimal from usability POV.

>
> > So, I'll add kprobe/multi-kprobe special handling to allows `struct
> > pt_regs *` then, ok?
>
> If you mean to allow 'void *ctx __arg_ctx' in kernel and libbpf and
> in addition allow 'struct pt_reg *ctx __arg_ctx' for kprobe and other
> prog types where that's what is being passed then yes.
> Totally fine with me.
> These two are easy to enforce in kernel and libbpf.

Ok, great.

>
> > Yes, of course, sk_buff instead of pt_regs is definitely broken. But
> > that will be detected even by the compiler.
>
> Right. C can do casts, but in bpf asm the verifier promises strict type
> checking and it goes further and makes safety decisions based on types.
>

It feels like you are thinking about PTR_TO_BTF_ID only and
extrapolating that behavior to everything else. You know that it's not
like that in general.

> > Anyways, I can add special casing for pt_regs and start enforcing
> > types. A bit hesitant to do that on libbpf side, still, due to that
> > eager global func behavior, which deviates from kernel, but if you
> > insist I'll do it.
>
> I don't understand this part.
> Both kernel and libbpf can check
> if (btf_type_id(ctx) == 'struct pt_regs'
>   && prog_type == kprobe) allow such __arg_ctx.
>

I was thinking about the case where we have __arg_ctx and type doesn't
match expectation. Should libbpf error out? Or emit a warning and
proceed without adjusting types? If we do the warning and let verifier
reject invalid program, I think it will be better and then these
concerns of mine about lazy vs eager global subprog verification
behavior are irrelevant.

> >
> > (Eduard, I'll add feature detection for the need to rewrite BTF at the
> > same time, just FYI)
> >
> > Keep in mind, though, for non-global subprogs kernel doesn't enforce
> > any types, so you can really pass sk_buff into pt_regs argument, if
> > you really want to, but kernel will happily still assume PTR_TO_CTX
> > (and I'm sure you know this as well, so this is mostly for others and
> > for completeness).
>
> static functions are very different. They're typeless and will stay
> typeless for some time.

Right. And they are different in how we verify and so on. But from the
user's perspective they are still just functions and (in my mind at
least), the less difference between static and global subprogs there
are, the better.

But anyways, I'll do what we agreed above. I'll proceed with libbpf
emitting warning and not doing anything for __arg_ctx, unless you feel
strongly libbpf should error out.

> Compiler can do whatever it wants with them.
> Like in katran case the static function of 6 arguments is optimized
> into 5 args. The types are unknown. The compiler can specialize
> args with constant, partially inline, etc.
> Even if it kept types of args after heavy transformations
> the verifier cannot rely on that for safety or enforce strict types (yet).
> static foo() is like static inline foo().
> kprobe-ing into static func is questionable.
> Only if case of global __weak the types are dependable and that's why
> the verifier treats them differently.
> Hopefully the -Overifiable llvm/gcc proposal will keep moving.
> Then, one day, we can potentially disable some of the transformations
> on static functions that makes types useless. Then the verifier
> will be able to verify them just as globals and enforce strict types.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-09 17:17                     ` Andrii Nakryiko
@ 2024-01-10  1:58                       ` Alexei Starovoitov
  2024-01-10 19:02                         ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2024-01-10  1:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Jiri Olsa

On Tue, Jan 9, 2024 at 9:17 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jan 8, 2024 at 5:49 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Jan 8, 2024 at 3:45 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > >
> > > > Not artificially, but only when pt_regs in bpf prog doesn't match
> > > > what kernel is passing.
> > > > I think allowing only:
> > > >   handle_event_user(void *ctx __arg_ctx)
> > > > and prog will cast it to pt_regs immediately is less surprising
> > > > and proper C code,
> > > > but
> > > >   handle_event_user(struct pt_regs *ctx __arg_ctx)
> > > > is also ok when pt_regs is indeed what is being passed.
> > > > Which will be the case for x86.
> > > > And will be fine on arm64 too, because
> > > > arch/arm64/include/asm/ptrace.h
> > > > struct pt_regs {
> > > >         union {
> > > >                 struct user_pt_regs user_regs;
> > > >
> > > > but if arm64 ever changes that layout we should start failing to load.
> > >
> > > Ok, I'm glad you agreed to allow `struct pt_regs *`. I also will say
> > > that (as it stands right now) passing `struct pt_regs *` is valid on
> > > all architectures, because that's what kernel passes around internally
> > > as context for uprobe, kprobe, and kprobe-multi. See uprobe_prog_run,
> > > kprobe_multi_link_prog_run, and perf_trace_run_bpf_submit, we always
> > > pass real `struct pt_regs *`.
> >
> > Right, but for perf event progs it's actually bpf_user_pt_regs_t:
> > ctx.regs = perf_arch_bpf_user_pt_regs(regs);
> > bpf_prog_run(prog, &ctx);
> > yet all such progs are written assuming struct pt_regs
> > which is not correct.
>
> Yes, SEC("perf_event") programs have very different context
> (bpf_perf_event_data, where *pointer to pt_regs* is the first field,
> so it's not compatible even memory layout-wise). So I'm not going to
> allow struct pt_regs there.

Not quite. I don't think we're on the same page.

Technically SEC("perf_event") bpf_prog should see a pointer to:
struct bpf_perf_event_data {
        bpf_user_pt_regs_t regs;
        __u64 sample_period;
        __u64 addr;
};

but a lot of them are written as:
SEC("perf_event")
int handle_pe(struct pt_regs *ctx)

and it's working, because of magic (or ugliness, it depends on pov) that
we do in pe_prog_convert_ctx_access() (that inserts extra load).

The part where I'm saying:
"written assuming struct pt_regs which is not correct".

The incorrect part is that the prog have access only to bpf_user_pt_regs_t
and the prog should be written as:
SEC("perf_event")
int pe_prog(bpf_user_pt_regs_t *ctx)
or as
SEC("perf_event")
int pe_prog(struct bpf_perf_event_data *ctx)

but in generic case not as:
SEC("perf_event")
int pe_prog(struct pt_regs *ctx)

because that is valid only on archs where bpf_user_pt_regs_t == pt_regs.

> > It's a bit of a mess while strict type checking should make it better.
> >
> > BPF is a strictly typed assembly language and the verifier
> > should not be violating its own promises of type checking when
> > it sees arg_ctx.
> >
> > The reason I was proposing to restrict both kernel and libbpf
> > to 'void *ctx __arg_ctx' is because it's trivial to implement
> > in both.
> > To allow 'struct correct_type *ctx __arg_ctx' generically is much more
> > work.
>
> Yes, it's definitely more complicated (but kernel has full BTF info,
> so maybe not too bad, I need to try). I'll give it a try, if it's too
> bad, we can discuss a fallback plan.

right. the kernel has btf and everything, but as seen in perf_event example
above it's not trivial to do the type match... even in the kernel.
Matching the accurate type in libbpf is imo too much complexity.

> But we should try at least,
> forcing users to do unnecessary void * casts to u64[] or tracepoint
> struct is suboptimal from usability POV.

That part of usability concerns I still don't understand.
Where do you see "suboptimal usability" in the code:

SEC("perf_event")
int pe_prog(void *ctx __arg_ctx)
{
  struct pt_regs *regs = ctx;

?
It's a pretty clear interface and the program author knows exactly
what it's doing. It's an explicit cast because the user wrote it.
Clear, unambiguous and no surprises.

Also we shouldn't forget interaction with CO-RE (preserve_access_index)
and upcoming preserve_context_offset.

When people do #include <vmlinux.h> they get unnecessary CO-RE-ed
'struct pt_regs *' and hidden type conversion by libbpf/kernel
is another level of debug complexity.

libbpf doesn't even print what it did in bpf_program_fixup_func_info()
no matter the verbosity flag.

> >
> > > So, I'll add kprobe/multi-kprobe special handling to allows `struct
> > > pt_regs *` then, ok?
> >
> > If you mean to allow 'void *ctx __arg_ctx' in kernel and libbpf and
> > in addition allow 'struct pt_reg *ctx __arg_ctx' for kprobe and other
> > prog types where that's what is being passed then yes.
> > Totally fine with me.
> > These two are easy to enforce in kernel and libbpf.
>
> Ok, great.
>
> >
> > > Yes, of course, sk_buff instead of pt_regs is definitely broken. But
> > > that will be detected even by the compiler.
> >
> > Right. C can do casts, but in bpf asm the verifier promises strict type
> > checking and it goes further and makes safety decisions based on types.
> >
>
> It feels like you are thinking about PTR_TO_BTF_ID only and
> extrapolating that behavior to everything else. You know that it's not
> like that in general.

imo trusted ptr_to_btf_id and ptr_to_ctx are equivalent.
They're fully trusted from the verifier pov and fully correct
from bpf prog pov. So neither should have hidden type casts.

Of course, I remember bpf_rdonly_cast. But this one is untrusted.
It's equivalent to C type cast.

> > > Anyways, I can add special casing for pt_regs and start enforcing
> > > types. A bit hesitant to do that on libbpf side, still, due to that
> > > eager global func behavior, which deviates from kernel, but if you
> > > insist I'll do it.
> >
> > I don't understand this part.
> > Both kernel and libbpf can check
> > if (btf_type_id(ctx) == 'struct pt_regs'
> >   && prog_type == kprobe) allow such __arg_ctx.
> >
>
> I was thinking about the case where we have __arg_ctx and type doesn't
> match expectation. Should libbpf error out? Or emit a warning and
> proceed without adjusting types? If we do the warning and let verifier
> reject invalid program, I think it will be better and then these
> concerns of mine about lazy vs eager global subprog verification
> behavior are irrelevant.

I think warn in libbpf is fine.
Old kernel will likely fail verification since types won't match
and libbpf warn will serve its purpose,
but for the new kernel both libbpf and kernel will print
two similar looking warns?

>
> > >
> > > (Eduard, I'll add feature detection for the need to rewrite BTF at the
> > > same time, just FYI)
> > >
> > > Keep in mind, though, for non-global subprogs kernel doesn't enforce
> > > any types, so you can really pass sk_buff into pt_regs argument, if
> > > you really want to, but kernel will happily still assume PTR_TO_CTX
> > > (and I'm sure you know this as well, so this is mostly for others and
> > > for completeness).
> >
> > static functions are very different. They're typeless and will stay
> > typeless for some time.
>
> Right. And they are different in how we verify and so on. But from the
> user's perspective they are still just functions and (in my mind at
> least), the less difference between static and global subprogs there
> are, the better.
>
> But anyways, I'll do what we agreed above. I'll proceed with libbpf
> emitting warning and not doing anything for __arg_ctx, unless you feel
> strongly libbpf should error out.

warn is fine. thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic
  2024-01-10  1:58                       ` Alexei Starovoitov
@ 2024-01-10 19:02                         ` Andrii Nakryiko
  0 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2024-01-10 19:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Jiri Olsa

On Tue, Jan 9, 2024 at 5:58 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jan 9, 2024 at 9:17 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jan 8, 2024 at 5:49 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Jan 8, 2024 at 3:45 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > >
> > > > > Not artificially, but only when pt_regs in bpf prog doesn't match
> > > > > what kernel is passing.
> > > > > I think allowing only:
> > > > >   handle_event_user(void *ctx __arg_ctx)
> > > > > and prog will cast it to pt_regs immediately is less surprising
> > > > > and proper C code,
> > > > > but
> > > > >   handle_event_user(struct pt_regs *ctx __arg_ctx)
> > > > > is also ok when pt_regs is indeed what is being passed.
> > > > > Which will be the case for x86.
> > > > > And will be fine on arm64 too, because
> > > > > arch/arm64/include/asm/ptrace.h
> > > > > struct pt_regs {
> > > > >         union {
> > > > >                 struct user_pt_regs user_regs;
> > > > >
> > > > > but if arm64 ever changes that layout we should start failing to load.
> > > >
> > > > Ok, I'm glad you agreed to allow `struct pt_regs *`. I also will say
> > > > that (as it stands right now) passing `struct pt_regs *` is valid on
> > > > all architectures, because that's what kernel passes around internally
> > > > as context for uprobe, kprobe, and kprobe-multi. See uprobe_prog_run,
> > > > kprobe_multi_link_prog_run, and perf_trace_run_bpf_submit, we always
> > > > pass real `struct pt_regs *`.
> > >
> > > Right, but for perf event progs it's actually bpf_user_pt_regs_t:
> > > ctx.regs = perf_arch_bpf_user_pt_regs(regs);
> > > bpf_prog_run(prog, &ctx);
> > > yet all such progs are written assuming struct pt_regs
> > > which is not correct.
> >
> > Yes, SEC("perf_event") programs have very different context
> > (bpf_perf_event_data, where *pointer to pt_regs* is the first field,
> > so it's not compatible even memory layout-wise). So I'm not going to
> > allow struct pt_regs there.
>
> Not quite. I don't think we're on the same page.
>
> Technically SEC("perf_event") bpf_prog should see a pointer to:
> struct bpf_perf_event_data {
>         bpf_user_pt_regs_t regs;
>         __u64 sample_period;
>         __u64 addr;
> };
>
> but a lot of them are written as:
> SEC("perf_event")
> int handle_pe(struct pt_regs *ctx)
>
> and it's working, because of magic (or ugliness, it depends on pov) that
> we do in pe_prog_convert_ctx_access() (that inserts extra load).

Ah, I didn't know about this part. I checked bpf_perf_event_data_kern
and saw a pointer for regs, but didn't realize it's remapped as an
embedded struct for perf_event programs by verifier.

>
> The part where I'm saying:
> "written assuming struct pt_regs which is not correct".
>
> The incorrect part is that the prog have access only to bpf_user_pt_regs_t
> and the prog should be written as:
> SEC("perf_event")
> int pe_prog(bpf_user_pt_regs_t *ctx)
> or as
> SEC("perf_event")
> int pe_prog(struct bpf_perf_event_data *ctx)
>
> but in generic case not as:
> SEC("perf_event")
> int pe_prog(struct pt_regs *ctx)
>
> because that is valid only on archs where bpf_user_pt_regs_t == pt_regs.

I guess for perf_event I can just add `struct pt_regs * __arg_ctx`
support if (typeof(struct pt_regs) == typeof(bpf_user_pt_regs_t)), or
something along those lines. I'll need to check if that works
correctly.

>
> > > It's a bit of a mess while strict type checking should make it better.
> > >
> > > BPF is a strictly typed assembly language and the verifier
> > > should not be violating its own promises of type checking when
> > > it sees arg_ctx.
> > >
> > > The reason I was proposing to restrict both kernel and libbpf
> > > to 'void *ctx __arg_ctx' is because it's trivial to implement
> > > in both.
> > > To allow 'struct correct_type *ctx __arg_ctx' generically is much more
> > > work.
> >
> > Yes, it's definitely more complicated (but kernel has full BTF info,
> > so maybe not too bad, I need to try). I'll give it a try, if it's too
> > bad, we can discuss a fallback plan.
>
> right. the kernel has btf and everything, but as seen in perf_event example
> above it's not trivial to do the type match... even in the kernel.
> Matching the accurate type in libbpf is imo too much complexity.
>

Heh, we agree on this. But we disagree on solutions :) But my proposed
solution was to rely on users and compilers, just like we do with
static subprogs :) But you want to restrict it to `void *`. Alright.

> > But we should try at least,
> > forcing users to do unnecessary void * casts to u64[] or tracepoint
> > struct is suboptimal from usability POV.
>
> That part of usability concerns I still don't understand.
> Where do you see "suboptimal usability" in the code:
>
> SEC("perf_event")
> int pe_prog(void *ctx __arg_ctx)
> {
>   struct pt_regs *regs = ctx;
>
> ?
> It's a pretty clear interface and the program author knows exactly
> what it's doing. It's an explicit cast because the user wrote it.
> Clear, unambiguous and no surprises.

I'm not saying it's disastrous. But my point is that, say, for
tracepoint, it's natural to want to do this, and that's what users
might do first:

int global_subprog(struct syscall_trace_enter* ctx __arg_ctx) {
    return ctx->args[0];
}

SEC("tracepoint/syscalls/sys_enter_kill")
int tracepoint__syscalls__sys_enter_kill(struct syscall_trace_enter* ctx) {
   ...
   global_subprog(ctx);
   ...
}

And it will be rejected. User will have WTF moment, maybe ask online
or maybe will figure out by themselves that he needs to rewrite
global_subprog as:

int global_subprog(void *ctx __arg_ctx) {
    struct syscall_trace_enter* ctx1 = ctx;

    return ctx1->args[0];
}

It works and it's easy to work around. But it's a stumbling point for
global subprog adoption/conversion, I think. While on the other hand
users having to debug issues due to using `struct pt_regs` instead of
`bpf_user_pt_regs_t` is quite theoretical, tbh (because of
PT_REGS_xxx() macros doing the right thing).

>
> Also we shouldn't forget interaction with CO-RE (preserve_access_index)
> and upcoming preserve_context_offset.
>
> When people do #include <vmlinux.h> they get unnecessary CO-RE-ed
> 'struct pt_regs *' and hidden type conversion by libbpf/kernel
> is another level of debug complexity.

This is where I don't get why you keep saying this :) What libbpf is
doing is just rewriting FUNC->FUNC_PROTO declaration just before
loading BTF into kernel. It's a sleight of hand just to make kernel
recognize argument (declaratively) as PTR_TO_CTX.

All the CO-RE relocations, preserve_context_offset, etc, all that is
done *before all this** and are **absolutely irrelevant** to this
whole discussion. Compiler will generate offsets based on original C
types that user specified. Libbpf will do CO-RE relocations based on
original BTF information (with user's original types).

So all the stuff libbpf is doing here for __arg_ctx has no bearing on the above.

>
> libbpf doesn't even print what it did in bpf_program_fixup_func_info()
> no matter the verbosity flag.
>
> > >
> > > > So, I'll add kprobe/multi-kprobe special handling to allows `struct
> > > > pt_regs *` then, ok?
> > >
> > > If you mean to allow 'void *ctx __arg_ctx' in kernel and libbpf and
> > > in addition allow 'struct pt_reg *ctx __arg_ctx' for kprobe and other
> > > prog types where that's what is being passed then yes.
> > > Totally fine with me.
> > > These two are easy to enforce in kernel and libbpf.
> >
> > Ok, great.
> >
> > >
> > > > Yes, of course, sk_buff instead of pt_regs is definitely broken. But
> > > > that will be detected even by the compiler.
> > >
> > > Right. C can do casts, but in bpf asm the verifier promises strict type
> > > checking and it goes further and makes safety decisions based on types.
> > >
> >
> > It feels like you are thinking about PTR_TO_BTF_ID only and
> > extrapolating that behavior to everything else. You know that it's not
> > like that in general.
>
> imo trusted ptr_to_btf_id and ptr_to_ctx are equivalent.
> They're fully trusted from the verifier pov and fully correct
> from bpf prog pov. So neither should have hidden type casts.

But my point is that the BPF verifier itself doesn't not treat them
the same today. For context argument, the verifier just assumes valid
PTR_TO_CTX and never looks at BTF information.

But I also think it's ironic that above you suggested this to be totally fine:

  > int pe_prog(void *ctx __arg_ctx)
  > {
  >   struct pt_regs *regs = ctx;

Which is totally a cast, and has no bearing on verifier's PTR_TO_CTX treatment.

But we are getting a bit philosophical, I propose we put this to rest.

>
> Of course, I remember bpf_rdonly_cast. But this one is untrusted.
> It's equivalent to C type cast.
>
> > > > Anyways, I can add special casing for pt_regs and start enforcing
> > > > types. A bit hesitant to do that on libbpf side, still, due to that
> > > > eager global func behavior, which deviates from kernel, but if you
> > > > insist I'll do it.
> > >
> > > I don't understand this part.
> > > Both kernel and libbpf can check
> > > if (btf_type_id(ctx) == 'struct pt_regs'
> > >   && prog_type == kprobe) allow such __arg_ctx.
> > >
> >
> > I was thinking about the case where we have __arg_ctx and type doesn't
> > match expectation. Should libbpf error out? Or emit a warning and
> > proceed without adjusting types? If we do the warning and let verifier
> > reject invalid program, I think it will be better and then these
> > concerns of mine about lazy vs eager global subprog verification
> > behavior are irrelevant.
>
> I think warn in libbpf is fine.
> Old kernel will likely fail verification since types won't match
> and libbpf warn will serve its purpose,
> but for the new kernel both libbpf and kernel will print
> two similar looking warns?

I'm going to add feature detection and disable this __arg_ctx
treatment on libbpf side (for new kernels, of course). So yes, on old
kernels verifier will reject argument (because it won't be PTR_TO_CTX)
and we'll have libbpf warning. And on new one any check will be done
directly by verifier and emitted properly in verifier log (which I
think is better than libbpf's warnings, especially taking into account
veristat and tools like that that by default emit only verifier log on
failures).

>
> >
> > > >
> > > > (Eduard, I'll add feature detection for the need to rewrite BTF at the
> > > > same time, just FYI)
> > > >
> > > > Keep in mind, though, for non-global subprogs kernel doesn't enforce
> > > > any types, so you can really pass sk_buff into pt_regs argument, if
> > > > you really want to, but kernel will happily still assume PTR_TO_CTX
> > > > (and I'm sure you know this as well, so this is mostly for others and
> > > > for completeness).
> > >
> > > static functions are very different. They're typeless and will stay
> > > typeless for some time.
> >
> > Right. And they are different in how we verify and so on. But from the
> > user's perspective they are still just functions and (in my mind at
> > least), the less difference between static and global subprogs there
> > are, the better.
> >
> > But anyways, I'll do what we agreed above. I'll proceed with libbpf
> > emitting warning and not doing anything for __arg_ctx, unless you feel
> > strongly libbpf should error out.
>
> warn is fine. thanks.

Alright, cool, will work on this and add it in v2 of [0]. If you get
time, please check that one as well (unless you already did).

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=814516&state=*

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-01-10 19:02 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-04  1:38 [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Andrii Nakryiko
2024-01-04  1:38 ` [PATCH v3 bpf-next 1/9] libbpf: make uniform use of btf__fd() accessor inside libbpf Andrii Nakryiko
2024-01-04  1:38 ` [PATCH v3 bpf-next 2/9] libbpf: use explicit map reuse flag to skip map creation steps Andrii Nakryiko
2024-01-04  1:38 ` [PATCH v3 bpf-next 3/9] libbpf: don't rely on map->fd as an indicator of map being created Andrii Nakryiko
2024-01-04  1:38 ` [PATCH v3 bpf-next 4/9] libbpf: use stable map placeholder FDs Andrii Nakryiko
2024-01-04  1:38 ` [PATCH v3 bpf-next 5/9] libbpf: move exception callbacks assignment logic into relocation step Andrii Nakryiko
2024-01-04  1:38 ` [PATCH v3 bpf-next 6/9] libbpf: move BTF loading step after " Andrii Nakryiko
2024-01-04  1:38 ` [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic Andrii Nakryiko
2024-01-04  5:39   ` Alexei Starovoitov
2024-01-04 18:37     ` Andrii Nakryiko
2024-01-04 18:52       ` Alexei Starovoitov
2024-01-04 20:58         ` Andrii Nakryiko
2024-01-05  1:33           ` Alexei Starovoitov
2024-01-05  3:57             ` Andrii Nakryiko
2024-01-05  5:42               ` Alexei Starovoitov
2024-01-08 23:45                 ` Andrii Nakryiko
2024-01-09  1:49                   ` Alexei Starovoitov
2024-01-09 17:17                     ` Andrii Nakryiko
2024-01-10  1:58                       ` Alexei Starovoitov
2024-01-10 19:02                         ` Andrii Nakryiko
2024-01-04  1:38 ` [PATCH v3 bpf-next 8/9] selftests/bpf: add arg:ctx cases to test_global_funcs tests Andrii Nakryiko
2024-01-04  1:38 ` [PATCH v3 bpf-next 9/9] selftests/bpf: add __arg_ctx BTF rewrite test Andrii Nakryiko
2024-01-04  2:33 ` [PATCH v3 bpf-next 0/9] Libbpf-side __arg_ctx fallback support Eduard Zingerman
2024-01-04  8:13 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.