[PATCH bpf-next 0/3] Add user-space and non-CO-RE variants of BPF_CORE

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/3] Add user-space and non-CO-RE variants of BPF_CORE_READ()
@ 2020-12-18 23:56 Andrii Nakryiko
  2020-12-18 23:56 ` [PATCH bpf-next 1/3] libbpf: add user-space variants of BPF_CORE_READ() family of macros Andrii Nakryiko
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2020-12-18 23:56 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team

Add two sets of BPF_CORE_READ()-like macros. One is for reading kernel data
from user address space (e.g., UAPI data structs for syscalls). Another one is
non-CO-RE variants, which don't emit CO-RE relocations and thus won't fail on
kernels without BTF. The latter one still provides much shorter way to write
reads that needs to chase few pointers.

Andrii Nakryiko (3):
  libbpf: add user-space variants of BPF_CORE_READ() family of macros
  libbpf: add non-CO-RE variants of BPF_CORE_READ() macro family
  selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ()
    variants

 tools/lib/bpf/bpf_core_read.h                 | 136 +++++++++++++-----
 .../bpf/prog_tests/core_read_macros.c         |  64 +++++++++
 .../bpf/progs/test_core_read_macros.c         |  51 +++++++
 3 files changed, 212 insertions(+), 39 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/core_read_macros.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_read_macros.c

-- 
2.24.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf-next 1/3] libbpf: add user-space variants of BPF_CORE_READ() family of macros
  2020-12-18 23:56 [PATCH bpf-next 0/3] Add user-space and non-CO-RE variants of BPF_CORE_READ() Andrii Nakryiko
@ 2020-12-18 23:56 ` Andrii Nakryiko
  2020-12-18 23:56 ` [PATCH bpf-next 2/3] libbpf: add non-CO-RE variants of BPF_CORE_READ() macro family Andrii Nakryiko
  2020-12-18 23:56 ` [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants Andrii Nakryiko
  2 siblings, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2020-12-18 23:56 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team, Gilad Reti

Add BPF_CORE_READ_USER(), BPF_CORE_READ_USER_STR() and their _INTO()
variations to allow reading CO-RE-relocatable kernel data structures from the
user-space. One of such cases is reading input arguments of syscalls, while
reaping the benefits of CO-RE relocations w.r.t. handling 32/64 bit
conversions and handling missing/new fields in UAPI data structs.

Suggested-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf_core_read.h | 98 +++++++++++++++++++++--------------
 1 file changed, 59 insertions(+), 39 deletions(-)

diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
index bbcefb3ff5a5..db0c735ceb53 100644
--- a/tools/lib/bpf/bpf_core_read.h
+++ b/tools/lib/bpf/bpf_core_read.h
@@ -195,17 +195,20 @@ enum bpf_enum_value_kind {
  * (local) BTF, used to record relocation.
  */
 #define bpf_core_read(dst, sz, src)					    \
-	bpf_probe_read_kernel(dst, sz,					    \
-			      (const void *)__builtin_preserve_access_index(src))
+	bpf_probe_read_kernel(dst, sz, (const void *)__builtin_preserve_access_index(src))
 
+#define bpf_core_read_user(dst, sz, src)				    \
+	bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src))
 /*
  * bpf_core_read_str() is a thin wrapper around bpf_probe_read_str()
  * additionally emitting BPF CO-RE field relocation for specified source
  * argument.
  */
 #define bpf_core_read_str(dst, sz, src)					    \
-	bpf_probe_read_kernel_str(dst, sz,				    \
-				  (const void *)__builtin_preserve_access_index(src))
+	bpf_probe_read_kernel_str(dst, sz, (const void *)__builtin_preserve_access_index(src))
+
+#define bpf_core_read_user_str(dst, sz, src)				    \
+	bpf_probe_read_user_str(dst, sz, (const void *)__builtin_preserve_access_index(src))
 
 #define ___concat(a, b) a ## b
 #define ___apply(fn, n) ___concat(fn, n)
@@ -264,30 +267,29 @@ enum bpf_enum_value_kind {
 	read_fn((void *)(dst), sizeof(*(dst)), &((src_type)(src))->accessor)
 
 /* "recursively" read a sequence of inner pointers using local __t var */
-#define ___rd_first(src, a) ___read(bpf_core_read, &__t, ___type(src), src, a);
-#define ___rd_last(...)							    \
-	___read(bpf_core_read, &__t,					    \
-		___type(___nolast(__VA_ARGS__)), __t, ___last(__VA_ARGS__));
-#define ___rd_p1(...) const void *__t; ___rd_first(__VA_ARGS__)
-#define ___rd_p2(...) ___rd_p1(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
-#define ___rd_p3(...) ___rd_p2(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
-#define ___rd_p4(...) ___rd_p3(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
-#define ___rd_p5(...) ___rd_p4(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
-#define ___rd_p6(...) ___rd_p5(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
-#define ___rd_p7(...) ___rd_p6(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
-#define ___rd_p8(...) ___rd_p7(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
-#define ___rd_p9(...) ___rd_p8(___nolast(__VA_ARGS__)) ___rd_last(__VA_ARGS__)
-#define ___read_ptrs(src, ...)						    \
-	___apply(___rd_p, ___narg(__VA_ARGS__))(src, __VA_ARGS__)
-
-#define ___core_read0(fn, dst, src, a)					    \
+#define ___rd_first(fn, src, a) ___read(fn, &__t, ___type(src), src, a);
+#define ___rd_last(fn, ...)						    \
+	___read(fn, &__t, ___type(___nolast(__VA_ARGS__)), __t, ___last(__VA_ARGS__));
+#define ___rd_p1(fn, ...) const void *__t; ___rd_first(fn, __VA_ARGS__)
+#define ___rd_p2(fn, ...) ___rd_p1(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
+#define ___rd_p3(fn, ...) ___rd_p2(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
+#define ___rd_p4(fn, ...) ___rd_p3(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
+#define ___rd_p5(fn, ...) ___rd_p4(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
+#define ___rd_p6(fn, ...) ___rd_p5(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
+#define ___rd_p7(fn, ...) ___rd_p6(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
+#define ___rd_p8(fn, ...) ___rd_p7(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
+#define ___rd_p9(fn, ...) ___rd_p8(fn, ___nolast(__VA_ARGS__)) ___rd_last(fn, __VA_ARGS__)
+#define ___read_ptrs(fn, src, ...)					    \
+	___apply(___rd_p, ___narg(__VA_ARGS__))(fn, src, __VA_ARGS__)
+
+#define ___core_read0(fn, fn_ptr, dst, src, a)				    \
 	___read(fn, dst, ___type(src), src, a);
-#define ___core_readN(fn, dst, src, ...)				    \
-	___read_ptrs(src, ___nolast(__VA_ARGS__))			    \
+#define ___core_readN(fn, fn_ptr, dst, src, ...)			    \
+	___read_ptrs(fn_ptr, src, ___nolast(__VA_ARGS__))		    \
 	___read(fn, dst, ___type(src, ___nolast(__VA_ARGS__)), __t,	    \
 		___last(__VA_ARGS__));
-#define ___core_read(fn, dst, src, a, ...)				    \
-	___apply(___core_read, ___empty(__VA_ARGS__))(fn, dst,		    \
+#define ___core_read(fn, fn_ptr, dst, src, a, ...)			    \
+	___apply(___core_read, ___empty(__VA_ARGS__))(fn, fn_ptr, dst,	    \
 						      src, a, ##__VA_ARGS__)
 
 /*
@@ -295,20 +297,32 @@ enum bpf_enum_value_kind {
  * BPF_CORE_READ(), in which final field is read into user-provided storage.
  * See BPF_CORE_READ() below for more details on general usage.
  */
-#define BPF_CORE_READ_INTO(dst, src, a, ...)				    \
-	({								    \
-		___core_read(bpf_core_read, dst, (src), a, ##__VA_ARGS__)   \
-	})
+#define BPF_CORE_READ_INTO(dst, src, a, ...) ({				    \
+	___core_read(bpf_core_read, bpf_core_read,			    \
+		     dst, (src), a, ##__VA_ARGS__)			    \
+})
+
+/* Variant of BPF_CORE_READ_INTO() for reading from user-space memory */
+#define BPF_CORE_READ_USER_INTO(dst, src, a, ...) ({			    \
+	___core_read(bpf_core_read_user, bpf_core_read_user,		    \
+		     dst, (src), a, ##__VA_ARGS__)			    \
+})
 
 /*
  * BPF_CORE_READ_STR_INTO() does same "pointer chasing" as
  * BPF_CORE_READ() for intermediate pointers, but then executes (and returns
  * corresponding error code) bpf_core_read_str() for final string read.
  */
-#define BPF_CORE_READ_STR_INTO(dst, src, a, ...)			    \
-	({								    \
-		___core_read(bpf_core_read_str, dst, (src), a, ##__VA_ARGS__)\
-	})
+#define BPF_CORE_READ_STR_INTO(dst, src, a, ...) ({			    \
+	___core_read(bpf_core_read_str, bpf_core_read,			    \
+		     dst, (src), a, ##__VA_ARGS__)			    \
+})
+
+/* Variant of BPF_CORE_READ_STR_INTO() for reading from user-space memory */
+#define BPF_CORE_READ_USER_STR_INTO(dst, src, a, ...) ({		    \
+	___core_read(bpf_core_read_user_str, bpf_core_read_user,	    \
+		     dst, (src), a, ##__VA_ARGS__)			    \
+})
 
 /*
  * BPF_CORE_READ() is used to simplify BPF CO-RE relocatable read, especially
@@ -334,12 +348,18 @@ enum bpf_enum_value_kind {
  * N.B. Only up to 9 "field accessors" are supported, which should be more
  * than enough for any practical purpose.
  */
-#define BPF_CORE_READ(src, a, ...)					    \
-	({								    \
-		___type((src), a, ##__VA_ARGS__) __r;			    \
-		BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__);	    \
-		__r;							    \
-	})
+#define BPF_CORE_READ(src, a, ...) ({					    \
+	___type((src), a, ##__VA_ARGS__) __r;				    \
+	BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__);		    \
+	__r;								    \
+})
+
+/* Variant of BPF_CORE_READ() for reading from user-space memory */
+#define BPF_CORE_READ_USER(src, a, ...) ({				    \
+	___type((src), a, ##__VA_ARGS__) __r;				    \
+	BPF_CORE_READ_USER_INTO(&__r, (src), a, ##__VA_ARGS__);		    \
+	__r;								    \
+})
 
 #endif
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next 2/3] libbpf: add non-CO-RE variants of BPF_CORE_READ() macro family
  2020-12-18 23:56 [PATCH bpf-next 0/3] Add user-space and non-CO-RE variants of BPF_CORE_READ() Andrii Nakryiko
  2020-12-18 23:56 ` [PATCH bpf-next 1/3] libbpf: add user-space variants of BPF_CORE_READ() family of macros Andrii Nakryiko
@ 2020-12-18 23:56 ` Andrii Nakryiko
  2020-12-18 23:56 ` [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants Andrii Nakryiko
  2 siblings, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2020-12-18 23:56 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team

BPF_CORE_READ(), in addition to handling CO-RE relocations, also allows much
nicer way to read data structures with nested pointers. Instead of writing
a sequence of bpf_probe_read() calls to follow links, one can just write
BPF_CORE_READ(a, b, c, d) to effectively do a->b->c->d read. This is a welcome
ability when porting BCC code, which (in most cases) allows exactly the
intuitive a->b->c->d variant.

This patch adds non-CO-RE variants of BPF_CORE_READ() family of macros for
cases where CO-RE is not supported (e.g., old kernels). In such cases, the
property of shortening a sequence of bpf_probe_read()s to a simple
BPF_PROBE_READ(a, b, c, d) invocation is still desirable, especially when
porting BCC code to libbpf. Yet, no CO-RE relocation is going to be emitted.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf_core_read.h | 38 +++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
index db0c735ceb53..9456aabcb03a 100644
--- a/tools/lib/bpf/bpf_core_read.h
+++ b/tools/lib/bpf/bpf_core_read.h
@@ -308,6 +308,18 @@ enum bpf_enum_value_kind {
 		     dst, (src), a, ##__VA_ARGS__)			    \
 })
 
+/* Non-CO-RE variant of BPF_CORE_READ_INTO() */
+#define BPF_PROBE_READ_INTO(dst, src, a, ...) ({			    \
+	___core_read(bpf_probe_read, bpf_probe_read,			    \
+		     dst, (src), a, ##__VA_ARGS__)			    \
+})
+
+/* Non-CO-RE variant of BPF_CORE_READ_USER_INTO() */
+#define BPF_PROBE_READ_USER_INTO(dst, src, a, ...) ({			    \
+	___core_read(bpf_probe_read_user, bpf_probe_read_user,		    \
+		     dst, (src), a, ##__VA_ARGS__)			    \
+})
+
 /*
  * BPF_CORE_READ_STR_INTO() does same "pointer chasing" as
  * BPF_CORE_READ() for intermediate pointers, but then executes (and returns
@@ -324,6 +336,18 @@ enum bpf_enum_value_kind {
 		     dst, (src), a, ##__VA_ARGS__)			    \
 })
 
+/* Non-CO-RE variant of BPF_CORE_READ_STR_INTO() */
+#define BPF_PROBE_READ_STR_INTO(dst, src, a, ...) ({			    \
+	___core_read(bpf_probe_read_str, bpf_probe_read,		    \
+		     dst, (src), a, ##__VA_ARGS__)			    \
+})
+
+/* Non-CO-RE variant of BPF_CORE_READ_USER_STR_INTO() */
+#define BPF_PROBE_READ_USER_STR_INTO(dst, src, a, ...) ({		    \
+	___core_read(bpf_probe_read_user_str, bpf_probe_read_user,	    \
+		     dst, (src), a, ##__VA_ARGS__)			    \
+})
+
 /*
  * BPF_CORE_READ() is used to simplify BPF CO-RE relocatable read, especially
  * when there are few pointer chasing steps.
@@ -361,5 +385,19 @@ enum bpf_enum_value_kind {
 	__r;								    \
 })
 
+/* Non-CO-RE variant of BPF_CORE_READ() */
+#define BPF_PROBE_READ(src, a, ...) ({					    \
+	___type((src), a, ##__VA_ARGS__) __r;				    \
+	BPF_PROBE_READ_INTO(&__r, (src), a, ##__VA_ARGS__);		    \
+	__r;								    \
+})
+
+/* Non-CO-RE variant of BPF_CORE_READ_USER() */
+#define BPF_PROBE_READ_USER(src, a, ...) ({				    \
+	___type((src), a, ##__VA_ARGS__) __r;				    \
+	BPF_PROBE_READ_USER_INTO(&__r, (src), a, ##__VA_ARGS__);	    \
+	__r;								    \
+})
+
 #endif
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants
  2020-12-18 23:56 [PATCH bpf-next 0/3] Add user-space and non-CO-RE variants of BPF_CORE_READ() Andrii Nakryiko
  2020-12-18 23:56 ` [PATCH bpf-next 1/3] libbpf: add user-space variants of BPF_CORE_READ() family of macros Andrii Nakryiko
  2020-12-18 23:56 ` [PATCH bpf-next 2/3] libbpf: add non-CO-RE variants of BPF_CORE_READ() macro family Andrii Nakryiko
@ 2020-12-18 23:56 ` Andrii Nakryiko
  2021-01-05  3:46   ` Alexei Starovoitov
  2 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2020-12-18 23:56 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team

Add selftests validating that newly added variations of BPF_CORE_READ(), for
use with user-space addresses and for non-CO-RE reads, work as expected.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../bpf/prog_tests/core_read_macros.c         | 64 +++++++++++++++++++
 .../bpf/progs/test_core_read_macros.c         | 51 +++++++++++++++
 2 files changed, 115 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/core_read_macros.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_read_macros.c

diff --git a/tools/testing/selftests/bpf/prog_tests/core_read_macros.c b/tools/testing/selftests/bpf/prog_tests/core_read_macros.c
new file mode 100644
index 000000000000..96f5cf3c6fa2
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/core_read_macros.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+
+#include <test_progs.h>
+
+struct callback_head {
+	struct callback_head *next;
+	void (*func)(struct callback_head *head);
+};
+
+/* ___shuffled flavor is just an illusion for BPF code, it doesn't really
+ * exist and user-space needs to provide data in the memory layout that
+ * matches callback_head. We just defined ___shuffled flavor to make it easier
+ * to work with the skeleton
+ */
+struct callback_head___shuffled {
+	struct callback_head___shuffled *next;
+	void (*func)(struct callback_head *head);
+};
+
+#include "test_core_read_macros.skel.h"
+
+void test_core_read_macros(void)
+{
+	int duration = 0, err;
+	struct test_core_read_macros* skel;
+	struct test_core_read_macros__bss *bss;
+	struct callback_head u_probe_in;
+	struct callback_head___shuffled u_core_in;
+
+	skel = test_core_read_macros__open_and_load();
+	if (CHECK(!skel, "skel_open", "failed to open skeleton\n"))
+		return;
+	bss = skel->bss;
+	bss->my_pid = getpid();
+
+	/* next pointers have to be set from the kernel side */
+	bss->k_probe_in.func = (void *)(long)0x1234;
+	bss->k_core_in.func = (void *)(long)0xabcd;
+
+	u_probe_in.next = &u_probe_in;
+	u_probe_in.func = (void *)(long)0x5678;
+	bss->u_probe_in = &u_probe_in;
+
+	u_core_in.next = &u_core_in;
+	u_core_in.func = (void *)(long)0xdbca;
+	bss->u_core_in = &u_core_in;
+
+	err = test_core_read_macros__attach(skel);
+	if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err))
+		goto cleanup;
+
+	/* trigger tracepoint */
+	usleep(1);
+
+	ASSERT_EQ(bss->k_probe_out, 0x1234, "k_probe_out");
+	ASSERT_EQ(bss->k_core_out, 0xabcd, "k_core_out");
+
+	ASSERT_EQ(bss->u_probe_out, 0x5678, "u_probe_out");
+	ASSERT_EQ(bss->u_core_out, 0xdbca, "u_core_out");
+
+cleanup:
+	test_core_read_macros__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_core_read_macros.c b/tools/testing/selftests/bpf/progs/test_core_read_macros.c
new file mode 100644
index 000000000000..50054af0a928
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_core_read_macros.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Facebook
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+
+char _license[] SEC("license") = "GPL";
+
+/* shuffled layout for relocatable (CO-RE) reads */
+struct callback_head___shuffled {
+	void (*func)(struct callback_head___shuffled *head);
+	struct callback_head___shuffled *next;
+};
+
+struct callback_head k_probe_in = {};
+struct callback_head___shuffled k_core_in = {};
+
+struct callback_head *u_probe_in = 0;
+struct callback_head___shuffled *u_core_in = 0;
+
+long k_probe_out = 0;
+long u_probe_out = 0;
+
+long k_core_out = 0;
+long u_core_out = 0;
+
+int my_pid = 0;
+
+SEC("raw_tracepoint/sys_enter")
+int handler(void *ctx)
+{
+	int pid = bpf_get_current_pid_tgid() >> 32;
+
+	if (my_pid != pid)
+		return 0;
+
+	/* next pointers for kernel address space have to be initialized from
+	 * BPF side, user-space mmaped addresses are stil user-space addresses
+	 */
+	k_probe_in.next = &k_probe_in;
+	__builtin_preserve_access_index(({k_core_in.next = &k_core_in;}));
+
+	k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func);
+	k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func);
+	u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func);
+	u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func);
+
+	return 0;
+}
+
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants
  2020-12-18 23:56 ` [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants Andrii Nakryiko
@ 2021-01-05  3:46   ` Alexei Starovoitov
  2021-01-05  5:08     ` Andrii Nakryiko
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2021-01-05  3:46 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, netdev, ast, daniel, kernel-team

On Fri, Dec 18, 2020 at 03:56:14PM -0800, Andrii Nakryiko wrote:
> +
> +/* shuffled layout for relocatable (CO-RE) reads */
> +struct callback_head___shuffled {
> +	void (*func)(struct callback_head___shuffled *head);
> +	struct callback_head___shuffled *next;
> +};
> +
> +struct callback_head k_probe_in = {};
> +struct callback_head___shuffled k_core_in = {};
> +
> +struct callback_head *u_probe_in = 0;
> +struct callback_head___shuffled *u_core_in = 0;
> +
> +long k_probe_out = 0;
> +long u_probe_out = 0;
> +
> +long k_core_out = 0;
> +long u_core_out = 0;
> +
> +int my_pid = 0;
> +
> +SEC("raw_tracepoint/sys_enter")
> +int handler(void *ctx)
> +{
> +	int pid = bpf_get_current_pid_tgid() >> 32;
> +
> +	if (my_pid != pid)
> +		return 0;
> +
> +	/* next pointers for kernel address space have to be initialized from
> +	 * BPF side, user-space mmaped addresses are stil user-space addresses
> +	 */
> +	k_probe_in.next = &k_probe_in;
> +	__builtin_preserve_access_index(({k_core_in.next = &k_core_in;}));
> +
> +	k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func);
> +	k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func);
> +	u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func);
> +	u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func);

I don't understand what the test suppose to demonstrate.
co-re relocs work for kernel btf only.
Are you saying that 'struct callback_head' happened to be used by user space
process that allocated it in user memory. And that is the same struct as
being used by the kernel? So co-re relocs that apply against the kernel
will sort-of work against the data of user space process because
the user space is using the same struct? That sounds convoluted.
I struggle to see the point of patch 1:
+#define bpf_core_read_user(dst, sz, src)                                   \
+       bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src))

co-re for user structs? Aren't they uapi? No reloc is needed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants
  2021-01-05  3:46   ` Alexei Starovoitov
@ 2021-01-05  5:08     ` Andrii Nakryiko
  2021-01-05 19:03       ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2021-01-05  5:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Gilad Reti

On Mon, Jan 4, 2021 at 7:46 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Dec 18, 2020 at 03:56:14PM -0800, Andrii Nakryiko wrote:
> > +
> > +/* shuffled layout for relocatable (CO-RE) reads */
> > +struct callback_head___shuffled {
> > +     void (*func)(struct callback_head___shuffled *head);
> > +     struct callback_head___shuffled *next;
> > +};
> > +
> > +struct callback_head k_probe_in = {};
> > +struct callback_head___shuffled k_core_in = {};
> > +
> > +struct callback_head *u_probe_in = 0;
> > +struct callback_head___shuffled *u_core_in = 0;
> > +
> > +long k_probe_out = 0;
> > +long u_probe_out = 0;
> > +
> > +long k_core_out = 0;
> > +long u_core_out = 0;
> > +
> > +int my_pid = 0;
> > +
> > +SEC("raw_tracepoint/sys_enter")
> > +int handler(void *ctx)
> > +{
> > +     int pid = bpf_get_current_pid_tgid() >> 32;
> > +
> > +     if (my_pid != pid)
> > +             return 0;
> > +
> > +     /* next pointers for kernel address space have to be initialized from
> > +      * BPF side, user-space mmaped addresses are stil user-space addresses
> > +      */
> > +     k_probe_in.next = &k_probe_in;
> > +     __builtin_preserve_access_index(({k_core_in.next = &k_core_in;}));
> > +
> > +     k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func);
> > +     k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func);
> > +     u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func);
> > +     u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func);
>
> I don't understand what the test suppose to demonstrate.
> co-re relocs work for kernel btf only.
> Are you saying that 'struct callback_head' happened to be used by user space
> process that allocated it in user memory. And that is the same struct as
> being used by the kernel? So co-re relocs that apply against the kernel
> will sort-of work against the data of user space process because
> the user space is using the same struct? That sounds convoluted.

The test itself just tests that bpf_probe_read_user() is executed, not
bpf_probe_read_kernel(). But yes, the use case is to read kernel data
structures from the user memory address space. See [0] for the last
time this was requested and justifications. It's not the first time
someone asked about the user-space variant of BPF_CORE_READ(), though
I won't be able to find the reference at this time.

  [0] https://lore.kernel.org/bpf/CANaYP3GetBKUPDfo6PqWnm3xuGs2GZjLF8Ed51Q1=Emv2J-dKg@mail.gmail.com/

> I struggle to see the point of patch 1:
> +#define bpf_core_read_user(dst, sz, src)                                   \
> +       bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src))
>
> co-re for user structs? Aren't they uapi? No reloc is needed.

The use case in [0] above is for reading UAPI structs, passed as input
arguments to syscall. It's a pretty niche use case, but there are at
least two more-or-less valid benefits to use CO-RE with "stable" UAPI
structs:

  - handling 32-bit vs 64-bit UAPI structs uniformly;
  - handling UAPI fields that were added in later kernels, but are
missing on the earlier ones.

For the former, you'd need to compile two variants of the BPF program
(or do convoluted and inconvenient 32-bit UAPI struct re-definition
for 64-bit BPF target). For the latter... I guess you can do if/else
dance based on the kernel version. Which sucks and is inconvenient
(and kernel version checks are discouraged, it's more reliable to
detect availability of specific types and fields).

So all in all, while pretty rare and niche, seemed like a valid use
case. And easy to support while reusing all the macro logic almost
without any changes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants
  2021-01-05  5:08     ` Andrii Nakryiko
@ 2021-01-05 19:03       ` Alexei Starovoitov
  2021-01-05 21:03         ` Andrii Nakryiko
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2021-01-05 19:03 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Gilad Reti

On Mon, Jan 04, 2021 at 09:08:21PM -0800, Andrii Nakryiko wrote:
> On Mon, Jan 4, 2021 at 7:46 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Dec 18, 2020 at 03:56:14PM -0800, Andrii Nakryiko wrote:
> > > +
> > > +/* shuffled layout for relocatable (CO-RE) reads */
> > > +struct callback_head___shuffled {
> > > +     void (*func)(struct callback_head___shuffled *head);
> > > +     struct callback_head___shuffled *next;
> > > +};
> > > +
> > > +struct callback_head k_probe_in = {};
> > > +struct callback_head___shuffled k_core_in = {};
> > > +
> > > +struct callback_head *u_probe_in = 0;
> > > +struct callback_head___shuffled *u_core_in = 0;
> > > +
> > > +long k_probe_out = 0;
> > > +long u_probe_out = 0;
> > > +
> > > +long k_core_out = 0;
> > > +long u_core_out = 0;
> > > +
> > > +int my_pid = 0;
> > > +
> > > +SEC("raw_tracepoint/sys_enter")
> > > +int handler(void *ctx)
> > > +{
> > > +     int pid = bpf_get_current_pid_tgid() >> 32;
> > > +
> > > +     if (my_pid != pid)
> > > +             return 0;
> > > +
> > > +     /* next pointers for kernel address space have to be initialized from
> > > +      * BPF side, user-space mmaped addresses are stil user-space addresses
> > > +      */
> > > +     k_probe_in.next = &k_probe_in;
> > > +     __builtin_preserve_access_index(({k_core_in.next = &k_core_in;}));
> > > +
> > > +     k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func);
> > > +     k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func);
> > > +     u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func);
> > > +     u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func);
> >
> > I don't understand what the test suppose to demonstrate.
> > co-re relocs work for kernel btf only.
> > Are you saying that 'struct callback_head' happened to be used by user space
> > process that allocated it in user memory. And that is the same struct as
> > being used by the kernel? So co-re relocs that apply against the kernel
> > will sort-of work against the data of user space process because
> > the user space is using the same struct? That sounds convoluted.
> 
> The test itself just tests that bpf_probe_read_user() is executed, not
> bpf_probe_read_kernel(). But yes, the use case is to read kernel data
> structures from the user memory address space. See [0] for the last
> time this was requested and justifications. It's not the first time
> someone asked about the user-space variant of BPF_CORE_READ(), though
> I won't be able to find the reference at this time.
> 
>   [0] https://lore.kernel.org/bpf/CANaYP3GetBKUPDfo6PqWnm3xuGs2GZjLF8Ed51Q1=Emv2J-dKg@mail.gmail.com/

That's quite confusing thread.

> > I struggle to see the point of patch 1:
> > +#define bpf_core_read_user(dst, sz, src)                                   \
> > +       bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src))
> >
> > co-re for user structs? Aren't they uapi? No reloc is needed.
> 
> The use case in [0] above is for reading UAPI structs, passed as input
> arguments to syscall. It's a pretty niche use case, but there are at
> least two more-or-less valid benefits to use CO-RE with "stable" UAPI
> structs:
> 
>   - handling 32-bit vs 64-bit UAPI structs uniformly;

what do you mean?
32-bit user space running on 64-bit kernel works through 'compat' syscalls.
If bpf progs are accessing 64-bit uapi structs in such case they're broken
and no amount of co-re can help.

>   - handling UAPI fields that were added in later kernels, but are
> missing on the earlier ones.
> 
> For the former, you'd need to compile two variants of the BPF program
> (or do convoluted and inconvenient 32-bit UAPI struct re-definition
> for 64-bit BPF target). 

No. 32-bit uapi structs should be used by bpf prog.
compat stuff is not only casting pointers from 64-bit to 32.

> For the latter... I guess you can do if/else
> dance based on the kernel version. Which sucks and is inconvenient
> (and kernel version checks are discouraged, it's more reliable to
> detect availability of specific types and fields).

Not really. ifdef based on kernel version is not needed.
bpf_core_field_exists() will work just fine.
No need to bpf_probe_read_user() macros.

> So all in all, while pretty rare and niche, seemed like a valid use
> case. And easy to support while reusing all the macro logic almost
> without any changes.

I think these new macros added with confusing and unclear goals
will do more harm than good.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants
  2021-01-05 19:03       ` Alexei Starovoitov
@ 2021-01-05 21:03         ` Andrii Nakryiko
  2021-01-06  6:09           ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2021-01-05 21:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Gilad Reti

On Tue, Jan 5, 2021 at 11:04 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jan 04, 2021 at 09:08:21PM -0800, Andrii Nakryiko wrote:
> > On Mon, Jan 4, 2021 at 7:46 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Dec 18, 2020 at 03:56:14PM -0800, Andrii Nakryiko wrote:
> > > > +
> > > > +/* shuffled layout for relocatable (CO-RE) reads */
> > > > +struct callback_head___shuffled {
> > > > +     void (*func)(struct callback_head___shuffled *head);
> > > > +     struct callback_head___shuffled *next;
> > > > +};
> > > > +
> > > > +struct callback_head k_probe_in = {};
> > > > +struct callback_head___shuffled k_core_in = {};
> > > > +
> > > > +struct callback_head *u_probe_in = 0;
> > > > +struct callback_head___shuffled *u_core_in = 0;
> > > > +
> > > > +long k_probe_out = 0;
> > > > +long u_probe_out = 0;
> > > > +
> > > > +long k_core_out = 0;
> > > > +long u_core_out = 0;
> > > > +
> > > > +int my_pid = 0;
> > > > +
> > > > +SEC("raw_tracepoint/sys_enter")
> > > > +int handler(void *ctx)
> > > > +{
> > > > +     int pid = bpf_get_current_pid_tgid() >> 32;
> > > > +
> > > > +     if (my_pid != pid)
> > > > +             return 0;
> > > > +
> > > > +     /* next pointers for kernel address space have to be initialized from
> > > > +      * BPF side, user-space mmaped addresses are stil user-space addresses
> > > > +      */
> > > > +     k_probe_in.next = &k_probe_in;
> > > > +     __builtin_preserve_access_index(({k_core_in.next = &k_core_in;}));
> > > > +
> > > > +     k_probe_out = (long)BPF_PROBE_READ(&k_probe_in, next, next, func);
> > > > +     k_core_out = (long)BPF_CORE_READ(&k_core_in, next, next, func);
> > > > +     u_probe_out = (long)BPF_PROBE_READ_USER(u_probe_in, next, next, func);
> > > > +     u_core_out = (long)BPF_CORE_READ_USER(u_core_in, next, next, func);
> > >
> > > I don't understand what the test suppose to demonstrate.
> > > co-re relocs work for kernel btf only.
> > > Are you saying that 'struct callback_head' happened to be used by user space
> > > process that allocated it in user memory. And that is the same struct as
> > > being used by the kernel? So co-re relocs that apply against the kernel
> > > will sort-of work against the data of user space process because
> > > the user space is using the same struct? That sounds convoluted.
> >
> > The test itself just tests that bpf_probe_read_user() is executed, not
> > bpf_probe_read_kernel(). But yes, the use case is to read kernel data
> > structures from the user memory address space. See [0] for the last
> > time this was requested and justifications. It's not the first time
> > someone asked about the user-space variant of BPF_CORE_READ(), though
> > I won't be able to find the reference at this time.
> >
> >   [0] https://lore.kernel.org/bpf/CANaYP3GetBKUPDfo6PqWnm3xuGs2GZjLF8Ed51Q1=Emv2J-dKg@mail.gmail.com/
>
> That's quite confusing thread.
>
> > > I struggle to see the point of patch 1:
> > > +#define bpf_core_read_user(dst, sz, src)                                   \
> > > +       bpf_probe_read_user(dst, sz, (const void *)__builtin_preserve_access_index(src))
> > >
> > > co-re for user structs? Aren't they uapi? No reloc is needed.
> >
> > The use case in [0] above is for reading UAPI structs, passed as input
> > arguments to syscall. It's a pretty niche use case, but there are at
> > least two more-or-less valid benefits to use CO-RE with "stable" UAPI
> > structs:
> >
> >   - handling 32-bit vs 64-bit UAPI structs uniformly;
>
> what do you mean?
> 32-bit user space running on 64-bit kernel works through 'compat' syscalls.
> If bpf progs are accessing 64-bit uapi structs in such case they're broken
> and no amount of co-re can help.

I know nothing about compat, so can't comment on that. But the way I
understood the situation was the BPF program compiled once (well, at
least from the unmodified source code), but runs on ARM32 and (on a
separate physical host) on ARM64. And it's task is, say, to read UAPI
kernel structures from syscall arguments.

>
> >   - handling UAPI fields that were added in later kernels, but are
> > missing on the earlier ones.
> >
> > For the former, you'd need to compile two variants of the BPF program
> > (or do convoluted and inconvenient 32-bit UAPI struct re-definition
> > for 64-bit BPF target).
>
> No. 32-bit uapi structs should be used by bpf prog.
> compat stuff is not only casting pointers from 64-bit to 32.
>

See above about compat, that's not what I was thinking about.

One simple example I found in UAPI definitions is struct timespec, it
seems it's defined with `long`:

struct timespec {
        __kernel_old_time_t     tv_sec;         /* seconds */
        long                    tv_nsec;        /* nanoseconds */
}

So if you were to trace clock_gettime(), you'd need to deal with
differently-sized reads of tv_nsec, depending on whether you are
running on the 32-bit or 64-bit host.

There are probably other examples where UAPI structs use long instead
of __u32 or __u64, but I didn't dig too deep.


> > For the latter... I guess you can do if/else
> > dance based on the kernel version. Which sucks and is inconvenient
> > (and kernel version checks are discouraged, it's more reliable to
> > detect availability of specific types and fields).
>
> Not really. ifdef based on kernel version is not needed.
> bpf_core_field_exists() will work just fine.
> No need to bpf_probe_read_user() macros.

Yes, you are right, detection of field/type existence doesn't depend
on kernel- vs user-space, disregard this one.

>
> > So all in all, while pretty rare and niche, seemed like a valid use
> > case. And easy to support while reusing all the macro logic almost
> > without any changes.
>
> I think these new macros added with confusing and unclear goals
> will do more harm than good.

Let's see if Gilad can provide his perspective. I have no strong
feelings about this and can send a patch removing CORE_READ_USER
variants (they didn't make it into libbpf v0.3, so no harm or API
stability concerns). BPF_PROBE_READ() and BPF_PROBE_READ_USER() are
still useful for reading non-relocatable, but nested data structures,
so I'd prefer to keep those.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants
  2021-01-05 21:03         ` Andrii Nakryiko
@ 2021-01-06  6:09           ` Alexei Starovoitov
  2021-01-06 10:10             ` Gilad Reti
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2021-01-06  6:09 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Gilad Reti

On Tue, Jan 05, 2021 at 01:03:47PM -0800, Andrii Nakryiko wrote:
> > >
> > >   - handling 32-bit vs 64-bit UAPI structs uniformly;
> >
> > what do you mean?
> > 32-bit user space running on 64-bit kernel works through 'compat' syscalls.
> > If bpf progs are accessing 64-bit uapi structs in such case they're broken
> > and no amount of co-re can help.
> 
> I know nothing about compat, so can't comment on that. But the way I
> understood the situation was the BPF program compiled once (well, at
> least from the unmodified source code), but runs on ARM32 and (on a
> separate physical host) on ARM64. And it's task is, say, to read UAPI
> kernel structures from syscall arguments.

I'm not sure about arm, but on x86 there should be two different progs
for this to work (if we're talking about 32-bit userspace).
See below.

> One simple example I found in UAPI definitions is struct timespec, it
> seems it's defined with `long`:
> 
> struct timespec {
>         __kernel_old_time_t     tv_sec;         /* seconds */
>         long                    tv_nsec;        /* nanoseconds */
> }
> 
> So if you were to trace clock_gettime(), you'd need to deal with
> differently-sized reads of tv_nsec, depending on whether you are
> running on the 32-bit or 64-bit host.

I believe gettime is vdso, so that's not a syscall and there are no bpf hooks
available to track that.
For normal syscall with timespec argument like sys_recvmmsg and sys_nanosleep
the user needs to load two programs and attach to two different points:
fentry/__x64_sys_nanosleep and fentry/__ia32_sys_nanosleep.
(I'm not an expert on compat and not quite sure how _time32 suffix is handled,
so may be 4 different progs are needed).
The point is that there are several different entry points with different args.
ia32 user space will call into the kernel differently.
At the end different pieces of the syscall and compat handling will call
into hrtimer_nanosleep with ktime.
So if one bpf prog needs to work for 32 and 64 bit user space it should be
attached to common piece of code like hrtimer_nanosleep().
If it's attached to syscall entry it needs to attach to at least 2 different
entry points with 2 different progs.
I guess it's possible to load single prog and kprobe-attach it to
two different kernel functions, but code is kinda different.
For the simplest things like sys_nanosleep it might work and timespec
is simple enough structure to handle with sizeof(long) co-re tricks,
but it's not a generally applicable approach.
So for a tool to track 32-bit and 64-bit user space is quite tricky.

If bpf prog doesn't care about user space and only needs to deal
with 64-bit kernel + 64-bit user space and 32-bit kernel + 32-bit user space
then it's a bit simpler, but probably still requires attaching
to two different points. On 32-bit x86 kernel there will be no
"fentry/__x64_sys_nanosleep" function.

> There are probably other examples where UAPI structs use long instead
> of __u32 or __u64, but I didn't dig too deep.

There is also crazy difference between compact_ioctl vs ioctl.

The point I'm trying to get across is that the clean 32-bit processing is
a lot more involved than just CORE_READ_USER macro and I want to make sure that
the expections are set correctly.
BPF CO-RE prog may handle 32-bit and 64-bit at the same time, but
it's probably exception than the rule.

> Let's see if Gilad can provide his perspective. I have no strong
> feelings about this and can send a patch removing CORE_READ_USER
> variants (they didn't make it into libbpf v0.3, so no harm or API
> stability concerns). BPF_PROBE_READ() and BPF_PROBE_READ_USER() are
> still useful for reading non-relocatable, but nested data structures,
> so I'd prefer to keep those.

Let's hear from Gilad.
I'm not against BPF_CORE_READ_USER macro as-is. I was objecting
to the reasons of implementing it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants
  2021-01-06  6:09           ` Alexei Starovoitov
@ 2021-01-06 10:10             ` Gilad Reti
  2021-01-06 23:25               ` Andrii Nakryiko
  0 siblings, 1 reply; 12+ messages in thread
From: Gilad Reti @ 2021-01-06 10:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrii Nakryiko, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Wed, Jan 6, 2021 at 8:09 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jan 05, 2021 at 01:03:47PM -0800, Andrii Nakryiko wrote:
> > > >
> > > >   - handling 32-bit vs 64-bit UAPI structs uniformly;
> > >
> > > what do you mean?
> > > 32-bit user space running on 64-bit kernel works through 'compat' syscalls.
> > > If bpf progs are accessing 64-bit uapi structs in such case they're broken
> > > and no amount of co-re can help.
> >
> > I know nothing about compat, so can't comment on that. But the way I
> > understood the situation was the BPF program compiled once (well, at
> > least from the unmodified source code), but runs on ARM32 and (on a
> > separate physical host) on ARM64. And it's task is, say, to read UAPI
> > kernel structures from syscall arguments.
>
> I'm not sure about arm, but on x86 there should be two different progs
> for this to work (if we're talking about 32-bit userspace).
> See below.
>
> > One simple example I found in UAPI definitions is struct timespec, it
> > seems it's defined with `long`:
> >
> > struct timespec {
> >         __kernel_old_time_t     tv_sec;         /* seconds */
> >         long                    tv_nsec;        /* nanoseconds */
> > }
> >
> > So if you were to trace clock_gettime(), you'd need to deal with
> > differently-sized reads of tv_nsec, depending on whether you are
> > running on the 32-bit or 64-bit host.
>
> I believe gettime is vdso, so that's not a syscall and there are no bpf hooks
> available to track that.
> For normal syscall with timespec argument like sys_recvmmsg and sys_nanosleep
> the user needs to load two programs and attach to two different points:
> fentry/__x64_sys_nanosleep and fentry/__ia32_sys_nanosleep.
> (I'm not an expert on compat and not quite sure how _time32 suffix is handled,
> so may be 4 different progs are needed).
> The point is that there are several different entry points with different args.
> ia32 user space will call into the kernel differently.
> At the end different pieces of the syscall and compat handling will call
> into hrtimer_nanosleep with ktime.
> So if one bpf prog needs to work for 32 and 64 bit user space it should be
> attached to common piece of code like hrtimer_nanosleep().
> If it's attached to syscall entry it needs to attach to at least 2 different
> entry points with 2 different progs.

I think that while it is true that syscall handlers are different for
32 and 64 bit,
it may happen that some syscall handlers will pass user-space pointers down to
shared common functions, which may be hooked by ebpf probes. I am no expert, but
I think that the cp_new_stat function is such an example; it gets a
user struct stat* pointer,
whose definition varies across different architectures (32/64-bit,
different processor archs etc.)

> I guess it's possible to load single prog and kprobe-attach it to
> two different kernel functions, but code is kinda different.
> For the simplest things like sys_nanosleep it might work and timespec
> is simple enough structure to handle with sizeof(long) co-re tricks,
> but it's not a generally applicable approach.
> So for a tool to track 32-bit and 64-bit user space is quite tricky.
>
> If bpf prog doesn't care about user space and only needs to deal
> with 64-bit kernel + 64-bit user space and 32-bit kernel + 32-bit user space
> then it's a bit simpler, but probably still requires attaching
> to two different points. On 32-bit x86 kernel there will be no
> "fentry/__x64_sys_nanosleep" function.
>
> > There are probably other examples where UAPI structs use long instead
> > of __u32 or __u64, but I didn't dig too deep.
>
> There is also crazy difference between compact_ioctl vs ioctl.
>
> The point I'm trying to get across is that the clean 32-bit processing is
> a lot more involved than just CORE_READ_USER macro and I want to make sure that
> the expections are set correctly.
> BPF CO-RE prog may handle 32-bit and 64-bit at the same time, but
> it's probably exception than the rule.
>
> > Let's see if Gilad can provide his perspective. I have no strong
> > feelings about this and can send a patch removing CORE_READ_USER
> > variants (they didn't make it into libbpf v0.3, so no harm or API
> > stability concerns). BPF_PROBE_READ() and BPF_PROBE_READ_USER() are
> > still useful for reading non-relocatable, but nested data structures,
> > so I'd prefer to keep those.
>
> Let's hear from Gilad.
> I'm not against BPF_CORE_READ_USER macro as-is. I was objecting
> to the reasons of implementing it.

If I am not mistaken (which is completely possible), I think that
providing such a macro will
not cause any more confusion than the bpf_probe_read_{,user}
distinction already does,
since BPF_CORE_READ_USER to BPF_CORE_READ is the same as bpf_probe_read_user
to bpf_probe_read.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants
  2021-01-06 10:10             ` Gilad Reti
@ 2021-01-06 23:25               ` Andrii Nakryiko
  2021-01-07 20:00                 ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2021-01-06 23:25 UTC (permalink / raw)
  To: Gilad Reti
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Wed, Jan 6, 2021 at 2:10 AM Gilad Reti <gilad.reti@gmail.com> wrote:
>
> On Wed, Jan 6, 2021 at 8:09 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Jan 05, 2021 at 01:03:47PM -0800, Andrii Nakryiko wrote:
> > > > >
> > > > >   - handling 32-bit vs 64-bit UAPI structs uniformly;
> > > >
> > > > what do you mean?
> > > > 32-bit user space running on 64-bit kernel works through 'compat' syscalls.
> > > > If bpf progs are accessing 64-bit uapi structs in such case they're broken
> > > > and no amount of co-re can help.
> > >
> > > I know nothing about compat, so can't comment on that. But the way I
> > > understood the situation was the BPF program compiled once (well, at
> > > least from the unmodified source code), but runs on ARM32 and (on a
> > > separate physical host) on ARM64. And it's task is, say, to read UAPI
> > > kernel structures from syscall arguments.
> >
> > I'm not sure about arm, but on x86 there should be two different progs
> > for this to work (if we're talking about 32-bit userspace).
> > See below.
> >
> > > One simple example I found in UAPI definitions is struct timespec, it
> > > seems it's defined with `long`:
> > >
> > > struct timespec {
> > >         __kernel_old_time_t     tv_sec;         /* seconds */
> > >         long                    tv_nsec;        /* nanoseconds */
> > > }
> > >
> > > So if you were to trace clock_gettime(), you'd need to deal with
> > > differently-sized reads of tv_nsec, depending on whether you are
> > > running on the 32-bit or 64-bit host.
> >
> > I believe gettime is vdso, so that's not a syscall and there are no bpf hooks
> > available to track that.
> > For normal syscall with timespec argument like sys_recvmmsg and sys_nanosleep
> > the user needs to load two programs and attach to two different points:
> > fentry/__x64_sys_nanosleep and fentry/__ia32_sys_nanosleep.
> > (I'm not an expert on compat and not quite sure how _time32 suffix is handled,
> > so may be 4 different progs are needed).
> > The point is that there are several different entry points with different args.
> > ia32 user space will call into the kernel differently.
> > At the end different pieces of the syscall and compat handling will call
> > into hrtimer_nanosleep with ktime.
> > So if one bpf prog needs to work for 32 and 64 bit user space it should be
> > attached to common piece of code like hrtimer_nanosleep().
> > If it's attached to syscall entry it needs to attach to at least 2 different
> > entry points with 2 different progs.
>
> I think that while it is true that syscall handlers are different for
> 32 and 64 bit,
> it may happen that some syscall handlers will pass user-space pointers down to
> shared common functions, which may be hooked by ebpf probes. I am no expert, but
> I think that the cp_new_stat function is such an example; it gets a
> user struct stat* pointer,
> whose definition varies across different architectures (32/64-bit,
> different processor archs etc.)
>
> > I guess it's possible to load single prog and kprobe-attach it to
> > two different kernel functions, but code is kinda different.
> > For the simplest things like sys_nanosleep it might work and timespec
> > is simple enough structure to handle with sizeof(long) co-re tricks,
> > but it's not a generally applicable approach.
> > So for a tool to track 32-bit and 64-bit user space is quite tricky.
> >
> > If bpf prog doesn't care about user space and only needs to deal
> > with 64-bit kernel + 64-bit user space and 32-bit kernel + 32-bit user space
> > then it's a bit simpler, but probably still requires attaching
> > to two different points. On 32-bit x86 kernel there will be no
> > "fentry/__x64_sys_nanosleep" function.
> >
> > > There are probably other examples where UAPI structs use long instead
> > > of __u32 or __u64, but I didn't dig too deep.
> >
> > There is also crazy difference between compact_ioctl vs ioctl.
> >
> > The point I'm trying to get across is that the clean 32-bit processing is
> > a lot more involved than just CORE_READ_USER macro and I want to make sure that
> > the expections are set correctly.
> > BPF CO-RE prog may handle 32-bit and 64-bit at the same time, but
> > it's probably exception than the rule.
> >
> > > Let's see if Gilad can provide his perspective. I have no strong
> > > feelings about this and can send a patch removing CORE_READ_USER
> > > variants (they didn't make it into libbpf v0.3, so no harm or API
> > > stability concerns). BPF_PROBE_READ() and BPF_PROBE_READ_USER() are
> > > still useful for reading non-relocatable, but nested data structures,
> > > so I'd prefer to keep those.
> >
> > Let's hear from Gilad.
> > I'm not against BPF_CORE_READ_USER macro as-is. I was objecting
> > to the reasons of implementing it.
>
> If I am not mistaken (which is completely possible), I think that
> providing such a macro will
> not cause any more confusion than the bpf_probe_read_{,user}
> distinction already does,
> since BPF_CORE_READ_USER to BPF_CORE_READ is the same as bpf_probe_read_user
> to bpf_probe_read.

I think the biggest source of confusion is that USER part in
BPF_CORE_READ_USER refers to reading data from user address space, not
really user structs (which is kind of natural instinct here). CO-RE
*always* works only with kernel types, which is obvious if you have a
lot of experience with using CO-RE, but not initially, unfortunately.

So in the end, while a narrow use case, it seems like it might be
useful to have BPF_CORE_READ_USER. Maybe emphasizing (in the doc
comment) the fact that all types need to be kernel types would help a
bit? If not, it's not a big deal, because it's pretty easy for users
to just explicitly write bpf_probe_read_user(&dst, sizeof(dst),
__builtin_preserve_access_index(src)). But there is definitely a bit
of mental satisfaction with having all possible combinations of
CORE/non-CORE and USER/KERNEL variants :)

Btw, it seems these patches are already in bpf-next, so, Alexei,
please let me know if you insist on removing BPF_CORE_READ_USER()
variant and I'll send a follow up patch.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants
  2021-01-06 23:25               ` Andrii Nakryiko
@ 2021-01-07 20:00                 ` Alexei Starovoitov
  0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2021-01-07 20:00 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Gilad Reti, Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Wed, Jan 06, 2021 at 03:25:30PM -0800, Andrii Nakryiko wrote:
> >
> > If I am not mistaken (which is completely possible), I think that
> > providing such a macro will
> > not cause any more confusion than the bpf_probe_read_{,user}
> > distinction already does,
> > since BPF_CORE_READ_USER to BPF_CORE_READ is the same as bpf_probe_read_user
> > to bpf_probe_read.
> 
> I think the biggest source of confusion is that USER part in
> BPF_CORE_READ_USER refers to reading data from user address space, not
> really user structs (which is kind of natural instinct here). CO-RE
> *always* works only with kernel types, which is obvious if you have a
> lot of experience with using CO-RE, but not initially, unfortunately.

Please send a patch to add such clarifying comment.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-01-07 20:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-18 23:56 [PATCH bpf-next 0/3] Add user-space and non-CO-RE variants of BPF_CORE_READ() Andrii Nakryiko
2020-12-18 23:56 ` [PATCH bpf-next 1/3] libbpf: add user-space variants of BPF_CORE_READ() family of macros Andrii Nakryiko
2020-12-18 23:56 ` [PATCH bpf-next 2/3] libbpf: add non-CO-RE variants of BPF_CORE_READ() macro family Andrii Nakryiko
2020-12-18 23:56 ` [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants Andrii Nakryiko
2021-01-05  3:46   ` Alexei Starovoitov
2021-01-05  5:08     ` Andrii Nakryiko
2021-01-05 19:03       ` Alexei Starovoitov
2021-01-05 21:03         ` Andrii Nakryiko
2021-01-06  6:09           ` Alexei Starovoitov
2021-01-06 10:10             ` Gilad Reti
2021-01-06 23:25               ` Andrii Nakryiko
2021-01-07 20:00                 ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).