[PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support
@ 2022-07-07  0:41 Andrii Nakryiko
  2022-07-07  0:41 ` [PATCH RFC bpf-next 1/3] libbpf: improve and rename BPF_KPROBE_SYSCALL Andrii Nakryiko
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-07  0:41 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Ilya Leoshkevich, Kenta Tada, Hengqi Chen

This RFC patch set is to gather feedback about new
SEC("ksyscall") and SEC("kretsyscall") section definitions meant to simplify
life of BPF users that want to trace Linux syscalls without having to know or
care about things like CONFIG_ARCH_HAS_SYSCALL_WRAPPER and related arch-specific
vs arch-agnostic __<arch>_sys_xxx vs __se_sys_xxx function names, calling
convention woes ("nested" pt_regs), etc. All this is quite annoying to
remember and care about as BPF user, especially if the goal is to write
achitecture- and kernel version-agnostic BPF code (e.g., things like
libbpf-tools, etc).

By using SEC("ksyscall/xxx")/SEC("kretsyscall/xxx") user clearly communicates
the desire to kprobe/kretprobe kernel function that corresponds to the
specified syscall. Libbpf will take care of all the details of determining
correct function name and calling conventions.

This patch set also improves BPF_KPROBE_SYSCALL (and renames it to
BPF_KSYSCALL to match SEC("ksyscall")) macro to take into account
CONFIG_ARCH_HAS_SYSCALL_WRAPPER instead of hard-coding whether host
architecture is expected to use syscall wrapper or not (which is less reliable
and can change over time).

It would be great to get feedback about the overall feature, but also I'd
appreciate help with testing this, especially for non-x86_64 architectures.

Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Kenta Tada <kenta.tada@sony.com>
Cc: Hengqi Chen <hengqi.chen@gmail.com>

Andrii Nakryiko (3):
  libbpf: improve and rename BPF_KPROBE_SYSCALL
  libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests

 tools/lib/bpf/bpf_tracing.h                   |  44 +++++--
 tools/lib/bpf/libbpf.c                        | 109 ++++++++++++++++++
 tools/lib/bpf/libbpf.h                        |  16 +++
 tools/lib/bpf/libbpf.map                      |   1 +
 tools/lib/bpf/libbpf_internal.h               |   2 +
 .../selftests/bpf/progs/bpf_syscall_macro.c   |   6 +-
 .../selftests/bpf/progs/test_attach_probe.c   |   6 +-
 .../selftests/bpf/progs/test_probe_user.c     |  27 +----
 8 files changed, 172 insertions(+), 39 deletions(-)

-- 
2.30.2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH RFC bpf-next 1/3] libbpf: improve and rename BPF_KPROBE_SYSCALL
  2022-07-07  0:41 [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Andrii Nakryiko
@ 2022-07-07  0:41 ` Andrii Nakryiko
  2022-07-07  0:41 ` [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes Andrii Nakryiko
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-07  0:41 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Ilya Leoshkevich, Kenta Tada, Hengqi Chen

Improve BPF_KPROBE_SYSCALL (and rename it to shorter BPF_KSYSCALL to
match libbpf's SEC("ksyscall") section name, added in next patch) to use
__kconfig variable to determine how to properly fetch syscall arguments.

Instead of relying on hard-coded knowledge of whether kernel's
architecture uses syscall wrapper or not (which only reflects the latest
kernel versions, but is not necessarily true for older kernels and won't
necessarily hold for later kernel versions on some particular host
architecture), determine this at runtime through
CONFIG_ARCH_HAS_SYSCALL_WRAPPER __kconfig variable and proceed
accordingly.

If host kernel defines CONFIG_ARCH_HAS_SYSCALL_WRAPPER, syscall kernel
function's first argument is a pointer to struct pt_regs that then
contains syscall arguments. In such case we need to use
bpf_probe_read_kernel() to fetch actual arguments (which we do through
BPF_CORE_READ() macro) from inner pt_regs.

But if the kernel doesn't use syscall wrapper approach, input
arguments can be read from struct pt_regs directly with no probe reading.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf_tracing.h | 44 +++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
index 01ce121c302d..948bedbfd24e 100644
--- a/tools/lib/bpf/bpf_tracing.h
+++ b/tools/lib/bpf/bpf_tracing.h
@@ -2,6 +2,8 @@
 #ifndef __BPF_TRACING_H__
 #define __BPF_TRACING_H__
 
+#include <bpf/bpf_helpers.h>
+
 /* Scan the ARCH passed in from ARCH env variable (see Makefile) */
 #if defined(__TARGET_ARCH_x86)
 	#define bpf_target_x86
@@ -140,7 +142,7 @@ struct pt_regs___s390 {
 #define __PT_RC_REG gprs[2]
 #define __PT_SP_REG gprs[15]
 #define __PT_IP_REG psw.addr
-#define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error \"use PT_REGS_PARM1_CORE_SYSCALL() instead\""); 0l; })
+#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
 #define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___s390 *)(x), orig_gpr2)
 
 #elif defined(bpf_target_arm)
@@ -174,7 +176,7 @@ struct pt_regs___arm64 {
 #define __PT_RC_REG regs[0]
 #define __PT_SP_REG sp
 #define __PT_IP_REG pc
-#define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error \"use PT_REGS_PARM1_CORE_SYSCALL() instead\""); 0l; })
+#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
 #define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___arm64 *)(x), orig_x0)
 
 #elif defined(bpf_target_mips)
@@ -493,16 +495,26 @@ typeof(name(0)) name(struct pt_regs *ctx)				    \
 }									    \
 static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args)
 
+/* if CONFIG_ARCH_HAS_SYSCALL_WRAPPER IS NOT defined, read pt_regs directly */
 #define ___bpf_syscall_args0()           ctx
-#define ___bpf_syscall_args1(x)          ___bpf_syscall_args0(), (void *)PT_REGS_PARM1_CORE_SYSCALL(regs)
-#define ___bpf_syscall_args2(x, args...) ___bpf_syscall_args1(args), (void *)PT_REGS_PARM2_CORE_SYSCALL(regs)
-#define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs)
-#define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs)
-#define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs)
+#define ___bpf_syscall_args1(x)          ___bpf_syscall_args0(), (void *)PT_REGS_PARM1_SYSCALL(regs)
+#define ___bpf_syscall_args2(x, args...) ___bpf_syscall_args1(args), (void *)PT_REGS_PARM2_SYSCALL(regs)
+#define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_SYSCALL(regs)
+#define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_SYSCALL(regs)
+#define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_SYSCALL(regs)
 #define ___bpf_syscall_args(args...)     ___bpf_apply(___bpf_syscall_args, ___bpf_narg(args))(args)
 
+/* if CONFIG_ARCH_HAS_SYSCALL_WRAPPER IS defined, we have to BPF_CORE_READ from pt_regs */
+#define ___bpf_syswrap_args0()           ctx
+#define ___bpf_syswrap_args1(x)          ___bpf_syswrap_args0(), (void *)PT_REGS_PARM1_CORE_SYSCALL(regs)
+#define ___bpf_syswrap_args2(x, args...) ___bpf_syswrap_args1(args), (void *)PT_REGS_PARM2_CORE_SYSCALL(regs)
+#define ___bpf_syswrap_args3(x, args...) ___bpf_syswrap_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs)
+#define ___bpf_syswrap_args4(x, args...) ___bpf_syswrap_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs)
+#define ___bpf_syswrap_args5(x, args...) ___bpf_syswrap_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs)
+#define ___bpf_syswrap_args(args...)     ___bpf_apply(___bpf_syswrap_args, ___bpf_narg(args))(args)
+
 /*
- * BPF_KPROBE_SYSCALL is a variant of BPF_KPROBE, which is intended for
+ * BPF_KSYSCALL is a variant of BPF_KPROBE, which is intended for
  * tracing syscall functions, like __x64_sys_close. It hides the underlying
  * platform-specific low-level way of getting syscall input arguments from
  * struct pt_regs, and provides a familiar typed and named function arguments
@@ -511,21 +523,29 @@ static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args)
  * Original struct pt_regs* context is preserved as 'ctx' argument. This might
  * be necessary when using BPF helpers like bpf_perf_event_output().
  *
- * This macro relies on BPF CO-RE support.
+ * This macro relies on BPF CO-RE support and __kconfig externs.
  */
-#define BPF_KPROBE_SYSCALL(name, args...)				    \
+#define BPF_KSYSCALL(name, args...)					    \
 name(struct pt_regs *ctx);						    \
+extern _Bool CONFIG_ARCH_HAS_SYSCALL_WRAPPER __kconfig __weak;		    \
 static __attribute__((always_inline)) typeof(name(0))			    \
 ____##name(struct pt_regs *ctx, ##args);				    \
 typeof(name(0)) name(struct pt_regs *ctx)				    \
 {									    \
-	struct pt_regs *regs = PT_REGS_SYSCALL_REGS(ctx);		    \
+	struct pt_regs *regs = CONFIG_ARCH_HAS_SYSCALL_WRAPPER		    \
+			       ? (struct pt_regs *)PT_REGS_PARM1(ctx)	    \
+			       : ctx;					    \
 	_Pragma("GCC diagnostic push")					    \
 	_Pragma("GCC diagnostic ignored \"-Wint-conversion\"")		    \
-	return ____##name(___bpf_syscall_args(args));			    \
+	if (CONFIG_ARCH_HAS_SYSCALL_WRAPPER)				    \
+		return ____##name(___bpf_syswrap_args(args));		    \
+	else								    \
+		return ____##name(___bpf_syscall_args(args));		    \
 	_Pragma("GCC diagnostic pop")					    \
 }									    \
 static __attribute__((always_inline)) typeof(name(0))			    \
 ____##name(struct pt_regs *ctx, ##args)
 
+#define BPF_KPROBE_SYSCALL BPF_KSYSCALL
+
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-07  0:41 [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Andrii Nakryiko
  2022-07-07  0:41 ` [PATCH RFC bpf-next 1/3] libbpf: improve and rename BPF_KPROBE_SYSCALL Andrii Nakryiko
@ 2022-07-07  0:41 ` Andrii Nakryiko
  2022-07-07 17:23   ` Alexei Starovoitov
  2022-07-08  9:35   ` Jiri Olsa
  2022-07-07  0:41 ` [PATCH RFC bpf-next 3/3] selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests Andrii Nakryiko
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-07  0:41 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Ilya Leoshkevich, Kenta Tada, Hengqi Chen

Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
kretsyscall variants (for return kprobes) to allow users to kprobe
syscall functions in kernel. These special sections allow to ignore
complexities and differences between kernel versions and host
architectures when it comes to syscall wrapper and corresponding
__<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
CONFIG_ARCH_HAS_SYSCALL_WRAPPER.

Combined with the use of BPF_KSYSCALL() macro, this allows to just
specify intended syscall name and expected input arguments and leave
dealing with all the variations to libbpf.

In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
bpf_program__attach_ksyscall() API which allows to specify syscall name
at runtime and provide associated BPF cookie value.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c          | 109 ++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.h          |  16 +++++
 tools/lib/bpf/libbpf.map        |   1 +
 tools/lib/bpf/libbpf_internal.h |   2 +
 4 files changed, 128 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index cb49408eb298..4749fb84e33d 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4654,6 +4654,65 @@ static int probe_kern_btf_enum64(void)
 					     strs, sizeof(strs)));
 }
 
+static const char *arch_specific_syscall_pfx(void)
+{
+#if defined(__x86_64__)
+	return "x64";
+#elif defined(__i386__)
+	return "ia32";
+#elif defined(__s390x__)
+	return "s390x";
+#elif defined(__s390__)
+	return "s390";
+#elif defined(__arm__)
+	return "arm";
+#elif defined(__aarch64__)
+	return "arm64";
+#elif defined(__mips__)
+	return "mips";
+#elif defined(__riscv)
+	return "riscv";
+#else
+	return NULL;
+#endif
+}
+
+static int probe_kern_syscall_wrapper(void)
+{
+	/* available_filter_functions is a few times smaller than
+	 * /proc/kallsyms and has simpler format, so we use it as a faster way
+	 * to check that __<arch>_sys_bpf symbol exists, which is a sign that
+	 * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses
+	 * syscall wrappers
+	 */
+	static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions";
+	char func_name[128], syscall_name[128];
+	const char *ksys_pfx;
+	FILE *f;
+	int cnt;
+
+	ksys_pfx = arch_specific_syscall_pfx();
+	if (!ksys_pfx)
+		return 0;
+
+	f = fopen(kprobes_file, "r");
+	if (!f)
+		return 0;
+
+	snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx);
+
+	/* check if bpf() syscall wrapper is listed as possible kprobe */
+	while ((cnt = fscanf(f, "%127s%*[^\n]\n", func_name)) == 1) {
+		if (strcmp(func_name, syscall_name) == 0) {
+			fclose(f);
+			return 1;
+		}
+	}
+
+	fclose(f);
+	return 0;
+}
+
 enum kern_feature_result {
 	FEAT_UNKNOWN = 0,
 	FEAT_SUPPORTED = 1,
@@ -4722,6 +4781,9 @@ static struct kern_feature_desc {
 	[FEAT_BTF_ENUM64] = {
 		"BTF_KIND_ENUM64 support", probe_kern_btf_enum64,
 	},
+	[FEAT_SYSCALL_WRAPPER] = {
+		"Kernel using syscall wrapper", probe_kern_syscall_wrapper,
+	},
 };
 
 bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
@@ -8381,6 +8443,7 @@ int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log
 
 static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link);
+static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
@@ -8401,6 +8464,8 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("uretprobe.s+",		KPROBE, 0, SEC_SLEEPABLE, attach_uprobe),
 	SEC_DEF("kprobe.multi+",	KPROBE,	BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
 	SEC_DEF("kretprobe.multi+",	KPROBE,	BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
+	SEC_DEF("ksyscall+",		KPROBE,	0, SEC_NONE, attach_ksyscall),
+	SEC_DEF("kretsyscall+",		KPROBE, 0, SEC_NONE, attach_ksyscall),
 	SEC_DEF("usdt+",		KPROBE,	0, SEC_NONE, attach_usdt),
 	SEC_DEF("tc",			SCHED_CLS, 0, SEC_NONE),
 	SEC_DEF("classifier",		SCHED_CLS, 0, SEC_NONE),
@@ -9990,6 +10055,29 @@ struct bpf_link *bpf_program__attach_kprobe(const struct bpf_program *prog,
 	return bpf_program__attach_kprobe_opts(prog, func_name, &opts);
 }
 
+struct bpf_link *bpf_program__attach_ksyscall(const struct bpf_program *prog,
+					      const char *syscall_name,
+					      const struct bpf_ksyscall_opts *opts)
+{
+	LIBBPF_OPTS(bpf_kprobe_opts, kprobe_opts);
+	char func_name[128];
+
+	if (!OPTS_VALID(opts, bpf_ksyscall_opts))
+		return libbpf_err_ptr(-EINVAL);
+
+	if (kernel_supports(prog->obj, FEAT_SYSCALL_WRAPPER)) {
+		snprintf(func_name, sizeof(func_name), "__%s_sys_%s",
+			 arch_specific_syscall_pfx(), syscall_name);
+	} else {
+		snprintf(func_name, sizeof(func_name), "__se_sys_%s", syscall_name);
+	}
+
+	kprobe_opts.retprobe = OPTS_GET(opts, retprobe, false);
+	kprobe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0);
+
+	return bpf_program__attach_kprobe_opts(prog, func_name, &kprobe_opts);
+}
+
 /* Adapted from perf/util/string.c */
 static bool glob_match(const char *str, const char *pat)
 {
@@ -10160,6 +10248,27 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf
 	return libbpf_get_error(*link);
 }
 
+static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link)
+{
+	LIBBPF_OPTS(bpf_ksyscall_opts, opts);
+	const char *syscall_name;
+
+	*link = NULL;
+
+	/* no auto-attach for SEC("ksyscall") and SEC("kretsyscall") */
+	if (strcmp(prog->sec_name, "ksyscall") == 0 || strcmp(prog->sec_name, "kretsyscall") == 0)
+		return 0;
+
+	opts.retprobe = str_has_pfx(prog->sec_name, "kretsyscall/");
+	if (opts.retprobe)
+		syscall_name = prog->sec_name + sizeof("kretsyscall/") - 1;
+	else
+		syscall_name = prog->sec_name + sizeof("ksyscall/") - 1;
+
+	*link = bpf_program__attach_ksyscall(prog, syscall_name, &opts);
+	return *link ? 0 : -errno;
+}
+
 static int attach_kprobe_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link)
 {
 	LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e4d5353f757b..90633ada4b40 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -457,6 +457,22 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog,
 				      const char *pattern,
 				      const struct bpf_kprobe_multi_opts *opts);
 
+struct bpf_ksyscall_opts {
+	/* size of this struct, for forward/backward compatiblity */
+	size_t sz;
+	/* custom user-provided value fetchable through bpf_get_attach_cookie() */
+	__u64 bpf_cookie;
+	/* attach as return probe? */
+	bool retprobe;
+	size_t :0;
+};
+#define bpf_ksyscall_opts__last_field retprobe
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_ksyscall(const struct bpf_program *prog,
+			     const char *syscall_name,
+			     const struct bpf_ksyscall_opts *opts);
+
 struct bpf_uprobe_opts {
 	/* size of this struct, for forward/backward compatiblity */
 	size_t sz;
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 94b589ecfeaa..63c059a02a0d 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -356,6 +356,7 @@ LIBBPF_0.8.0 {
 LIBBPF_1.0.0 {
 	global:
 		bpf_prog_query_opts;
+		bpf_program__attach_ksyscall;
 		btf__add_enum64;
 		btf__add_enum64_value;
 		libbpf_bpf_attach_type_str;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 9cd7829cbe41..f01dbab49da9 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -352,6 +352,8 @@ enum kern_feature_id {
 	FEAT_BPF_COOKIE,
 	/* BTF_KIND_ENUM64 support and BTF_KIND_ENUM kflag support */
 	FEAT_BTF_ENUM64,
+	/* Kernel uses syscall wrapper (CONFIG_ARCH_HAS_SYSCALL_WRAPPER) */
+	FEAT_SYSCALL_WRAPPER,
 	__FEAT_CNT,
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC bpf-next 3/3] selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests
  2022-07-07  0:41 [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Andrii Nakryiko
  2022-07-07  0:41 ` [PATCH RFC bpf-next 1/3] libbpf: improve and rename BPF_KPROBE_SYSCALL Andrii Nakryiko
  2022-07-07  0:41 ` [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes Andrii Nakryiko
@ 2022-07-07  0:41 ` Andrii Nakryiko
  2022-07-07  8:28 ` [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Yaniv Agman
  2022-07-07 15:51 ` Ilya Leoshkevich
  4 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-07  0:41 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Ilya Leoshkevich, Kenta Tada, Hengqi Chen

Convert few selftest that used plain SEC("kprobe") with arch-specific
syscall wrapper prefix to ksyscall/kretsyscall and corresponding
BPF_KSYSCALL macro. test_probe_user.c is especially benefiting from this
simplification.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/progs/bpf_syscall_macro.c   |  6 ++---
 .../selftests/bpf/progs/test_attach_probe.c   |  6 ++---
 .../selftests/bpf/progs/test_probe_user.c     | 27 +++++--------------
 3 files changed, 12 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c b/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
index 05838ed9b89c..e1e11897e99b 100644
--- a/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
+++ b/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
@@ -64,9 +64,9 @@ int BPF_KPROBE(handle_sys_prctl)
 	return 0;
 }
 
-SEC("kprobe/" SYS_PREFIX "sys_prctl")
-int BPF_KPROBE_SYSCALL(prctl_enter, int option, unsigned long arg2,
-		       unsigned long arg3, unsigned long arg4, unsigned long arg5)
+SEC("ksyscall/prctl")
+int BPF_KSYSCALL(prctl_enter, int option, unsigned long arg2,
+		 unsigned long arg3, unsigned long arg4, unsigned long arg5)
 {
 	pid_t pid = bpf_get_current_pid_tgid() >> 32;
 
diff --git a/tools/testing/selftests/bpf/progs/test_attach_probe.c b/tools/testing/selftests/bpf/progs/test_attach_probe.c
index f1c88ad368ef..3f776b2eca1b 100644
--- a/tools/testing/selftests/bpf/progs/test_attach_probe.c
+++ b/tools/testing/selftests/bpf/progs/test_attach_probe.c
@@ -31,8 +31,8 @@ int handle_kprobe(struct pt_regs *ctx)
 	return 0;
 }
 
-SEC("kprobe/" SYS_PREFIX "sys_nanosleep")
-int BPF_KPROBE(handle_kprobe_auto)
+SEC("ksyscall/nanosleep")
+int BPF_KSYSCALL(handle_kprobe_auto)
 {
 	kprobe2_res = 11;
 	return 0;
@@ -56,7 +56,7 @@ int handle_kretprobe(struct pt_regs *ctx)
 	return 0;
 }
 
-SEC("kretprobe/" SYS_PREFIX "sys_nanosleep")
+SEC("kretsyscall/nanosleep")
 int BPF_KRETPROBE(handle_kretprobe_auto)
 {
 	kretprobe2_res = 22;
diff --git a/tools/testing/selftests/bpf/progs/test_probe_user.c b/tools/testing/selftests/bpf/progs/test_probe_user.c
index 702578a5e496..8e1495008e4d 100644
--- a/tools/testing/selftests/bpf/progs/test_probe_user.c
+++ b/tools/testing/selftests/bpf/progs/test_probe_user.c
@@ -1,35 +1,20 @@
 // SPDX-License-Identifier: GPL-2.0
-
-#include <linux/ptrace.h>
-#include <linux/bpf.h>
-
-#include <netinet/in.h>
-
+#include "vmlinux.h"
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
 #include "bpf_misc.h"
 
 static struct sockaddr_in old;
 
-SEC("kprobe/" SYS_PREFIX "sys_connect")
-int BPF_KPROBE(handle_sys_connect)
+SEC("ksyscall/connect")
+int BPF_KSYSCALL(handle_sys_connect, int fd, struct sockaddr_in *uservaddr, int addrlen)
 {
-#if SYSCALL_WRAPPER == 1
-	struct pt_regs *real_regs;
-#endif
 	struct sockaddr_in new;
-	void *ptr;
-
-#if SYSCALL_WRAPPER == 0
-	ptr = (void *)PT_REGS_PARM2(ctx);
-#else
-	real_regs = (struct pt_regs *)PT_REGS_PARM1(ctx);
-	bpf_probe_read_kernel(&ptr, sizeof(ptr), &PT_REGS_PARM2(real_regs));
-#endif
 
-	bpf_probe_read_user(&old, sizeof(old), ptr);
+	bpf_probe_read_user(&old, sizeof(old), uservaddr);
 	__builtin_memset(&new, 0xab, sizeof(new));
-	bpf_probe_write_user(ptr, &new, sizeof(new));
+	bpf_probe_write_user(uservaddr, &new, sizeof(new));
 
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support
  2022-07-07  0:41 [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2022-07-07  0:41 ` [PATCH RFC bpf-next 3/3] selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests Andrii Nakryiko
@ 2022-07-07  8:28 ` Yaniv Agman
  2022-07-07 20:56   ` Andrii Nakryiko
  2022-07-07 15:51 ` Ilya Leoshkevich
  4 siblings, 1 reply; 25+ messages in thread
From: Yaniv Agman @ 2022-07-07  8:28 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, Daniel Borkmann, kernel-team, Ilya Leoshkevich,
	Kenta Tada, Hengqi Chen

‫בתאריך יום ה׳, 7 ביולי 2022 ב-3:48 מאת ‪Andrii Nakryiko‬‏
<‪andrii@kernel.org‬‏>:‬
>
> This RFC patch set is to gather feedback about new
> SEC("ksyscall") and SEC("kretsyscall") section definitions meant to simplify
> life of BPF users that want to trace Linux syscalls without having to know or
> care about things like CONFIG_ARCH_HAS_SYSCALL_WRAPPER and related arch-specific
> vs arch-agnostic __<arch>_sys_xxx vs __se_sys_xxx function names, calling
> convention woes ("nested" pt_regs), etc. All this is quite annoying to
> remember and care about as BPF user, especially if the goal is to write
> achitecture- and kernel version-agnostic BPF code (e.g., things like
> libbpf-tools, etc).
>
> By using SEC("ksyscall/xxx")/SEC("kretsyscall/xxx") user clearly communicates
> the desire to kprobe/kretprobe kernel function that corresponds to the
> specified syscall. Libbpf will take care of all the details of determining
> correct function name and calling conventions.
>
> This patch set also improves BPF_KPROBE_SYSCALL (and renames it to
> BPF_KSYSCALL to match SEC("ksyscall")) macro to take into account
> CONFIG_ARCH_HAS_SYSCALL_WRAPPER instead of hard-coding whether host
> architecture is expected to use syscall wrapper or not (which is less reliable
> and can change over time).
>

Hi Andrii,
I would very much liked if there was such a macro, which will make
things easier for syscall tracing,
but I think that you can't assume that libbpf will have access to
kconfig files all the time.
For example, if running from a container and not mounting /boot (on
environments where the config file is in /boot), libbpf will fail to
load CONFIG_ARCH_HAS_SYSCALL_WRAPPER value and assume it to be not
defined.
Then, on any environment with a "new" kernel where the program runs
from a container, it will return the wrong argument values.
For this very reason we fall-back in [1] to assume
CONFIG_ARCH_HAS_SYSCALL_WRAPPER is defined, as in most environments it
will be.

[1] https://github.com/aquasecurity/tracee/blob/0f28a2cc14b851308ebaa380d503dea9eaa67271/pkg/ebpf/initialization/kconfig.go#L37

> It would be great to get feedback about the overall feature, but also I'd
> appreciate help with testing this, especially for non-x86_64 architectures.
>
> Cc: Ilya Leoshkevich <iii@linux.ibm.com>
> Cc: Kenta Tada <kenta.tada@sony.com>
> Cc: Hengqi Chen <hengqi.chen@gmail.com>
>
> Andrii Nakryiko (3):
>   libbpf: improve and rename BPF_KPROBE_SYSCALL
>   libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
>   selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests
>
>  tools/lib/bpf/bpf_tracing.h                   |  44 +++++--
>  tools/lib/bpf/libbpf.c                        | 109 ++++++++++++++++++
>  tools/lib/bpf/libbpf.h                        |  16 +++
>  tools/lib/bpf/libbpf.map                      |   1 +
>  tools/lib/bpf/libbpf_internal.h               |   2 +
>  .../selftests/bpf/progs/bpf_syscall_macro.c   |   6 +-
>  .../selftests/bpf/progs/test_attach_probe.c   |   6 +-
>  .../selftests/bpf/progs/test_probe_user.c     |  27 +----
>  8 files changed, 172 insertions(+), 39 deletions(-)
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support
  2022-07-07  0:41 [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2022-07-07  8:28 ` [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Yaniv Agman
@ 2022-07-07 15:51 ` Ilya Leoshkevich
  2022-07-07 20:59   ` Andrii Nakryiko
  4 siblings, 1 reply; 25+ messages in thread
From: Ilya Leoshkevich @ 2022-07-07 15:51 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Kenta Tada, Hengqi Chen

On Wed, 2022-07-06 at 17:41 -0700, Andrii Nakryiko wrote:
> This RFC patch set is to gather feedback about new
> SEC("ksyscall") and SEC("kretsyscall") section definitions meant to
> simplify
> life of BPF users that want to trace Linux syscalls without having to
> know or
> care about things like CONFIG_ARCH_HAS_SYSCALL_WRAPPER and related
> arch-specific
> vs arch-agnostic __<arch>_sys_xxx vs __se_sys_xxx function names,
> calling
> convention woes ("nested" pt_regs), etc. All this is quite annoying
> to
> remember and care about as BPF user, especially if the goal is to
> write
> achitecture- and kernel version-agnostic BPF code (e.g., things like
> libbpf-tools, etc).
> 
> By using SEC("ksyscall/xxx")/SEC("kretsyscall/xxx") user clearly
> communicates
> the desire to kprobe/kretprobe kernel function that corresponds to
> the
> specified syscall. Libbpf will take care of all the details of
> determining
> correct function name and calling conventions.
> 
> This patch set also improves BPF_KPROBE_SYSCALL (and renames it to
> BPF_KSYSCALL to match SEC("ksyscall")) macro to take into account
> CONFIG_ARCH_HAS_SYSCALL_WRAPPER instead of hard-coding whether host
> architecture is expected to use syscall wrapper or not (which is less
> reliable
> and can change over time).
> 
> It would be great to get feedback about the overall feature, but also
> I'd
> appreciate help with testing this, especially for non-x86_64
> architectures.
> 
> Cc: Ilya Leoshkevich <iii@linux.ibm.com>
> Cc: Kenta Tada <kenta.tada@sony.com>
> Cc: Hengqi Chen <hengqi.chen@gmail.com>
> 
> Andrii Nakryiko (3):
>   libbpf: improve and rename BPF_KPROBE_SYSCALL
>   libbpf: add ksyscall/kretsyscall sections support for syscall
> kprobes
>   selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests
> 
>  tools/lib/bpf/bpf_tracing.h                   |  44 +++++--
>  tools/lib/bpf/libbpf.c                        | 109
> ++++++++++++++++++
>  tools/lib/bpf/libbpf.h                        |  16 +++
>  tools/lib/bpf/libbpf.map                      |   1 +
>  tools/lib/bpf/libbpf_internal.h               |   2 +
>  .../selftests/bpf/progs/bpf_syscall_macro.c   |   6 +-
>  .../selftests/bpf/progs/test_attach_probe.c   |   6 +-
>  .../selftests/bpf/progs/test_probe_user.c     |  27 +----
>  8 files changed, 172 insertions(+), 39 deletions(-)

Hi Andrii,

Looks interesting, I will give it a try on s390x a bit later.

In the meantime just one remark: if we want to create a truly seamless
solution, we might need to take care of quirks associated with the
following kernel #defines:

* __ARCH_WANT_SYS_OLD_MMAP (real arguments are in memory)
* CONFIG_CLONE_BACKWARDS (child_tidptr/tls swapped)
* CONFIG_CLONE_BACKWARDS2 (newsp/clone_flags swapped)
* CONFIG_CLONE_BACKWARDS3 (extra arg: stack_size)

or at least document that users need to be careful with mmap() and
clone() probes. Also, there might be more of that out there, but that's
what I'm constantly running into on s390x.

Best regards,
Ilya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-07  0:41 ` [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes Andrii Nakryiko
@ 2022-07-07 17:23   ` Alexei Starovoitov
  2022-07-07 19:10     ` Andrii Nakryiko
  2022-07-08  9:35   ` Jiri Olsa
  1 sibling, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2022-07-07 17:23 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	Ilya Leoshkevich, Kenta Tada, Hengqi Chen

On Wed, Jul 6, 2022 at 5:41 PM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
> kretsyscall variants (for return kprobes) to allow users to kprobe
> syscall functions in kernel. These special sections allow to ignore
> complexities and differences between kernel versions and host
> architectures when it comes to syscall wrapper and corresponding
> __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
> CONFIG_ARCH_HAS_SYSCALL_WRAPPER.
>
> Combined with the use of BPF_KSYSCALL() macro, this allows to just
> specify intended syscall name and expected input arguments and leave
> dealing with all the variations to libbpf.
>
> In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
> bpf_program__attach_ksyscall() API which allows to specify syscall name
> at runtime and provide associated BPF cookie value.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  tools/lib/bpf/libbpf.c          | 109 ++++++++++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.h          |  16 +++++
>  tools/lib/bpf/libbpf.map        |   1 +
>  tools/lib/bpf/libbpf_internal.h |   2 +
>  4 files changed, 128 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index cb49408eb298..4749fb84e33d 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -4654,6 +4654,65 @@ static int probe_kern_btf_enum64(void)
>                                              strs, sizeof(strs)));
>  }
>
> +static const char *arch_specific_syscall_pfx(void)
> +{
> +#if defined(__x86_64__)
> +       return "x64";
> +#elif defined(__i386__)
> +       return "ia32";
> +#elif defined(__s390x__)
> +       return "s390x";
> +#elif defined(__s390__)
> +       return "s390";
> +#elif defined(__arm__)
> +       return "arm";
> +#elif defined(__aarch64__)
> +       return "arm64";
> +#elif defined(__mips__)
> +       return "mips";
> +#elif defined(__riscv)
> +       return "riscv";
> +#else
> +       return NULL;
> +#endif
> +}
> +
> +static int probe_kern_syscall_wrapper(void)
> +{
> +       /* available_filter_functions is a few times smaller than
> +        * /proc/kallsyms and has simpler format, so we use it as a faster way
> +        * to check that __<arch>_sys_bpf symbol exists, which is a sign that
> +        * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses
> +        * syscall wrappers
> +        */
> +       static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions";
> +       char func_name[128], syscall_name[128];
> +       const char *ksys_pfx;
> +       FILE *f;
> +       int cnt;
> +
> +       ksys_pfx = arch_specific_syscall_pfx();
> +       if (!ksys_pfx)
> +               return 0;
> +
> +       f = fopen(kprobes_file, "r");
> +       if (!f)
> +               return 0;
> +
> +       snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx);
> +
> +       /* check if bpf() syscall wrapper is listed as possible kprobe */
> +       while ((cnt = fscanf(f, "%127s%*[^\n]\n", func_name)) == 1) {
> +               if (strcmp(func_name, syscall_name) == 0) {
> +                       fclose(f);
> +                       return 1;
> +               }
> +       }

Maybe we should do the other way around ?
cat /proc/kallsyms |grep sys_bpf

and figure out the prefix from there?
Then we won't need to do giant
#if defined(__x86_64__)
...

/proc/kallsyms has world read permissions:
proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
unlike available_filter_functions.

Also tracefs might be mounted in a different dir than
/sys/kernel/tracing/
like
/sys/kernel/debug/tracing/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-07 17:23   ` Alexei Starovoitov
@ 2022-07-07 19:10     ` Andrii Nakryiko
  2022-07-07 19:36       ` Alexei Starovoitov
  2022-07-08 11:28       ` Jiri Olsa
  0 siblings, 2 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-07 19:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Ilya Leoshkevich, Kenta Tada, Hengqi Chen

On Thu, Jul 7, 2022 at 10:23 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jul 6, 2022 at 5:41 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
> > kretsyscall variants (for return kprobes) to allow users to kprobe
> > syscall functions in kernel. These special sections allow to ignore
> > complexities and differences between kernel versions and host
> > architectures when it comes to syscall wrapper and corresponding
> > __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
> > CONFIG_ARCH_HAS_SYSCALL_WRAPPER.
> >
> > Combined with the use of BPF_KSYSCALL() macro, this allows to just
> > specify intended syscall name and expected input arguments and leave
> > dealing with all the variations to libbpf.
> >
> > In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
> > bpf_program__attach_ksyscall() API which allows to specify syscall name
> > at runtime and provide associated BPF cookie value.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  tools/lib/bpf/libbpf.c          | 109 ++++++++++++++++++++++++++++++++
> >  tools/lib/bpf/libbpf.h          |  16 +++++
> >  tools/lib/bpf/libbpf.map        |   1 +
> >  tools/lib/bpf/libbpf_internal.h |   2 +
> >  4 files changed, 128 insertions(+)
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index cb49408eb298..4749fb84e33d 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -4654,6 +4654,65 @@ static int probe_kern_btf_enum64(void)
> >                                              strs, sizeof(strs)));
> >  }
> >
> > +static const char *arch_specific_syscall_pfx(void)
> > +{
> > +#if defined(__x86_64__)
> > +       return "x64";
> > +#elif defined(__i386__)
> > +       return "ia32";
> > +#elif defined(__s390x__)
> > +       return "s390x";
> > +#elif defined(__s390__)
> > +       return "s390";
> > +#elif defined(__arm__)
> > +       return "arm";
> > +#elif defined(__aarch64__)
> > +       return "arm64";
> > +#elif defined(__mips__)
> > +       return "mips";
> > +#elif defined(__riscv)
> > +       return "riscv";
> > +#else
> > +       return NULL;
> > +#endif
> > +}
> > +
> > +static int probe_kern_syscall_wrapper(void)
> > +{
> > +       /* available_filter_functions is a few times smaller than
> > +        * /proc/kallsyms and has simpler format, so we use it as a faster way
> > +        * to check that __<arch>_sys_bpf symbol exists, which is a sign that
> > +        * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses
> > +        * syscall wrappers
> > +        */
> > +       static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions";
> > +       char func_name[128], syscall_name[128];
> > +       const char *ksys_pfx;
> > +       FILE *f;
> > +       int cnt;
> > +
> > +       ksys_pfx = arch_specific_syscall_pfx();
> > +       if (!ksys_pfx)
> > +               return 0;
> > +
> > +       f = fopen(kprobes_file, "r");
> > +       if (!f)
> > +               return 0;
> > +
> > +       snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx);
> > +
> > +       /* check if bpf() syscall wrapper is listed as possible kprobe */
> > +       while ((cnt = fscanf(f, "%127s%*[^\n]\n", func_name)) == 1) {
> > +               if (strcmp(func_name, syscall_name) == 0) {
> > +                       fclose(f);
> > +                       return 1;
> > +               }
> > +       }
>
> Maybe we should do the other way around ?
> cat /proc/kallsyms |grep sys_bpf
>
> and figure out the prefix from there?
> Then we won't need to do giant
> #if defined(__x86_64__)
> ...
>

Unfortunately this won't work well due to compat and 32-bit APIs (and
bpf() syscall is particularly bad with also bpf_sys_bpf):

$ sudo cat /proc/kallsyms| rg '_sys_bpf$'
ffffffff811cb100 t __sys_bpf
ffffffff811cd380 T bpf_sys_bpf
ffffffff811cd520 T __x64_sys_bpf
ffffffff811cd540 T __ia32_sys_bpf
ffffffff8256fce0 r __ksymtab_bpf_sys_bpf
ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf
ffffffff8259bab9 r __kstrtab_bpf_sys_bpf
ffffffff83abc400 t _eil_addr___ia32_sys_bpf
ffffffff83abc410 t _eil_addr___x64_sys_bpf

$ sudo cat /proc/kallsyms| rg '_sys_mmap$'
ffffffff81024480 T __x64_sys_mmap
ffffffff810244c0 T __ia32_sys_mmap
ffffffff83abae30 t _eil_addr___ia32_sys_mmap
ffffffff83abae40 t _eil_addr___x64_sys_mmap

We have similar arch-specific switches in few other places (USDT and
lib path detection, for example), so it's not a new precedent (for
better or worse).


> /proc/kallsyms has world read permissions:
> proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> unlike available_filter_functions.
>
> Also tracefs might be mounted in a different dir than
> /sys/kernel/tracing/
> like
> /sys/kernel/debug/tracing/

Yeah, good point, was trying to avoid parsing more expensive kallsyms,
but given it's done once, it might not be a big deal.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-07 19:10     ` Andrii Nakryiko
@ 2022-07-07 19:36       ` Alexei Starovoitov
  2022-07-07 20:50         ` Andrii Nakryiko
  2022-07-08 11:28       ` Jiri Olsa
  1 sibling, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2022-07-07 19:36 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Ilya Leoshkevich, Kenta Tada, Hengqi Chen

On Thu, Jul 7, 2022 at 12:10 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Jul 7, 2022 at 10:23 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Jul 6, 2022 at 5:41 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
> > > kretsyscall variants (for return kprobes) to allow users to kprobe
> > > syscall functions in kernel. These special sections allow to ignore
> > > complexities and differences between kernel versions and host
> > > architectures when it comes to syscall wrapper and corresponding
> > > __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
> > > CONFIG_ARCH_HAS_SYSCALL_WRAPPER.
> > >
> > > Combined with the use of BPF_KSYSCALL() macro, this allows to just
> > > specify intended syscall name and expected input arguments and leave
> > > dealing with all the variations to libbpf.
> > >
> > > In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
> > > bpf_program__attach_ksyscall() API which allows to specify syscall name
> > > at runtime and provide associated BPF cookie value.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  tools/lib/bpf/libbpf.c          | 109 ++++++++++++++++++++++++++++++++
> > >  tools/lib/bpf/libbpf.h          |  16 +++++
> > >  tools/lib/bpf/libbpf.map        |   1 +
> > >  tools/lib/bpf/libbpf_internal.h |   2 +
> > >  4 files changed, 128 insertions(+)
> > >
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index cb49408eb298..4749fb84e33d 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -4654,6 +4654,65 @@ static int probe_kern_btf_enum64(void)
> > >                                              strs, sizeof(strs)));
> > >  }
> > >
> > > +static const char *arch_specific_syscall_pfx(void)
> > > +{
> > > +#if defined(__x86_64__)
> > > +       return "x64";
> > > +#elif defined(__i386__)
> > > +       return "ia32";
> > > +#elif defined(__s390x__)
> > > +       return "s390x";
> > > +#elif defined(__s390__)
> > > +       return "s390";
> > > +#elif defined(__arm__)
> > > +       return "arm";
> > > +#elif defined(__aarch64__)
> > > +       return "arm64";
> > > +#elif defined(__mips__)
> > > +       return "mips";
> > > +#elif defined(__riscv)
> > > +       return "riscv";
> > > +#else
> > > +       return NULL;
> > > +#endif
> > > +}
> > > +
> > > +static int probe_kern_syscall_wrapper(void)
> > > +{
> > > +       /* available_filter_functions is a few times smaller than
> > > +        * /proc/kallsyms and has simpler format, so we use it as a faster way
> > > +        * to check that __<arch>_sys_bpf symbol exists, which is a sign that
> > > +        * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses
> > > +        * syscall wrappers
> > > +        */
> > > +       static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions";
> > > +       char func_name[128], syscall_name[128];
> > > +       const char *ksys_pfx;
> > > +       FILE *f;
> > > +       int cnt;
> > > +
> > > +       ksys_pfx = arch_specific_syscall_pfx();
> > > +       if (!ksys_pfx)
> > > +               return 0;
> > > +
> > > +       f = fopen(kprobes_file, "r");
> > > +       if (!f)
> > > +               return 0;
> > > +
> > > +       snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx);
> > > +
> > > +       /* check if bpf() syscall wrapper is listed as possible kprobe */
> > > +       while ((cnt = fscanf(f, "%127s%*[^\n]\n", func_name)) == 1) {
> > > +               if (strcmp(func_name, syscall_name) == 0) {
> > > +                       fclose(f);
> > > +                       return 1;
> > > +               }
> > > +       }
> >
> > Maybe we should do the other way around ?
> > cat /proc/kallsyms |grep sys_bpf
> >
> > and figure out the prefix from there?
> > Then we won't need to do giant
> > #if defined(__x86_64__)
> > ...
> >
>
> Unfortunately this won't work well due to compat and 32-bit APIs (and
> bpf() syscall is particularly bad with also bpf_sys_bpf):
>
> $ sudo cat /proc/kallsyms| rg '_sys_bpf$'
> ffffffff811cb100 t __sys_bpf
> ffffffff811cd380 T bpf_sys_bpf
> ffffffff811cd520 T __x64_sys_bpf
> ffffffff811cd540 T __ia32_sys_bpf
> ffffffff8256fce0 r __ksymtab_bpf_sys_bpf
> ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf
> ffffffff8259bab9 r __kstrtab_bpf_sys_bpf
> ffffffff83abc400 t _eil_addr___ia32_sys_bpf
> ffffffff83abc410 t _eil_addr___x64_sys_bpf

That actually means that the current and proposed approaches
are both somewhat wrong, since they don't attach to both.
Meaning all syscalls done by 32-bit userspace will not be seen
by bpf prog.

Probably libbpf should attach to both:
__x64_sys_bpf and __ia32_sys_bpf.

__ksym and __kstr are easy to filter out, since they are
standard prefixes.
No idea what eil_addr is.

> $ sudo cat /proc/kallsyms| rg '_sys_mmap$'
> ffffffff81024480 T __x64_sys_mmap
> ffffffff810244c0 T __ia32_sys_mmap
> ffffffff83abae30 t _eil_addr___ia32_sys_mmap
> ffffffff83abae40 t _eil_addr___x64_sys_mmap
>
> We have similar arch-specific switches in few other places (USDT and
> lib path detection, for example), so it's not a new precedent (for
> better or worse).
>
>
> > /proc/kallsyms has world read permissions:
> > proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> > unlike available_filter_functions.
> >
> > Also tracefs might be mounted in a different dir than
> > /sys/kernel/tracing/
> > like
> > /sys/kernel/debug/tracing/
>
> Yeah, good point, was trying to avoid parsing more expensive kallsyms,
> but given it's done once, it might not be a big deal.

Soon we'll have an iterator for them so doing in-kernel
search of sys_bpf would be fast.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-07 19:36       ` Alexei Starovoitov
@ 2022-07-07 20:50         ` Andrii Nakryiko
  0 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-07 20:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Ilya Leoshkevich, Kenta Tada, Hengqi Chen

On Thu, Jul 7, 2022 at 12:36 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jul 7, 2022 at 12:10 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Jul 7, 2022 at 10:23 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Jul 6, 2022 at 5:41 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
> > > > kretsyscall variants (for return kprobes) to allow users to kprobe
> > > > syscall functions in kernel. These special sections allow to ignore
> > > > complexities and differences between kernel versions and host
> > > > architectures when it comes to syscall wrapper and corresponding
> > > > __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
> > > > CONFIG_ARCH_HAS_SYSCALL_WRAPPER.
> > > >
> > > > Combined with the use of BPF_KSYSCALL() macro, this allows to just
> > > > specify intended syscall name and expected input arguments and leave
> > > > dealing with all the variations to libbpf.
> > > >
> > > > In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
> > > > bpf_program__attach_ksyscall() API which allows to specify syscall name
> > > > at runtime and provide associated BPF cookie value.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > ---
> > > >  tools/lib/bpf/libbpf.c          | 109 ++++++++++++++++++++++++++++++++
> > > >  tools/lib/bpf/libbpf.h          |  16 +++++
> > > >  tools/lib/bpf/libbpf.map        |   1 +
> > > >  tools/lib/bpf/libbpf_internal.h |   2 +
> > > >  4 files changed, 128 insertions(+)
> > > >
> > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > > index cb49408eb298..4749fb84e33d 100644
> > > > --- a/tools/lib/bpf/libbpf.c
> > > > +++ b/tools/lib/bpf/libbpf.c
> > > > @@ -4654,6 +4654,65 @@ static int probe_kern_btf_enum64(void)
> > > >                                              strs, sizeof(strs)));
> > > >  }
> > > >
> > > > +static const char *arch_specific_syscall_pfx(void)
> > > > +{
> > > > +#if defined(__x86_64__)
> > > > +       return "x64";
> > > > +#elif defined(__i386__)
> > > > +       return "ia32";
> > > > +#elif defined(__s390x__)
> > > > +       return "s390x";
> > > > +#elif defined(__s390__)
> > > > +       return "s390";
> > > > +#elif defined(__arm__)
> > > > +       return "arm";
> > > > +#elif defined(__aarch64__)
> > > > +       return "arm64";
> > > > +#elif defined(__mips__)
> > > > +       return "mips";
> > > > +#elif defined(__riscv)
> > > > +       return "riscv";
> > > > +#else
> > > > +       return NULL;
> > > > +#endif
> > > > +}
> > > > +
> > > > +static int probe_kern_syscall_wrapper(void)
> > > > +{
> > > > +       /* available_filter_functions is a few times smaller than
> > > > +        * /proc/kallsyms and has simpler format, so we use it as a faster way
> > > > +        * to check that __<arch>_sys_bpf symbol exists, which is a sign that
> > > > +        * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses
> > > > +        * syscall wrappers
> > > > +        */
> > > > +       static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions";
> > > > +       char func_name[128], syscall_name[128];
> > > > +       const char *ksys_pfx;
> > > > +       FILE *f;
> > > > +       int cnt;
> > > > +
> > > > +       ksys_pfx = arch_specific_syscall_pfx();
> > > > +       if (!ksys_pfx)
> > > > +               return 0;
> > > > +
> > > > +       f = fopen(kprobes_file, "r");
> > > > +       if (!f)
> > > > +               return 0;
> > > > +
> > > > +       snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx);
> > > > +
> > > > +       /* check if bpf() syscall wrapper is listed as possible kprobe */
> > > > +       while ((cnt = fscanf(f, "%127s%*[^\n]\n", func_name)) == 1) {
> > > > +               if (strcmp(func_name, syscall_name) == 0) {
> > > > +                       fclose(f);
> > > > +                       return 1;
> > > > +               }
> > > > +       }
> > >
> > > Maybe we should do the other way around ?
> > > cat /proc/kallsyms |grep sys_bpf
> > >
> > > and figure out the prefix from there?
> > > Then we won't need to do giant
> > > #if defined(__x86_64__)
> > > ...
> > >
> >
> > Unfortunately this won't work well due to compat and 32-bit APIs (and
> > bpf() syscall is particularly bad with also bpf_sys_bpf):
> >
> > $ sudo cat /proc/kallsyms| rg '_sys_bpf$'
> > ffffffff811cb100 t __sys_bpf
> > ffffffff811cd380 T bpf_sys_bpf
> > ffffffff811cd520 T __x64_sys_bpf
> > ffffffff811cd540 T __ia32_sys_bpf
> > ffffffff8256fce0 r __ksymtab_bpf_sys_bpf
> > ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf
> > ffffffff8259bab9 r __kstrtab_bpf_sys_bpf
> > ffffffff83abc400 t _eil_addr___ia32_sys_bpf
> > ffffffff83abc410 t _eil_addr___x64_sys_bpf
>
> That actually means that the current and proposed approaches
> are both somewhat wrong, since they don't attach to both.
> Meaning all syscalls done by 32-bit userspace will not be seen
> by bpf prog.
>
> Probably libbpf should attach to both:
> __x64_sys_bpf and __ia32_sys_bpf.
>

I actually was intending to attach to "native" syscall, I haven't seen
any application/use case attaching to ia32 variants on 64-bit
architectures. Same for compat, if the kernel has it.

My thinking was that compat and 32-bit syscalls are not a typical use
case, and if user cares about them, they should be capable of figuring
out __x64_sys_xxx and __ia32_sys_xxx and __compact_whatever the name
as well.

But that's why I sent it as an RFC, to discuss stuff like this.


> __ksym and __kstr are easy to filter out, since they are
> standard prefixes.
> No idea what eil_addr is.
>
> > $ sudo cat /proc/kallsyms| rg '_sys_mmap$'
> > ffffffff81024480 T __x64_sys_mmap
> > ffffffff810244c0 T __ia32_sys_mmap
> > ffffffff83abae30 t _eil_addr___ia32_sys_mmap
> > ffffffff83abae40 t _eil_addr___x64_sys_mmap
> >
> > We have similar arch-specific switches in few other places (USDT and
> > lib path detection, for example), so it's not a new precedent (for
> > better or worse).
> >
> >
> > > /proc/kallsyms has world read permissions:
> > > proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> > > unlike available_filter_functions.
> > >
> > > Also tracefs might be mounted in a different dir than
> > > /sys/kernel/tracing/
> > > like
> > > /sys/kernel/debug/tracing/
> >
> > Yeah, good point, was trying to avoid parsing more expensive kallsyms,
> > but given it's done once, it might not be a big deal.
>
> Soon we'll have an iterator for them so doing in-kernel
> search of sys_bpf would be fast.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support
  2022-07-07  8:28 ` [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Yaniv Agman
@ 2022-07-07 20:56   ` Andrii Nakryiko
  2022-07-11 16:23     ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-07 20:56 UTC (permalink / raw)
  To: Yaniv Agman
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Ilya Leoshkevich, Kenta Tada, Hengqi Chen

On Thu, Jul 7, 2022 at 1:28 AM Yaniv Agman <yanivagman@gmail.com> wrote:
>
> ‫בתאריך יום ה׳, 7 ביולי 2022 ב-3:48 מאת ‪Andrii Nakryiko‬‏
> <‪andrii@kernel.org‬‏>:‬
> >
> > This RFC patch set is to gather feedback about new
> > SEC("ksyscall") and SEC("kretsyscall") section definitions meant to simplify
> > life of BPF users that want to trace Linux syscalls without having to know or
> > care about things like CONFIG_ARCH_HAS_SYSCALL_WRAPPER and related arch-specific
> > vs arch-agnostic __<arch>_sys_xxx vs __se_sys_xxx function names, calling
> > convention woes ("nested" pt_regs), etc. All this is quite annoying to
> > remember and care about as BPF user, especially if the goal is to write
> > achitecture- and kernel version-agnostic BPF code (e.g., things like
> > libbpf-tools, etc).
> >
> > By using SEC("ksyscall/xxx")/SEC("kretsyscall/xxx") user clearly communicates
> > the desire to kprobe/kretprobe kernel function that corresponds to the
> > specified syscall. Libbpf will take care of all the details of determining
> > correct function name and calling conventions.
> >
> > This patch set also improves BPF_KPROBE_SYSCALL (and renames it to
> > BPF_KSYSCALL to match SEC("ksyscall")) macro to take into account
> > CONFIG_ARCH_HAS_SYSCALL_WRAPPER instead of hard-coding whether host
> > architecture is expected to use syscall wrapper or not (which is less reliable
> > and can change over time).
> >
>
> Hi Andrii,
> I would very much liked if there was such a macro, which will make
> things easier for syscall tracing,
> but I think that you can't assume that libbpf will have access to
> kconfig files all the time.
> For example, if running from a container and not mounting /boot (on
> environments where the config file is in /boot), libbpf will fail to
> load CONFIG_ARCH_HAS_SYSCALL_WRAPPER value and assume it to be not
> defined.
> Then, on any environment with a "new" kernel where the program runs
> from a container, it will return the wrong argument values.
> For this very reason we fall-back in [1] to assume
> CONFIG_ARCH_HAS_SYSCALL_WRAPPER is defined, as in most environments it
> will be.
>
> [1] https://github.com/aquasecurity/tracee/blob/0f28a2cc14b851308ebaa380d503dea9eaa67271/pkg/ebpf/initialization/kconfig.go#L37
>

I see, unfortunately without relying on
CONFIG_ARCH_HAS_SYSCALL_WRAPPER on BPF side it's hard to make this
correct in all kernel versions. One way would be to keep
BPF_KPROBE_SYSCALL as is assuming syscall wrapper for x86, s390 and
arm64, and add BPF_KSYSCALL() macro as I did here, which would depend
on __kconfig, so in your situation it won't work. SEC("ksyscall") by
itself will still work, though, if you find it useful.


> > It would be great to get feedback about the overall feature, but also I'd
> > appreciate help with testing this, especially for non-x86_64 architectures.
> >
> > Cc: Ilya Leoshkevich <iii@linux.ibm.com>
> > Cc: Kenta Tada <kenta.tada@sony.com>
> > Cc: Hengqi Chen <hengqi.chen@gmail.com>
> >
> > Andrii Nakryiko (3):
> >   libbpf: improve and rename BPF_KPROBE_SYSCALL
> >   libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
> >   selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests
> >
> >  tools/lib/bpf/bpf_tracing.h                   |  44 +++++--
> >  tools/lib/bpf/libbpf.c                        | 109 ++++++++++++++++++
> >  tools/lib/bpf/libbpf.h                        |  16 +++
> >  tools/lib/bpf/libbpf.map                      |   1 +
> >  tools/lib/bpf/libbpf_internal.h               |   2 +
> >  .../selftests/bpf/progs/bpf_syscall_macro.c   |   6 +-
> >  .../selftests/bpf/progs/test_attach_probe.c   |   6 +-
> >  .../selftests/bpf/progs/test_probe_user.c     |  27 +----
> >  8 files changed, 172 insertions(+), 39 deletions(-)
> >
> > --
> > 2.30.2
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support
  2022-07-07 15:51 ` Ilya Leoshkevich
@ 2022-07-07 20:59   ` Andrii Nakryiko
  2022-07-11 18:25     ` Ilya Leoshkevich
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-07 20:59 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Kenta Tada, Hengqi Chen

On Thu, Jul 7, 2022 at 8:51 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> On Wed, 2022-07-06 at 17:41 -0700, Andrii Nakryiko wrote:
> > This RFC patch set is to gather feedback about new
> > SEC("ksyscall") and SEC("kretsyscall") section definitions meant to
> > simplify
> > life of BPF users that want to trace Linux syscalls without having to
> > know or
> > care about things like CONFIG_ARCH_HAS_SYSCALL_WRAPPER and related
> > arch-specific
> > vs arch-agnostic __<arch>_sys_xxx vs __se_sys_xxx function names,
> > calling
> > convention woes ("nested" pt_regs), etc. All this is quite annoying
> > to
> > remember and care about as BPF user, especially if the goal is to
> > write
> > achitecture- and kernel version-agnostic BPF code (e.g., things like
> > libbpf-tools, etc).
> >
> > By using SEC("ksyscall/xxx")/SEC("kretsyscall/xxx") user clearly
> > communicates
> > the desire to kprobe/kretprobe kernel function that corresponds to
> > the
> > specified syscall. Libbpf will take care of all the details of
> > determining
> > correct function name and calling conventions.
> >
> > This patch set also improves BPF_KPROBE_SYSCALL (and renames it to
> > BPF_KSYSCALL to match SEC("ksyscall")) macro to take into account
> > CONFIG_ARCH_HAS_SYSCALL_WRAPPER instead of hard-coding whether host
> > architecture is expected to use syscall wrapper or not (which is less
> > reliable
> > and can change over time).
> >
> > It would be great to get feedback about the overall feature, but also
> > I'd
> > appreciate help with testing this, especially for non-x86_64
> > architectures.
> >
> > Cc: Ilya Leoshkevich <iii@linux.ibm.com>
> > Cc: Kenta Tada <kenta.tada@sony.com>
> > Cc: Hengqi Chen <hengqi.chen@gmail.com>
> >
> > Andrii Nakryiko (3):
> >   libbpf: improve and rename BPF_KPROBE_SYSCALL
> >   libbpf: add ksyscall/kretsyscall sections support for syscall
> > kprobes
> >   selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests
> >
> >  tools/lib/bpf/bpf_tracing.h                   |  44 +++++--
> >  tools/lib/bpf/libbpf.c                        | 109
> > ++++++++++++++++++
> >  tools/lib/bpf/libbpf.h                        |  16 +++
> >  tools/lib/bpf/libbpf.map                      |   1 +
> >  tools/lib/bpf/libbpf_internal.h               |   2 +
> >  .../selftests/bpf/progs/bpf_syscall_macro.c   |   6 +-
> >  .../selftests/bpf/progs/test_attach_probe.c   |   6 +-
> >  .../selftests/bpf/progs/test_probe_user.c     |  27 +----
> >  8 files changed, 172 insertions(+), 39 deletions(-)
>
> Hi Andrii,
>
> Looks interesting, I will give it a try on s390x a bit later.
>
> In the meantime just one remark: if we want to create a truly seamless
> solution, we might need to take care of quirks associated with the
> following kernel #defines:
>
> * __ARCH_WANT_SYS_OLD_MMAP (real arguments are in memory)
> * CONFIG_CLONE_BACKWARDS (child_tidptr/tls swapped)
> * CONFIG_CLONE_BACKWARDS2 (newsp/clone_flags swapped)
> * CONFIG_CLONE_BACKWARDS3 (extra arg: stack_size)
>
> or at least document that users need to be careful with mmap() and
> clone() probes. Also, there might be more of that out there, but that's
> what I'm constantly running into on s390x.
>

Tbh, this space seems so messy, that I don't think it's realistic to
try to have a completely seamless solution. As I replied to Alexei, I
didn't have an intention to support compat and 32-bit syscalls, for
example. This seems to be also a quirk that users will have to
discover and handle on their own. In my mind there is always plain
SEC("kprobe") if SEC("ksyscall") gets in a way to handle
compat/32-bit/quirks like the ones you mentioned.

But maybe the right answer is just to not add SEC("ksyscall") at all?


> Best regards,
> Ilya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-07  0:41 ` [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes Andrii Nakryiko
  2022-07-07 17:23   ` Alexei Starovoitov
@ 2022-07-08  9:35   ` Jiri Olsa
  2022-07-08 22:04     ` Andrii Nakryiko
  1 sibling, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2022-07-08  9:35 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, kernel-team, Ilya Leoshkevich, Kenta Tada, Hengqi Chen

On Wed, Jul 06, 2022 at 05:41:17PM -0700, Andrii Nakryiko wrote:

SNIP

> +static int probe_kern_syscall_wrapper(void)
> +{
> +	/* available_filter_functions is a few times smaller than
> +	 * /proc/kallsyms and has simpler format, so we use it as a faster way
> +	 * to check that __<arch>_sys_bpf symbol exists, which is a sign that
> +	 * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses
> +	 * syscall wrappers
> +	 */
> +	static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions";
> +	char func_name[128], syscall_name[128];
> +	const char *ksys_pfx;
> +	FILE *f;
> +	int cnt;
> +
> +	ksys_pfx = arch_specific_syscall_pfx();
> +	if (!ksys_pfx)
> +		return 0;
> +
> +	f = fopen(kprobes_file, "r");
> +	if (!f)
> +		return 0;
> +
> +	snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx);
> +
> +	/* check if bpf() syscall wrapper is listed as possible kprobe */
> +	while ((cnt = fscanf(f, "%127s%*[^\n]\n", func_name)) == 1) {

nit cnt is not used/needed

jirka

> +		if (strcmp(func_name, syscall_name) == 0) {
> +			fclose(f);
> +			return 1;
> +		}
> +	}
> +
> +	fclose(f);
> +	return 0;
> +}
> +
>  enum kern_feature_result {
>  	FEAT_UNKNOWN = 0,
>  	FEAT_SUPPORTED = 1,
> @@ -4722,6 +4781,9 @@ static struct kern_feature_desc {
>  	[FEAT_BTF_ENUM64] = {
>  		"BTF_KIND_ENUM64 support", probe_kern_btf_enum64,
>  	},
> +	[FEAT_SYSCALL_WRAPPER] = {
> +		"Kernel using syscall wrapper", probe_kern_syscall_wrapper,
> +	},
>  };
>  

SNIP

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-07 19:10     ` Andrii Nakryiko
  2022-07-07 19:36       ` Alexei Starovoitov
@ 2022-07-08 11:28       ` Jiri Olsa
  2022-07-08 22:05         ` Andrii Nakryiko
  1 sibling, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2022-07-08 11:28 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Ilya Leoshkevich, Kenta Tada,
	Hengqi Chen

On Thu, Jul 07, 2022 at 12:10:30PM -0700, Andrii Nakryiko wrote:

SNIP

> > Maybe we should do the other way around ?
> > cat /proc/kallsyms |grep sys_bpf
> >
> > and figure out the prefix from there?
> > Then we won't need to do giant
> > #if defined(__x86_64__)
> > ...
> >
> 
> Unfortunately this won't work well due to compat and 32-bit APIs (and
> bpf() syscall is particularly bad with also bpf_sys_bpf):
> 
> $ sudo cat /proc/kallsyms| rg '_sys_bpf$'
> ffffffff811cb100 t __sys_bpf
> ffffffff811cd380 T bpf_sys_bpf
> ffffffff811cd520 T __x64_sys_bpf
> ffffffff811cd540 T __ia32_sys_bpf
> ffffffff8256fce0 r __ksymtab_bpf_sys_bpf
> ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf
> ffffffff8259bab9 r __kstrtab_bpf_sys_bpf
> ffffffff83abc400 t _eil_addr___ia32_sys_bpf
> ffffffff83abc410 t _eil_addr___x64_sys_bpf
> 
> $ sudo cat /proc/kallsyms| rg '_sys_mmap$'
> ffffffff81024480 T __x64_sys_mmap
> ffffffff810244c0 T __ia32_sys_mmap
> ffffffff83abae30 t _eil_addr___ia32_sys_mmap
> ffffffff83abae40 t _eil_addr___x64_sys_mmap
> 
> We have similar arch-specific switches in few other places (USDT and
> lib path detection, for example), so it's not a new precedent (for
> better or worse).
> 
> 
> > /proc/kallsyms has world read permissions:
> > proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> > unlike available_filter_functions.
> >
> > Also tracefs might be mounted in a different dir than
> > /sys/kernel/tracing/
> > like
> > /sys/kernel/debug/tracing/
> 
> Yeah, good point, was trying to avoid parsing more expensive kallsyms,
> but given it's done once, it might not be a big deal.

we could get that also from BTF?

jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-08  9:35   ` Jiri Olsa
@ 2022-07-08 22:04     ` Andrii Nakryiko
  0 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-08 22:04 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Ilya Leoshkevich, Kenta Tada, Hengqi Chen

On Fri, Jul 8, 2022 at 2:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Wed, Jul 06, 2022 at 05:41:17PM -0700, Andrii Nakryiko wrote:
>
> SNIP
>
> > +static int probe_kern_syscall_wrapper(void)
> > +{
> > +     /* available_filter_functions is a few times smaller than
> > +      * /proc/kallsyms and has simpler format, so we use it as a faster way
> > +      * to check that __<arch>_sys_bpf symbol exists, which is a sign that
> > +      * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses
> > +      * syscall wrappers
> > +      */
> > +     static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions";
> > +     char func_name[128], syscall_name[128];
> > +     const char *ksys_pfx;
> > +     FILE *f;
> > +     int cnt;
> > +
> > +     ksys_pfx = arch_specific_syscall_pfx();
> > +     if (!ksys_pfx)
> > +             return 0;
> > +
> > +     f = fopen(kprobes_file, "r");
> > +     if (!f)
> > +             return 0;
> > +
> > +     snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx);
> > +
> > +     /* check if bpf() syscall wrapper is listed as possible kprobe */
> > +     while ((cnt = fscanf(f, "%127s%*[^\n]\n", func_name)) == 1) {
>
> nit cnt is not used/needed

yep, leftovers, nice catch

>
> jirka
>
> > +             if (strcmp(func_name, syscall_name) == 0) {
> > +                     fclose(f);
> > +                     return 1;
> > +             }
> > +     }
> > +
> > +     fclose(f);
> > +     return 0;
> > +}
> > +
> >  enum kern_feature_result {
> >       FEAT_UNKNOWN = 0,
> >       FEAT_SUPPORTED = 1,
> > @@ -4722,6 +4781,9 @@ static struct kern_feature_desc {
> >       [FEAT_BTF_ENUM64] = {
> >               "BTF_KIND_ENUM64 support", probe_kern_btf_enum64,
> >       },
> > +     [FEAT_SYSCALL_WRAPPER] = {
> > +             "Kernel using syscall wrapper", probe_kern_syscall_wrapper,
> > +     },
> >  };
> >
>
> SNIP

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-08 11:28       ` Jiri Olsa
@ 2022-07-08 22:05         ` Andrii Nakryiko
  2022-07-10  0:38           ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-08 22:05 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Ilya Leoshkevich, Kenta Tada,
	Hengqi Chen

On Fri, Jul 8, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Jul 07, 2022 at 12:10:30PM -0700, Andrii Nakryiko wrote:
>
> SNIP
>
> > > Maybe we should do the other way around ?
> > > cat /proc/kallsyms |grep sys_bpf
> > >
> > > and figure out the prefix from there?
> > > Then we won't need to do giant
> > > #if defined(__x86_64__)
> > > ...
> > >
> >
> > Unfortunately this won't work well due to compat and 32-bit APIs (and
> > bpf() syscall is particularly bad with also bpf_sys_bpf):
> >
> > $ sudo cat /proc/kallsyms| rg '_sys_bpf$'
> > ffffffff811cb100 t __sys_bpf
> > ffffffff811cd380 T bpf_sys_bpf
> > ffffffff811cd520 T __x64_sys_bpf
> > ffffffff811cd540 T __ia32_sys_bpf
> > ffffffff8256fce0 r __ksymtab_bpf_sys_bpf
> > ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf
> > ffffffff8259bab9 r __kstrtab_bpf_sys_bpf
> > ffffffff83abc400 t _eil_addr___ia32_sys_bpf
> > ffffffff83abc410 t _eil_addr___x64_sys_bpf
> >
> > $ sudo cat /proc/kallsyms| rg '_sys_mmap$'
> > ffffffff81024480 T __x64_sys_mmap
> > ffffffff810244c0 T __ia32_sys_mmap
> > ffffffff83abae30 t _eil_addr___ia32_sys_mmap
> > ffffffff83abae40 t _eil_addr___x64_sys_mmap
> >
> > We have similar arch-specific switches in few other places (USDT and
> > lib path detection, for example), so it's not a new precedent (for
> > better or worse).
> >
> >
> > > /proc/kallsyms has world read permissions:
> > > proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> > > unlike available_filter_functions.
> > >
> > > Also tracefs might be mounted in a different dir than
> > > /sys/kernel/tracing/
> > > like
> > > /sys/kernel/debug/tracing/
> >
> > Yeah, good point, was trying to avoid parsing more expensive kallsyms,
> > but given it's done once, it might not be a big deal.
>
> we could get that also from BTF?

I'd rather not add dependency on BTF for this.

>
> jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-08 22:05         ` Andrii Nakryiko
@ 2022-07-10  0:38           ` Alexei Starovoitov
  2022-07-11 16:28             ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2022-07-10  0:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Ilya Leoshkevich, Kenta Tada,
	Hengqi Chen

On Fri, Jul 8, 2022 at 3:05 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Jul 8, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, Jul 07, 2022 at 12:10:30PM -0700, Andrii Nakryiko wrote:
> >
> > SNIP
> >
> > > > Maybe we should do the other way around ?
> > > > cat /proc/kallsyms |grep sys_bpf
> > > >
> > > > and figure out the prefix from there?
> > > > Then we won't need to do giant
> > > > #if defined(__x86_64__)
> > > > ...
> > > >
> > >
> > > Unfortunately this won't work well due to compat and 32-bit APIs (and
> > > bpf() syscall is particularly bad with also bpf_sys_bpf):
> > >
> > > $ sudo cat /proc/kallsyms| rg '_sys_bpf$'
> > > ffffffff811cb100 t __sys_bpf
> > > ffffffff811cd380 T bpf_sys_bpf
> > > ffffffff811cd520 T __x64_sys_bpf
> > > ffffffff811cd540 T __ia32_sys_bpf
> > > ffffffff8256fce0 r __ksymtab_bpf_sys_bpf
> > > ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf
> > > ffffffff8259bab9 r __kstrtab_bpf_sys_bpf
> > > ffffffff83abc400 t _eil_addr___ia32_sys_bpf
> > > ffffffff83abc410 t _eil_addr___x64_sys_bpf
> > >
> > > $ sudo cat /proc/kallsyms| rg '_sys_mmap$'
> > > ffffffff81024480 T __x64_sys_mmap
> > > ffffffff810244c0 T __ia32_sys_mmap
> > > ffffffff83abae30 t _eil_addr___ia32_sys_mmap
> > > ffffffff83abae40 t _eil_addr___x64_sys_mmap
> > >
> > > We have similar arch-specific switches in few other places (USDT and
> > > lib path detection, for example), so it's not a new precedent (for
> > > better or worse).
> > >
> > >
> > > > /proc/kallsyms has world read permissions:
> > > > proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> > > > unlike available_filter_functions.
> > > >
> > > > Also tracefs might be mounted in a different dir than
> > > > /sys/kernel/tracing/
> > > > like
> > > > /sys/kernel/debug/tracing/
> > >
> > > Yeah, good point, was trying to avoid parsing more expensive kallsyms,
> > > but given it's done once, it might not be a big deal.
> >
> > we could get that also from BTF?
>
> I'd rather not add dependency on BTF for this.

A weird and non technical reason.
Care to explain this odd excuse?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support
  2022-07-07 20:56   ` Andrii Nakryiko
@ 2022-07-11 16:23     ` Andrii Nakryiko
  0 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-11 16:23 UTC (permalink / raw)
  To: Yaniv Agman
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Ilya Leoshkevich, Kenta Tada, Hengqi Chen

On Thu, Jul 7, 2022 at 1:56 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Jul 7, 2022 at 1:28 AM Yaniv Agman <yanivagman@gmail.com> wrote:
> >
> > ‫בתאריך יום ה׳, 7 ביולי 2022 ב-3:48 מאת ‪Andrii Nakryiko‬‏
> > <‪andrii@kernel.org‬‏>:‬
> > >
> > > This RFC patch set is to gather feedback about new
> > > SEC("ksyscall") and SEC("kretsyscall") section definitions meant to simplify
> > > life of BPF users that want to trace Linux syscalls without having to know or
> > > care about things like CONFIG_ARCH_HAS_SYSCALL_WRAPPER and related arch-specific
> > > vs arch-agnostic __<arch>_sys_xxx vs __se_sys_xxx function names, calling
> > > convention woes ("nested" pt_regs), etc. All this is quite annoying to
> > > remember and care about as BPF user, especially if the goal is to write
> > > achitecture- and kernel version-agnostic BPF code (e.g., things like
> > > libbpf-tools, etc).
> > >
> > > By using SEC("ksyscall/xxx")/SEC("kretsyscall/xxx") user clearly communicates
> > > the desire to kprobe/kretprobe kernel function that corresponds to the
> > > specified syscall. Libbpf will take care of all the details of determining
> > > correct function name and calling conventions.
> > >
> > > This patch set also improves BPF_KPROBE_SYSCALL (and renames it to
> > > BPF_KSYSCALL to match SEC("ksyscall")) macro to take into account
> > > CONFIG_ARCH_HAS_SYSCALL_WRAPPER instead of hard-coding whether host
> > > architecture is expected to use syscall wrapper or not (which is less reliable
> > > and can change over time).
> > >
> >
> > Hi Andrii,
> > I would very much liked if there was such a macro, which will make
> > things easier for syscall tracing,
> > but I think that you can't assume that libbpf will have access to
> > kconfig files all the time.
> > For example, if running from a container and not mounting /boot (on
> > environments where the config file is in /boot), libbpf will fail to
> > load CONFIG_ARCH_HAS_SYSCALL_WRAPPER value and assume it to be not
> > defined.
> > Then, on any environment with a "new" kernel where the program runs
> > from a container, it will return the wrong argument values.
> > For this very reason we fall-back in [1] to assume
> > CONFIG_ARCH_HAS_SYSCALL_WRAPPER is defined, as in most environments it
> > will be.
> >
> > [1] https://github.com/aquasecurity/tracee/blob/0f28a2cc14b851308ebaa380d503dea9eaa67271/pkg/ebpf/initialization/kconfig.go#L37
> >
>
> I see, unfortunately without relying on
> CONFIG_ARCH_HAS_SYSCALL_WRAPPER on BPF side it's hard to make this
> correct in all kernel versions. One way would be to keep
> BPF_KPROBE_SYSCALL as is assuming syscall wrapper for x86, s390 and
> arm64, and add BPF_KSYSCALL() macro as I did here, which would depend
> on __kconfig, so in your situation it won't work. SEC("ksyscall") by
> itself will still work, though, if you find it useful.
>

I thought some more about this. This is the second such problem (first
being USDT detecting availability of BPF cookie support) where libbpf
on user-space side performs feature detection and BPF-side code has to
use slightly different feature detection for the same feature.

To solve both problems a bit more generically, I'm thinking to add few
fake __kconfig variable, just like libbpf does with
LINUX_VERSION_CODE. I'll carve out some "namespace" for
libbpf-provided feature detection (e.g., LINUX_HAS_BPF_COOKIE and
LINUX_HAS_SYSCALL_WRAPPER, or something along those lines), and libbpf
will fill them in just like we do with LINUX_VERSION_CODE. Without
actually requiring /proc/config.gz. Thoughts?

>
> > > It would be great to get feedback about the overall feature, but also I'd
> > > appreciate help with testing this, especially for non-x86_64 architectures.
> > >
> > > Cc: Ilya Leoshkevich <iii@linux.ibm.com>
> > > Cc: Kenta Tada <kenta.tada@sony.com>
> > > Cc: Hengqi Chen <hengqi.chen@gmail.com>
> > >
> > > Andrii Nakryiko (3):
> > >   libbpf: improve and rename BPF_KPROBE_SYSCALL
> > >   libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
> > >   selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests
> > >
> > >  tools/lib/bpf/bpf_tracing.h                   |  44 +++++--
> > >  tools/lib/bpf/libbpf.c                        | 109 ++++++++++++++++++
> > >  tools/lib/bpf/libbpf.h                        |  16 +++
> > >  tools/lib/bpf/libbpf.map                      |   1 +
> > >  tools/lib/bpf/libbpf_internal.h               |   2 +
> > >  .../selftests/bpf/progs/bpf_syscall_macro.c   |   6 +-
> > >  .../selftests/bpf/progs/test_attach_probe.c   |   6 +-
> > >  .../selftests/bpf/progs/test_probe_user.c     |  27 +----
> > >  8 files changed, 172 insertions(+), 39 deletions(-)
> > >
> > > --
> > > 2.30.2
> > >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-10  0:38           ` Alexei Starovoitov
@ 2022-07-11 16:28             ` Andrii Nakryiko
  2022-07-12  4:20               ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-11 16:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Ilya Leoshkevich, Kenta Tada,
	Hengqi Chen

On Sat, Jul 9, 2022 at 5:38 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jul 8, 2022 at 3:05 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Jul 8, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Thu, Jul 07, 2022 at 12:10:30PM -0700, Andrii Nakryiko wrote:
> > >
> > > SNIP
> > >
> > > > > Maybe we should do the other way around ?
> > > > > cat /proc/kallsyms |grep sys_bpf
> > > > >
> > > > > and figure out the prefix from there?
> > > > > Then we won't need to do giant
> > > > > #if defined(__x86_64__)
> > > > > ...
> > > > >
> > > >
> > > > Unfortunately this won't work well due to compat and 32-bit APIs (and
> > > > bpf() syscall is particularly bad with also bpf_sys_bpf):
> > > >
> > > > $ sudo cat /proc/kallsyms| rg '_sys_bpf$'
> > > > ffffffff811cb100 t __sys_bpf
> > > > ffffffff811cd380 T bpf_sys_bpf
> > > > ffffffff811cd520 T __x64_sys_bpf
> > > > ffffffff811cd540 T __ia32_sys_bpf
> > > > ffffffff8256fce0 r __ksymtab_bpf_sys_bpf
> > > > ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf
> > > > ffffffff8259bab9 r __kstrtab_bpf_sys_bpf
> > > > ffffffff83abc400 t _eil_addr___ia32_sys_bpf
> > > > ffffffff83abc410 t _eil_addr___x64_sys_bpf
> > > >
> > > > $ sudo cat /proc/kallsyms| rg '_sys_mmap$'
> > > > ffffffff81024480 T __x64_sys_mmap
> > > > ffffffff810244c0 T __ia32_sys_mmap
> > > > ffffffff83abae30 t _eil_addr___ia32_sys_mmap
> > > > ffffffff83abae40 t _eil_addr___x64_sys_mmap
> > > >
> > > > We have similar arch-specific switches in few other places (USDT and
> > > > lib path detection, for example), so it's not a new precedent (for
> > > > better or worse).
> > > >
> > > >
> > > > > /proc/kallsyms has world read permissions:
> > > > > proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> > > > > unlike available_filter_functions.
> > > > >
> > > > > Also tracefs might be mounted in a different dir than
> > > > > /sys/kernel/tracing/
> > > > > like
> > > > > /sys/kernel/debug/tracing/
> > > >
> > > > Yeah, good point, was trying to avoid parsing more expensive kallsyms,
> > > > but given it's done once, it might not be a big deal.
> > >
> > > we could get that also from BTF?
> >
> > I'd rather not add dependency on BTF for this.
>
> A weird and non technical reason.
> Care to explain this odd excuse?

Quite technical reason: minimizing unrelated dependencies. It's not
necessary to have vmlinux BTF to use kprobes (especially for kprobing
syscalls), so adding dependency on vmlinux BTF just to use
SEC("ksyscall") seems completely unnecessary, given we have other
alternatives.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support
  2022-07-07 20:59   ` Andrii Nakryiko
@ 2022-07-11 18:25     ` Ilya Leoshkevich
  2022-07-12  4:24       ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Ilya Leoshkevich @ 2022-07-11 18:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Kenta Tada, Hengqi Chen

On Thu, 2022-07-07 at 13:59 -0700, Andrii Nakryiko wrote:
> On Thu, Jul 7, 2022 at 8:51 AM Ilya Leoshkevich <iii@linux.ibm.com>
> wrote:
> > 
> > On Wed, 2022-07-06 at 17:41 -0700, Andrii Nakryiko wrote:
> > > This RFC patch set is to gather feedback about new
> > > SEC("ksyscall") and SEC("kretsyscall") section definitions meant
> > > to
> > > simplify
> > > life of BPF users that want to trace Linux syscalls without
> > > having to
> > > know or
> > > care about things like CONFIG_ARCH_HAS_SYSCALL_WRAPPER and
> > > related
> > > arch-specific
> > > vs arch-agnostic __<arch>_sys_xxx vs __se_sys_xxx function names,
> > > calling
> > > convention woes ("nested" pt_regs), etc. All this is quite
> > > annoying
> > > to
> > > remember and care about as BPF user, especially if the goal is to
> > > write
> > > achitecture- and kernel version-agnostic BPF code (e.g., things
> > > like
> > > libbpf-tools, etc).
> > > 
> > > By using SEC("ksyscall/xxx")/SEC("kretsyscall/xxx") user clearly
> > > communicates
> > > the desire to kprobe/kretprobe kernel function that corresponds
> > > to
> > > the
> > > specified syscall. Libbpf will take care of all the details of
> > > determining
> > > correct function name and calling conventions.
> > > 
> > > This patch set also improves BPF_KPROBE_SYSCALL (and renames it
> > > to
> > > BPF_KSYSCALL to match SEC("ksyscall")) macro to take into account
> > > CONFIG_ARCH_HAS_SYSCALL_WRAPPER instead of hard-coding whether
> > > host
> > > architecture is expected to use syscall wrapper or not (which is
> > > less
> > > reliable
> > > and can change over time).
> > > 
> > > It would be great to get feedback about the overall feature, but
> > > also
> > > I'd
> > > appreciate help with testing this, especially for non-x86_64
> > > architectures.
> > > 
> > > Cc: Ilya Leoshkevich <iii@linux.ibm.com>
> > > Cc: Kenta Tada <kenta.tada@sony.com>
> > > Cc: Hengqi Chen <hengqi.chen@gmail.com>
> > > 
> > > Andrii Nakryiko (3):
> > >   libbpf: improve and rename BPF_KPROBE_SYSCALL
> > >   libbpf: add ksyscall/kretsyscall sections support for syscall
> > > kprobes
> > >   selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in
> > > selftests
> > > 
> > >  tools/lib/bpf/bpf_tracing.h                   |  44 +++++--
> > >  tools/lib/bpf/libbpf.c                        | 109
> > > ++++++++++++++++++
> > >  tools/lib/bpf/libbpf.h                        |  16 +++
> > >  tools/lib/bpf/libbpf.map                      |   1 +
> > >  tools/lib/bpf/libbpf_internal.h               |   2 +
> > >  .../selftests/bpf/progs/bpf_syscall_macro.c   |   6 +-
> > >  .../selftests/bpf/progs/test_attach_probe.c   |   6 +-
> > >  .../selftests/bpf/progs/test_probe_user.c     |  27 +----
> > >  8 files changed, 172 insertions(+), 39 deletions(-)
> > 
> > Hi Andrii,
> > 
> > Looks interesting, I will give it a try on s390x a bit later.
> > 
> > In the meantime just one remark: if we want to create a truly
> > seamless
> > solution, we might need to take care of quirks associated with the
> > following kernel #defines:
> > 
> > * __ARCH_WANT_SYS_OLD_MMAP (real arguments are in memory)
> > * CONFIG_CLONE_BACKWARDS (child_tidptr/tls swapped)
> > * CONFIG_CLONE_BACKWARDS2 (newsp/clone_flags swapped)
> > * CONFIG_CLONE_BACKWARDS3 (extra arg: stack_size)
> > 
> > or at least document that users need to be careful with mmap() and
> > clone() probes. Also, there might be more of that out there, but
> > that's
> > what I'm constantly running into on s390x.
> > 
> 
> Tbh, this space seems so messy, that I don't think it's realistic to
> try to have a completely seamless solution. As I replied to Alexei, I
> didn't have an intention to support compat and 32-bit syscalls, for
> example. This seems to be also a quirk that users will have to
> discover and handle on their own. In my mind there is always plain
> SEC("kprobe") if SEC("ksyscall") gets in a way to handle
> compat/32-bit/quirks like the ones you mentioned.
> 
> But maybe the right answer is just to not add SEC("ksyscall") at all?

I think it's a valuable feature, even if it doesn't handle compat
syscalls and all the other calling convention quirks. IMHO these things
just need to be clearly spelled in the documentation.

In order to keep the possibility to handle them in the future, I would
write something like:

    At the moment SEC("ksyscall") does not handle all the calling
    convention quirks for mmap(), clone() and compat syscalls. This may
    or may not change in the future. Therefore it is recommended to use
    SEC("kprobe") for these syscalls.

What do you think?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-11 16:28             ` Andrii Nakryiko
@ 2022-07-12  4:20               ` Alexei Starovoitov
  2022-07-12  5:00                 ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2022-07-12  4:20 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Ilya Leoshkevich, Kenta Tada,
	Hengqi Chen

On Mon, Jul 11, 2022 at 09:28:29AM -0700, Andrii Nakryiko wrote:
> On Sat, Jul 9, 2022 at 5:38 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Jul 8, 2022 at 3:05 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Fri, Jul 8, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > On Thu, Jul 07, 2022 at 12:10:30PM -0700, Andrii Nakryiko wrote:
> > > >
> > > > SNIP
> > > >
> > > > > > Maybe we should do the other way around ?
> > > > > > cat /proc/kallsyms |grep sys_bpf
> > > > > >
> > > > > > and figure out the prefix from there?
> > > > > > Then we won't need to do giant
> > > > > > #if defined(__x86_64__)
> > > > > > ...
> > > > > >
> > > > >
> > > > > Unfortunately this won't work well due to compat and 32-bit APIs (and
> > > > > bpf() syscall is particularly bad with also bpf_sys_bpf):
> > > > >
> > > > > $ sudo cat /proc/kallsyms| rg '_sys_bpf$'
> > > > > ffffffff811cb100 t __sys_bpf
> > > > > ffffffff811cd380 T bpf_sys_bpf
> > > > > ffffffff811cd520 T __x64_sys_bpf
> > > > > ffffffff811cd540 T __ia32_sys_bpf
> > > > > ffffffff8256fce0 r __ksymtab_bpf_sys_bpf
> > > > > ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf
> > > > > ffffffff8259bab9 r __kstrtab_bpf_sys_bpf
> > > > > ffffffff83abc400 t _eil_addr___ia32_sys_bpf
> > > > > ffffffff83abc410 t _eil_addr___x64_sys_bpf
> > > > >
> > > > > $ sudo cat /proc/kallsyms| rg '_sys_mmap$'
> > > > > ffffffff81024480 T __x64_sys_mmap
> > > > > ffffffff810244c0 T __ia32_sys_mmap
> > > > > ffffffff83abae30 t _eil_addr___ia32_sys_mmap
> > > > > ffffffff83abae40 t _eil_addr___x64_sys_mmap
> > > > >
> > > > > We have similar arch-specific switches in few other places (USDT and
> > > > > lib path detection, for example), so it's not a new precedent (for
> > > > > better or worse).
> > > > >
> > > > >
> > > > > > /proc/kallsyms has world read permissions:
> > > > > > proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> > > > > > unlike available_filter_functions.
> > > > > >
> > > > > > Also tracefs might be mounted in a different dir than
> > > > > > /sys/kernel/tracing/
> > > > > > like
> > > > > > /sys/kernel/debug/tracing/
> > > > >
> > > > > Yeah, good point, was trying to avoid parsing more expensive kallsyms,
> > > > > but given it's done once, it might not be a big deal.
> > > >
> > > > we could get that also from BTF?
> > >
> > > I'd rather not add dependency on BTF for this.
> >
> > A weird and non technical reason.
> > Care to explain this odd excuse?
> 
> Quite technical reason: minimizing unrelated dependencies. It's not
> necessary to have vmlinux BTF to use kprobes (especially for kprobing
> syscalls), so adding dependency on vmlinux BTF just to use
> SEC("ksyscall") seems completely unnecessary, given we have other
> alternatives.

If BTF and kallsyms were alternatives then it indeed would make
sense to avoid implement different schemes for old kernels and recent.
But libbpf already loads vmlinux BTF for other reasons.
It caches it and search in it is fast.
While libbpf also parses kallsyms it doesn't cache it.
Yet another search through kallsyms will slow down libbpf loading,
while another search in cached BTF is close to be free.
Also we have bpf_btf_find_by_name_kind() in-kernel helper.
We can prog_run it and optimize libbpf's BTF search to be even faster.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support
  2022-07-11 18:25     ` Ilya Leoshkevich
@ 2022-07-12  4:24       ` Andrii Nakryiko
  2022-07-13  7:12         ` Alan Maguire
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-12  4:24 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Kenta Tada, Hengqi Chen

On Mon, Jul 11, 2022 at 11:25 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> On Thu, 2022-07-07 at 13:59 -0700, Andrii Nakryiko wrote:
> > On Thu, Jul 7, 2022 at 8:51 AM Ilya Leoshkevich <iii@linux.ibm.com>
> > wrote:
> > >
> > > On Wed, 2022-07-06 at 17:41 -0700, Andrii Nakryiko wrote:
> > > > This RFC patch set is to gather feedback about new
> > > > SEC("ksyscall") and SEC("kretsyscall") section definitions meant
> > > > to
> > > > simplify
> > > > life of BPF users that want to trace Linux syscalls without
> > > > having to
> > > > know or
> > > > care about things like CONFIG_ARCH_HAS_SYSCALL_WRAPPER and
> > > > related
> > > > arch-specific
> > > > vs arch-agnostic __<arch>_sys_xxx vs __se_sys_xxx function names,
> > > > calling
> > > > convention woes ("nested" pt_regs), etc. All this is quite
> > > > annoying
> > > > to
> > > > remember and care about as BPF user, especially if the goal is to
> > > > write
> > > > achitecture- and kernel version-agnostic BPF code (e.g., things
> > > > like
> > > > libbpf-tools, etc).
> > > >
> > > > By using SEC("ksyscall/xxx")/SEC("kretsyscall/xxx") user clearly
> > > > communicates
> > > > the desire to kprobe/kretprobe kernel function that corresponds
> > > > to
> > > > the
> > > > specified syscall. Libbpf will take care of all the details of
> > > > determining
> > > > correct function name and calling conventions.
> > > >
> > > > This patch set also improves BPF_KPROBE_SYSCALL (and renames it
> > > > to
> > > > BPF_KSYSCALL to match SEC("ksyscall")) macro to take into account
> > > > CONFIG_ARCH_HAS_SYSCALL_WRAPPER instead of hard-coding whether
> > > > host
> > > > architecture is expected to use syscall wrapper or not (which is
> > > > less
> > > > reliable
> > > > and can change over time).
> > > >
> > > > It would be great to get feedback about the overall feature, but
> > > > also
> > > > I'd
> > > > appreciate help with testing this, especially for non-x86_64
> > > > architectures.
> > > >
> > > > Cc: Ilya Leoshkevich <iii@linux.ibm.com>
> > > > Cc: Kenta Tada <kenta.tada@sony.com>
> > > > Cc: Hengqi Chen <hengqi.chen@gmail.com>
> > > >
> > > > Andrii Nakryiko (3):
> > > >   libbpf: improve and rename BPF_KPROBE_SYSCALL
> > > >   libbpf: add ksyscall/kretsyscall sections support for syscall
> > > > kprobes
> > > >   selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in
> > > > selftests
> > > >
> > > >  tools/lib/bpf/bpf_tracing.h                   |  44 +++++--
> > > >  tools/lib/bpf/libbpf.c                        | 109
> > > > ++++++++++++++++++
> > > >  tools/lib/bpf/libbpf.h                        |  16 +++
> > > >  tools/lib/bpf/libbpf.map                      |   1 +
> > > >  tools/lib/bpf/libbpf_internal.h               |   2 +
> > > >  .../selftests/bpf/progs/bpf_syscall_macro.c   |   6 +-
> > > >  .../selftests/bpf/progs/test_attach_probe.c   |   6 +-
> > > >  .../selftests/bpf/progs/test_probe_user.c     |  27 +----
> > > >  8 files changed, 172 insertions(+), 39 deletions(-)
> > >
> > > Hi Andrii,
> > >
> > > Looks interesting, I will give it a try on s390x a bit later.
> > >
> > > In the meantime just one remark: if we want to create a truly
> > > seamless
> > > solution, we might need to take care of quirks associated with the
> > > following kernel #defines:
> > >
> > > * __ARCH_WANT_SYS_OLD_MMAP (real arguments are in memory)
> > > * CONFIG_CLONE_BACKWARDS (child_tidptr/tls swapped)
> > > * CONFIG_CLONE_BACKWARDS2 (newsp/clone_flags swapped)
> > > * CONFIG_CLONE_BACKWARDS3 (extra arg: stack_size)
> > >
> > > or at least document that users need to be careful with mmap() and
> > > clone() probes. Also, there might be more of that out there, but
> > > that's
> > > what I'm constantly running into on s390x.
> > >
> >
> > Tbh, this space seems so messy, that I don't think it's realistic to
> > try to have a completely seamless solution. As I replied to Alexei, I
> > didn't have an intention to support compat and 32-bit syscalls, for
> > example. This seems to be also a quirk that users will have to
> > discover and handle on their own. In my mind there is always plain
> > SEC("kprobe") if SEC("ksyscall") gets in a way to handle
> > compat/32-bit/quirks like the ones you mentioned.
> >
> > But maybe the right answer is just to not add SEC("ksyscall") at all?
>
> I think it's a valuable feature, even if it doesn't handle compat
> syscalls and all the other calling convention quirks. IMHO these things
> just need to be clearly spelled in the documentation.
>
> In order to keep the possibility to handle them in the future, I would
> write something like:
>
>     At the moment SEC("ksyscall") does not handle all the calling
>     convention quirks for mmap(), clone() and compat syscalls. This may
>     or may not change in the future. Therefore it is recommended to use
>     SEC("kprobe") for these syscalls.
>
> What do you think?

Sounds good! I'll add that to bpf_program__attach_ksyscall() doc
comment (and to commit message). I'll implement those new virtual
__kconfig variables that I mentioned in another thread and post it as
v1, hopefully some time this week.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
  2022-07-12  4:20               ` Alexei Starovoitov
@ 2022-07-12  5:00                 ` Andrii Nakryiko
  0 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-12  5:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Ilya Leoshkevich, Kenta Tada,
	Hengqi Chen

On Mon, Jul 11, 2022 at 9:20 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jul 11, 2022 at 09:28:29AM -0700, Andrii Nakryiko wrote:
> > On Sat, Jul 9, 2022 at 5:38 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Jul 8, 2022 at 3:05 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Fri, Jul 8, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > >
> > > > > On Thu, Jul 07, 2022 at 12:10:30PM -0700, Andrii Nakryiko wrote:
> > > > >
> > > > > SNIP
> > > > >
> > > > > > > Maybe we should do the other way around ?
> > > > > > > cat /proc/kallsyms |grep sys_bpf
> > > > > > >
> > > > > > > and figure out the prefix from there?
> > > > > > > Then we won't need to do giant
> > > > > > > #if defined(__x86_64__)
> > > > > > > ...
> > > > > > >
> > > > > >
> > > > > > Unfortunately this won't work well due to compat and 32-bit APIs (and
> > > > > > bpf() syscall is particularly bad with also bpf_sys_bpf):
> > > > > >
> > > > > > $ sudo cat /proc/kallsyms| rg '_sys_bpf$'
> > > > > > ffffffff811cb100 t __sys_bpf
> > > > > > ffffffff811cd380 T bpf_sys_bpf
> > > > > > ffffffff811cd520 T __x64_sys_bpf
> > > > > > ffffffff811cd540 T __ia32_sys_bpf
> > > > > > ffffffff8256fce0 r __ksymtab_bpf_sys_bpf
> > > > > > ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf
> > > > > > ffffffff8259bab9 r __kstrtab_bpf_sys_bpf
> > > > > > ffffffff83abc400 t _eil_addr___ia32_sys_bpf
> > > > > > ffffffff83abc410 t _eil_addr___x64_sys_bpf
> > > > > >
> > > > > > $ sudo cat /proc/kallsyms| rg '_sys_mmap$'
> > > > > > ffffffff81024480 T __x64_sys_mmap
> > > > > > ffffffff810244c0 T __ia32_sys_mmap
> > > > > > ffffffff83abae30 t _eil_addr___ia32_sys_mmap
> > > > > > ffffffff83abae40 t _eil_addr___x64_sys_mmap
> > > > > >
> > > > > > We have similar arch-specific switches in few other places (USDT and
> > > > > > lib path detection, for example), so it's not a new precedent (for
> > > > > > better or worse).
> > > > > >
> > > > > >
> > > > > > > /proc/kallsyms has world read permissions:
> > > > > > > proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> > > > > > > unlike available_filter_functions.
> > > > > > >
> > > > > > > Also tracefs might be mounted in a different dir than
> > > > > > > /sys/kernel/tracing/
> > > > > > > like
> > > > > > > /sys/kernel/debug/tracing/
> > > > > >
> > > > > > Yeah, good point, was trying to avoid parsing more expensive kallsyms,
> > > > > > but given it's done once, it might not be a big deal.
> > > > >
> > > > > we could get that also from BTF?
> > > >
> > > > I'd rather not add dependency on BTF for this.
> > >
> > > A weird and non technical reason.
> > > Care to explain this odd excuse?
> >
> > Quite technical reason: minimizing unrelated dependencies. It's not
> > necessary to have vmlinux BTF to use kprobes (especially for kprobing
> > syscalls), so adding dependency on vmlinux BTF just to use
> > SEC("ksyscall") seems completely unnecessary, given we have other
> > alternatives.
>
> If BTF and kallsyms were alternatives then it indeed would make
> sense to avoid implement different schemes for old kernels and recent.
> But libbpf already loads vmlinux BTF for other reasons.

Not necessarily, only if bpf_object requires vmlinux BTF, see
obj_needs_vmlinux_btf().

> It caches it and search in it is fast.
> While libbpf also parses kallsyms it doesn't cache it.
> Yet another search through kallsyms will slow down libbpf loading,
> while another search in cached BTF is close to be free.
> Also we have bpf_btf_find_by_name_kind() in-kernel helper.
> We can prog_run it and optimize libbpf's BTF search to be even faster.

I'm starting to actually lean towards just trying to create perf_event
for __<arch>_sys_<syscall> as a feature detection. It will be fast and
simple, and no need to parse kallsyms or available_filter_functions,
take unnecessary dependency on vmlinux BTF, etc. And I have all that
code written already.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support
  2022-07-12  4:24       ` Andrii Nakryiko
@ 2022-07-13  7:12         ` Alan Maguire
  2022-07-13 17:52           ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Maguire @ 2022-07-13  7:12 UTC (permalink / raw)
  To: Andrii Nakryiko, Ilya Leoshkevich
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Kenta Tada, Hengqi Chen

On 12/07/2022 05:24, Andrii Nakryiko wrote:

> Sounds good! I'll add that to bpf_program__attach_ksyscall() doc
> comment (and to commit message). I'll implement those new virtual
> __kconfig variables that I mentioned in another thread and post it as
> v1, hopefully some time this week.
> 

This is really useful, thanks for doing it! I tested on arm64,
only issue was the tracefs path issue that I think was already
mentioned, i.e. for me it took

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4749fb84e33d..a27f21619cfc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4685,7 +4685,7 @@ static int probe_kern_syscall_wrapper(void)
         * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses
         * syscall wrappers
         */
-       static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions";
+       static const char *kprobes_file = "/sys/kernel/debug/tracing/available_filter_functions";
        char func_name[128], syscall_name[128];
        const char *ksys_pfx;
        FILE *f;

...to get the feature probing - and hence auto-attach to the
right arch-specific probes - to work.

Alan

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support
  2022-07-13  7:12         ` Alan Maguire
@ 2022-07-13 17:52           ` Andrii Nakryiko
  0 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2022-07-13 17:52 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Ilya Leoshkevich, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Kenta Tada, Hengqi Chen

On Wed, Jul 13, 2022 at 12:12 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 12/07/2022 05:24, Andrii Nakryiko wrote:
>
> > Sounds good! I'll add that to bpf_program__attach_ksyscall() doc
> > comment (and to commit message). I'll implement those new virtual
> > __kconfig variables that I mentioned in another thread and post it as
> > v1, hopefully some time this week.
> >
>
> This is really useful, thanks for doing it! I tested on arm64,
> only issue was the tracefs path issue that I think was already
> mentioned, i.e. for me it took
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 4749fb84e33d..a27f21619cfc 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -4685,7 +4685,7 @@ static int probe_kern_syscall_wrapper(void)
>          * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses
>          * syscall wrappers
>          */
> -       static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions";
> +       static const char *kprobes_file = "/sys/kernel/debug/tracing/available_filter_functions";
>         char func_name[128], syscall_name[128];
>         const char *ksys_pfx;
>         FILE *f;
>
> ...to get the feature probing - and hence auto-attach to the
> right arch-specific probes - to work.
>

So I remember there were some patches generalizing this, but I can't
remember why it hasn't landed yet. I'll dig it up a bit later, but I
think this will be fixed separately and orthogonally. And also the
non-RFC v1 doesn't rely on available_filter_functions anymore anyways.

It would be nice if you can try v1 as well, and if it looks good give
your ack/tested-by. Thanks!

> Alan

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-07-13 17:53 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07  0:41 [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Andrii Nakryiko
2022-07-07  0:41 ` [PATCH RFC bpf-next 1/3] libbpf: improve and rename BPF_KPROBE_SYSCALL Andrii Nakryiko
2022-07-07  0:41 ` [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes Andrii Nakryiko
2022-07-07 17:23   ` Alexei Starovoitov
2022-07-07 19:10     ` Andrii Nakryiko
2022-07-07 19:36       ` Alexei Starovoitov
2022-07-07 20:50         ` Andrii Nakryiko
2022-07-08 11:28       ` Jiri Olsa
2022-07-08 22:05         ` Andrii Nakryiko
2022-07-10  0:38           ` Alexei Starovoitov
2022-07-11 16:28             ` Andrii Nakryiko
2022-07-12  4:20               ` Alexei Starovoitov
2022-07-12  5:00                 ` Andrii Nakryiko
2022-07-08  9:35   ` Jiri Olsa
2022-07-08 22:04     ` Andrii Nakryiko
2022-07-07  0:41 ` [PATCH RFC bpf-next 3/3] selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests Andrii Nakryiko
2022-07-07  8:28 ` [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Yaniv Agman
2022-07-07 20:56   ` Andrii Nakryiko
2022-07-11 16:23     ` Andrii Nakryiko
2022-07-07 15:51 ` Ilya Leoshkevich
2022-07-07 20:59   ` Andrii Nakryiko
2022-07-11 18:25     ` Ilya Leoshkevich
2022-07-12  4:24       ` Andrii Nakryiko
2022-07-13  7:12         ` Alan Maguire
2022-07-13 17:52           ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.