All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/5] bpf: add get_reg_val helper
@ 2022-05-12  7:43 Dave Marchevsky
  2022-05-12  7:43 ` [RFC PATCH bpf-next 1/5] x86/fpu: Move context.h to include/asm Dave Marchevsky
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Dave Marchevsky @ 2022-05-12  7:43 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team,
	Dave Marchevsky

Currently, BPF programs can access register values from struct pt_regs.
Fetching other registers is not supported. For some usecases this limits
the usefulness of BPF programs. This series adds a helper meant to
fetch any register value for the architecture the program is running on.

Concrete motivating usecase: Tracing programs often attach to User
Statically-Defined Tracing (USDT) probes, which can pass arguments using
registers. The registers used to pass arguments for a specific probe are
determined at compile-time. 

Although general-purpose registers which can be accessed via pt_regs are 
usually chosen, register pressure can cause others to be used. Recently
we saw this happening in a Fedora libpthread library [0], where a xmm
register was used. Similarly, floating-point arguments in USDTs will
result in use of xmm register [1]. Since there is no way to access the
registers used to pass these arguments, BPF programs can't use them.

Another usecase: rdtsc access.

Initially the helper was meant to narrowly address the USDT xmm usecase
but conversation with Andrii highlighted the usefulness of a more
general helper. Although only x86 SSE reg fetching is added in this
patchset, the path forward for adding other register sets and
architectures should be clear.

Feedback from someone familiar with s390 or other arch regarding whether
the helper would be usable for other archs in current form would be
appreciated.


Summary of patches:

Patch 1 moves a header so fpregs_state_valid helper can be used.

Patches 2 and 3 contain the meat of the kernel- and libbpf-side
changes, respectively. Libbpf-side changes add use of the helper to usdt
lib in order to address USDT xmm issue that originally prompted this
work.

Patches 4 and 5 add tests.

Submitted as RFC for early feedback while failing usdt12 prog
verification is addressed (see patch 3).

  [0] - https://github.com/iovisor/bcc/pull/3880
	[1] - https://github.com/iovisor/bcc/issues/3875

Dave Marchevsky (5):
  x86/fpu: Move context.h to include/asm
  bpf: add get_reg_val helper
  libbpf: usdt lib wiring of xmm reads
  selftests/bpf: Add test for USDT parse of xmm reg
  selftests/bpf: get_reg_val test exercising fxsave fetch

 .../x86/{kernel => include/asm}/fpu/context.h |   2 +
 arch/x86/kernel/fpu/core.c                    |   2 +-
 arch/x86/kernel/fpu/regset.c                  |   2 +-
 arch/x86/kernel/fpu/signal.c                  |   2 +-
 arch/x86/kernel/fpu/xstate.c                  |   2 +-
 include/linux/bpf.h                           |   1 +
 include/uapi/linux/bpf.h                      |  40 +++++
 kernel/trace/bpf_trace.c                      | 148 ++++++++++++++++++
 kernel/trace/bpf_trace.h                      |   1 +
 net/bpf/bpf_dummy_struct_ops.c                |  13 ++
 tools/include/uapi/linux/bpf.h                |  40 +++++
 tools/lib/bpf/usdt.bpf.h                      |  36 +++--
 tools/lib/bpf/usdt.c                          |  51 +++++-
 tools/testing/selftests/bpf/Makefile          |   8 +-
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   |  13 ++
 tools/testing/selftests/bpf/prog_tests/usdt.c |  49 ++++++
 .../selftests/bpf/progs/test_urandom_usdt.c   |  37 +++++
 tools/testing/selftests/bpf/test_progs.c      |   7 +
 tools/testing/selftests/bpf/urandom_read.c    |   3 +
 .../selftests/bpf/urandom_read_lib_xmm.c      |  62 ++++++++
 20 files changed, 499 insertions(+), 20 deletions(-)
 rename arch/x86/{kernel => include/asm}/fpu/context.h (96%)
 create mode 100644 tools/testing/selftests/bpf/urandom_read_lib_xmm.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH bpf-next 1/5] x86/fpu: Move context.h to include/asm
  2022-05-12  7:43 [RFC PATCH bpf-next 0/5] bpf: add get_reg_val helper Dave Marchevsky
@ 2022-05-12  7:43 ` Dave Marchevsky
  2022-05-12 13:56   ` David Vernet
  2022-05-14  0:44   ` Alexei Starovoitov
  2022-05-12  7:43 ` [RFC PATCH bpf-next 2/5] bpf: add get_reg_val helper Dave Marchevsky
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: Dave Marchevsky @ 2022-05-12  7:43 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team,
	Dave Marchevsky

The file's fpregs_state_valid function is useful outside of
arch/x86/kernel/fpu dir. Further commits in this series use
fpregs_state_valid to determine whether a BPF helper should fetch
fpu reg value from xsave'd memory or register.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 arch/x86/{kernel => include/asm}/fpu/context.h | 2 ++
 arch/x86/kernel/fpu/core.c                     | 2 +-
 arch/x86/kernel/fpu/regset.c                   | 2 +-
 arch/x86/kernel/fpu/signal.c                   | 2 +-
 arch/x86/kernel/fpu/xstate.c                   | 2 +-
 5 files changed, 6 insertions(+), 4 deletions(-)
 rename arch/x86/{kernel => include/asm}/fpu/context.h (96%)

diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/include/asm/fpu/context.h
similarity index 96%
rename from arch/x86/kernel/fpu/context.h
rename to arch/x86/include/asm/fpu/context.h
index 958accf2ccf0..39dac18cd22c 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/include/asm/fpu/context.h
@@ -51,6 +51,8 @@ static inline void fpregs_activate(struct fpu *fpu)
 	trace_x86_fpu_regs_activated(fpu);
 }
 
+extern void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask);
+
 /* Internal helper for switch_fpu_return() and signal frame setup */
 static inline void fpregs_restore_userregs(void)
 {
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index c049561f373a..5296112d4273 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -7,6 +7,7 @@
  *	Gareth Hughes <gareth@valinux.com>, May 2000
  */
 #include <asm/fpu/api.h>
+#include <asm/fpu/context.h>
 #include <asm/fpu/regset.h>
 #include <asm/fpu/sched.h>
 #include <asm/fpu/signal.h>
@@ -18,7 +19,6 @@
 #include <linux/pkeys.h>
 #include <linux/vmalloc.h>
 
-#include "context.h"
 #include "internal.h"
 #include "legacy.h"
 #include "xstate.h"
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 75ffaef8c299..f93336f332e3 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -6,10 +6,10 @@
 #include <linux/vmalloc.h>
 
 #include <asm/fpu/api.h>
+#include <asm/fpu/context.h>
 #include <asm/fpu/signal.h>
 #include <asm/fpu/regset.h>
 
-#include "context.h"
 #include "internal.h"
 #include "legacy.h"
 #include "xstate.h"
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 91d4b6de58ab..f099a56c9a93 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -7,6 +7,7 @@
 #include <linux/cpu.h>
 #include <linux/pagemap.h>
 
+#include <asm/fpu/context.h>
 #include <asm/fpu/signal.h>
 #include <asm/fpu/regset.h>
 #include <asm/fpu/xstate.h>
@@ -15,7 +16,6 @@
 #include <asm/trapnr.h>
 #include <asm/trace/fpu.h>
 
-#include "context.h"
 #include "internal.h"
 #include "legacy.h"
 #include "xstate.h"
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 39e1c8626ab9..ab5e26075716 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -15,6 +15,7 @@
 #include <linux/vmalloc.h>
 
 #include <asm/fpu/api.h>
+#include <asm/fpu/context.h>
 #include <asm/fpu/regset.h>
 #include <asm/fpu/signal.h>
 #include <asm/fpu/xcr.h>
@@ -23,7 +24,6 @@
 #include <asm/prctl.h>
 #include <asm/elf.h>
 
-#include "context.h"
 #include "internal.h"
 #include "legacy.h"
 #include "xstate.h"
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH bpf-next 2/5] bpf: add get_reg_val helper
  2022-05-12  7:43 [RFC PATCH bpf-next 0/5] bpf: add get_reg_val helper Dave Marchevsky
  2022-05-12  7:43 ` [RFC PATCH bpf-next 1/5] x86/fpu: Move context.h to include/asm Dave Marchevsky
@ 2022-05-12  7:43 ` Dave Marchevsky
  2022-05-12 15:29   ` David Vernet
  2022-05-14  0:41   ` Alexei Starovoitov
  2022-05-12  7:43 ` [RFC PATCH bpf-next 3/5] libbpf: usdt lib wiring of xmm reads Dave Marchevsky
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: Dave Marchevsky @ 2022-05-12  7:43 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team,
	Dave Marchevsky

Add a helper which reads the value of specified register into memory.

Currently, bpf programs only have access to general-purpose registers
via struct pt_regs. Other registers, like SSE regs %xmm0-15, are
inaccessible, which makes some tracing usecases impossible. For example,
User Statically-Defined Tracing (USDT) probes may use SSE registers to
pass their arguments on x86. While this patch adds support for %xmm0-15
only, the helper is meant to be generic enough to support fetching any
reg.

A useful "value of register" definition for bpf programs is "value of
register before control transfer to kernel". pt_regs gives us this
currently, so it's the default behavior of the new helper. Fetching the
actual _current_ reg value is possible, though, by passing
BPF_GETREG_F_CURRENT flag as part of input.

For SSE regs we try to avoid digging around in task's fpu state by first
reading _current_ value, then checking to see if the state of cpu's
floating point regs matches task's view of them. If so, we can just
return _current_ value.

Further usecases which are straightforward to support, but
unimplemented:
  * using the helper to fetch general-purpose register value.
  currently-unused pt_regs parameter exists for this reason.

  * fetching rdtsc (w/ BPF_GETREG_F_CURRENT)

  * other architectures. s390 specifically might benefit from similar
  fpu reg fetching as USDT library was recently updated to support that
  architecture.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/uapi/linux/bpf.h       |  40 +++++++++
 kernel/trace/bpf_trace.c       | 148 +++++++++++++++++++++++++++++++++
 kernel/trace/bpf_trace.h       |   1 +
 tools/include/uapi/linux/bpf.h |  40 +++++++++
 4 files changed, 229 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 444fe6f1cf35..3ef8f683ed9e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5154,6 +5154,18 @@ union bpf_attr {
  *		if not NULL, is a reference which must be released using its
  *		corresponding release function, or moved into a BPF map before
  *		program exit.
+ *
+ * long bpf_get_reg_val(void *dst, u32 size, u64 getreg_spec, struct pt_regs *regs, struct task_struct *tsk)
+ *	Description
+ *		Store the value of a SSE register specified by *getreg_spec*
+ *		into memory region of size *size* specified by *dst*. *getreg_spec*
+ *		is a combination of BPF_GETREG enum AND BPF_GETREG_F flag e.g.
+ *		(BPF_GETREG_X86_XMM0 << 32) | BPF_GETREG_F_CURRENT.*
+ *	Return
+ *		0 on success
+ *		**-ENOENT** if the system architecture does not have requested reg
+ *		**-EINVAL** if *getreg_spec* is invalid
+ *		**-EINVAL** if *size* != bytes necessary to store requested reg val
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5351,6 +5363,7 @@ union bpf_attr {
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
 	FN(kptr_xchg),			\
+	FN(get_reg_val),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -6318,6 +6331,33 @@ struct bpf_perf_event_value {
 	__u64 running;
 };
 
+/* bpf_get_reg_val register enum */
+enum {
+	BPF_GETREG_X86_XMM0 = 0,
+	BPF_GETREG_X86_XMM1,
+	BPF_GETREG_X86_XMM2,
+	BPF_GETREG_X86_XMM3,
+	BPF_GETREG_X86_XMM4,
+	BPF_GETREG_X86_XMM5,
+	BPF_GETREG_X86_XMM6,
+	BPF_GETREG_X86_XMM7,
+	BPF_GETREG_X86_XMM8,
+	BPF_GETREG_X86_XMM9,
+	BPF_GETREG_X86_XMM10,
+	BPF_GETREG_X86_XMM11,
+	BPF_GETREG_X86_XMM12,
+	BPF_GETREG_X86_XMM13,
+	BPF_GETREG_X86_XMM14,
+	BPF_GETREG_X86_XMM15,
+	__MAX_BPF_GETREG,
+};
+
+/* bpf_get_reg_val flags */
+enum {
+	BPF_GETREG_F_NONE = 0,
+	BPF_GETREG_F_CURRENT = (1U << 0),
+};
+
 enum {
 	BPF_DEVCG_ACC_MKNOD	= (1ULL << 0),
 	BPF_DEVCG_ACC_READ	= (1ULL << 1),
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f15b826f9899..0de7d6b3af5b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -28,6 +28,10 @@
 
 #include <asm/tlb.h>
 
+#ifdef CONFIG_X86
+#include <asm/fpu/context.h>
+#endif
+
 #include "trace_probe.h"
 #include "trace.h"
 
@@ -1166,6 +1170,148 @@ static const struct bpf_func_proto bpf_get_func_arg_cnt_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
 
+#define XMM_REG_SZ 16
+
+#define __xmm_space_off(regno)				\
+	case BPF_GETREG_X86_XMM ## regno:		\
+		xmm_space_off = regno * 16;		\
+		break;
+
+static long getreg_read_xmm_fxsave(u32 reg, struct task_struct *tsk,
+				   void *data)
+{
+	struct fxregs_state *fxsave;
+	u32 xmm_space_off;
+
+	switch (reg) {
+	__xmm_space_off(0);
+	__xmm_space_off(1);
+	__xmm_space_off(2);
+	__xmm_space_off(3);
+	__xmm_space_off(4);
+	__xmm_space_off(5);
+	__xmm_space_off(6);
+	__xmm_space_off(7);
+#ifdef	CONFIG_X86_64
+	__xmm_space_off(8);
+	__xmm_space_off(9);
+	__xmm_space_off(10);
+	__xmm_space_off(11);
+	__xmm_space_off(12);
+	__xmm_space_off(13);
+	__xmm_space_off(14);
+	__xmm_space_off(15);
+#endif
+	default:
+		return -EINVAL;
+	}
+
+	fxsave = &tsk->thread.fpu.fpstate->regs.fxsave;
+	memcpy(data, (void *)&fxsave->xmm_space + xmm_space_off, XMM_REG_SZ);
+	return 0;
+}
+
+#undef __xmm_space_off
+
+static bool getreg_is_xmm(u32 reg)
+{
+	return reg >= BPF_GETREG_X86_XMM0 && reg <= BPF_GETREG_X86_XMM15;
+}
+
+#define __bpf_sse_read(regno)							\
+	case BPF_GETREG_X86_XMM ## regno:					\
+		asm("movdqa %%xmm" #regno ", %0" : "=m"(*(char *)data));	\
+		break;
+
+static long bpf_read_sse_reg(u32 reg, u32 flags, struct task_struct *tsk,
+			     void *data)
+{
+#ifdef CONFIG_X86
+	unsigned long irq_flags;
+	long err;
+
+	switch (reg) {
+	__bpf_sse_read(0);
+	__bpf_sse_read(1);
+	__bpf_sse_read(2);
+	__bpf_sse_read(3);
+	__bpf_sse_read(4);
+	__bpf_sse_read(5);
+	__bpf_sse_read(6);
+	__bpf_sse_read(7);
+#ifdef CONFIG_X86_64
+	__bpf_sse_read(8);
+	__bpf_sse_read(9);
+	__bpf_sse_read(10);
+	__bpf_sse_read(11);
+	__bpf_sse_read(12);
+	__bpf_sse_read(13);
+	__bpf_sse_read(14);
+	__bpf_sse_read(15);
+#endif /* CONFIG_X86_64 */
+	default:
+		return -EINVAL;
+	}
+
+	if (flags & BPF_GETREG_F_CURRENT)
+		return 0;
+
+	if (!fpregs_state_valid(&tsk->thread.fpu, smp_processor_id())) {
+		local_irq_save(irq_flags);
+		err = getreg_read_xmm_fxsave(reg, tsk, data);
+		local_irq_restore(irq_flags);
+		return err;
+	}
+
+	return 0;
+#else
+	return -ENOENT;
+#endif /* CONFIG_X86 */
+}
+
+#undef __bpf_sse_read
+
+BPF_CALL_5(get_reg_val, void *, dst, u32, size,
+	   u64, getreg_spec, struct pt_regs *, regs,
+	   struct task_struct *, tsk)
+{
+	u32 reg, flags;
+
+	reg = (u32)(getreg_spec >> 32);
+	flags = (u32)getreg_spec;
+	if (reg >= __MAX_BPF_GETREG)
+		return -EINVAL;
+
+	if (getreg_is_xmm(reg)) {
+#ifndef CONFIG_X86
+		return -ENOENT;
+#else
+		if (size != XMM_REG_SZ)
+			return -EINVAL;
+
+		return bpf_read_sse_reg(reg, flags, tsk, dst);
+	}
+
+	return -EINVAL;
+#endif
+}
+
+BTF_ID_LIST(bpf_get_reg_val_ids)
+BTF_ID(struct, pt_regs)
+
+static const struct bpf_func_proto bpf_get_reg_val_proto = {
+	.func	= get_reg_val,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_BTF_ID_OR_NULL,
+	.arg4_btf_id	= &bpf_get_reg_val_ids[0],
+	.arg5_type	= ARG_PTR_TO_BTF_ID_OR_NULL,
+	.arg5_btf_id	= &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
+};
+
 static const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1287,6 +1433,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_find_vma_proto;
 	case BPF_FUNC_trace_vprintk:
 		return bpf_get_trace_vprintk_proto();
+	case BPF_FUNC_get_reg_val:
+		return &bpf_get_reg_val_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
diff --git a/kernel/trace/bpf_trace.h b/kernel/trace/bpf_trace.h
index 9acbc11ac7bb..b4b55706c2dd 100644
--- a/kernel/trace/bpf_trace.h
+++ b/kernel/trace/bpf_trace.h
@@ -29,6 +29,7 @@ TRACE_EVENT(bpf_trace_printk,
 
 #undef TRACE_INCLUDE_PATH
 #define TRACE_INCLUDE_PATH .
+#undef TRACE_INCLUDE_FILE
 #define TRACE_INCLUDE_FILE bpf_trace
 
 #include <trace/define_trace.h>
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 444fe6f1cf35..3ef8f683ed9e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5154,6 +5154,18 @@ union bpf_attr {
  *		if not NULL, is a reference which must be released using its
  *		corresponding release function, or moved into a BPF map before
  *		program exit.
+ *
+ * long bpf_get_reg_val(void *dst, u32 size, u64 getreg_spec, struct pt_regs *regs, struct task_struct *tsk)
+ *	Description
+ *		Store the value of a SSE register specified by *getreg_spec*
+ *		into memory region of size *size* specified by *dst*. *getreg_spec*
+ *		is a combination of BPF_GETREG enum AND BPF_GETREG_F flag e.g.
+ *		(BPF_GETREG_X86_XMM0 << 32) | BPF_GETREG_F_CURRENT.*
+ *	Return
+ *		0 on success
+ *		**-ENOENT** if the system architecture does not have requested reg
+ *		**-EINVAL** if *getreg_spec* is invalid
+ *		**-EINVAL** if *size* != bytes necessary to store requested reg val
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5351,6 +5363,7 @@ union bpf_attr {
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
 	FN(kptr_xchg),			\
+	FN(get_reg_val),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -6318,6 +6331,33 @@ struct bpf_perf_event_value {
 	__u64 running;
 };
 
+/* bpf_get_reg_val register enum */
+enum {
+	BPF_GETREG_X86_XMM0 = 0,
+	BPF_GETREG_X86_XMM1,
+	BPF_GETREG_X86_XMM2,
+	BPF_GETREG_X86_XMM3,
+	BPF_GETREG_X86_XMM4,
+	BPF_GETREG_X86_XMM5,
+	BPF_GETREG_X86_XMM6,
+	BPF_GETREG_X86_XMM7,
+	BPF_GETREG_X86_XMM8,
+	BPF_GETREG_X86_XMM9,
+	BPF_GETREG_X86_XMM10,
+	BPF_GETREG_X86_XMM11,
+	BPF_GETREG_X86_XMM12,
+	BPF_GETREG_X86_XMM13,
+	BPF_GETREG_X86_XMM14,
+	BPF_GETREG_X86_XMM15,
+	__MAX_BPF_GETREG,
+};
+
+/* bpf_get_reg_val flags */
+enum {
+	BPF_GETREG_F_NONE = 0,
+	BPF_GETREG_F_CURRENT = (1U << 0),
+};
+
 enum {
 	BPF_DEVCG_ACC_MKNOD	= (1ULL << 0),
 	BPF_DEVCG_ACC_READ	= (1ULL << 1),
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH bpf-next 3/5] libbpf: usdt lib wiring of xmm reads
  2022-05-12  7:43 [RFC PATCH bpf-next 0/5] bpf: add get_reg_val helper Dave Marchevsky
  2022-05-12  7:43 ` [RFC PATCH bpf-next 1/5] x86/fpu: Move context.h to include/asm Dave Marchevsky
  2022-05-12  7:43 ` [RFC PATCH bpf-next 2/5] bpf: add get_reg_val helper Dave Marchevsky
@ 2022-05-12  7:43 ` Dave Marchevsky
  2022-05-14  0:43   ` Alexei Starovoitov
  2022-05-16 23:26   ` Andrii Nakryiko
  2022-05-12  7:43 ` [RFC PATCH bpf-next 4/5] selftests/bpf: Add test for USDT parse of xmm reg Dave Marchevsky
  2022-05-12  7:43 ` [RFC PATCH bpf-next 5/5] selftests/bpf: get_reg_val test exercising fxsave fetch Dave Marchevsky
  4 siblings, 2 replies; 20+ messages in thread
From: Dave Marchevsky @ 2022-05-12  7:43 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team,
	Dave Marchevsky

Handle xmm0,...,xmm15 registers when parsing USDT arguments. Currently
only the first 64 bits of the fetched value are returned as I haven't
seen the rest of the register used in practice.

This patch also handles floats in USDT arg spec by ignoring the fact
that they're floats and considering them scalar. Currently we can't do
float math in BPF programs anyways, so might as well support passing to
userspace and converting there.

We can use existing ARG_REG sscanf + logic, adding XMM-specific logic
when calc_pt_regs_off fails. If the reg is xmm, arg_spec's reg_off is
repurposed to hold reg_no, which is passed to bpf_get_reg_val. Since the
helper does the digging around in fxregs_state it's not necessary to
calculate offset in bpf code for these regs.

NOTE: Changes here cause verification failure for existing USDT tests.
Specifically, BPF_USDT prog 'usdt12' fails to verify due to too many
insns despite not having its insn count significantly changed.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 tools/lib/bpf/usdt.bpf.h | 36 ++++++++++++++++++++--------
 tools/lib/bpf/usdt.c     | 51 ++++++++++++++++++++++++++++++++++++----
 2 files changed, 73 insertions(+), 14 deletions(-)

diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
index 4181fddb3687..7b5ed4cbaa2f 100644
--- a/tools/lib/bpf/usdt.bpf.h
+++ b/tools/lib/bpf/usdt.bpf.h
@@ -43,6 +43,7 @@ enum __bpf_usdt_arg_type {
 	BPF_USDT_ARG_CONST,
 	BPF_USDT_ARG_REG,
 	BPF_USDT_ARG_REG_DEREF,
+	BPF_USDT_ARG_XMM_REG,
 };
 
 struct __bpf_usdt_arg_spec {
@@ -129,7 +130,9 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
 {
 	struct __bpf_usdt_spec *spec;
 	struct __bpf_usdt_arg_spec *arg_spec;
-	unsigned long val;
+	struct pt_regs *btf_regs;
+	struct task_struct *btf_task;
+	struct { __u64 a; __u64 unused; } val = {};
 	int err, spec_id;
 
 	*res = 0;
@@ -151,7 +154,7 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
 		/* Arg is just a constant ("-4@$-9" in USDT arg spec).
 		 * value is recorded in arg_spec->val_off directly.
 		 */
-		val = arg_spec->val_off;
+		val.a = arg_spec->val_off;
 		break;
 	case BPF_USDT_ARG_REG:
 		/* Arg is in a register (e.g, "8@%rax" in USDT arg spec),
@@ -159,7 +162,20 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
 		 * struct pt_regs. To keep things simple user-space parts
 		 * record offsetof(struct pt_regs, <regname>) in arg_spec->reg_off.
 		 */
-		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
+		err = bpf_probe_read_kernel(&val.a, sizeof(val.a), (void *)ctx + arg_spec->reg_off);
+		if (err)
+			return err;
+		break;
+	case BPF_USDT_ARG_XMM_REG:
+		/* Same as above, but arg is an xmm reg, so can't look
+		 * in pt_regs, need to use special helper.
+		 * reg_off is the regno ("xmm0" -> regno 0, etc)
+		 */
+		btf_task = bpf_get_current_task_btf();
+		btf_regs = (struct pt_regs *)bpf_task_pt_regs(btf_task);
+		err = bpf_get_reg_val(&val, sizeof(val),
+				     ((u64)arg_spec->reg_off + BPF_GETREG_X86_XMM0) << 32,
+				     btf_regs, btf_task);
 		if (err)
 			return err;
 		break;
@@ -171,14 +187,14 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
 		 * from pt_regs, then do another user-space probe read to
 		 * fetch argument value itself.
 		 */
-		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
+		err = bpf_probe_read_kernel(&val.a, sizeof(val.a), (void *)ctx + arg_spec->reg_off);
 		if (err)
 			return err;
-		err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
+		err = bpf_probe_read_user(&val.a, sizeof(val.a), (void *)val.a + arg_spec->val_off);
 		if (err)
 			return err;
 #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
-		val >>= arg_spec->arg_bitshift;
+		val.a >>= arg_spec->arg_bitshift;
 #endif
 		break;
 	default:
@@ -189,12 +205,12 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
 	 * necessary upper arg_bitshift bits, with sign extension if argument
 	 * is signed
 	 */
-	val <<= arg_spec->arg_bitshift;
+	val.a <<= arg_spec->arg_bitshift;
 	if (arg_spec->arg_signed)
-		val = ((long)val) >> arg_spec->arg_bitshift;
+		val.a = ((long)val.a) >> arg_spec->arg_bitshift;
 	else
-		val = val >> arg_spec->arg_bitshift;
-	*res = val;
+		val.a = val.a >> arg_spec->arg_bitshift;
+	*res = val.a;
 	return 0;
 }
 
diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
index f1c9339cfbbc..3cac48959ff9 100644
--- a/tools/lib/bpf/usdt.c
+++ b/tools/lib/bpf/usdt.c
@@ -199,6 +199,7 @@ enum usdt_arg_type {
 	USDT_ARG_CONST,
 	USDT_ARG_REG,
 	USDT_ARG_REG_DEREF,
+	USDT_ARG_XMM_REG,
 };
 
 /* should match exactly struct __bpf_usdt_arg_spec from usdt.bpf.h */
@@ -1218,7 +1219,41 @@ static int calc_pt_regs_off(const char *reg_name)
 		}
 	}
 
-	pr_warn("usdt: unrecognized register '%s'\n", reg_name);
+	return -ENOENT;
+}
+
+static int calc_xmm_regno(const char *reg_name)
+{
+	static struct {
+		const char *name;
+		__u16 regno;
+	} xmm_reg_map[] = {
+		{ "xmm0",  0 },
+		{ "xmm1",  1 },
+		{ "xmm2",  2 },
+		{ "xmm3",  3 },
+		{ "xmm4",  4 },
+		{ "xmm5",  5 },
+		{ "xmm6",  6 },
+		{ "xmm7",  7 },
+#ifdef __x86_64__
+		{ "xmm8",  8 },
+		{ "xmm9",  9 },
+		{ "xmm10",  10 },
+		{ "xmm11",  11 },
+		{ "xmm12",  12 },
+		{ "xmm13",  13 },
+		{ "xmm14",  14 },
+		{ "xmm15",  15 },
+#endif
+	};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(xmm_reg_map); i++) {
+		if (strcmp(reg_name, xmm_reg_map[i].name) == 0)
+			return xmm_reg_map[i].regno;
+	}
+
 	return -ENOENT;
 }
 
@@ -1234,18 +1269,26 @@ static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec
 		arg->val_off = off;
 		reg_off = calc_pt_regs_off(reg_name);
 		free(reg_name);
-		if (reg_off < 0)
+		if (reg_off < 0) {
+			pr_warn("usdt: unrecognized register '%s'\n", reg_name);
 			return reg_off;
+		}
 		arg->reg_off = reg_off;
-	} else if (sscanf(arg_str, " %d @ %%%ms %n", &arg_sz, &reg_name, &len) == 2) {
+	} else if (sscanf(arg_str, " %d %*[f@] %%%ms %n", &arg_sz, &reg_name, &len) == 2) {
 		/* Register read case, e.g., -4@%eax */
 		arg->arg_type = USDT_ARG_REG;
 		arg->val_off = 0;
 
 		reg_off = calc_pt_regs_off(reg_name);
+		if (reg_off < 0) {
+			reg_off = calc_xmm_regno(reg_name);
+			arg->arg_type = USDT_ARG_XMM_REG;
+		}
 		free(reg_name);
-		if (reg_off < 0)
+		if (reg_off < 0) {
+			pr_warn("usdt: unrecognized register '%s'\n", reg_name);
 			return reg_off;
+		}
 		arg->reg_off = reg_off;
 	} else if (sscanf(arg_str, " %d @ $%ld %n", &arg_sz, &off, &len) == 2) {
 		/* Constant value case, e.g., 4@$71 */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH bpf-next 4/5] selftests/bpf: Add test for USDT parse of xmm reg
  2022-05-12  7:43 [RFC PATCH bpf-next 0/5] bpf: add get_reg_val helper Dave Marchevsky
                   ` (2 preceding siblings ...)
  2022-05-12  7:43 ` [RFC PATCH bpf-next 3/5] libbpf: usdt lib wiring of xmm reads Dave Marchevsky
@ 2022-05-12  7:43 ` Dave Marchevsky
  2022-05-16 23:31   ` Andrii Nakryiko
  2022-05-12  7:43 ` [RFC PATCH bpf-next 5/5] selftests/bpf: get_reg_val test exercising fxsave fetch Dave Marchevsky
  4 siblings, 1 reply; 20+ messages in thread
From: Dave Marchevsky @ 2022-05-12  7:43 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team,
	Dave Marchevsky

Validate that bpf_get_reg_val helper solves the motivating problem of
this patch series: USDT args passed through xmm regs. The userspace
portion of the test forces STAP_PROBE macro to use %xmm0 and %xmm1 regs
to pass a float and an int, which the bpf-side successfully reads using
BPF_USDT.

In the wild I discovered a sanely-configured USDT in Fedora libpthread
using xmm regs to pass scalar values, likely due to register pressure.
urandom_read_lib_xmm mimics this by using -ffixed-$REG flag to mark
r11-r14 unusable and passing many USDT args.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 tools/testing/selftests/bpf/Makefile          |  8 ++-
 tools/testing/selftests/bpf/prog_tests/usdt.c |  7 +++
 .../selftests/bpf/progs/test_urandom_usdt.c   | 13 ++++
 tools/testing/selftests/bpf/urandom_read.c    |  3 +
 .../selftests/bpf/urandom_read_lib_xmm.c      | 62 +++++++++++++++++++
 5 files changed, 91 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/urandom_read_lib_xmm.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 6bbc03161544..19246e34dfe1 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -172,10 +172,14 @@ $(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c
 	$(call msg,LIB,,$@)
 	$(Q)$(CC) $(CFLAGS) -fPIC $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
 
-$(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_read.so
+$(OUTPUT)/liburandom_read_xmm.so: urandom_read_lib_xmm.c
+	$(call msg,LIB,,$@)
+	$(Q)$(CC) -O0 -ffixed-r11 -ffixed-r12 -ffixed-r13 -ffixed-r14 -fPIC $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
+
+$(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_read.so $(OUTPUT)/liburandom_read_xmm.so
 	$(call msg,BINARY,,$@)
 	$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.c,$^)			       \
-		  liburandom_read.so $(LDLIBS)	       			       \
+		  liburandom_read.so liburandom_read_xmm.so $(LDLIBS)          \
 		  -Wl,-rpath=. -Wl,--build-id=sha1 -o $@
 
 $(OUTPUT)/bpf_testmod.ko: $(VMLINUX_BTF) $(wildcard bpf_testmod/Makefile bpf_testmod/*.[ch])
diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
index a71f51bdc08d..f98749ac74a7 100644
--- a/tools/testing/selftests/bpf/prog_tests/usdt.c
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -385,6 +385,12 @@ static void subtest_urandom_usdt(bool auto_attach)
 			goto cleanup;
 		skel->links.urandlib_read_with_sema = l;
 
+		l = bpf_program__attach_usdt(skel->progs.urandlib_xmm_reg_read,
+					     urand_pid, "./liburandom_read_xmm.so",
+					     "urandlib", "xmm_reg_read", NULL);
+		if (!ASSERT_OK_PTR(l, "urandlib_xmm_reg_read"))
+			goto cleanup;
+		skel->links.urandlib_xmm_reg_read = l;
 	}
 
 	/* trigger urandom_read USDTs */
@@ -402,6 +408,7 @@ static void subtest_urandom_usdt(bool auto_attach)
 	ASSERT_EQ(bss->urandlib_read_with_sema_call_cnt, 1, "urandlib_w_sema_cnt");
 	ASSERT_EQ(bss->urandlib_read_with_sema_buf_sz_sum, 256, "urandlib_w_sema_sum");
 
+	ASSERT_EQ(bss->urandlib_xmm_reg_read_buf_sz_sum, 256, "liburandom_read_xmm.so");
 cleanup:
 	if (urand_pipe)
 		pclose(urand_pipe);
diff --git a/tools/testing/selftests/bpf/progs/test_urandom_usdt.c b/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
index 3539b02bd5f7..575761863eb6 100644
--- a/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
+++ b/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
@@ -67,4 +67,17 @@ int BPF_USDT(urandlib_read_with_sema, int iter_num, int iter_cnt, int buf_sz)
 	return 0;
 }
 
+int urandlib_xmm_reg_read_buf_sz_sum;
+SEC("usdt/./liburandom_read_xmm.so:urandlib:xmm_reg_read")
+int BPF_USDT(urandlib_xmm_reg_read, int *f1, int *f2, int *f3, int a, int b,
+				     int c /*should be float */, int d, int e,
+				     int f, int g, int h, int buf_sz)
+{
+	if (urand_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&urandlib_xmm_reg_read_buf_sz_sum, buf_sz);
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/urandom_read.c b/tools/testing/selftests/bpf/urandom_read.c
index e92644d0fa75..0ee7aad014e1 100644
--- a/tools/testing/selftests/bpf/urandom_read.c
+++ b/tools/testing/selftests/bpf/urandom_read.c
@@ -20,6 +20,8 @@ void urand_read_without_sema(int iter_num, int iter_cnt, int read_sz);
 /* these are coming from urandom_read_lib{1,2}.c */
 void urandlib_read_with_sema(int iter_num, int iter_cnt, int read_sz);
 void urandlib_read_without_sema(int iter_num, int iter_cnt, int read_sz);
+/* defined in urandom_read_lib_xmm.c */
+void urandlib_read_xmm_args(int iter_num, int iter_cnt, int read_sz);
 
 unsigned short urand_read_with_sema_semaphore SEC(".probes");
 
@@ -39,6 +41,7 @@ void urandom_read(int fd, int count)
 		/* trigger USDTs defined in shared lib */
 		urandlib_read_without_sema(i, count, BUF_SIZE);
 		urandlib_read_with_sema(i, count, BUF_SIZE);
+		urandlib_read_xmm_args(i, count, BUF_SIZE);
 	}
 }
 
diff --git a/tools/testing/selftests/bpf/urandom_read_lib_xmm.c b/tools/testing/selftests/bpf/urandom_read_lib_xmm.c
new file mode 100644
index 000000000000..d5f9c9cf74e7
--- /dev/null
+++ b/tools/testing/selftests/bpf/urandom_read_lib_xmm.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#define STAP_SDT_ARG_CONSTRAINT norx
+/* For x86_64, this was changed from 'nor' default to 'norfxy' in systemtap
+ * commit eaa15b047 ("PR27829: Support floating point values passed through sdt.h markers")
+ * then changed to 'norx' in commit 1d3653936f ("sys/sdt.h fp constraints cont'd, x86-64 edition")
+ * It's not necessary to set STAP_SDT_ARG_CONSTRAINT for newer systemtap to see
+ * xmm regs used in this program
+ */
+
+#include "sdt.h"
+
+int *f1(void)
+{
+	return (int *)100;
+}
+
+int *f2(void)
+{
+	return (int *)200;
+}
+
+int *f3(void)
+{
+	return (int *)300;
+}
+
+/* Compile w/ -ffixed-r11 -ffixed-r12 -ffixed-r13 -ffixed-r14 -O0 */
+void urandlib_read_xmm_args(int iter_num, int iter_cnt, int read_sz)
+{
+	volatile int a, b, d, e, f, g, h, i;
+	a = b = d = e = f = g = h = 300;
+	i = read_sz;
+	volatile float c = 100;
+
+	STAP_PROBE12(urandlib, xmm_reg_read, f1(), f2(), f3(), a, b, c, d, e, f, g, h, i);
+}
+
+/*
+ * `readelf -n` results:
+ * On a test host outside of kernel toolchain (Ubuntu 20.04, 5.13.0-39-generic, gcc 9.4.0-1ubuntu1~20.04.1)
+ * w/ STAP_SDT_ARG_CONSTRAINT norfxy
+ * 	using system sdt.h:
+ * 		Provider: urandlib
+ * 		Name: xmm_reg_read
+ * 		Location: 0x00000000000011d8, Base: 0x0000000000002008, Semaphore: 0x0000000000000000
+ * 		Arguments: 8@%rbx 8@%r15 8@%xmm1 -4@%edx -4@%ecx 4@%xmm0 -4@%esi -4@%edi -4@%r8d -4@%r9d -4@%r10d -4@%eax
+ *
+ * 	Using a newer systemtap's sdt.h (commit acca4ae47 ("Don't run tls tests if debuginfo is missing")):
+ * 		Provider: urandlib
+ * 		Name: xmm_reg_read
+ * 		Location: 0x00000000000011d8, Base: 0x0000000000002008, Semaphore: 0x0000000000000000
+ * 		Arguments: 8@%rbx 8@%r15 8@%xmm1 -4@%edx -4@%ecx 4f@%xmm0 -4@%esi -4@%edi -4@%r8d -4@%r9d -4@%r10d -4@%eax
+ *
+ * Kernel toolchain:
+ * STAP_SDT_ARG_CONSTRAINT norfxy causes compiler error (due to the 'f'), so using 'norx'
+ * 		Provider: urandlib
+ * 		Name: xmm_reg_read
+ * 		Location: 0x0000000000000717, Base: 0x0000000000000738, Semaphore: 0x0000000000000000
+ * 		Arguments: 8@%rbx 8@%r15 8@-72(%rbp) -4@%edx -4@%ecx 4f@%xmm0 -4@%esi -4@%edi -4@%r8d -4@%r9d -4@%r10d -4@%xmm1
+ */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH bpf-next 5/5] selftests/bpf: get_reg_val test exercising fxsave fetch
  2022-05-12  7:43 [RFC PATCH bpf-next 0/5] bpf: add get_reg_val helper Dave Marchevsky
                   ` (3 preceding siblings ...)
  2022-05-12  7:43 ` [RFC PATCH bpf-next 4/5] selftests/bpf: Add test for USDT parse of xmm reg Dave Marchevsky
@ 2022-05-12  7:43 ` Dave Marchevsky
  2022-05-12 17:47   ` Dave Marchevsky
  2022-05-16 23:28   ` Andrii Nakryiko
  4 siblings, 2 replies; 20+ messages in thread
From: Dave Marchevsky @ 2022-05-12  7:43 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team,
	Dave Marchevsky

Add a test which calls bpf_get_reg_val with an xmm reg after forcing fpu
state save. The test program writes to %xmm10, then calls a BPF program
which forces fpu save and calls bpf_get_reg_val. This guarantees that
!fpregs_state_valid check will succeed, forcing bpf_get_reg_val to fetch
%xmm10's value from task's fpu state.

A bpf_testmod_save_fpregs kfunc helper is added to bpf_testmod to enable
'force fpu save'. Existing bpf_dummy_ops test infra is extended to
support calling the kfunc.

unload_bpf_testmod would often fail with -EAGAIN when running the test
added in this patch, so a single retry w/ 20ms sleep is added.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h                           |  1 +
 kernel/trace/bpf_trace.c                      |  2 +-
 net/bpf/bpf_dummy_struct_ops.c                | 13 ++++++
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   | 13 ++++++
 tools/testing/selftests/bpf/prog_tests/usdt.c | 42 +++++++++++++++++++
 .../selftests/bpf/progs/test_urandom_usdt.c   | 24 +++++++++++
 tools/testing/selftests/bpf/test_progs.c      |  7 ++++
 7 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index be94833d390a..e642e4b8a726 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2223,6 +2223,7 @@ extern const struct bpf_func_proto bpf_find_vma_proto;
 extern const struct bpf_func_proto bpf_loop_proto;
 extern const struct bpf_func_proto bpf_strncmp_proto;
 extern const struct bpf_func_proto bpf_copy_from_user_task_proto;
+extern const struct bpf_func_proto bpf_get_reg_val_proto;
 
 const struct bpf_func_proto *tracing_prog_func_proto(
   enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0de7d6b3af5b..cb81142a751a 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1300,7 +1300,7 @@ BPF_CALL_5(get_reg_val, void *, dst, u32, size,
 BTF_ID_LIST(bpf_get_reg_val_ids)
 BTF_ID(struct, pt_regs)
 
-static const struct bpf_func_proto bpf_get_reg_val_proto = {
+const struct bpf_func_proto bpf_get_reg_val_proto = {
 	.func	= get_reg_val,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
index d0e54e30658a..1f3933cd8aa6 100644
--- a/net/bpf/bpf_dummy_struct_ops.c
+++ b/net/bpf/bpf_dummy_struct_ops.c
@@ -171,7 +171,20 @@ static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log,
 	return atype == BPF_READ ? err : NOT_INIT;
 }
 
+static const struct bpf_func_proto *
+bpf_dummy_ops_get_func_proto(enum bpf_func_id func_id,
+			     const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_get_reg_val:
+		return &bpf_get_reg_val_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
 static const struct bpf_verifier_ops bpf_dummy_verifier_ops = {
+	.get_func_proto = bpf_dummy_ops_get_func_proto,
 	.is_valid_access = bpf_dummy_ops_is_valid_access,
 	.btf_struct_access = bpf_dummy_ops_btf_struct_access,
 };
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index e585e1cefc77..b2b35138b097 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2020 Facebook */
+#include <asm/fpu/api.h>
 #include <linux/btf.h>
 #include <linux/btf_ids.h>
 #include <linux/error-injection.h>
@@ -25,6 +26,13 @@ bpf_testmod_test_mod_kfunc(int i)
 	*(int *)this_cpu_ptr(&bpf_testmod_ksym_percpu) = i;
 }
 
+noinline void
+bpf_testmod_save_fpregs(void)
+{
+	kernel_fpu_begin();
+	kernel_fpu_end();
+}
+
 struct bpf_testmod_btf_type_tag_1 {
 	int a;
 };
@@ -150,6 +158,7 @@ static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = {
 
 BTF_SET_START(bpf_testmod_check_kfunc_ids)
 BTF_ID(func, bpf_testmod_test_mod_kfunc)
+BTF_ID(func, bpf_testmod_save_fpregs)
 BTF_SET_END(bpf_testmod_check_kfunc_ids)
 
 static const struct btf_kfunc_id_set bpf_testmod_kfunc_set = {
@@ -166,6 +175,10 @@ static int bpf_testmod_init(void)
 	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_testmod_kfunc_set);
 	if (ret < 0)
 		return ret;
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_testmod_kfunc_set);
+	if (ret < 0)
+		return ret;
+
 	if (bpf_fentry_test1(0) < 0)
 		return -EINVAL;
 	return sysfs_create_bin_file(kernel_kobj, &bin_attr_bpf_testmod_file);
diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
index f98749ac74a7..3866cb004b22 100644
--- a/tools/testing/selftests/bpf/prog_tests/usdt.c
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -8,6 +8,11 @@
 #include "test_usdt.skel.h"
 #include "test_urandom_usdt.skel.h"
 
+/* Need to keep consistent with definition in include/linux/bpf.h */
+struct bpf_dummy_ops_state {
+	int val;
+};
+
 int lets_test_this(int);
 
 static volatile int idx = 2;
@@ -415,6 +420,41 @@ static void subtest_urandom_usdt(bool auto_attach)
 	test_urandom_usdt__destroy(skel);
 }
 
+static void subtest_reg_val_fpustate(void)
+{
+	struct bpf_dummy_ops_state in_state;
+	struct test_urandom_usdt__bss *bss;
+	struct test_urandom_usdt *skel;
+	u64 in_args[1];
+	u64 regval[2];
+	int err, fd;
+
+	in_state.val = 0; /* unused */
+	in_args[0] = (unsigned long)&in_state;
+
+	LIBBPF_OPTS(bpf_test_run_opts, attr,
+		   .ctx_in = in_args,
+		   .ctx_size_in = sizeof(in_args),
+	);
+
+	skel = test_urandom_usdt__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+	bss = skel->bss;
+
+	fd = bpf_program__fd(skel->progs.save_fpregs_and_read);
+	regval[0] = 42;
+	regval[1] = 0;
+	asm("movdqa %0, %%xmm10" : "=m"(*(char *)regval));
+
+	err = bpf_prog_test_run_opts(fd, &attr);
+	ASSERT_OK(err, "save_fpregs_and_read");
+	ASSERT_EQ(bss->fpregs_dummy_opts_xmm_val, 42, "fpregs_dummy_opts_xmm_val");
+
+	close(fd);
+	test_urandom_usdt__destroy(skel);
+}
+
 void test_usdt(void)
 {
 	if (test__start_subtest("basic"))
@@ -425,4 +465,6 @@ void test_usdt(void)
 		subtest_urandom_usdt(true /* auto_attach */);
 	if (test__start_subtest("urand_pid_attach"))
 		subtest_urandom_usdt(false /* auto_attach */);
+	if (test__start_subtest("bpf_get_reg_val_fpustate"))
+		subtest_reg_val_fpustate();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_urandom_usdt.c b/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
index 575761863eb6..2c8b6709606a 100644
--- a/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
+++ b/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
@@ -67,6 +67,30 @@ int BPF_USDT(urandlib_read_with_sema, int iter_num, int iter_cnt, int buf_sz)
 	return 0;
 }
 
+extern void bpf_testmod_save_fpregs(void) __ksym;
+
+u64 fpregs_dummy_opts_xmm_val;
+
+SEC("struct_ops/save_fpregs_and_read")
+int BPF_PROG(save_fpregs_and_read, struct bpf_dummy_ops_state *unused)
+{
+	struct task_struct *tsk;
+	u64 val[2];
+
+	bpf_testmod_save_fpregs();
+	tsk = bpf_get_current_task_btf();
+
+	bpf_get_reg_val(&val[0], 16, (u64)BPF_GETREG_X86_XMM10 << 32, NULL, tsk);
+	__sync_fetch_and_add(&fpregs_dummy_opts_xmm_val, val[0]);
+
+	return 0;
+}
+
+SEC(".struct_ops")
+struct bpf_dummy_ops dummy_ops = {
+	.test_1 = (void *)save_fpregs_and_read,
+};
+
 int urandlib_xmm_reg_read_buf_sz_sum;
 SEC("usdt/./liburandom_read_xmm.so:urandlib:xmm_reg_read")
 int BPF_USDT(urandlib_xmm_reg_read, int *f1, int *f2, int *f3, int a, int b,
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index a07da648af3b..27a3e8cb9c36 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -620,6 +620,9 @@ int kern_sync_rcu(void)
 
 static void unload_bpf_testmod(void)
 {
+	bool tried_again = false;
+
+again:
 	if (kern_sync_rcu())
 		fprintf(env.stderr, "Failed to trigger kernel-side RCU sync!\n");
 	if (delete_module("bpf_testmod", 0)) {
@@ -627,6 +630,10 @@ static void unload_bpf_testmod(void)
 			if (verbose())
 				fprintf(stdout, "bpf_testmod.ko is already unloaded.\n");
 			return;
+		} else if (errno == EAGAIN && !tried_again) {
+			tried_again = true;
+			usleep(20 * 1000);
+			goto again;
 		}
 		fprintf(env.stderr, "Failed to unload bpf_testmod.ko from kernel: %d\n", -errno);
 		return;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 1/5] x86/fpu: Move context.h to include/asm
  2022-05-12  7:43 ` [RFC PATCH bpf-next 1/5] x86/fpu: Move context.h to include/asm Dave Marchevsky
@ 2022-05-12 13:56   ` David Vernet
  2022-05-14  0:44   ` Alexei Starovoitov
  1 sibling, 0 replies; 20+ messages in thread
From: David Vernet @ 2022-05-12 13:56 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team

On Thu, May 12, 2022 at 12:43:17AM -0700, Dave Marchevsky wrote:
> The file's fpregs_state_valid function is useful outside of
> arch/x86/kernel/fpu dir. Further commits in this series use
> fpregs_state_valid to determine whether a BPF helper should fetch
> fpu reg value from xsave'd memory or register.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  arch/x86/{kernel => include/asm}/fpu/context.h | 2 ++
>  arch/x86/kernel/fpu/core.c                     | 2 +-
>  arch/x86/kernel/fpu/regset.c                   | 2 +-
>  arch/x86/kernel/fpu/signal.c                   | 2 +-
>  arch/x86/kernel/fpu/xstate.c                   | 2 +-
>  5 files changed, 6 insertions(+), 4 deletions(-)
>  rename arch/x86/{kernel => include/asm}/fpu/context.h (96%)
> 
> diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/include/asm/fpu/context.h
> similarity index 96%
> rename from arch/x86/kernel/fpu/context.h
> rename to arch/x86/include/asm/fpu/context.h
> index 958accf2ccf0..39dac18cd22c 100644
> --- a/arch/x86/kernel/fpu/context.h
> +++ b/arch/x86/include/asm/fpu/context.h
> @@ -51,6 +51,8 @@ static inline void fpregs_activate(struct fpu *fpu)
>  	trace_x86_fpu_regs_activated(fpu);
>  }
>  
> +extern void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask);

This signature is already included in arch/x86/include/asm/fpu/signal.h.
Should we just include that header from arch/x86/include/asm/fpu/context.h
rather than declaring the signature twice?

> +
>  /* Internal helper for switch_fpu_return() and signal frame setup */
>  static inline void fpregs_restore_userregs(void)
>  {
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index c049561f373a..5296112d4273 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -7,6 +7,7 @@
>   *	Gareth Hughes <gareth@valinux.com>, May 2000
>   */
>  #include <asm/fpu/api.h>
> +#include <asm/fpu/context.h>
>  #include <asm/fpu/regset.h>
>  #include <asm/fpu/sched.h>
>  #include <asm/fpu/signal.h>
> @@ -18,7 +19,6 @@
>  #include <linux/pkeys.h>
>  #include <linux/vmalloc.h>
>  
> -#include "context.h"
>  #include "internal.h"
>  #include "legacy.h"
>  #include "xstate.h"
> diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
> index 75ffaef8c299..f93336f332e3 100644
> --- a/arch/x86/kernel/fpu/regset.c
> +++ b/arch/x86/kernel/fpu/regset.c
> @@ -6,10 +6,10 @@
>  #include <linux/vmalloc.h>
>  
>  #include <asm/fpu/api.h>
> +#include <asm/fpu/context.h>
>  #include <asm/fpu/signal.h>
>  #include <asm/fpu/regset.h>
>  
> -#include "context.h"
>  #include "internal.h"
>  #include "legacy.h"
>  #include "xstate.h"
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index 91d4b6de58ab..f099a56c9a93 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -7,6 +7,7 @@
>  #include <linux/cpu.h>
>  #include <linux/pagemap.h>
>  
> +#include <asm/fpu/context.h>
>  #include <asm/fpu/signal.h>
>  #include <asm/fpu/regset.h>
>  #include <asm/fpu/xstate.h>
> @@ -15,7 +16,6 @@
>  #include <asm/trapnr.h>
>  #include <asm/trace/fpu.h>
>  
> -#include "context.h"
>  #include "internal.h"
>  #include "legacy.h"
>  #include "xstate.h"
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 39e1c8626ab9..ab5e26075716 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -15,6 +15,7 @@
>  #include <linux/vmalloc.h>
>  
>  #include <asm/fpu/api.h>
> +#include <asm/fpu/context.h>
>  #include <asm/fpu/regset.h>
>  #include <asm/fpu/signal.h>
>  #include <asm/fpu/xcr.h>
> @@ -23,7 +24,6 @@
>  #include <asm/prctl.h>
>  #include <asm/elf.h>
>  
> -#include "context.h"
>  #include "internal.h"
>  #include "legacy.h"
>  #include "xstate.h"
> -- 
> 2.30.2
> 

Looks reasonable otherwise, though it's unclear to me (as I'm not an expert
in this code) whether we should or shouldn't export all of these functions
if some of them are specific to the logic in arch/x86/kernel/fpu. It looks
like there's already a precedent for doing that with e.g.
fpu__clear_user_states(), so it seems fine.

Thanks,
David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 2/5] bpf: add get_reg_val helper
  2022-05-12  7:43 ` [RFC PATCH bpf-next 2/5] bpf: add get_reg_val helper Dave Marchevsky
@ 2022-05-12 15:29   ` David Vernet
  2022-05-18  8:07     ` Dave Marchevsky
  2022-05-14  0:41   ` Alexei Starovoitov
  1 sibling, 1 reply; 20+ messages in thread
From: David Vernet @ 2022-05-12 15:29 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team

On Thu, May 12, 2022 at 12:43:18AM -0700, Dave Marchevsky wrote:
> Add a helper which reads the value of specified register into memory.
> 
> Currently, bpf programs only have access to general-purpose registers
> via struct pt_regs. Other registers, like SSE regs %xmm0-15, are
> inaccessible, which makes some tracing usecases impossible. For example,
> User Statically-Defined Tracing (USDT) probes may use SSE registers to
> pass their arguments on x86. While this patch adds support for %xmm0-15
> only, the helper is meant to be generic enough to support fetching any
> reg.
> 
> A useful "value of register" definition for bpf programs is "value of
> register before control transfer to kernel". pt_regs gives us this
> currently, so it's the default behavior of the new helper. Fetching the
> actual _current_ reg value is possible, though, by passing
> BPF_GETREG_F_CURRENT flag as part of input.
> 
> For SSE regs we try to avoid digging around in task's fpu state by first
> reading _current_ value, then checking to see if the state of cpu's
> floating point regs matches task's view of them. If so, we can just
> return _current_ value.
> 
> Further usecases which are straightforward to support, but
> unimplemented:
>   * using the helper to fetch general-purpose register value.
>   currently-unused pt_regs parameter exists for this reason.
> 
>   * fetching rdtsc (w/ BPF_GETREG_F_CURRENT)
> 
>   * other architectures. s390 specifically might benefit from similar
>   fpu reg fetching as USDT library was recently updated to support that
>   architecture.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  include/uapi/linux/bpf.h       |  40 +++++++++
>  kernel/trace/bpf_trace.c       | 148 +++++++++++++++++++++++++++++++++
>  kernel/trace/bpf_trace.h       |   1 +
>  tools/include/uapi/linux/bpf.h |  40 +++++++++
>  4 files changed, 229 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 444fe6f1cf35..3ef8f683ed9e 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5154,6 +5154,18 @@ union bpf_attr {
>   *		if not NULL, is a reference which must be released using its
>   *		corresponding release function, or moved into a BPF map before
>   *		program exit.
> + *
> + * long bpf_get_reg_val(void *dst, u32 size, u64 getreg_spec, struct pt_regs *regs, struct task_struct *tsk)
> + *	Description
> + *		Store the value of a SSE register specified by *getreg_spec*

Even though this patch only adds support for SSE, if the helper is meant to
be generic enough to support fetching any register, should the description
be updated to not imply that it's only meant for fetching SSE? IMO the
example below is sufficient to indicate that it can be used to fetch SSE
regs.

> + *		into memory region of size *size* specified by *dst*. *getreg_spec*
> + *		is a combination of BPF_GETREG enum AND BPF_GETREG_F flag e.g.
> + *		(BPF_GETREG_X86_XMM0 << 32) | BPF_GETREG_F_CURRENT.*
> + *	Return
> + *		0 on success
> + *		**-ENOENT** if the system architecture does not have requested reg
> + *		**-EINVAL** if *getreg_spec* is invalid
> + *		**-EINVAL** if *size* != bytes necessary to store requested reg val
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -5351,6 +5363,7 @@ union bpf_attr {
>  	FN(skb_set_tstamp),		\
>  	FN(ima_file_hash),		\
>  	FN(kptr_xchg),			\
> +	FN(get_reg_val),		\
>  	/* */
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> @@ -6318,6 +6331,33 @@ struct bpf_perf_event_value {
>  	__u64 running;
>  };
>  
> +/* bpf_get_reg_val register enum */
> +enum {
> +	BPF_GETREG_X86_XMM0 = 0,

I know we do this in a few places in bpf.h, so please feel free to ignore
this, but the C standard (section 6.7.2.2.1) formally states that if no
value is specified for the first enumerator that its value is 0, so
specifying the value here is strictly unnecessary. We're inconsistent in
how we apply this in bpf.h, but IMHO if we're adding new enums, we should
do the "standard" thing and only define the first element if it's nonzero.

> +	BPF_GETREG_X86_XMM1,
> +	BPF_GETREG_X86_XMM2,
> +	BPF_GETREG_X86_XMM3,
> +	BPF_GETREG_X86_XMM4,
> +	BPF_GETREG_X86_XMM5,
> +	BPF_GETREG_X86_XMM6,
> +	BPF_GETREG_X86_XMM7,
> +	BPF_GETREG_X86_XMM8,
> +	BPF_GETREG_X86_XMM9,
> +	BPF_GETREG_X86_XMM10,
> +	BPF_GETREG_X86_XMM11,
> +	BPF_GETREG_X86_XMM12,
> +	BPF_GETREG_X86_XMM13,
> +	BPF_GETREG_X86_XMM14,
> +	BPF_GETREG_X86_XMM15,
> +	__MAX_BPF_GETREG,
> +};
> +
> +/* bpf_get_reg_val flags */
> +enum {
> +	BPF_GETREG_F_NONE = 0,
> +	BPF_GETREG_F_CURRENT = (1U << 0),
> +};

Can you add a comment specifying what the BPF_GETREG_F_CURRENT flag does?
The commit summary is very helpful, but it would be good to persist this in
code as well.

> +
>  enum {
>  	BPF_DEVCG_ACC_MKNOD	= (1ULL << 0),
>  	BPF_DEVCG_ACC_READ	= (1ULL << 1),
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f15b826f9899..0de7d6b3af5b 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -28,6 +28,10 @@
>  
>  #include <asm/tlb.h>
>  
> +#ifdef CONFIG_X86
> +#include <asm/fpu/context.h>
> +#endif
> +
>  #include "trace_probe.h"
>  #include "trace.h"
>  
> @@ -1166,6 +1170,148 @@ static const struct bpf_func_proto bpf_get_func_arg_cnt_proto = {
>  	.arg1_type	= ARG_PTR_TO_CTX,
>  };
>  
> +#define XMM_REG_SZ 16
> +
> +#define __xmm_space_off(regno)				\
> +	case BPF_GETREG_X86_XMM ## regno:		\
> +		xmm_space_off = regno * 16;		\
> +		break;
> +
> +static long getreg_read_xmm_fxsave(u32 reg, struct task_struct *tsk,
> +				   void *data)
> +{
> +	struct fxregs_state *fxsave;
> +	u32 xmm_space_off;
> +
> +	switch (reg) {
> +	__xmm_space_off(0);
> +	__xmm_space_off(1);
> +	__xmm_space_off(2);
> +	__xmm_space_off(3);
> +	__xmm_space_off(4);
> +	__xmm_space_off(5);
> +	__xmm_space_off(6);
> +	__xmm_space_off(7);
> +#ifdef	CONFIG_X86_64
> +	__xmm_space_off(8);
> +	__xmm_space_off(9);
> +	__xmm_space_off(10);
> +	__xmm_space_off(11);
> +	__xmm_space_off(12);
> +	__xmm_space_off(13);
> +	__xmm_space_off(14);
> +	__xmm_space_off(15);
> +#endif
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	fxsave = &tsk->thread.fpu.fpstate->regs.fxsave;
> +	memcpy(data, (void *)&fxsave->xmm_space + xmm_space_off, XMM_REG_SZ);
> +	return 0;
> +}

Does any of this also need to be wrapped in CONFIG_X86? IIUC, everything in
struct thread_struct is arch specific, so I think this may fail to compile
on a number of other architectures. Per my suggestion below, maybe we
should just compile all of this logic out if we're not on x86, and update
bpf_get_reg_val() to only call bpf_read_sse_reg() on x86?

> +
> +#undef __xmm_space_off
> +
> +static bool getreg_is_xmm(u32 reg)
> +{
> +	return reg >= BPF_GETREG_X86_XMM0 && reg <= BPF_GETREG_X86_XMM15;

I think it's a bit confusing that we have a function here which confirms
that a register is xmm, but then we have ifdef CONFIG_X86_64 in large
switch statements in functions where we actually read the register and then
return -EINVAL.  Should we just update this to do the CONFIG_X6_64
preprocessor check, and then we can assume in getreg_read_xmm_fxsave() and
bpf_read_sse_reg() that the register is a valid xmm register, and avoid
having to do these switch statements at all? Note that this wouldn't change
the existing behavior at all, as we'd still be returning -EINVAL on 32-bit
x86 in either case.

> +}
> +
> +#define __bpf_sse_read(regno)							\
> +	case BPF_GETREG_X86_XMM ## regno:					\
> +		asm("movdqa %%xmm" #regno ", %0" : "=m"(*(char *)data));	\
> +		break;
> +
> +static long bpf_read_sse_reg(u32 reg, u32 flags, struct task_struct *tsk,
> +			     void *data)
> +{
> +#ifdef CONFIG_X86
> +	unsigned long irq_flags;
> +	long err;
> +
> +	switch (reg) {
> +	__bpf_sse_read(0);
> +	__bpf_sse_read(1);
> +	__bpf_sse_read(2);
> +	__bpf_sse_read(3);
> +	__bpf_sse_read(4);
> +	__bpf_sse_read(5);
> +	__bpf_sse_read(6);
> +	__bpf_sse_read(7);
> +#ifdef CONFIG_X86_64
> +	__bpf_sse_read(8);
> +	__bpf_sse_read(9);
> +	__bpf_sse_read(10);
> +	__bpf_sse_read(11);
> +	__bpf_sse_read(12);
> +	__bpf_sse_read(13);
> +	__bpf_sse_read(14);
> +	__bpf_sse_read(15);
> +#endif /* CONFIG_X86_64 */
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (flags & BPF_GETREG_F_CURRENT)
> +		return 0;
> +
> +	if (!fpregs_state_valid(&tsk->thread.fpu, smp_processor_id())) {
> +		local_irq_save(irq_flags);
> +		err = getreg_read_xmm_fxsave(reg, tsk, data);
> +		local_irq_restore(irq_flags);
> +		return err;
> +	}

Should we move the checks for current and fpregs_state_valid() above where
we actually read the registers? That way we can avoid doing the xmm read if
we'd have to read the fxsave region anyways. Not sure if that's common in
practice or really necessary at all. What you have here seems fine as well.

> +
> +	return 0;
> +#else
> +	return -ENOENT;
> +#endif /* CONFIG_X86 */
> +}
> +
> +#undef __bpf_sse_read
> +
> +BPF_CALL_5(get_reg_val, void *, dst, u32, size,
> +	   u64, getreg_spec, struct pt_regs *, regs,
> +	   struct task_struct *, tsk)
> +{
> +	u32 reg, flags;
> +
> +	reg = (u32)(getreg_spec >> 32);
> +	flags = (u32)getreg_spec;
> +	if (reg >= __MAX_BPF_GETREG)
> +		return -EINVAL;
> +
> +	if (getreg_is_xmm(reg)) {
> +#ifndef CONFIG_X86
> +		return -ENOENT;

On CONFIG_X86 but !CONFIG_X86_64, we return -EINVAL if we try to access the
wrong xmm register. Should we just change this to be return -EINVAL to keep
the return value consistent between architectures? Or we should update the
32 bit x86 case to return -ENOENT as well, and probably update the last
return -EINVAL statement in the function to be return -ENOENT. In general,
I'd say that returning -ENOENT if a register is specified that's
< __MAX_BPF_GETREG seems like the most intuitive API.

> +#else

Is it necessary to have this ifdef check here if you also have it in
bpf_read_sse_reg()? Maybe it makes more sense to keep this preprocessor
check, and compile out bpf_read_sse_reg() altogether on other
architectures? I think that probably makes sense given that we likely also
want to wrap __bpf_sse_read() in an ifdef given that it emits x86 asm, and
getreg_read_xmm_fxsave() which relies on the x86 definition of struct
thread_struct.

> +		if (size != XMM_REG_SZ)
> +			return -EINVAL;
> +
> +		return bpf_read_sse_reg(reg, flags, tsk, dst);
> +	}
> +
> +	return -EINVAL;
> +#endif
> +}
> +
> +BTF_ID_LIST(bpf_get_reg_val_ids)
> +BTF_ID(struct, pt_regs)
> +
> +static const struct bpf_func_proto bpf_get_reg_val_proto = {
> +	.func	= get_reg_val,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type	= ARG_CONST_SIZE,
> +	.arg3_type	= ARG_ANYTHING,
> +	.arg4_type	= ARG_PTR_TO_BTF_ID_OR_NULL,
> +	.arg4_btf_id	= &bpf_get_reg_val_ids[0],
> +	.arg5_type	= ARG_PTR_TO_BTF_ID_OR_NULL,
> +	.arg5_btf_id	= &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
> +};
> +
>  static const struct bpf_func_proto *
>  bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  {
> @@ -1287,6 +1433,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  		return &bpf_find_vma_proto;
>  	case BPF_FUNC_trace_vprintk:
>  		return bpf_get_trace_vprintk_proto();
> +	case BPF_FUNC_get_reg_val:
> +		return &bpf_get_reg_val_proto;
>  	default:
>  		return bpf_base_func_proto(func_id);
>  	}
> diff --git a/kernel/trace/bpf_trace.h b/kernel/trace/bpf_trace.h
> index 9acbc11ac7bb..b4b55706c2dd 100644
> --- a/kernel/trace/bpf_trace.h
> +++ b/kernel/trace/bpf_trace.h
> @@ -29,6 +29,7 @@ TRACE_EVENT(bpf_trace_printk,
>  
>  #undef TRACE_INCLUDE_PATH
>  #define TRACE_INCLUDE_PATH .
> +#undef TRACE_INCLUDE_FILE
>  #define TRACE_INCLUDE_FILE bpf_trace
>  
>  #include <trace/define_trace.h>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 444fe6f1cf35..3ef8f683ed9e 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5154,6 +5154,18 @@ union bpf_attr {
>   *		if not NULL, is a reference which must be released using its
>   *		corresponding release function, or moved into a BPF map before
>   *		program exit.
> + *
> + * long bpf_get_reg_val(void *dst, u32 size, u64 getreg_spec, struct pt_regs *regs, struct task_struct *tsk)
> + *	Description
> + *		Store the value of a SSE register specified by *getreg_spec*
> + *		into memory region of size *size* specified by *dst*. *getreg_spec*
> + *		is a combination of BPF_GETREG enum AND BPF_GETREG_F flag e.g.
> + *		(BPF_GETREG_X86_XMM0 << 32) | BPF_GETREG_F_CURRENT.*
> + *	Return
> + *		0 on success
> + *		**-ENOENT** if the system architecture does not have requested reg
> + *		**-EINVAL** if *getreg_spec* is invalid
> + *		**-EINVAL** if *size* != bytes necessary to store requested reg val
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -5351,6 +5363,7 @@ union bpf_attr {
>  	FN(skb_set_tstamp),		\
>  	FN(ima_file_hash),		\
>  	FN(kptr_xchg),			\
> +	FN(get_reg_val),		\
>  	/* */
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> @@ -6318,6 +6331,33 @@ struct bpf_perf_event_value {
>  	__u64 running;
>  };
>  
> +/* bpf_get_reg_val register enum */
> +enum {
> +	BPF_GETREG_X86_XMM0 = 0,
> +	BPF_GETREG_X86_XMM1,
> +	BPF_GETREG_X86_XMM2,
> +	BPF_GETREG_X86_XMM3,
> +	BPF_GETREG_X86_XMM4,
> +	BPF_GETREG_X86_XMM5,
> +	BPF_GETREG_X86_XMM6,
> +	BPF_GETREG_X86_XMM7,
> +	BPF_GETREG_X86_XMM8,
> +	BPF_GETREG_X86_XMM9,
> +	BPF_GETREG_X86_XMM10,
> +	BPF_GETREG_X86_XMM11,
> +	BPF_GETREG_X86_XMM12,
> +	BPF_GETREG_X86_XMM13,
> +	BPF_GETREG_X86_XMM14,
> +	BPF_GETREG_X86_XMM15,
> +	__MAX_BPF_GETREG,
> +};
> +
> +/* bpf_get_reg_val flags */
> +enum {
> +	BPF_GETREG_F_NONE = 0,
> +	BPF_GETREG_F_CURRENT = (1U << 0),
> +};
> +
>  enum {
>  	BPF_DEVCG_ACC_MKNOD	= (1ULL << 0),
>  	BPF_DEVCG_ACC_READ	= (1ULL << 1),
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 5/5] selftests/bpf: get_reg_val test exercising fxsave fetch
  2022-05-12  7:43 ` [RFC PATCH bpf-next 5/5] selftests/bpf: get_reg_val test exercising fxsave fetch Dave Marchevsky
@ 2022-05-12 17:47   ` Dave Marchevsky
  2022-05-16 23:28   ` Andrii Nakryiko
  1 sibling, 0 replies; 20+ messages in thread
From: Dave Marchevsky @ 2022-05-12 17:47 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team

On 5/12/22 3:43 AM, Dave Marchevsky wrote:   
> Add a test which calls bpf_get_reg_val with an xmm reg after forcing fpu
> state save. The test program writes to %xmm10, then calls a BPF program
> which forces fpu save and calls bpf_get_reg_val. This guarantees that
> !fpregs_state_valid check will succeed, forcing bpf_get_reg_val to fetch
> %xmm10's value from task's fpu state.
> 
> A bpf_testmod_save_fpregs kfunc helper is added to bpf_testmod to enable
> 'force fpu save'. Existing bpf_dummy_ops test infra is extended to
> support calling the kfunc.
> 
> unload_bpf_testmod would often fail with -EAGAIN when running the test
> added in this patch, so a single retry w/ 20ms sleep is added.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  include/linux/bpf.h                           |  1 +
>  kernel/trace/bpf_trace.c                      |  2 +-
>  net/bpf/bpf_dummy_struct_ops.c                | 13 ++++++
>  .../selftests/bpf/bpf_testmod/bpf_testmod.c   | 13 ++++++
>  tools/testing/selftests/bpf/prog_tests/usdt.c | 42 +++++++++++++++++++
>  .../selftests/bpf/progs/test_urandom_usdt.c   | 24 +++++++++++
>  tools/testing/selftests/bpf/test_progs.c      |  7 ++++
>  7 files changed, 101 insertions(+), 1 deletion(-)

This series wasn't based on latest bpf-next. After rebase, this test
causes kernel panic. Investigating, but patches are still worth a
look.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 2/5] bpf: add get_reg_val helper
  2022-05-12  7:43 ` [RFC PATCH bpf-next 2/5] bpf: add get_reg_val helper Dave Marchevsky
  2022-05-12 15:29   ` David Vernet
@ 2022-05-14  0:41   ` Alexei Starovoitov
  2022-05-18  7:35     ` Dave Marchevsky
  1 sibling, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2022-05-14  0:41 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team

On Thu, May 12, 2022 at 12:43:18AM -0700, Dave Marchevsky wrote:
> Add a helper which reads the value of specified register into memory.
> 
> Currently, bpf programs only have access to general-purpose registers
> via struct pt_regs. Other registers, like SSE regs %xmm0-15, are
> inaccessible, which makes some tracing usecases impossible. For example,
> User Statically-Defined Tracing (USDT) probes may use SSE registers to
> pass their arguments on x86. While this patch adds support for %xmm0-15
> only, the helper is meant to be generic enough to support fetching any
> reg.
> 
> A useful "value of register" definition for bpf programs is "value of
> register before control transfer to kernel". pt_regs gives us this
> currently, so it's the default behavior of the new helper. Fetching the
> actual _current_ reg value is possible, though, by passing
> BPF_GETREG_F_CURRENT flag as part of input.
> 
> For SSE regs we try to avoid digging around in task's fpu state by first
> reading _current_ value, then checking to see if the state of cpu's
> floating point regs matches task's view of them. If so, we can just
> return _current_ value.
> 
> Further usecases which are straightforward to support, but
> unimplemented:
>   * using the helper to fetch general-purpose register value.
>   currently-unused pt_regs parameter exists for this reason.
> 
>   * fetching rdtsc (w/ BPF_GETREG_F_CURRENT)
> 
>   * other architectures. s390 specifically might benefit from similar
>   fpu reg fetching as USDT library was recently updated to support that
>   architecture.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  include/uapi/linux/bpf.h       |  40 +++++++++
>  kernel/trace/bpf_trace.c       | 148 +++++++++++++++++++++++++++++++++
>  kernel/trace/bpf_trace.h       |   1 +
>  tools/include/uapi/linux/bpf.h |  40 +++++++++
>  4 files changed, 229 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 444fe6f1cf35..3ef8f683ed9e 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5154,6 +5154,18 @@ union bpf_attr {
>   *		if not NULL, is a reference which must be released using its
>   *		corresponding release function, or moved into a BPF map before
>   *		program exit.
> + *
> + * long bpf_get_reg_val(void *dst, u32 size, u64 getreg_spec, struct pt_regs *regs, struct task_struct *tsk)
> + *	Description
> + *		Store the value of a SSE register specified by *getreg_spec*
> + *		into memory region of size *size* specified by *dst*. *getreg_spec*
> + *		is a combination of BPF_GETREG enum AND BPF_GETREG_F flag e.g.
> + *		(BPF_GETREG_X86_XMM0 << 32) | BPF_GETREG_F_CURRENT.*
> + *	Return
> + *		0 on success
> + *		**-ENOENT** if the system architecture does not have requested reg
> + *		**-EINVAL** if *getreg_spec* is invalid
> + *		**-EINVAL** if *size* != bytes necessary to store requested reg val
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -5351,6 +5363,7 @@ union bpf_attr {
>  	FN(skb_set_tstamp),		\
>  	FN(ima_file_hash),		\
>  	FN(kptr_xchg),			\
> +	FN(get_reg_val),		\
>  	/* */
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> @@ -6318,6 +6331,33 @@ struct bpf_perf_event_value {
>  	__u64 running;
>  };
>  
> +/* bpf_get_reg_val register enum */
> +enum {
> +	BPF_GETREG_X86_XMM0 = 0,
> +	BPF_GETREG_X86_XMM1,
> +	BPF_GETREG_X86_XMM2,
> +	BPF_GETREG_X86_XMM3,
> +	BPF_GETREG_X86_XMM4,
> +	BPF_GETREG_X86_XMM5,
> +	BPF_GETREG_X86_XMM6,
> +	BPF_GETREG_X86_XMM7,
> +	BPF_GETREG_X86_XMM8,
> +	BPF_GETREG_X86_XMM9,
> +	BPF_GETREG_X86_XMM10,
> +	BPF_GETREG_X86_XMM11,
> +	BPF_GETREG_X86_XMM12,
> +	BPF_GETREG_X86_XMM13,
> +	BPF_GETREG_X86_XMM14,
> +	BPF_GETREG_X86_XMM15,
> +	__MAX_BPF_GETREG,
> +};

Can we do BPF_GETREG_X86_XMM plus number instead?
Enumerating every possible register will take quite some space in uapi
and bpf progs probably won't be using these enum values directly anyway.
usdt spec will have something like "xmm5" as a string.

> +
> +/* bpf_get_reg_val flags */
> +enum {
> +	BPF_GETREG_F_NONE = 0,
> +	BPF_GETREG_F_CURRENT = (1U << 0),
> +};
> +
>  enum {
>  	BPF_DEVCG_ACC_MKNOD	= (1ULL << 0),
>  	BPF_DEVCG_ACC_READ	= (1ULL << 1),
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f15b826f9899..0de7d6b3af5b 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -28,6 +28,10 @@
>  
>  #include <asm/tlb.h>
>  
> +#ifdef CONFIG_X86
> +#include <asm/fpu/context.h>
> +#endif
> +
>  #include "trace_probe.h"
>  #include "trace.h"
>  
> @@ -1166,6 +1170,148 @@ static const struct bpf_func_proto bpf_get_func_arg_cnt_proto = {
>  	.arg1_type	= ARG_PTR_TO_CTX,
>  };
>  
> +#define XMM_REG_SZ 16
> +
> +#define __xmm_space_off(regno)				\
> +	case BPF_GETREG_X86_XMM ## regno:		\
> +		xmm_space_off = regno * 16;		\
> +		break;
> +
> +static long getreg_read_xmm_fxsave(u32 reg, struct task_struct *tsk,
> +				   void *data)
> +{
> +	struct fxregs_state *fxsave;
> +	u32 xmm_space_off;
> +
> +	switch (reg) {
> +	__xmm_space_off(0);
> +	__xmm_space_off(1);
> +	__xmm_space_off(2);
> +	__xmm_space_off(3);
> +	__xmm_space_off(4);
> +	__xmm_space_off(5);
> +	__xmm_space_off(6);
> +	__xmm_space_off(7);
> +#ifdef	CONFIG_X86_64
> +	__xmm_space_off(8);
> +	__xmm_space_off(9);
> +	__xmm_space_off(10);
> +	__xmm_space_off(11);
> +	__xmm_space_off(12);
> +	__xmm_space_off(13);
> +	__xmm_space_off(14);
> +	__xmm_space_off(15);
> +#endif
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	fxsave = &tsk->thread.fpu.fpstate->regs.fxsave;
> +	memcpy(data, (void *)&fxsave->xmm_space + xmm_space_off, XMM_REG_SZ);
> +	return 0;

It's all arch specific.
This one and majority of other functions should probably go
into arch/x86/net/bpf_jit_comp.c? instead of generic code.
bpf_trace.c doesn't fit.

Try to avoid all ifdef-s. It's a red flag.

> +static long bpf_read_sse_reg(u32 reg, u32 flags, struct task_struct *tsk,
> +			     void *data)
> +{
> +#ifdef CONFIG_X86
> +	unsigned long irq_flags;
> +	long err;
> +
> +	switch (reg) {
> +	__bpf_sse_read(0);
> +	__bpf_sse_read(1);
> +	__bpf_sse_read(2);
> +	__bpf_sse_read(3);
> +	__bpf_sse_read(4);
> +	__bpf_sse_read(5);
> +	__bpf_sse_read(6);
> +	__bpf_sse_read(7);
> +#ifdef CONFIG_X86_64
> +	__bpf_sse_read(8);
> +	__bpf_sse_read(9);
> +	__bpf_sse_read(10);
> +	__bpf_sse_read(11);
> +	__bpf_sse_read(12);
> +	__bpf_sse_read(13);
> +	__bpf_sse_read(14);
> +	__bpf_sse_read(15);
> +#endif /* CONFIG_X86_64 */
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (flags & BPF_GETREG_F_CURRENT)
> +		return 0;
> +
> +	if (!fpregs_state_valid(&tsk->thread.fpu, smp_processor_id())) {
> +		local_irq_save(irq_flags);

why disable irqs?

> +		err = getreg_read_xmm_fxsave(reg, tsk, data);
> +		local_irq_restore(irq_flags);
> +		return err;
> +	}

What is the use case to read other task regs?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 3/5] libbpf: usdt lib wiring of xmm reads
  2022-05-12  7:43 ` [RFC PATCH bpf-next 3/5] libbpf: usdt lib wiring of xmm reads Dave Marchevsky
@ 2022-05-14  0:43   ` Alexei Starovoitov
  2022-05-16 23:26   ` Andrii Nakryiko
  1 sibling, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2022-05-14  0:43 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team

On Thu, May 12, 2022 at 12:43:19AM -0700, Dave Marchevsky wrote:
> +		err = bpf_get_reg_val(&val, sizeof(val),
> +				     ((u64)arg_spec->reg_off + BPF_GETREG_X86_XMM0) << 32,
> +				     btf_regs, btf_task);

That illustrated the point from patch 2.
The above is probably the typical usage.
Just BPF_GETREG_X86_XMM in uapi enum would have been enough.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 1/5] x86/fpu: Move context.h to include/asm
  2022-05-12  7:43 ` [RFC PATCH bpf-next 1/5] x86/fpu: Move context.h to include/asm Dave Marchevsky
  2022-05-12 13:56   ` David Vernet
@ 2022-05-14  0:44   ` Alexei Starovoitov
  1 sibling, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2022-05-14  0:44 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team

On Thu, May 12, 2022 at 12:43:17AM -0700, Dave Marchevsky wrote:
> The file's fpregs_state_valid function is useful outside of
> arch/x86/kernel/fpu dir. Further commits in this series use
> fpregs_state_valid to determine whether a BPF helper should fetch
> fpu reg value from xsave'd memory or register.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  arch/x86/{kernel => include/asm}/fpu/context.h | 2 ++
>  arch/x86/kernel/fpu/core.c                     | 2 +-
>  arch/x86/kernel/fpu/regset.c                   | 2 +-
>  arch/x86/kernel/fpu/signal.c                   | 2 +-
>  arch/x86/kernel/fpu/xstate.c                   | 2 +-
>  5 files changed, 6 insertions(+), 4 deletions(-)
>  rename arch/x86/{kernel => include/asm}/fpu/context.h (96%)

Why rename and cause all this churn?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 3/5] libbpf: usdt lib wiring of xmm reads
  2022-05-12  7:43 ` [RFC PATCH bpf-next 3/5] libbpf: usdt lib wiring of xmm reads Dave Marchevsky
  2022-05-14  0:43   ` Alexei Starovoitov
@ 2022-05-16 23:26   ` Andrii Nakryiko
  2022-05-18  8:20     ` Dave Marchevsky
  1 sibling, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2022-05-16 23:26 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, Kernel Team

On Thu, May 12, 2022 at 12:43 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> Handle xmm0,...,xmm15 registers when parsing USDT arguments. Currently
> only the first 64 bits of the fetched value are returned as I haven't
> seen the rest of the register used in practice.
>
> This patch also handles floats in USDT arg spec by ignoring the fact
> that they're floats and considering them scalar. Currently we can't do
> float math in BPF programs anyways, so might as well support passing to
> userspace and converting there.
>
> We can use existing ARG_REG sscanf + logic, adding XMM-specific logic
> when calc_pt_regs_off fails. If the reg is xmm, arg_spec's reg_off is
> repurposed to hold reg_no, which is passed to bpf_get_reg_val. Since the
> helper does the digging around in fxregs_state it's not necessary to
> calculate offset in bpf code for these regs.
>
> NOTE: Changes here cause verification failure for existing USDT tests.
> Specifically, BPF_USDT prog 'usdt12' fails to verify due to too many
> insns despite not having its insn count significantly changed.
>
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  tools/lib/bpf/usdt.bpf.h | 36 ++++++++++++++++++++--------
>  tools/lib/bpf/usdt.c     | 51 ++++++++++++++++++++++++++++++++++++----
>  2 files changed, 73 insertions(+), 14 deletions(-)
>
> diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
> index 4181fddb3687..7b5ed4cbaa2f 100644
> --- a/tools/lib/bpf/usdt.bpf.h
> +++ b/tools/lib/bpf/usdt.bpf.h
> @@ -43,6 +43,7 @@ enum __bpf_usdt_arg_type {
>         BPF_USDT_ARG_CONST,
>         BPF_USDT_ARG_REG,
>         BPF_USDT_ARG_REG_DEREF,
> +       BPF_USDT_ARG_XMM_REG,
>  };
>
>  struct __bpf_usdt_arg_spec {
> @@ -129,7 +130,9 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>  {
>         struct __bpf_usdt_spec *spec;
>         struct __bpf_usdt_arg_spec *arg_spec;
> -       unsigned long val;
> +       struct pt_regs *btf_regs;
> +       struct task_struct *btf_task;
> +       struct { __u64 a; __u64 unused; } val = {};
>         int err, spec_id;
>
>         *res = 0;
> @@ -151,7 +154,7 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>                 /* Arg is just a constant ("-4@$-9" in USDT arg spec).
>                  * value is recorded in arg_spec->val_off directly.
>                  */
> -               val = arg_spec->val_off;
> +               val.a = arg_spec->val_off;
>                 break;
>         case BPF_USDT_ARG_REG:
>                 /* Arg is in a register (e.g, "8@%rax" in USDT arg spec),
> @@ -159,7 +162,20 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>                  * struct pt_regs. To keep things simple user-space parts
>                  * record offsetof(struct pt_regs, <regname>) in arg_spec->reg_off.
>                  */
> -               err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> +               err = bpf_probe_read_kernel(&val.a, sizeof(val.a), (void *)ctx + arg_spec->reg_off);
> +               if (err)
> +                       return err;
> +               break;
> +       case BPF_USDT_ARG_XMM_REG:

nit: a bit too XMM-specific name here, we probably want to keep it a bit

> +               /* Same as above, but arg is an xmm reg, so can't look
> +                * in pt_regs, need to use special helper.
> +                * reg_off is the regno ("xmm0" -> regno 0, etc)
> +                */
> +               btf_task = bpf_get_current_task_btf();
> +               btf_regs = (struct pt_regs *)bpf_task_pt_regs(btf_task);

I'd like to avoid taking dependency on bpf_get_current_task_btf() for
rare case of XMM register, which makes it impossible to do USDT on
older kernels. It seems like supporting reading registers from current
(and maybe current pt_regs context) should cover a lot of practical
uses.

> +               err = bpf_get_reg_val(&val, sizeof(val),

But regardless of the above, we'll need to use CO-RE to detect support
for this new BPF helper (probably using bpf_core_enum_value_exists()?)
to allow using USDTs on older kernels.


> +                                    ((u64)arg_spec->reg_off + BPF_GETREG_X86_XMM0) << 32,
> +                                    btf_regs, btf_task);
>                 if (err)
>                         return err;
>                 break;
> @@ -171,14 +187,14 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>                  * from pt_regs, then do another user-space probe read to
>                  * fetch argument value itself.
>                  */
> -               err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> +               err = bpf_probe_read_kernel(&val.a, sizeof(val.a), (void *)ctx + arg_spec->reg_off);
>                 if (err)
>                         return err;
> -               err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
> +               err = bpf_probe_read_user(&val.a, sizeof(val.a), (void *)val.a + arg_spec->val_off);

is the useful value in xmm register normally in lower 64-bits of it?
is it possible to just request reading just the first 64 bits from
bpf_get_reg_val() and avoid this ugly union?

>                 if (err)
>                         return err;
>  #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> -               val >>= arg_spec->arg_bitshift;
> +               val.a >>= arg_spec->arg_bitshift;
>  #endif
>                 break;
>         default:

[...]

> +static int calc_xmm_regno(const char *reg_name)
> +{
> +       static struct {
> +               const char *name;
> +               __u16 regno;
> +       } xmm_reg_map[] = {
> +               { "xmm0",  0 },
> +               { "xmm1",  1 },
> +               { "xmm2",  2 },
> +               { "xmm3",  3 },
> +               { "xmm4",  4 },
> +               { "xmm5",  5 },
> +               { "xmm6",  6 },
> +               { "xmm7",  7 },
> +#ifdef __x86_64__
> +               { "xmm8",  8 },
> +               { "xmm9",  9 },
> +               { "xmm10",  10 },
> +               { "xmm11",  11 },
> +               { "xmm12",  12 },
> +               { "xmm13",  13 },
> +               { "xmm14",  14 },
> +               { "xmm15",  15 },

no-x86 arches parse this generically with sscanf(), seems like we can
do this simple approach here as well?


> +#endif
> +       };
> +       int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(xmm_reg_map); i++) {
> +               if (strcmp(reg_name, xmm_reg_map[i].name) == 0)
> +                       return xmm_reg_map[i].regno;
> +       }
> +
>         return -ENOENT;
>  }
>

[...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 5/5] selftests/bpf: get_reg_val test exercising fxsave fetch
  2022-05-12  7:43 ` [RFC PATCH bpf-next 5/5] selftests/bpf: get_reg_val test exercising fxsave fetch Dave Marchevsky
  2022-05-12 17:47   ` Dave Marchevsky
@ 2022-05-16 23:28   ` Andrii Nakryiko
  1 sibling, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2022-05-16 23:28 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, Kernel Team

On Thu, May 12, 2022 at 12:43 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> Add a test which calls bpf_get_reg_val with an xmm reg after forcing fpu
> state save. The test program writes to %xmm10, then calls a BPF program
> which forces fpu save and calls bpf_get_reg_val. This guarantees that
> !fpregs_state_valid check will succeed, forcing bpf_get_reg_val to fetch
> %xmm10's value from task's fpu state.
>
> A bpf_testmod_save_fpregs kfunc helper is added to bpf_testmod to enable
> 'force fpu save'. Existing bpf_dummy_ops test infra is extended to
> support calling the kfunc.
>
> unload_bpf_testmod would often fail with -EAGAIN when running the test
> added in this patch, so a single retry w/ 20ms sleep is added.
>
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  include/linux/bpf.h                           |  1 +
>  kernel/trace/bpf_trace.c                      |  2 +-
>  net/bpf/bpf_dummy_struct_ops.c                | 13 ++++++

split kernel changes from selftests?

>  .../selftests/bpf/bpf_testmod/bpf_testmod.c   | 13 ++++++
>  tools/testing/selftests/bpf/prog_tests/usdt.c | 42 +++++++++++++++++++
>  .../selftests/bpf/progs/test_urandom_usdt.c   | 24 +++++++++++
>  tools/testing/selftests/bpf/test_progs.c      |  7 ++++
>  7 files changed, 101 insertions(+), 1 deletion(-)
>

[...]

> diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> index e585e1cefc77..b2b35138b097 100644
> --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Copyright (c) 2020 Facebook */
> +#include <asm/fpu/api.h>
>  #include <linux/btf.h>
>  #include <linux/btf_ids.h>
>  #include <linux/error-injection.h>
> @@ -25,6 +26,13 @@ bpf_testmod_test_mod_kfunc(int i)
>         *(int *)this_cpu_ptr(&bpf_testmod_ksym_percpu) = i;
>  }
>
> +noinline void
> +bpf_testmod_save_fpregs(void)
> +{
> +       kernel_fpu_begin();
> +       kernel_fpu_end();

this seems to be x86-specific kernel functions, we need to think about
building selftests (including bpf_testmod) on other architectures


> +}
> +
>  struct bpf_testmod_btf_type_tag_1 {
>         int a;
>  };
> @@ -150,6 +158,7 @@ static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = {
>
>  BTF_SET_START(bpf_testmod_check_kfunc_ids)
>  BTF_ID(func, bpf_testmod_test_mod_kfunc)
> +BTF_ID(func, bpf_testmod_save_fpregs)
>  BTF_SET_END(bpf_testmod_check_kfunc_ids)
>
>  static const struct btf_kfunc_id_set bpf_testmod_kfunc_set = {
> @@ -166,6 +175,10 @@ static int bpf_testmod_init(void)
>         ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_testmod_kfunc_set);
>         if (ret < 0)
>                 return ret;
> +       ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_testmod_kfunc_set);
> +       if (ret < 0)
> +               return ret;
> +
>         if (bpf_fentry_test1(0) < 0)
>                 return -EINVAL;
>         return sysfs_create_bin_file(kernel_kobj, &bin_attr_bpf_testmod_file);
> diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> index f98749ac74a7..3866cb004b22 100644
> --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> @@ -8,6 +8,11 @@
>  #include "test_usdt.skel.h"
>  #include "test_urandom_usdt.skel.h"
>
> +/* Need to keep consistent with definition in include/linux/bpf.h */
> +struct bpf_dummy_ops_state {
> +       int val;
> +};
> +
>  int lets_test_this(int);
>
>  static volatile int idx = 2;
> @@ -415,6 +420,41 @@ static void subtest_urandom_usdt(bool auto_attach)
>         test_urandom_usdt__destroy(skel);
>  }
>
> +static void subtest_reg_val_fpustate(void)
> +{
> +       struct bpf_dummy_ops_state in_state;
> +       struct test_urandom_usdt__bss *bss;
> +       struct test_urandom_usdt *skel;
> +       u64 in_args[1];
> +       u64 regval[2];
> +       int err, fd;
> +
> +       in_state.val = 0; /* unused */
> +       in_args[0] = (unsigned long)&in_state;
> +
> +       LIBBPF_OPTS(bpf_test_run_opts, attr,

nit: LIBBPF_OPTS declares variable, so it had to be in variable
declaration block

> +                  .ctx_in = in_args,
> +                  .ctx_size_in = sizeof(in_args),
> +       );
> +
> +       skel = test_urandom_usdt__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "skel_open"))
> +               return;
> +       bss = skel->bss;
> +
> +       fd = bpf_program__fd(skel->progs.save_fpregs_and_read);
> +       regval[0] = 42;
> +       regval[1] = 0;
> +       asm("movdqa %0, %%xmm10" : "=m"(*(char *)regval));
> +
> +       err = bpf_prog_test_run_opts(fd, &attr);
> +       ASSERT_OK(err, "save_fpregs_and_read");
> +       ASSERT_EQ(bss->fpregs_dummy_opts_xmm_val, 42, "fpregs_dummy_opts_xmm_val");
> +
> +       close(fd);
> +       test_urandom_usdt__destroy(skel);
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 4/5] selftests/bpf: Add test for USDT parse of xmm reg
  2022-05-12  7:43 ` [RFC PATCH bpf-next 4/5] selftests/bpf: Add test for USDT parse of xmm reg Dave Marchevsky
@ 2022-05-16 23:31   ` Andrii Nakryiko
  2022-05-17  1:17     ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2022-05-16 23:31 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, Kernel Team

On Thu, May 12, 2022 at 12:43 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> Validate that bpf_get_reg_val helper solves the motivating problem of
> this patch series: USDT args passed through xmm regs. The userspace
> portion of the test forces STAP_PROBE macro to use %xmm0 and %xmm1 regs
> to pass a float and an int, which the bpf-side successfully reads using
> BPF_USDT.
>
> In the wild I discovered a sanely-configured USDT in Fedora libpthread
> using xmm regs to pass scalar values, likely due to register pressure.
> urandom_read_lib_xmm mimics this by using -ffixed-$REG flag to mark
> r11-r14 unusable and passing many USDT args.
>
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  tools/testing/selftests/bpf/Makefile          |  8 ++-
>  tools/testing/selftests/bpf/prog_tests/usdt.c |  7 +++
>  .../selftests/bpf/progs/test_urandom_usdt.c   | 13 ++++
>  tools/testing/selftests/bpf/urandom_read.c    |  3 +
>  .../selftests/bpf/urandom_read_lib_xmm.c      | 62 +++++++++++++++++++
>  5 files changed, 91 insertions(+), 2 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/urandom_read_lib_xmm.c
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 6bbc03161544..19246e34dfe1 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -172,10 +172,14 @@ $(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c
>         $(call msg,LIB,,$@)
>         $(Q)$(CC) $(CFLAGS) -fPIC $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
>
> -$(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_read.so
> +$(OUTPUT)/liburandom_read_xmm.so: urandom_read_lib_xmm.c
> +       $(call msg,LIB,,$@)
> +       $(Q)$(CC) -O0 -ffixed-r11 -ffixed-r12 -ffixed-r13 -ffixed-r14 -fPIC $(LDFLAGS) $^ $(LDLIBS) --shared -o $@

this looks very x86-specific, but we support other architectures as well

looking at sdt.h, it seems like STAP_PROBEx() macros support being
called from assembly code, I wonder if it would be better to try to
figure out how to use it from assembly and use some xmm register
directly in inline assembly? I have never done that before, but am
hopeful :)

> +
> +$(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_read.so $(OUTPUT)/liburandom_read_xmm.so
>         $(call msg,BINARY,,$@)
>         $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.c,$^)                        \
> -                 liburandom_read.so $(LDLIBS)                                 \
> +                 liburandom_read.so liburandom_read_xmm.so $(LDLIBS)          \
>                   -Wl,-rpath=. -Wl,--build-id=sha1 -o $@
>
>  $(OUTPUT)/bpf_testmod.ko: $(VMLINUX_BTF) $(wildcard bpf_testmod/Makefile bpf_testmod/*.[ch])
> diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> index a71f51bdc08d..f98749ac74a7 100644
> --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> @@ -385,6 +385,12 @@ static void subtest_urandom_usdt(bool auto_attach)
>                         goto cleanup;
>                 skel->links.urandlib_read_with_sema = l;
>
> +               l = bpf_program__attach_usdt(skel->progs.urandlib_xmm_reg_read,
> +                                            urand_pid, "./liburandom_read_xmm.so",
> +                                            "urandlib", "xmm_reg_read", NULL);
> +               if (!ASSERT_OK_PTR(l, "urandlib_xmm_reg_read"))
> +                       goto cleanup;
> +               skel->links.urandlib_xmm_reg_read = l;
>         }
>
>         /* trigger urandom_read USDTs */
> @@ -402,6 +408,7 @@ static void subtest_urandom_usdt(bool auto_attach)
>         ASSERT_EQ(bss->urandlib_read_with_sema_call_cnt, 1, "urandlib_w_sema_cnt");
>         ASSERT_EQ(bss->urandlib_read_with_sema_buf_sz_sum, 256, "urandlib_w_sema_sum");
>
> +       ASSERT_EQ(bss->urandlib_xmm_reg_read_buf_sz_sum, 256, "liburandom_read_xmm.so");
>  cleanup:
>         if (urand_pipe)
>                 pclose(urand_pipe);
> diff --git a/tools/testing/selftests/bpf/progs/test_urandom_usdt.c b/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
> index 3539b02bd5f7..575761863eb6 100644
> --- a/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
> +++ b/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
> @@ -67,4 +67,17 @@ int BPF_USDT(urandlib_read_with_sema, int iter_num, int iter_cnt, int buf_sz)
>         return 0;
>  }
>
> +int urandlib_xmm_reg_read_buf_sz_sum;

nit: empty line here

> +SEC("usdt/./liburandom_read_xmm.so:urandlib:xmm_reg_read")
> +int BPF_USDT(urandlib_xmm_reg_read, int *f1, int *f2, int *f3, int a, int b,
> +                                    int c /*should be float */, int d, int e,
> +                                    int f, int g, int h, int buf_sz)
> +{
> +       if (urand_pid != (bpf_get_current_pid_tgid() >> 32))
> +               return 0;
> +
> +       __sync_fetch_and_add(&urandlib_xmm_reg_read_buf_sz_sum, buf_sz);
> +       return 0;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 4/5] selftests/bpf: Add test for USDT parse of xmm reg
  2022-05-16 23:31   ` Andrii Nakryiko
@ 2022-05-17  1:17     ` Alexei Starovoitov
  2022-05-18 23:56       ` Andrii Nakryiko
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2022-05-17  1:17 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Rik van Riel, Ilya Leoshkevich, Yonghong Song,
	Kernel Team

On Mon, May 16, 2022 at 04:31:55PM -0700, Andrii Nakryiko wrote:
> On Thu, May 12, 2022 at 12:43 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
> >
> > Validate that bpf_get_reg_val helper solves the motivating problem of
> > this patch series: USDT args passed through xmm regs. The userspace
> > portion of the test forces STAP_PROBE macro to use %xmm0 and %xmm1 regs
> > to pass a float and an int, which the bpf-side successfully reads using
> > BPF_USDT.
> >
> > In the wild I discovered a sanely-configured USDT in Fedora libpthread
> > using xmm regs to pass scalar values, likely due to register pressure.
> > urandom_read_lib_xmm mimics this by using -ffixed-$REG flag to mark
> > r11-r14 unusable and passing many USDT args.
> >
> > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > ---
> >  tools/testing/selftests/bpf/Makefile          |  8 ++-
> >  tools/testing/selftests/bpf/prog_tests/usdt.c |  7 +++
> >  .../selftests/bpf/progs/test_urandom_usdt.c   | 13 ++++
> >  tools/testing/selftests/bpf/urandom_read.c    |  3 +
> >  .../selftests/bpf/urandom_read_lib_xmm.c      | 62 +++++++++++++++++++
> >  5 files changed, 91 insertions(+), 2 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/urandom_read_lib_xmm.c
> >
> > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > index 6bbc03161544..19246e34dfe1 100644
> > --- a/tools/testing/selftests/bpf/Makefile
> > +++ b/tools/testing/selftests/bpf/Makefile
> > @@ -172,10 +172,14 @@ $(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c
> >         $(call msg,LIB,,$@)
> >         $(Q)$(CC) $(CFLAGS) -fPIC $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
> >
> > -$(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_read.so
> > +$(OUTPUT)/liburandom_read_xmm.so: urandom_read_lib_xmm.c
> > +       $(call msg,LIB,,$@)
> > +       $(Q)$(CC) -O0 -ffixed-r11 -ffixed-r12 -ffixed-r13 -ffixed-r14 -fPIC $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
> 
> this looks very x86-specific, but we support other architectures as well
> 
> looking at sdt.h, it seems like STAP_PROBEx() macros support being
> called from assembly code, I wonder if it would be better to try to
> figure out how to use it from assembly and use some xmm register
> directly in inline assembly? I have never done that before, but am
> hopeful :)

stap_probe in asm won't help cross arch issue.
It's better to stay with C.
Maybe some makefile magic can make the above x86 only?
The test should be skipped on other archs anyway.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 2/5] bpf: add get_reg_val helper
  2022-05-14  0:41   ` Alexei Starovoitov
@ 2022-05-18  7:35     ` Dave Marchevsky
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Marchevsky @ 2022-05-18  7:35 UTC (permalink / raw)
  To: Alexei Starovoitov, Rik van Riel
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Ilya Leoshkevich, Yonghong Song, kernel-team

On 5/13/22 8:41 PM, Alexei Starovoitov wrote:   
> On Thu, May 12, 2022 at 12:43:18AM -0700, Dave Marchevsky wrote:
>> Add a helper which reads the value of specified register into memory.
>>
>> Currently, bpf programs only have access to general-purpose registers
>> via struct pt_regs. Other registers, like SSE regs %xmm0-15, are
>> inaccessible, which makes some tracing usecases impossible. For example,
>> User Statically-Defined Tracing (USDT) probes may use SSE registers to
>> pass their arguments on x86. While this patch adds support for %xmm0-15
>> only, the helper is meant to be generic enough to support fetching any
>> reg.
>>
>> A useful "value of register" definition for bpf programs is "value of
>> register before control transfer to kernel". pt_regs gives us this
>> currently, so it's the default behavior of the new helper. Fetching the
>> actual _current_ reg value is possible, though, by passing
>> BPF_GETREG_F_CURRENT flag as part of input.
>>
>> For SSE regs we try to avoid digging around in task's fpu state by first
>> reading _current_ value, then checking to see if the state of cpu's
>> floating point regs matches task's view of them. If so, we can just
>> return _current_ value.
>>
>> Further usecases which are straightforward to support, but
>> unimplemented:
>>   * using the helper to fetch general-purpose register value.
>>   currently-unused pt_regs parameter exists for this reason.
>>
>>   * fetching rdtsc (w/ BPF_GETREG_F_CURRENT)
>>
>>   * other architectures. s390 specifically might benefit from similar
>>   fpu reg fetching as USDT library was recently updated to support that
>>   architecture.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>  include/uapi/linux/bpf.h       |  40 +++++++++
>>  kernel/trace/bpf_trace.c       | 148 +++++++++++++++++++++++++++++++++
>>  kernel/trace/bpf_trace.h       |   1 +
>>  tools/include/uapi/linux/bpf.h |  40 +++++++++
>>  4 files changed, 229 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 444fe6f1cf35..3ef8f683ed9e 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -5154,6 +5154,18 @@ union bpf_attr {
>>   *		if not NULL, is a reference which must be released using its
>>   *		corresponding release function, or moved into a BPF map before
>>   *		program exit.
>> + *
>> + * long bpf_get_reg_val(void *dst, u32 size, u64 getreg_spec, struct pt_regs *regs, struct task_struct *tsk)
>> + *	Description
>> + *		Store the value of a SSE register specified by *getreg_spec*
>> + *		into memory region of size *size* specified by *dst*. *getreg_spec*
>> + *		is a combination of BPF_GETREG enum AND BPF_GETREG_F flag e.g.
>> + *		(BPF_GETREG_X86_XMM0 << 32) | BPF_GETREG_F_CURRENT.*
>> + *	Return
>> + *		0 on success
>> + *		**-ENOENT** if the system architecture does not have requested reg
>> + *		**-EINVAL** if *getreg_spec* is invalid
>> + *		**-EINVAL** if *size* != bytes necessary to store requested reg val
>>   */
>>  #define __BPF_FUNC_MAPPER(FN)		\
>>  	FN(unspec),			\
>> @@ -5351,6 +5363,7 @@ union bpf_attr {
>>  	FN(skb_set_tstamp),		\
>>  	FN(ima_file_hash),		\
>>  	FN(kptr_xchg),			\
>> +	FN(get_reg_val),		\
>>  	/* */
>>  
>>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>> @@ -6318,6 +6331,33 @@ struct bpf_perf_event_value {
>>  	__u64 running;
>>  };
>>  
>> +/* bpf_get_reg_val register enum */
>> +enum {
>> +	BPF_GETREG_X86_XMM0 = 0,
>> +	BPF_GETREG_X86_XMM1,
>> +	BPF_GETREG_X86_XMM2,
>> +	BPF_GETREG_X86_XMM3,
>> +	BPF_GETREG_X86_XMM4,
>> +	BPF_GETREG_X86_XMM5,
>> +	BPF_GETREG_X86_XMM6,
>> +	BPF_GETREG_X86_XMM7,
>> +	BPF_GETREG_X86_XMM8,
>> +	BPF_GETREG_X86_XMM9,
>> +	BPF_GETREG_X86_XMM10,
>> +	BPF_GETREG_X86_XMM11,
>> +	BPF_GETREG_X86_XMM12,
>> +	BPF_GETREG_X86_XMM13,
>> +	BPF_GETREG_X86_XMM14,
>> +	BPF_GETREG_X86_XMM15,
>> +	__MAX_BPF_GETREG,
>> +};
> 
> Can we do BPF_GETREG_X86_XMM plus number instead?
> Enumerating every possible register will take quite some space in uapi
> and bpf progs probably won't be using these enum values directly anyway.
> usdt spec will have something like "xmm5" as a string.

Works for me. I was doing something like that originally, Andrii preferred it
this way. I didn't have strong feeling either way at the time, but your "will
take space in uapi" argument is compelling.

> 
>> +
>> +/* bpf_get_reg_val flags */
>> +enum {
>> +	BPF_GETREG_F_NONE = 0,
>> +	BPF_GETREG_F_CURRENT = (1U << 0),
>> +};
>> +
>>  enum {
>>  	BPF_DEVCG_ACC_MKNOD	= (1ULL << 0),
>>  	BPF_DEVCG_ACC_READ	= (1ULL << 1),
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index f15b826f9899..0de7d6b3af5b 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -28,6 +28,10 @@
>>  
>>  #include <asm/tlb.h>
>>  
>> +#ifdef CONFIG_X86
>> +#include <asm/fpu/context.h>
>> +#endif
>> +
>>  #include "trace_probe.h"
>>  #include "trace.h"
>>  
>> @@ -1166,6 +1170,148 @@ static const struct bpf_func_proto bpf_get_func_arg_cnt_proto = {
>>  	.arg1_type	= ARG_PTR_TO_CTX,
>>  };
>>  
>> +#define XMM_REG_SZ 16
>> +
>> +#define __xmm_space_off(regno)				\
>> +	case BPF_GETREG_X86_XMM ## regno:		\
>> +		xmm_space_off = regno * 16;		\
>> +		break;
>> +
>> +static long getreg_read_xmm_fxsave(u32 reg, struct task_struct *tsk,
>> +				   void *data)
>> +{
>> +	struct fxregs_state *fxsave;
>> +	u32 xmm_space_off;
>> +
>> +	switch (reg) {
>> +	__xmm_space_off(0);
>> +	__xmm_space_off(1);
>> +	__xmm_space_off(2);
>> +	__xmm_space_off(3);
>> +	__xmm_space_off(4);
>> +	__xmm_space_off(5);
>> +	__xmm_space_off(6);
>> +	__xmm_space_off(7);
>> +#ifdef	CONFIG_X86_64
>> +	__xmm_space_off(8);
>> +	__xmm_space_off(9);
>> +	__xmm_space_off(10);
>> +	__xmm_space_off(11);
>> +	__xmm_space_off(12);
>> +	__xmm_space_off(13);
>> +	__xmm_space_off(14);
>> +	__xmm_space_off(15);
>> +#endif
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	fxsave = &tsk->thread.fpu.fpstate->regs.fxsave;
>> +	memcpy(data, (void *)&fxsave->xmm_space + xmm_space_off, XMM_REG_SZ);
>> +	return 0;
> 
> It's all arch specific.
> This one and majority of other functions should probably go
> into arch/x86/net/bpf_jit_comp.c? instead of generic code.
> bpf_trace.c doesn't fit.
> 
> Try to avoid all ifdef-s. It's a red flag.

Will move these.

> 
>> +static long bpf_read_sse_reg(u32 reg, u32 flags, struct task_struct *tsk,
>> +			     void *data)
>> +{
>> +#ifdef CONFIG_X86
>> +	unsigned long irq_flags;
>> +	long err;
>> +
>> +	switch (reg) {
>> +	__bpf_sse_read(0);
>> +	__bpf_sse_read(1);
>> +	__bpf_sse_read(2);
>> +	__bpf_sse_read(3);
>> +	__bpf_sse_read(4);
>> +	__bpf_sse_read(5);
>> +	__bpf_sse_read(6);
>> +	__bpf_sse_read(7);
>> +#ifdef CONFIG_X86_64
>> +	__bpf_sse_read(8);
>> +	__bpf_sse_read(9);
>> +	__bpf_sse_read(10);
>> +	__bpf_sse_read(11);
>> +	__bpf_sse_read(12);
>> +	__bpf_sse_read(13);
>> +	__bpf_sse_read(14);
>> +	__bpf_sse_read(15);
>> +#endif /* CONFIG_X86_64 */
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (flags & BPF_GETREG_F_CURRENT)
>> +		return 0;
>> +
>> +	if (!fpregs_state_valid(&tsk->thread.fpu, smp_processor_id())) {
>> +		local_irq_save(irq_flags);
> 
> why disable irqs?

An irq can use those regs, which requires calling kernel_fpu_begin + 
kernel_fpu_end to ensure registers are restored before returning to userspace.
So for a sleepable program an irq might come and overwrite fxsave state.

Actually, now I'm not sure if it's necessary. kernel_fpu_begin_mask impl has:

  if (!(current->flags & PF_KTHREAD) &&
      !test_thread_flag(TIF_NEED_FPU_LOAD)) {
    set_thread_flag(TIF_NEED_FPU_LOAD);
    save_fpregs_to_fpstate(&current->thread.fpu);
  }
  __cpu_invalidate_fpregs_state();

So for the above scenario, the handler, which only tries to read fxsave if 
!fpregs_state_valid, should only ever read fxsave if TIF_NEED_FPU_LOAD is set,
and save_fpregs_to_fpstate will never execute.

Rik, can you confirm my logic here? We talked previously about disabling
interrupts being necessary, but I may be misremembering.

IIUC TIF_NEED_FPU_LOAD exists to provide this kind of optimization. If it's 
unset and we're returning to user, then we don't need to restore regs, if it's
already set and we're getting ready to do something with fpregs in the kernel,
we don't need to save regs. If we can rely on this, maybe can avoid disabling
irqs?

> 
>> +		err = getreg_read_xmm_fxsave(reg, tsk, data);
>> +		local_irq_restore(irq_flags);
>> +		return err;
>> +	}
> 
> What is the use case to read other task regs?

I don't have a clear one, will remove.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 2/5] bpf: add get_reg_val helper
  2022-05-12 15:29   ` David Vernet
@ 2022-05-18  8:07     ` Dave Marchevsky
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Marchevsky @ 2022-05-18  8:07 UTC (permalink / raw)
  To: David Vernet
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, kernel-team

On 5/12/22 11:29 AM, David Vernet wrote:   
> On Thu, May 12, 2022 at 12:43:18AM -0700, Dave Marchevsky wrote:
>> Add a helper which reads the value of specified register into memory.
>>
>> Currently, bpf programs only have access to general-purpose registers
>> via struct pt_regs. Other registers, like SSE regs %xmm0-15, are
>> inaccessible, which makes some tracing usecases impossible. For example,
>> User Statically-Defined Tracing (USDT) probes may use SSE registers to
>> pass their arguments on x86. While this patch adds support for %xmm0-15
>> only, the helper is meant to be generic enough to support fetching any
>> reg.
>>
>> A useful "value of register" definition for bpf programs is "value of
>> register before control transfer to kernel". pt_regs gives us this
>> currently, so it's the default behavior of the new helper. Fetching the
>> actual _current_ reg value is possible, though, by passing
>> BPF_GETREG_F_CURRENT flag as part of input.
>>
>> For SSE regs we try to avoid digging around in task's fpu state by first
>> reading _current_ value, then checking to see if the state of cpu's
>> floating point regs matches task's view of them. If so, we can just
>> return _current_ value.
>>
>> Further usecases which are straightforward to support, but
>> unimplemented:
>>   * using the helper to fetch general-purpose register value.
>>   currently-unused pt_regs parameter exists for this reason.
>>
>>   * fetching rdtsc (w/ BPF_GETREG_F_CURRENT)
>>
>>   * other architectures. s390 specifically might benefit from similar
>>   fpu reg fetching as USDT library was recently updated to support that
>>   architecture.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>  include/uapi/linux/bpf.h       |  40 +++++++++
>>  kernel/trace/bpf_trace.c       | 148 +++++++++++++++++++++++++++++++++
>>  kernel/trace/bpf_trace.h       |   1 +
>>  tools/include/uapi/linux/bpf.h |  40 +++++++++
>>  4 files changed, 229 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 444fe6f1cf35..3ef8f683ed9e 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -5154,6 +5154,18 @@ union bpf_attr {
>>   *		if not NULL, is a reference which must be released using its
>>   *		corresponding release function, or moved into a BPF map before
>>   *		program exit.
>> + *
>> + * long bpf_get_reg_val(void *dst, u32 size, u64 getreg_spec, struct pt_regs *regs, struct task_struct *tsk)
>> + *	Description
>> + *		Store the value of a SSE register specified by *getreg_spec*
> 
> Even though this patch only adds support for SSE, if the helper is meant to
> be generic enough to support fetching any register, should the description
> be updated to not imply that it's only meant for fetching SSE? IMO the
> example below is sufficient to indicate that it can be used to fetch SSE
> regs.

Relic from a less-general initial pass. Will update.

> 
>> + *		into memory region of size *size* specified by *dst*. *getreg_spec*
>> + *		is a combination of BPF_GETREG enum AND BPF_GETREG_F flag e.g.
>> + *		(BPF_GETREG_X86_XMM0 << 32) | BPF_GETREG_F_CURRENT.*
>> + *	Return
>> + *		0 on success
>> + *		**-ENOENT** if the system architecture does not have requested reg
>> + *		**-EINVAL** if *getreg_spec* is invalid
>> + *		**-EINVAL** if *size* != bytes necessary to store requested reg val
>>   */
>>  #define __BPF_FUNC_MAPPER(FN)		\
>>  	FN(unspec),			\
>> @@ -5351,6 +5363,7 @@ union bpf_attr {
>>  	FN(skb_set_tstamp),		\
>>  	FN(ima_file_hash),		\
>>  	FN(kptr_xchg),			\
>> +	FN(get_reg_val),		\
>>  	/* */
>>  
>>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>> @@ -6318,6 +6331,33 @@ struct bpf_perf_event_value {
>>  	__u64 running;
>>  };
>>  
>> +/* bpf_get_reg_val register enum */
>> +enum {
>> +	BPF_GETREG_X86_XMM0 = 0,
> 
> I know we do this in a few places in bpf.h, so please feel free to ignore
> this, but the C standard (section 6.7.2.2.1) formally states that if no
> value is specified for the first enumerator that its value is 0, so
> specifying the value here is strictly unnecessary. We're inconsistent in
> how we apply this in bpf.h, but IMHO if we're adding new enums, we should
> do the "standard" thing and only define the first element if it's nonzero.
> 

I don't feel strongly about it. I'd say "explicit is better than implicit", but
that doesn't apply here unless the specific value of BPF_GETREG_X86_XMM0 or 
others matters. But this isn't the case. Will remove.

>> +	BPF_GETREG_X86_XMM1,
>> +	BPF_GETREG_X86_XMM2,
>> +	BPF_GETREG_X86_XMM3,
>> +	BPF_GETREG_X86_XMM4,
>> +	BPF_GETREG_X86_XMM5,
>> +	BPF_GETREG_X86_XMM6,
>> +	BPF_GETREG_X86_XMM7,
>> +	BPF_GETREG_X86_XMM8,
>> +	BPF_GETREG_X86_XMM9,
>> +	BPF_GETREG_X86_XMM10,
>> +	BPF_GETREG_X86_XMM11,
>> +	BPF_GETREG_X86_XMM12,
>> +	BPF_GETREG_X86_XMM13,
>> +	BPF_GETREG_X86_XMM14,
>> +	BPF_GETREG_X86_XMM15,
>> +	__MAX_BPF_GETREG,
>> +};
>> +
>> +/* bpf_get_reg_val flags */
>> +enum {
>> +	BPF_GETREG_F_NONE = 0,
>> +	BPF_GETREG_F_CURRENT = (1U << 0),
>> +};
> 
> Can you add a comment specifying what the BPF_GETREG_F_CURRENT flag does?
> The commit summary is very helpful, but it would be good to persist this in
> code as well.
> 

Yep.

>> +
>>  enum {
>>  	BPF_DEVCG_ACC_MKNOD	= (1ULL << 0),
>>  	BPF_DEVCG_ACC_READ	= (1ULL << 1),
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index f15b826f9899..0de7d6b3af5b 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -28,6 +28,10 @@
>>  
>>  #include <asm/tlb.h>
>>  
>> +#ifdef CONFIG_X86
>> +#include <asm/fpu/context.h>
>> +#endif
>> +
>>  #include "trace_probe.h"
>>  #include "trace.h"
>>  
>> @@ -1166,6 +1170,148 @@ static const struct bpf_func_proto bpf_get_func_arg_cnt_proto = {
>>  	.arg1_type	= ARG_PTR_TO_CTX,
>>  };
>>  
>> +#define XMM_REG_SZ 16
>> +
>> +#define __xmm_space_off(regno)				\
>> +	case BPF_GETREG_X86_XMM ## regno:		\
>> +		xmm_space_off = regno * 16;		\
>> +		break;
>> +
>> +static long getreg_read_xmm_fxsave(u32 reg, struct task_struct *tsk,
>> +				   void *data)
>> +{
>> +	struct fxregs_state *fxsave;
>> +	u32 xmm_space_off;
>> +
>> +	switch (reg) {
>> +	__xmm_space_off(0);
>> +	__xmm_space_off(1);
>> +	__xmm_space_off(2);
>> +	__xmm_space_off(3);
>> +	__xmm_space_off(4);
>> +	__xmm_space_off(5);
>> +	__xmm_space_off(6);
>> +	__xmm_space_off(7);
>> +#ifdef	CONFIG_X86_64
>> +	__xmm_space_off(8);
>> +	__xmm_space_off(9);
>> +	__xmm_space_off(10);
>> +	__xmm_space_off(11);
>> +	__xmm_space_off(12);
>> +	__xmm_space_off(13);
>> +	__xmm_space_off(14);
>> +	__xmm_space_off(15);
>> +#endif
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	fxsave = &tsk->thread.fpu.fpstate->regs.fxsave;
>> +	memcpy(data, (void *)&fxsave->xmm_space + xmm_space_off, XMM_REG_SZ);
>> +	return 0;
>> +}
> 
> Does any of this also need to be wrapped in CONFIG_X86? IIUC, everything in
> struct thread_struct is arch specific, so I think this may fail to compile
> on a number of other architectures. Per my suggestion below, maybe we
> should just compile all of this logic out if we're not on x86, and update
> bpf_get_reg_val() to only call bpf_read_sse_reg() on x86?
> 

I generally agree with this comment and rest of "make ifdefs more sane"
comments. Per Alexei's comments on this patch some refactoring is in order to
try to avoid ifdefs altogether, but will do a sanity pass on any that remain.

>> +
>> +#undef __xmm_space_off
>> +
>> +static bool getreg_is_xmm(u32 reg)
>> +{
>> +	return reg >= BPF_GETREG_X86_XMM0 && reg <= BPF_GETREG_X86_XMM15;
> 
> I think it's a bit confusing that we have a function here which confirms
> that a register is xmm, but then we have ifdef CONFIG_X86_64 in large
> switch statements in functions where we actually read the register and then
> return -EINVAL.  Should we just update this to do the CONFIG_X6_64
> preprocessor check, and then we can assume in getreg_read_xmm_fxsave() and
> bpf_read_sse_reg() that the register is a valid xmm register, and avoid
> having to do these switch statements at all? Note that this wouldn't change
> the existing behavior at all, as we'd still be returning -EINVAL on 32-bit
> x86 in either case.
> 
>> +}
>> +
>> +#define __bpf_sse_read(regno)							\
>> +	case BPF_GETREG_X86_XMM ## regno:					\
>> +		asm("movdqa %%xmm" #regno ", %0" : "=m"(*(char *)data));	\
>> +		break;
>> +
>> +static long bpf_read_sse_reg(u32 reg, u32 flags, struct task_struct *tsk,
>> +			     void *data)
>> +{
>> +#ifdef CONFIG_X86
>> +	unsigned long irq_flags;
>> +	long err;
>> +
>> +	switch (reg) {
>> +	__bpf_sse_read(0);
>> +	__bpf_sse_read(1);
>> +	__bpf_sse_read(2);
>> +	__bpf_sse_read(3);
>> +	__bpf_sse_read(4);
>> +	__bpf_sse_read(5);
>> +	__bpf_sse_read(6);
>> +	__bpf_sse_read(7);
>> +#ifdef CONFIG_X86_64
>> +	__bpf_sse_read(8);
>> +	__bpf_sse_read(9);
>> +	__bpf_sse_read(10);
>> +	__bpf_sse_read(11);
>> +	__bpf_sse_read(12);
>> +	__bpf_sse_read(13);
>> +	__bpf_sse_read(14);
>> +	__bpf_sse_read(15);
>> +#endif /* CONFIG_X86_64 */
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (flags & BPF_GETREG_F_CURRENT)
>> +		return 0;
>> +
>> +	if (!fpregs_state_valid(&tsk->thread.fpu, smp_processor_id())) {
>> +		local_irq_save(irq_flags);
>> +		err = getreg_read_xmm_fxsave(reg, tsk, data);
>> +		local_irq_restore(irq_flags);
>> +		return err;
>> +	}
> 
> Should we move the checks for current and fpregs_state_valid() above where
> we actually read the registers? That way we can avoid doing the xmm read if
> we'd have to read the fxsave region anyways. Not sure if that's common in
> practice or really necessary at all. What you have here seems fine as well.

I think if we check that fpregs_state_valid is true before reading the reg val,
we'd need to disable irqs for both the check and the read, since an irq could
come and clobber regs between check and read in a sleepable bpf program.

> 
>> +
>> +	return 0;
>> +#else
>> +	return -ENOENT;
>> +#endif /* CONFIG_X86 */
>> +}
>> +
>> +#undef __bpf_sse_read
>> +
>> +BPF_CALL_5(get_reg_val, void *, dst, u32, size,
>> +	   u64, getreg_spec, struct pt_regs *, regs,
>> +	   struct task_struct *, tsk)
>> +{
>> +	u32 reg, flags;
>> +
>> +	reg = (u32)(getreg_spec >> 32);
>> +	flags = (u32)getreg_spec;
>> +	if (reg >= __MAX_BPF_GETREG)
>> +		return -EINVAL;
>> +
>> +	if (getreg_is_xmm(reg)) {
>> +#ifndef CONFIG_X86
>> +		return -ENOENT;
> 
> On CONFIG_X86 but !CONFIG_X86_64, we return -EINVAL if we try to access the
> wrong xmm register. Should we just change this to be return -EINVAL to keep
> the return value consistent between architectures? Or we should update the
> 32 bit x86 case to return -ENOENT as well, and probably update the last
> return -EINVAL statement in the function to be return -ENOENT. In general,
> I'd say that returning -ENOENT if a register is specified that's
> < __MAX_BPF_GETREG seems like the most intuitive API.

They're both handling erroneous input, but "this isn't even valid for the entire
arch" feels worth distinguishing from other bad input complaints. I don't feel
strongly about this.

> 
>> +#else
> 
> Is it necessary to have this ifdef check here if you also have it in
> bpf_read_sse_reg()? Maybe it makes more sense to keep this preprocessor
> check, and compile out bpf_read_sse_reg() altogether on other
> architectures? I think that probably makes sense given that we likely also
> want to wrap __bpf_sse_read() in an ifdef given that it emits x86 asm, and
> getreg_read_xmm_fxsave() which relies on the x86 definition of struct
> thread_struct.
> 
>> +		if (size != XMM_REG_SZ)
>> +			return -EINVAL;
>> +
>> +		return bpf_read_sse_reg(reg, flags, tsk, dst);
>> +	}
>> +
>> +	return -EINVAL;
>> +#endif
>> +}
>> +
>> +BTF_ID_LIST(bpf_get_reg_val_ids)
>> +BTF_ID(struct, pt_regs)
>> +
>> +static const struct bpf_func_proto bpf_get_reg_val_proto = {
>> +	.func	= get_reg_val,
>> +	.ret_type	= RET_INTEGER,
>> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
>> +	.arg2_type	= ARG_CONST_SIZE,
>> +	.arg3_type	= ARG_ANYTHING,
>> +	.arg4_type	= ARG_PTR_TO_BTF_ID_OR_NULL,
>> +	.arg4_btf_id	= &bpf_get_reg_val_ids[0],
>> +	.arg5_type	= ARG_PTR_TO_BTF_ID_OR_NULL,
>> +	.arg5_btf_id	= &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
>> +};
>> +
>>  static const struct bpf_func_proto *
>>  bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>  {
>> @@ -1287,6 +1433,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>  		return &bpf_find_vma_proto;
>>  	case BPF_FUNC_trace_vprintk:
>>  		return bpf_get_trace_vprintk_proto();
>> +	case BPF_FUNC_get_reg_val:
>> +		return &bpf_get_reg_val_proto;
>>  	default:
>>  		return bpf_base_func_proto(func_id);
>>  	}
>> diff --git a/kernel/trace/bpf_trace.h b/kernel/trace/bpf_trace.h
>> index 9acbc11ac7bb..b4b55706c2dd 100644
>> --- a/kernel/trace/bpf_trace.h
>> +++ b/kernel/trace/bpf_trace.h
>> @@ -29,6 +29,7 @@ TRACE_EVENT(bpf_trace_printk,
>>  
>>  #undef TRACE_INCLUDE_PATH
>>  #define TRACE_INCLUDE_PATH .
>> +#undef TRACE_INCLUDE_FILE
>>  #define TRACE_INCLUDE_FILE bpf_trace
>>  
>>  #include <trace/define_trace.h>
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index 444fe6f1cf35..3ef8f683ed9e 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -5154,6 +5154,18 @@ union bpf_attr {
>>   *		if not NULL, is a reference which must be released using its
>>   *		corresponding release function, or moved into a BPF map before
>>   *		program exit.
>> + *
>> + * long bpf_get_reg_val(void *dst, u32 size, u64 getreg_spec, struct pt_regs *regs, struct task_struct *tsk)
>> + *	Description
>> + *		Store the value of a SSE register specified by *getreg_spec*
>> + *		into memory region of size *size* specified by *dst*. *getreg_spec*
>> + *		is a combination of BPF_GETREG enum AND BPF_GETREG_F flag e.g.
>> + *		(BPF_GETREG_X86_XMM0 << 32) | BPF_GETREG_F_CURRENT.*
>> + *	Return
>> + *		0 on success
>> + *		**-ENOENT** if the system architecture does not have requested reg
>> + *		**-EINVAL** if *getreg_spec* is invalid
>> + *		**-EINVAL** if *size* != bytes necessary to store requested reg val
>>   */
>>  #define __BPF_FUNC_MAPPER(FN)		\
>>  	FN(unspec),			\
>> @@ -5351,6 +5363,7 @@ union bpf_attr {
>>  	FN(skb_set_tstamp),		\
>>  	FN(ima_file_hash),		\
>>  	FN(kptr_xchg),			\
>> +	FN(get_reg_val),		\
>>  	/* */
>>  
>>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>> @@ -6318,6 +6331,33 @@ struct bpf_perf_event_value {
>>  	__u64 running;
>>  };
>>  
>> +/* bpf_get_reg_val register enum */
>> +enum {
>> +	BPF_GETREG_X86_XMM0 = 0,
>> +	BPF_GETREG_X86_XMM1,
>> +	BPF_GETREG_X86_XMM2,
>> +	BPF_GETREG_X86_XMM3,
>> +	BPF_GETREG_X86_XMM4,
>> +	BPF_GETREG_X86_XMM5,
>> +	BPF_GETREG_X86_XMM6,
>> +	BPF_GETREG_X86_XMM7,
>> +	BPF_GETREG_X86_XMM8,
>> +	BPF_GETREG_X86_XMM9,
>> +	BPF_GETREG_X86_XMM10,
>> +	BPF_GETREG_X86_XMM11,
>> +	BPF_GETREG_X86_XMM12,
>> +	BPF_GETREG_X86_XMM13,
>> +	BPF_GETREG_X86_XMM14,
>> +	BPF_GETREG_X86_XMM15,
>> +	__MAX_BPF_GETREG,
>> +};
>> +
>> +/* bpf_get_reg_val flags */
>> +enum {
>> +	BPF_GETREG_F_NONE = 0,
>> +	BPF_GETREG_F_CURRENT = (1U << 0),
>> +};
>> +
>>  enum {
>>  	BPF_DEVCG_ACC_MKNOD	= (1ULL << 0),
>>  	BPF_DEVCG_ACC_READ	= (1ULL << 1),
>> -- 
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 3/5] libbpf: usdt lib wiring of xmm reads
  2022-05-16 23:26   ` Andrii Nakryiko
@ 2022-05-18  8:20     ` Dave Marchevsky
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Marchevsky @ 2022-05-18  8:20 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Rik van Riel, Ilya Leoshkevich, Yonghong Song, Kernel Team

On 5/16/22 7:26 PM, Andrii Nakryiko wrote:   
> On Thu, May 12, 2022 at 12:43 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>>
>> Handle xmm0,...,xmm15 registers when parsing USDT arguments. Currently
>> only the first 64 bits of the fetched value are returned as I haven't
>> seen the rest of the register used in practice.
>>
>> This patch also handles floats in USDT arg spec by ignoring the fact
>> that they're floats and considering them scalar. Currently we can't do
>> float math in BPF programs anyways, so might as well support passing to
>> userspace and converting there.
>>
>> We can use existing ARG_REG sscanf + logic, adding XMM-specific logic
>> when calc_pt_regs_off fails. If the reg is xmm, arg_spec's reg_off is
>> repurposed to hold reg_no, which is passed to bpf_get_reg_val. Since the
>> helper does the digging around in fxregs_state it's not necessary to
>> calculate offset in bpf code for these regs.
>>
>> NOTE: Changes here cause verification failure for existing USDT tests.
>> Specifically, BPF_USDT prog 'usdt12' fails to verify due to too many
>> insns despite not having its insn count significantly changed.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>  tools/lib/bpf/usdt.bpf.h | 36 ++++++++++++++++++++--------
>>  tools/lib/bpf/usdt.c     | 51 ++++++++++++++++++++++++++++++++++++----
>>  2 files changed, 73 insertions(+), 14 deletions(-)
>>
>> diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
>> index 4181fddb3687..7b5ed4cbaa2f 100644
>> --- a/tools/lib/bpf/usdt.bpf.h
>> +++ b/tools/lib/bpf/usdt.bpf.h
>> @@ -43,6 +43,7 @@ enum __bpf_usdt_arg_type {
>>         BPF_USDT_ARG_CONST,
>>         BPF_USDT_ARG_REG,
>>         BPF_USDT_ARG_REG_DEREF,
>> +       BPF_USDT_ARG_XMM_REG,
>>  };
>>
>>  struct __bpf_usdt_arg_spec {
>> @@ -129,7 +130,9 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>>  {
>>         struct __bpf_usdt_spec *spec;
>>         struct __bpf_usdt_arg_spec *arg_spec;
>> -       unsigned long val;
>> +       struct pt_regs *btf_regs;
>> +       struct task_struct *btf_task;
>> +       struct { __u64 a; __u64 unused; } val = {};
>>         int err, spec_id;
>>
>>         *res = 0;
>> @@ -151,7 +154,7 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>>                 /* Arg is just a constant ("-4@$-9" in USDT arg spec).
>>                  * value is recorded in arg_spec->val_off directly.
>>                  */
>> -               val = arg_spec->val_off;
>> +               val.a = arg_spec->val_off;
>>                 break;
>>         case BPF_USDT_ARG_REG:
>>                 /* Arg is in a register (e.g, "8@%rax" in USDT arg spec),
>> @@ -159,7 +162,20 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>>                  * struct pt_regs. To keep things simple user-space parts
>>                  * record offsetof(struct pt_regs, <regname>) in arg_spec->reg_off.
>>                  */
>> -               err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
>> +               err = bpf_probe_read_kernel(&val.a, sizeof(val.a), (void *)ctx + arg_spec->reg_off);
>> +               if (err)
>> +                       return err;
>> +               break;
>> +       case BPF_USDT_ARG_XMM_REG:
> 
> nit: a bit too XMM-specific name here, we probably want to keep it a bit

Agreed.

> 
>> +               /* Same as above, but arg is an xmm reg, so can't look
>> +                * in pt_regs, need to use special helper.
>> +                * reg_off is the regno ("xmm0" -> regno 0, etc)
>> +                */
>> +               btf_task = bpf_get_current_task_btf();
>> +               btf_regs = (struct pt_regs *)bpf_task_pt_regs(btf_task);
> 
> I'd like to avoid taking dependency on bpf_get_current_task_btf() for
> rare case of XMM register, which makes it impossible to do USDT on
> older kernels. It seems like supporting reading registers from current
> (and maybe current pt_regs context) should cover a lot of practical
> uses.
> 

Yep. We talked about this today. Will remove.

>> +               err = bpf_get_reg_val(&val, sizeof(val),
> 
> But regardless of the above, we'll need to use CO-RE to detect support
> for this new BPF helper (probably using bpf_core_enum_value_exists()?)
> to allow using USDTs on older kernels.
> 

Will add.

> 
>> +                                    ((u64)arg_spec->reg_off + BPF_GETREG_X86_XMM0) << 32,
>> +                                    btf_regs, btf_task);
>>                 if (err)
>>                         return err;
>>                 break;
>> @@ -171,14 +187,14 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>>                  * from pt_regs, then do another user-space probe read to
>>                  * fetch argument value itself.
>>                  */
>> -               err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
>> +               err = bpf_probe_read_kernel(&val.a, sizeof(val.a), (void *)ctx + arg_spec->reg_off);
>>                 if (err)
>>                         return err;
>> -               err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
>> +               err = bpf_probe_read_user(&val.a, sizeof(val.a), (void *)val.a + arg_spec->val_off);
> 
> is the useful value in xmm register normally in lower 64-bits of it?
> is it possible to just request reading just the first 64 bits from
> bpf_get_reg_val() and avoid this ugly union?

For USDT usecase, I've only seen lower 64 bits used. Should be possible to just
grab those, will see if there's a clean way to integrate such an option.

> 
>>                 if (err)
>>                         return err;
>>  #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
>> -               val >>= arg_spec->arg_bitshift;
>> +               val.a >>= arg_spec->arg_bitshift;
>>  #endif
>>                 break;
>>         default:
> 
> [...]
> 
>> +static int calc_xmm_regno(const char *reg_name)
>> +{
>> +       static struct {
>> +               const char *name;
>> +               __u16 regno;
>> +       } xmm_reg_map[] = {
>> +               { "xmm0",  0 },
>> +               { "xmm1",  1 },
>> +               { "xmm2",  2 },
>> +               { "xmm3",  3 },
>> +               { "xmm4",  4 },
>> +               { "xmm5",  5 },
>> +               { "xmm6",  6 },
>> +               { "xmm7",  7 },
>> +#ifdef __x86_64__
>> +               { "xmm8",  8 },
>> +               { "xmm9",  9 },
>> +               { "xmm10",  10 },
>> +               { "xmm11",  11 },
>> +               { "xmm12",  12 },
>> +               { "xmm13",  13 },
>> +               { "xmm14",  14 },
>> +               { "xmm15",  15 },
> 
> no-x86 arches parse this generically with sscanf(), seems like we can
> do this simple approach here as well?
> 

Agreed.

> 
>> +#endif
>> +       };
>> +       int i;
>> +
>> +       for (i = 0; i < ARRAY_SIZE(xmm_reg_map); i++) {
>> +               if (strcmp(reg_name, xmm_reg_map[i].name) == 0)
>> +                       return xmm_reg_map[i].regno;
>> +       }
>> +
>>         return -ENOENT;
>>  }
>>
> 
> [...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH bpf-next 4/5] selftests/bpf: Add test for USDT parse of xmm reg
  2022-05-17  1:17     ` Alexei Starovoitov
@ 2022-05-18 23:56       ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2022-05-18 23:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Rik van Riel, Ilya Leoshkevich, Yonghong Song,
	Kernel Team

On Mon, May 16, 2022 at 6:17 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, May 16, 2022 at 04:31:55PM -0700, Andrii Nakryiko wrote:
> > On Thu, May 12, 2022 at 12:43 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
> > >
> > > Validate that bpf_get_reg_val helper solves the motivating problem of
> > > this patch series: USDT args passed through xmm regs. The userspace
> > > portion of the test forces STAP_PROBE macro to use %xmm0 and %xmm1 regs
> > > to pass a float and an int, which the bpf-side successfully reads using
> > > BPF_USDT.
> > >
> > > In the wild I discovered a sanely-configured USDT in Fedora libpthread
> > > using xmm regs to pass scalar values, likely due to register pressure.
> > > urandom_read_lib_xmm mimics this by using -ffixed-$REG flag to mark
> > > r11-r14 unusable and passing many USDT args.
> > >
> > > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > > ---
> > >  tools/testing/selftests/bpf/Makefile          |  8 ++-
> > >  tools/testing/selftests/bpf/prog_tests/usdt.c |  7 +++
> > >  .../selftests/bpf/progs/test_urandom_usdt.c   | 13 ++++
> > >  tools/testing/selftests/bpf/urandom_read.c    |  3 +
> > >  .../selftests/bpf/urandom_read_lib_xmm.c      | 62 +++++++++++++++++++
> > >  5 files changed, 91 insertions(+), 2 deletions(-)
> > >  create mode 100644 tools/testing/selftests/bpf/urandom_read_lib_xmm.c
> > >
> > > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > > index 6bbc03161544..19246e34dfe1 100644
> > > --- a/tools/testing/selftests/bpf/Makefile
> > > +++ b/tools/testing/selftests/bpf/Makefile
> > > @@ -172,10 +172,14 @@ $(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c
> > >         $(call msg,LIB,,$@)
> > >         $(Q)$(CC) $(CFLAGS) -fPIC $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
> > >
> > > -$(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_read.so
> > > +$(OUTPUT)/liburandom_read_xmm.so: urandom_read_lib_xmm.c
> > > +       $(call msg,LIB,,$@)
> > > +       $(Q)$(CC) -O0 -ffixed-r11 -ffixed-r12 -ffixed-r13 -ffixed-r14 -fPIC $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
> >
> > this looks very x86-specific, but we support other architectures as well
> >
> > looking at sdt.h, it seems like STAP_PROBEx() macros support being
> > called from assembly code, I wonder if it would be better to try to
> > figure out how to use it from assembly and use some xmm register
> > directly in inline assembly? I have never done that before, but am
> > hopeful :)
>
> stap_probe in asm won't help cross arch issue.
> It's better to stay with C.
> Maybe some makefile magic can make the above x86 only?
> The test should be skipped on other archs anyway.

It seems easier (or rather cleaner) to maintain arch-specific pieces
within .c file through #if __x86_64__ instead of doing equivalent
changes to Makefile. Suggestion for using assembly was due to needing
to make sure xmmX register is used and desired to avoid maintaining
arch-specific compile-time flags like -ffixed-r11. But truth be told I
have zero idea how STAP_PROBE() use from assembly would look like, so
just a suggestion to consider.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-05-18 23:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12  7:43 [RFC PATCH bpf-next 0/5] bpf: add get_reg_val helper Dave Marchevsky
2022-05-12  7:43 ` [RFC PATCH bpf-next 1/5] x86/fpu: Move context.h to include/asm Dave Marchevsky
2022-05-12 13:56   ` David Vernet
2022-05-14  0:44   ` Alexei Starovoitov
2022-05-12  7:43 ` [RFC PATCH bpf-next 2/5] bpf: add get_reg_val helper Dave Marchevsky
2022-05-12 15:29   ` David Vernet
2022-05-18  8:07     ` Dave Marchevsky
2022-05-14  0:41   ` Alexei Starovoitov
2022-05-18  7:35     ` Dave Marchevsky
2022-05-12  7:43 ` [RFC PATCH bpf-next 3/5] libbpf: usdt lib wiring of xmm reads Dave Marchevsky
2022-05-14  0:43   ` Alexei Starovoitov
2022-05-16 23:26   ` Andrii Nakryiko
2022-05-18  8:20     ` Dave Marchevsky
2022-05-12  7:43 ` [RFC PATCH bpf-next 4/5] selftests/bpf: Add test for USDT parse of xmm reg Dave Marchevsky
2022-05-16 23:31   ` Andrii Nakryiko
2022-05-17  1:17     ` Alexei Starovoitov
2022-05-18 23:56       ` Andrii Nakryiko
2022-05-12  7:43 ` [RFC PATCH bpf-next 5/5] selftests/bpf: get_reg_val test exercising fxsave fetch Dave Marchevsky
2022-05-12 17:47   ` Dave Marchevsky
2022-05-16 23:28   ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.