All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC bpf-next 0/3] uprobe: uretprobe speed up
@ 2024-03-18  9:31 Jiri Olsa
  2024-03-18  9:31 ` [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe Jiri Olsa
                   ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Jiri Olsa @ 2024-03-18  9:31 UTC (permalink / raw)
  To: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

hi,
as part of the effort on speeding up the uprobes [0] coming with
return uprobe optimization by using syscall instead of the trap
on the uretprobe trampoline.

The speed up depends on instruction type that uprobe is installed
and depends on specific HW type, please check patch 1 for details.

I added extra bpf selftest to check on registers values before and
after uretprobe to make sure the syscall saves all the needed regs.

reasons for this being RFC:

 - I'm not sure we want specific uretprobe syscall and not generic
   uprobe syscall to extend in future.. when I added extra code to
   read registers from stack I did not see any notable performance
   degradation, so I think we could survive restoring another register
   for argument

 - I'm not sure how we want to handle assembly code in bpf selftests

thanks,
jirka


[0] https://lore.kernel.org/bpf/ZeCXHKJ--iYYbmLj@krava/
---
Jiri Olsa (3):
      uprobe: Add uretprobe syscall to speed up return probe
      selftests/bpf: Add uretprobe syscall test
      selftests/bpf: Mark uprobe trigger functions with nocf_check attribute

 arch/x86/entry/syscalls/syscall_64.tbl                           |  1 +
 arch/x86/kernel/uprobes.c                                        | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/syscalls.h                                         |  2 ++
 include/linux/uprobes.h                                          |  2 ++
 include/uapi/asm-generic/unistd.h                                |  5 ++++-
 kernel/events/uprobes.c                                          | 18 ++++++++++++++----
 kernel/sys_ni.c                                                  |  2 ++
 tools/include/linux/compiler.h                                   |  4 ++++
 tools/testing/selftests/bpf/Makefile                             | 15 +++++++++++++--
 tools/testing/selftests/bpf/benchs/bench_trigger.c               |  6 +++---
 tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c          | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/progs/uprobe_syscall.c               | 15 +++++++++++++++
 13 files changed, 281 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall.c

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe
  2024-03-18  9:31 [PATCH RFC bpf-next 0/3] uprobe: uretprobe speed up Jiri Olsa
@ 2024-03-18  9:31 ` Jiri Olsa
  2024-03-18 14:22   ` Oleg Nesterov
  2024-03-19  1:11   ` Andrii Nakryiko
  2024-03-18  9:31 ` [PATCH RFC bpf-next 2/3] selftests/bpf: Add uretprobe syscall test Jiri Olsa
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 33+ messages in thread
From: Jiri Olsa @ 2024-03-18  9:31 UTC (permalink / raw)
  To: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

Adding uretprobe syscall instead of trap to speed up return probe.

At the moment the uretprobe setup/path is:

  - install entry uprobe

  - when the uprobe is hit overwrite probed function's return address on stack
    with address of the trampoline that contains breakpoint instruction

  - the breakpoint trap code handles the uretprobe consumers execution and
    jumps back to original return address

This patch changes the above trampoline's breakpoint instruction to new
ureprobe syscall call. This syscall does exactly the same job as the trap
with some extra work:

  - syscall trampoline must save original value for rax/r11/rcx registers
    on stack - rax is set to syscall number and r11/rcx are changed and
    used by syscall instruction

  - the syscall code reads the original values of those registers and restore
    those values in task's pt_regs area

Even with the extra registers handling code the having uretprobes handled
by syscalls shows speed improvement.

  On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz)

  current:

    base           :   15.173 ± 0.052M/s
    uprobe-nop     :    2.797 ± 0.090M/s
    uprobe-push    :    2.580 ± 0.007M/s
    uprobe-ret     :    1.001 ± 0.003M/s
    uretprobe-nop  :    1.373 ± 0.002M/s
    uretprobe-push :    1.346 ± 0.002M/s
    uretprobe-ret  :    0.747 ± 0.001M/s

  with the fix:

    base           :   15.704 ± 0.076M/s
    uprobe-nop     :    2.841 ± 0.008M/s
    uprobe-push    :    2.666 ± 0.029M/s
    uprobe-ret     :    1.037 ± 0.008M/s
    uretprobe-nop  :    1.718 ± 0.010M/s  < ~25% speed up
    uretprobe-push :    1.658 ± 0.008M/s  < ~23% speed up
    uretprobe-ret  :    0.853 ± 0.004M/s  < ~14% speed up

  On Amd (AMD Ryzen 7 5700U)

  current:

    base           :    5.702 ± 0.003M/s
    uprobe-nop     :    1.505 ± 0.011M/s
    uprobe-push    :    1.388 ± 0.008M/s
    uprobe-ret     :    0.825 ± 0.001M/s
    uretprobe-nop  :    0.782 ± 0.001M/s
    uretprobe-push :    0.750 ± 0.001M/s
    uretprobe-ret  :    0.544 ± 0.001M/s

  with the fix:

    base           :    5.669 ± 0.004M/s
    uprobe-nop     :    1.539 ± 0.001M/s
    uprobe-push    :    1.385 ± 0.003M/s
    uprobe-ret     :    0.819 ± 0.001M/s
    uretprobe-nop  :    0.889 ± 0.001M/s < ~13% speed up
    uretprobe-push :    0.846 ± 0.001M/s < ~12% speed up
    uretprobe-ret  :    0.594 ± 0.000M/s <  ~9% speed up

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/entry/syscalls/syscall_64.tbl |  1 +
 arch/x86/kernel/uprobes.c              | 48 ++++++++++++++++++++++++++
 include/linux/syscalls.h               |  2 ++
 include/linux/uprobes.h                |  2 ++
 include/uapi/asm-generic/unistd.h      |  5 ++-
 kernel/events/uprobes.c                | 18 +++++++---
 kernel/sys_ni.c                        |  2 ++
 7 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 7e8d46f4147f..af0a33ab06ee 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -383,6 +383,7 @@
 459	common	lsm_get_self_attr	sys_lsm_get_self_attr
 460	common	lsm_set_self_attr	sys_lsm_set_self_attr
 461	common	lsm_list_modules	sys_lsm_list_modules
+462	64	uretprobe		sys_uretprobe
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 6c07f6daaa22..069371e86180 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -12,6 +12,7 @@
 #include <linux/ptrace.h>
 #include <linux/uprobes.h>
 #include <linux/uaccess.h>
+#include <linux/syscalls.h>
 
 #include <linux/kdebug.h>
 #include <asm/processor.h>
@@ -308,6 +309,53 @@ static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool
 }
 
 #ifdef CONFIG_X86_64
+
+asm (
+	".pushsection .rodata\n"
+	".global uretprobe_syscall_entry\n"
+	"uretprobe_syscall_entry:\n"
+	"pushq %rax\n"
+	"pushq %rcx\n"
+	"pushq %r11\n"
+	"movq $462, %rax\n"
+	"syscall\n"
+	".global uretprobe_syscall_end\n"
+	"uretprobe_syscall_end:\n"
+	".popsection\n"
+);
+
+extern u8 uretprobe_syscall_entry[];
+extern u8 uretprobe_syscall_end[];
+
+void *arch_uprobe_trampoline(unsigned long *psize)
+{
+	*psize = uretprobe_syscall_end - uretprobe_syscall_entry;
+	return uretprobe_syscall_entry;
+}
+
+SYSCALL_DEFINE0(uretprobe)
+{
+	struct pt_regs *regs = task_pt_regs(current);
+	unsigned long sregs[3], err;
+
+	/*
+	 * We set rax and syscall itself changes rcx and r11, so the syscall
+	 * trampoline saves their original values on stack. We need to read
+	 * them and set original register values and fix the rsp pointer back.
+	 */
+	err = copy_from_user((void *) &sregs, (void *) regs->sp, sizeof(sregs));
+	WARN_ON_ONCE(err);
+
+	regs->r11 = sregs[0];
+	regs->cx = sregs[1];
+	regs->ax = sregs[2];
+	regs->orig_ax = -1;
+	regs->sp += sizeof(sregs);
+
+	uprobe_handle_trampoline(regs);
+	return regs->ax;
+}
+
 /*
  * If arch_uprobe->insn doesn't use rip-relative addressing, return
  * immediately.  Otherwise, rewrite the instruction so that it accesses
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 77eb9b0e7685..db150794f89d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -972,6 +972,8 @@ asmlinkage long sys_lsm_list_modules(u64 *ids, size_t *size, u32 flags);
 /* x86 */
 asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int on);
 
+asmlinkage long sys_uretprobe(void);
+
 /* pciconfig: alpha, arm, arm64, ia64, sparc */
 asmlinkage long sys_pciconfig_read(unsigned long bus, unsigned long dfn,
 				unsigned long off, unsigned long len,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index f46e0ca0169c..a490146ad89d 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -138,6 +138,8 @@ extern bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check c
 extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 					 void *src, unsigned long len);
+extern void uprobe_handle_trampoline(struct pt_regs *regs);
+extern void *arch_uprobe_trampoline(unsigned long *psize);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 75f00965ab15..8a747cd1d735 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -842,8 +842,11 @@ __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
 #define __NR_lsm_list_modules 461
 __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
 
+#define __NR_uretprobe 462
+__SYSCALL(__NR_uretprobe, sys_uretprobe)
+
 #undef __NR_syscalls
-#define __NR_syscalls 462
+#define __NR_syscalls 463
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 929e98c62965..90395b16bde0 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1474,11 +1474,20 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
 	return ret;
 }
 
+void * __weak arch_uprobe_trampoline(unsigned long *psize)
+{
+	static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
+
+	*psize = UPROBE_SWBP_INSN_SIZE;
+	return &insn;
+}
+
 static struct xol_area *__create_xol_area(unsigned long vaddr)
 {
 	struct mm_struct *mm = current->mm;
-	uprobe_opcode_t insn = UPROBE_SWBP_INSN;
+	unsigned long insns_size;
 	struct xol_area *area;
+	void *insns;
 
 	area = kmalloc(sizeof(*area), GFP_KERNEL);
 	if (unlikely(!area))
@@ -1502,7 +1511,8 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
 	/* Reserve the 1st slot for get_trampoline_vaddr() */
 	set_bit(0, area->bitmap);
 	atomic_set(&area->slot_count, 1);
-	arch_uprobe_copy_ixol(area->pages[0], 0, &insn, UPROBE_SWBP_INSN_SIZE);
+	insns = arch_uprobe_trampoline(&insns_size);
+	arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
 
 	if (!xol_add_vma(mm, area))
 		return area;
@@ -2123,7 +2133,7 @@ static struct return_instance *find_next_ret_chain(struct return_instance *ri)
 	return ri;
 }
 
-static void handle_trampoline(struct pt_regs *regs)
+void uprobe_handle_trampoline(struct pt_regs *regs)
 {
 	struct uprobe_task *utask;
 	struct return_instance *ri, *next;
@@ -2188,7 +2198,7 @@ static void handle_swbp(struct pt_regs *regs)
 
 	bp_vaddr = uprobe_get_swbp_addr(regs);
 	if (bp_vaddr == get_trampoline_vaddr())
-		return handle_trampoline(regs);
+		return uprobe_handle_trampoline(regs);
 
 	uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
 	if (!uprobe) {
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index faad00cce269..be6195e0d078 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -391,3 +391,5 @@ COND_SYSCALL(setuid16);
 
 /* restartable sequence */
 COND_SYSCALL(rseq);
+
+COND_SYSCALL(uretprobe);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH RFC bpf-next 2/3] selftests/bpf: Add uretprobe syscall test
  2024-03-18  9:31 [PATCH RFC bpf-next 0/3] uprobe: uretprobe speed up Jiri Olsa
  2024-03-18  9:31 ` [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe Jiri Olsa
@ 2024-03-18  9:31 ` Jiri Olsa
  2024-03-19  1:16   ` Andrii Nakryiko
  2024-03-18  9:31 ` [PATCH RFC bpf-next 3/3] selftests/bpf: Mark uprobe trigger functions with nocf_check attribute Jiri Olsa
  2024-03-19 10:25 ` [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret Oleg Nesterov
  3 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2024-03-18  9:31 UTC (permalink / raw)
  To: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

Add uretprobe syscall test and compare register values before
and after the uretprobe is hit. Also compare the register values
seen from attached bpf program.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/Makefile          | 13 ++-
 .../bpf/prog_tests/arch/x86/uprobe_syscall.S  | 89 +++++++++++++++++++
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 84 +++++++++++++++++
 .../selftests/bpf/progs/uprobe_syscall.c      | 15 ++++
 4 files changed, 200 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 3b9eb40d6343..e425a946276b 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -490,6 +490,9 @@ TRUNNER_TEST_OBJS := $$(patsubst %.c,$$(TRUNNER_OUTPUT)/%.test.o,	\
 				 $$(notdir $$(wildcard $(TRUNNER_TESTS_DIR)/*.c)))
 TRUNNER_EXTRA_OBJS := $$(patsubst %.c,$$(TRUNNER_OUTPUT)/%.o,		\
 				 $$(filter %.c,$(TRUNNER_EXTRA_SOURCES)))
+TRUNNER_ASM_OBJS := $$(patsubst %.S,$$(TRUNNER_OUTPUT)/%.arch.o,	\
+				 $$(notdir $$(wildcard $(TRUNNER_TESTS_DIR)/arch/$(SRCARCH)/*.S)))
+
 TRUNNER_EXTRA_HDRS := $$(filter %.h,$(TRUNNER_EXTRA_SOURCES))
 TRUNNER_TESTS_HDR := $(TRUNNER_TESTS_DIR)/tests.h
 TRUNNER_BPF_SRCS := $$(notdir $$(wildcard $(TRUNNER_BPF_PROGS_DIR)/*.c))
@@ -597,6 +600,13 @@ $(TRUNNER_EXTRA_OBJS): $(TRUNNER_OUTPUT)/%.o:				\
 	$$(call msg,EXT-OBJ,$(TRUNNER_BINARY),$$@)
 	$(Q)$$(CC) $$(CFLAGS) -c $$< $$(LDLIBS) -o $$@
 
+$(TRUNNER_ASM_OBJS): $(TRUNNER_OUTPUT)/%.arch.o:			\
+		       $(TRUNNER_TESTS_DIR)/arch/$(SRCARCH)/%.S		\
+		       $(TRUNNER_TESTS_HDR)				\
+		       $$(BPFOBJ) | $(TRUNNER_OUTPUT)
+	$$(call msg,ASM-OBJ,$(TRUNNER_BINARY),$$@)
+	$(Q)$$(CC) $$(CFLAGS) -c $$< $$(LDLIBS) -o $$@
+
 # non-flavored in-srctree builds receive special treatment, in particular, we
 # do not need to copy extra resources (see e.g. test_btf_dump_case())
 $(TRUNNER_BINARY)-extras: $(TRUNNER_EXTRA_FILES) | $(TRUNNER_OUTPUT)
@@ -606,7 +616,8 @@ ifneq ($2:$(OUTPUT),:$(shell pwd))
 endif
 
 $(OUTPUT)/$(TRUNNER_BINARY): $(TRUNNER_TEST_OBJS)			\
-			     $(TRUNNER_EXTRA_OBJS) $$(BPFOBJ)		\
+			     $(TRUNNER_EXTRA_OBJS) $(TRUNNER_ASM_OBJS)	\
+			     $$(BPFOBJ)					\
 			     $(RESOLVE_BTFIDS)				\
 			     $(TRUNNER_BPFTOOL)				\
 			     | $(TRUNNER_BINARY)-extras
diff --git a/tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S b/tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S
new file mode 100644
index 000000000000..bcbad218c4d6
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef ASM_NL
+#define ASM_NL	 ;
+#endif
+
+#define SYM_ENTRY(name)			\
+	.globl name ASM_NL		\
+	name:
+
+#define SYM_END(name)			\
+	.type name STT_FUNC ASM_NL	\
+	.size name, .-name ASM_NL
+
+.code64
+.section .text, "ax"
+
+SYM_ENTRY(uprobe_syscall_arch_test)
+	movq $0xdeadbeef, %rax
+	ret
+SYM_END(uprobe_syscall_arch_test)
+
+.globl uprobe_syscall_arch
+uprobe_syscall_arch:
+	movq %r15,   0(%rdi)
+	movq %r14,   8(%rdi)
+	movq %r13,  16(%rdi)
+	movq %r12,  24(%rdi)
+	movq %rbp,  32(%rdi)
+	movq %rbx,  40(%rdi)
+	movq %r11,  48(%rdi)
+	movq %r10,  56(%rdi)
+	movq  %r9,  64(%rdi)
+	movq  %r8,  72(%rdi)
+	movq %rax,  80(%rdi)
+	movq %rcx,  88(%rdi)
+	movq %rdx,  96(%rdi)
+	movq %rsi, 104(%rdi)
+	movq %rdi, 112(%rdi)
+	movq   $0, 120(%rdi) /* orig_rax */
+	movq   $0, 128(%rdi) /* rip      */
+	movq   $0, 136(%rdi) /* cs       */
+
+	pushf
+	pop %rax
+
+	movq %rax, 144(%rdi) /* eflags   */
+	movq %rsp, 152(%rdi) /* rsp      */
+	movq   $0, 160(%rdi) /* ss       */
+
+	pushq %rsi
+	call uprobe_syscall_arch_test
+
+	/* store return value and get second argument pointer  to rax */
+	pushq %rax
+	movq 8(%rsp), %rax
+
+	movq %r15,   0(%rax)
+	movq %r14,   8(%rax)
+	movq %r13,  16(%rax)
+	movq %r12,  24(%rax)
+	movq %rbp,  32(%rax)
+	movq %rbx,  40(%rax)
+	movq %r11,  48(%rax)
+	movq %r10,  56(%rax)
+	movq  %r9,  64(%rax)
+	movq  %r8,  72(%rax)
+	movq %rcx,  88(%rax)
+	movq %rdx,  96(%rax)
+	movq %rsi, 104(%rax)
+	movq %rdi, 112(%rax)
+	movq   $0, 120(%rax) /* orig_rax */
+	movq   $0, 128(%rax) /* rip      */
+	movq   $0, 136(%rax) /* cs       */
+
+	pop %rax
+	pop %rsi
+	movq %rax,  80(%rsi)
+
+	pushf
+	pop %rax
+
+	movq %rax, 144(%rsi) /* eflags   */
+	movq %rsp, 152(%rsi) /* rsp      */
+	movq   $0, 160(%rsi) /* ss       */
+
+	ret
+
+.section .note.GNU-stack,"",@progbits
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
new file mode 100644
index 000000000000..0df205fea957
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+
+#ifdef __x86_64__
+
+#include <unistd.h>
+#include <asm/ptrace.h>
+#include "uprobe_syscall.skel.h"
+
+extern int uprobe_syscall_arch(struct pt_regs *before, struct pt_regs *after);
+
+static void test_uretprobe(void)
+{
+	struct pt_regs before = {}, after = {};
+	unsigned long *pb = (unsigned long *) &before;
+	unsigned long *pa = (unsigned long *) &after;
+	unsigned long *prog_regs;
+	struct uprobe_syscall *skel = NULL;
+	unsigned int i, cnt;
+	int err;
+
+	skel = uprobe_syscall__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "uprobe_syscall__open_and_load"))
+		goto cleanup;
+
+	err = uprobe_syscall__attach(skel);
+	if (!ASSERT_OK(err, "uprobe_syscall__attach"))
+		goto cleanup;
+
+	uprobe_syscall_arch(&before, &after);
+
+	prog_regs = (unsigned long *) &skel->bss->regs;
+	cnt = sizeof(before)/sizeof(*pb);
+
+	for (i = 0; i < cnt; i++) {
+		unsigned int offset = i * sizeof(unsigned long);
+
+		/*
+		 * Check register before and after uprobe_syscall_arch_test call
+		 * that triggers the uretprobe.
+		 */
+		switch (offset) {
+		case offsetof(struct pt_regs, rax):
+			ASSERT_EQ(pa[i], 0xdeadbeef, "return value");
+			break;
+		default:
+			if (!ASSERT_EQ(pb[i], pa[i], "register before-after value check"))
+				fprintf(stdout, "failed register offset %u\n", offset);
+		}
+
+		/*
+		 * Check register seen from bpf program and register after
+		 * uprobe_syscall_arch_test call
+		 */
+		switch (offset) {
+		/*
+		 * These will be different (not set in uprobe_syscall_arch),
+		 * we don't care.
+		 */
+		case offsetof(struct pt_regs, orig_rax):
+		case offsetof(struct pt_regs, rip):
+		case offsetof(struct pt_regs, cs):
+		case offsetof(struct pt_regs, rsp):
+		case offsetof(struct pt_regs, ss):
+			break;
+		default:
+			if (!ASSERT_EQ(prog_regs[i], pa[i], "register prog-after value check"))
+				fprintf(stdout, "failed register offset %u\n", offset);
+		}
+	}
+
+cleanup:
+	uprobe_syscall__destroy(skel);
+}
+#else
+static void test_uretprobe(void) { }
+#endif
+
+void test_uprobe_syscall(void)
+{
+	if (test__start_subtest("uretprobe"))
+		test_uretprobe();
+}
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall.c b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
new file mode 100644
index 000000000000..0cc7e8761410
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <string.h>
+
+struct pt_regs regs;
+
+char _license[] SEC("license") = "GPL";
+
+SEC("uretprobe//proc/self/exe:uprobe_syscall_arch_test")
+int uretprobe(struct pt_regs *ctx)
+{
+	memcpy(&regs, ctx, sizeof(regs));
+	return 0;
+}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH RFC bpf-next 3/3] selftests/bpf: Mark uprobe trigger functions with nocf_check attribute
  2024-03-18  9:31 [PATCH RFC bpf-next 0/3] uprobe: uretprobe speed up Jiri Olsa
  2024-03-18  9:31 ` [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe Jiri Olsa
  2024-03-18  9:31 ` [PATCH RFC bpf-next 2/3] selftests/bpf: Add uretprobe syscall test Jiri Olsa
@ 2024-03-18  9:31 ` Jiri Olsa
  2024-03-19  1:22   ` Andrii Nakryiko
  2024-03-19 10:25 ` [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret Oleg Nesterov
  3 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2024-03-18  9:31 UTC (permalink / raw)
  To: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

Some distros seem to enable the -fcf-protection=branch by default,
which breaks our setup on first instruction of uprobe trigger
functions and place there endbr64 instruction.

Marking them with nocf_check attribute to skip that.

Adding -Wno-attributes for bench objects, becase nocf_check can
be used only when -fcf-protection=branch is enabled, otherwise
we get a warning and break compilation.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/include/linux/compiler.h                     | 4 ++++
 tools/testing/selftests/bpf/Makefile               | 2 +-
 tools/testing/selftests/bpf/benchs/bench_trigger.c | 6 +++---
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h
index 7b65566f3e42..14038ce04ca4 100644
--- a/tools/include/linux/compiler.h
+++ b/tools/include/linux/compiler.h
@@ -58,6 +58,10 @@
 #define noinline
 #endif
 
+#ifndef __nocfcheck
+#define __nocfcheck __attribute__((nocf_check))
+#endif
+
 /* Are two types/vars the same type (ignoring qualifiers)? */
 #ifndef __same_type
 # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index e425a946276b..506d3d592093 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -726,7 +726,7 @@ $(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ)
 # Benchmark runner
 $(OUTPUT)/bench_%.o: benchs/bench_%.c bench.h $(BPFOBJ)
 	$(call msg,CC,,$@)
-	$(Q)$(CC) $(CFLAGS) -O2 -c $(filter %.c,$^) $(LDLIBS) -o $@
+	$(Q)$(CC) $(CFLAGS) -O2 -Wno-attributes -c $(filter %.c,$^) $(LDLIBS) -o $@
 $(OUTPUT)/bench_rename.o: $(OUTPUT)/test_overhead.skel.h
 $(OUTPUT)/bench_trigger.o: $(OUTPUT)/trigger_bench.skel.h
 $(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index ace0d1011a8e..3aecc3ef74e9 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -137,7 +137,7 @@ static void trigger_fmodret_setup(void)
  * GCC doesn't generate stack setup preample for these functions due to them
  * having no input arguments and doing nothing in the body.
  */
-__weak void uprobe_target_nop(void)
+__nocfcheck __weak void uprobe_target_nop(void)
 {
 	asm volatile ("nop");
 }
@@ -146,7 +146,7 @@ __weak void opaque_noop_func(void)
 {
 }
 
-__weak int uprobe_target_push(void)
+__nocfcheck __weak int uprobe_target_push(void)
 {
 	/* overhead of function call is negligible compared to uprobe
 	 * triggering, so this shouldn't affect benchmark results much
@@ -155,7 +155,7 @@ __weak int uprobe_target_push(void)
 	return 1;
 }
 
-__weak void uprobe_target_ret(void)
+__nocfcheck __weak void uprobe_target_ret(void)
 {
 	asm volatile ("");
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe
  2024-03-18  9:31 ` [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe Jiri Olsa
@ 2024-03-18 14:22   ` Oleg Nesterov
  2024-03-19  1:11   ` Andrii Nakryiko
  1 sibling, 0 replies; 33+ messages in thread
From: Oleg Nesterov @ 2024-03-18 14:22 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On 03/18, Jiri Olsa wrote:
>
> +SYSCALL_DEFINE0(uretprobe)
> +{
> +	struct pt_regs *regs = task_pt_regs(current);
> +	unsigned long sregs[3], err;
> +
> +	/*
> +	 * We set rax and syscall itself changes rcx and r11, so the syscall
> +	 * trampoline saves their original values on stack. We need to read
> +	 * them and set original register values and fix the rsp pointer back.
> +	 */
> +	err = copy_from_user((void *) &sregs, (void *) regs->sp, sizeof(sregs));
                                              ^^^^^^^^^^^^^^^^^

IIUC, it should be (void __user *)regs->sp to shut up the sparse checks.
The 1st "(void *)" typecast is not needed.

Correctness-wise looks good to me, FWIW

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe
  2024-03-18  9:31 ` [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe Jiri Olsa
  2024-03-18 14:22   ` Oleg Nesterov
@ 2024-03-19  1:11   ` Andrii Nakryiko
  2024-03-19  6:32     ` Oleg Nesterov
  2024-03-19 10:54     ` Jiri Olsa
  1 sibling, 2 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2024-03-19  1:11 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Mon, Mar 18, 2024 at 2:31 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding uretprobe syscall instead of trap to speed up return probe.
>
> At the moment the uretprobe setup/path is:
>
>   - install entry uprobe
>
>   - when the uprobe is hit overwrite probed function's return address on stack
>     with address of the trampoline that contains breakpoint instruction
>
>   - the breakpoint trap code handles the uretprobe consumers execution and
>     jumps back to original return address
>
> This patch changes the above trampoline's breakpoint instruction to new
> ureprobe syscall call. This syscall does exactly the same job as the trap
> with some extra work:
>
>   - syscall trampoline must save original value for rax/r11/rcx registers
>     on stack - rax is set to syscall number and r11/rcx are changed and
>     used by syscall instruction
>
>   - the syscall code reads the original values of those registers and restore
>     those values in task's pt_regs area
>
> Even with the extra registers handling code the having uretprobes handled
> by syscalls shows speed improvement.
>
>   On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz)
>
>   current:
>
>     base           :   15.173 ± 0.052M/s
>     uprobe-nop     :    2.797 ± 0.090M/s
>     uprobe-push    :    2.580 ± 0.007M/s
>     uprobe-ret     :    1.001 ± 0.003M/s
>     uretprobe-nop  :    1.373 ± 0.002M/s
>     uretprobe-push :    1.346 ± 0.002M/s
>     uretprobe-ret  :    0.747 ± 0.001M/s
>
>   with the fix:
>
>     base           :   15.704 ± 0.076M/s
>     uprobe-nop     :    2.841 ± 0.008M/s
>     uprobe-push    :    2.666 ± 0.029M/s
>     uprobe-ret     :    1.037 ± 0.008M/s
>     uretprobe-nop  :    1.718 ± 0.010M/s  < ~25% speed up
>     uretprobe-push :    1.658 ± 0.008M/s  < ~23% speed up
>     uretprobe-ret  :    0.853 ± 0.004M/s  < ~14% speed up
>
>   On Amd (AMD Ryzen 7 5700U)
>
>   current:
>
>     base           :    5.702 ± 0.003M/s
>     uprobe-nop     :    1.505 ± 0.011M/s
>     uprobe-push    :    1.388 ± 0.008M/s
>     uprobe-ret     :    0.825 ± 0.001M/s
>     uretprobe-nop  :    0.782 ± 0.001M/s
>     uretprobe-push :    0.750 ± 0.001M/s
>     uretprobe-ret  :    0.544 ± 0.001M/s
>
>   with the fix:
>
>     base           :    5.669 ± 0.004M/s
>     uprobe-nop     :    1.539 ± 0.001M/s
>     uprobe-push    :    1.385 ± 0.003M/s
>     uprobe-ret     :    0.819 ± 0.001M/s
>     uretprobe-nop  :    0.889 ± 0.001M/s < ~13% speed up
>     uretprobe-push :    0.846 ± 0.001M/s < ~12% speed up
>     uretprobe-ret  :    0.594 ± 0.000M/s <  ~9% speed up
>
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  arch/x86/entry/syscalls/syscall_64.tbl |  1 +
>  arch/x86/kernel/uprobes.c              | 48 ++++++++++++++++++++++++++
>  include/linux/syscalls.h               |  2 ++
>  include/linux/uprobes.h                |  2 ++
>  include/uapi/asm-generic/unistd.h      |  5 ++-
>  kernel/events/uprobes.c                | 18 +++++++---
>  kernel/sys_ni.c                        |  2 ++
>  7 files changed, 73 insertions(+), 5 deletions(-)
>

LGTM, but see some questions and nits below:

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 7e8d46f4147f..af0a33ab06ee 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -383,6 +383,7 @@
>  459    common  lsm_get_self_attr       sys_lsm_get_self_attr
>  460    common  lsm_set_self_attr       sys_lsm_set_self_attr
>  461    common  lsm_list_modules        sys_lsm_list_modules
> +462    64      uretprobe               sys_uretprobe
>
>  #
>  # Due to a historical design error, certain syscalls are numbered differently
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 6c07f6daaa22..069371e86180 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -12,6 +12,7 @@
>  #include <linux/ptrace.h>
>  #include <linux/uprobes.h>
>  #include <linux/uaccess.h>
> +#include <linux/syscalls.h>
>
>  #include <linux/kdebug.h>
>  #include <asm/processor.h>
> @@ -308,6 +309,53 @@ static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool
>  }
>
>  #ifdef CONFIG_X86_64
> +
> +asm (
> +       ".pushsection .rodata\n"
> +       ".global uretprobe_syscall_entry\n"
> +       "uretprobe_syscall_entry:\n"
> +       "pushq %rax\n"
> +       "pushq %rcx\n"
> +       "pushq %r11\n"
> +       "movq $462, %rax\n"

nit: is it possible to avoid hardcoding 462 here? Can we use
__NR_uretprobe instead?

> +       "syscall\n"

oh, btw, do we need to save flags register as well or it's handled
somehow? I think according to manual syscall instruction does
something to rflags register. So do we need pushfq before syscall?

> +       ".global uretprobe_syscall_end\n"
> +       "uretprobe_syscall_end:\n"
> +       ".popsection\n"
> +);
> +
> +extern u8 uretprobe_syscall_entry[];
> +extern u8 uretprobe_syscall_end[];
> +
> +void *arch_uprobe_trampoline(unsigned long *psize)
> +{
> +       *psize = uretprobe_syscall_end - uretprobe_syscall_entry;
> +       return uretprobe_syscall_entry;
> +}
> +
> +SYSCALL_DEFINE0(uretprobe)
> +{
> +       struct pt_regs *regs = task_pt_regs(current);
> +       unsigned long sregs[3], err;
> +
> +       /*
> +        * We set rax and syscall itself changes rcx and r11, so the syscall
> +        * trampoline saves their original values on stack. We need to read
> +        * them and set original register values and fix the rsp pointer back.
> +        */
> +       err = copy_from_user((void *) &sregs, (void *) regs->sp, sizeof(sregs));
> +       WARN_ON_ONCE(err);
> +
> +       regs->r11 = sregs[0];
> +       regs->cx = sregs[1];
> +       regs->ax = sregs[2];
> +       regs->orig_ax = -1;
> +       regs->sp += sizeof(sregs);
> +
> +       uprobe_handle_trampoline(regs);

probably worth leaving a comment that uprobe_handle_trampoline() is
rewriting userspace RIP and so syscall "returns" to the original
caller.

> +       return regs->ax;

and this is making sure that caller function gets the correct function
return value, right? It's all a bit magical, so worth leaving a
comment here, IMO.

> +}
> +
>  /*
>   * If arch_uprobe->insn doesn't use rip-relative addressing, return
>   * immediately.  Otherwise, rewrite the instruction so that it accesses
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 77eb9b0e7685..db150794f89d 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -972,6 +972,8 @@ asmlinkage long sys_lsm_list_modules(u64 *ids, size_t *size, u32 flags);
>  /* x86 */
>  asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int on);
>
> +asmlinkage long sys_uretprobe(void);
> +
>  /* pciconfig: alpha, arm, arm64, ia64, sparc */
>  asmlinkage long sys_pciconfig_read(unsigned long bus, unsigned long dfn,
>                                 unsigned long off, unsigned long len,
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index f46e0ca0169c..a490146ad89d 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -138,6 +138,8 @@ extern bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check c
>  extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
>  extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
>                                          void *src, unsigned long len);
> +extern void uprobe_handle_trampoline(struct pt_regs *regs);
> +extern void *arch_uprobe_trampoline(unsigned long *psize);
>  #else /* !CONFIG_UPROBES */
>  struct uprobes_state {
>  };
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 75f00965ab15..8a747cd1d735 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -842,8 +842,11 @@ __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
>  #define __NR_lsm_list_modules 461
>  __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
>
> +#define __NR_uretprobe 462
> +__SYSCALL(__NR_uretprobe, sys_uretprobe)
> +
>  #undef __NR_syscalls
> -#define __NR_syscalls 462
> +#define __NR_syscalls 463
>
>  /*
>   * 32 bit systems traditionally used different
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 929e98c62965..90395b16bde0 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1474,11 +1474,20 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
>         return ret;
>  }
>
> +void * __weak arch_uprobe_trampoline(unsigned long *psize)
> +{
> +       static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
> +
> +       *psize = UPROBE_SWBP_INSN_SIZE;
> +       return &insn;
> +}
> +
>  static struct xol_area *__create_xol_area(unsigned long vaddr)
>  {
>         struct mm_struct *mm = current->mm;
> -       uprobe_opcode_t insn = UPROBE_SWBP_INSN;
> +       unsigned long insns_size;
>         struct xol_area *area;
> +       void *insns;
>
>         area = kmalloc(sizeof(*area), GFP_KERNEL);
>         if (unlikely(!area))
> @@ -1502,7 +1511,8 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
>         /* Reserve the 1st slot for get_trampoline_vaddr() */
>         set_bit(0, area->bitmap);
>         atomic_set(&area->slot_count, 1);
> -       arch_uprobe_copy_ixol(area->pages[0], 0, &insn, UPROBE_SWBP_INSN_SIZE);
> +       insns = arch_uprobe_trampoline(&insns_size);
> +       arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
>
>         if (!xol_add_vma(mm, area))
>                 return area;
> @@ -2123,7 +2133,7 @@ static struct return_instance *find_next_ret_chain(struct return_instance *ri)
>         return ri;
>  }
>
> -static void handle_trampoline(struct pt_regs *regs)
> +void uprobe_handle_trampoline(struct pt_regs *regs)
>  {
>         struct uprobe_task *utask;
>         struct return_instance *ri, *next;
> @@ -2188,7 +2198,7 @@ static void handle_swbp(struct pt_regs *regs)
>
>         bp_vaddr = uprobe_get_swbp_addr(regs);
>         if (bp_vaddr == get_trampoline_vaddr())
> -               return handle_trampoline(regs);
> +               return uprobe_handle_trampoline(regs);
>
>         uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
>         if (!uprobe) {
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index faad00cce269..be6195e0d078 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -391,3 +391,5 @@ COND_SYSCALL(setuid16);
>
>  /* restartable sequence */
>  COND_SYSCALL(rseq);
> +
> +COND_SYSCALL(uretprobe);
> --
> 2.44.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] selftests/bpf: Add uretprobe syscall test
  2024-03-18  9:31 ` [PATCH RFC bpf-next 2/3] selftests/bpf: Add uretprobe syscall test Jiri Olsa
@ 2024-03-19  1:16   ` Andrii Nakryiko
  2024-03-19 11:09     ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Andrii Nakryiko @ 2024-03-19  1:16 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Mon, Mar 18, 2024 at 2:32 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Add uretprobe syscall test and compare register values before
> and after the uretprobe is hit. Also compare the register values
> seen from attached bpf program.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/testing/selftests/bpf/Makefile          | 13 ++-
>  .../bpf/prog_tests/arch/x86/uprobe_syscall.S  | 89 +++++++++++++++++++
>  .../selftests/bpf/prog_tests/uprobe_syscall.c | 84 +++++++++++++++++
>  .../selftests/bpf/progs/uprobe_syscall.c      | 15 ++++
>  4 files changed, 200 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
>  create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall.c
>

Can all the above be achieved with inline assembly inside .c files? It
would probably simplify logistics overall. We can guard with
arch-specific #ifdefs, of course.

[...]

> diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall.c b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
> new file mode 100644
> index 000000000000..0cc7e8761410
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <string.h>
> +
> +struct pt_regs regs;
> +
> +char _license[] SEC("license") = "GPL";
> +
> +SEC("uretprobe//proc/self/exe:uprobe_syscall_arch_test")
> +int uretprobe(struct pt_regs *ctx)
> +{
> +       memcpy(&regs, ctx, sizeof(regs));

nit: please use __builtin_memcpy(), given this is BPF code. And we
don't need string.h include.

> +       return 0;
> +}
> --
> 2.44.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 3/3] selftests/bpf: Mark uprobe trigger functions with nocf_check attribute
  2024-03-18  9:31 ` [PATCH RFC bpf-next 3/3] selftests/bpf: Mark uprobe trigger functions with nocf_check attribute Jiri Olsa
@ 2024-03-19  1:22   ` Andrii Nakryiko
  2024-03-19 11:11     ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Andrii Nakryiko @ 2024-03-19  1:22 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Mon, Mar 18, 2024 at 2:32 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Some distros seem to enable the -fcf-protection=branch by default,
> which breaks our setup on first instruction of uprobe trigger
> functions and place there endbr64 instruction.
>
> Marking them with nocf_check attribute to skip that.
>
> Adding -Wno-attributes for bench objects, becase nocf_check can
> be used only when -fcf-protection=branch is enabled, otherwise
> we get a warning and break compilation.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/include/linux/compiler.h                     | 4 ++++
>  tools/testing/selftests/bpf/Makefile               | 2 +-
>  tools/testing/selftests/bpf/benchs/bench_trigger.c | 6 +++---
>  3 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h
> index 7b65566f3e42..14038ce04ca4 100644
> --- a/tools/include/linux/compiler.h
> +++ b/tools/include/linux/compiler.h
> @@ -58,6 +58,10 @@
>  #define noinline
>  #endif
>
> +#ifndef __nocfcheck
> +#define __nocfcheck __attribute__((nocf_check))
> +#endif

Let's preserve spelling of the attribut, __nocf_check ?

BTW, just FYI, seems like kernel is defining it as:

#define __noendbr    __attribute__((nocf_check))

Thought somewhere deep in x86-specific code, so probably not a good
idea to use it here?

> +
>  /* Are two types/vars the same type (ignoring qualifiers)? */
>  #ifndef __same_type
>  # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index e425a946276b..506d3d592093 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -726,7 +726,7 @@ $(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ)
>  # Benchmark runner
>  $(OUTPUT)/bench_%.o: benchs/bench_%.c bench.h $(BPFOBJ)
>         $(call msg,CC,,$@)
> -       $(Q)$(CC) $(CFLAGS) -O2 -c $(filter %.c,$^) $(LDLIBS) -o $@
> +       $(Q)$(CC) $(CFLAGS) -O2 -Wno-attributes -c $(filter %.c,$^) $(LDLIBS) -o $@

let's better use `#pragma warning disable` in relevant .c files,
instead of this global flag?

>  $(OUTPUT)/bench_rename.o: $(OUTPUT)/test_overhead.skel.h
>  $(OUTPUT)/bench_trigger.o: $(OUTPUT)/trigger_bench.skel.h
>  $(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \

[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe
  2024-03-19  1:11   ` Andrii Nakryiko
@ 2024-03-19  6:32     ` Oleg Nesterov
  2024-03-19 16:20       ` Andrii Nakryiko
  2024-03-19 10:54     ` Jiri Olsa
  1 sibling, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2024-03-19  6:32 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	bpf, Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On 03/18, Andrii Nakryiko wrote:
>
> > +       "syscall\n"
>
> oh, btw, do we need to save flags register as well or it's handled
> somehow? I think according to manual syscall instruction does
> something to rflags register. So do we need pushfq before syscall?

The comment above entry_SYSCALL_64() says

	64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11

then entry_SYSCALL_64 does

	pushq   %r11                                    /* pt_regs->flags */

which should be restored on return to userspace. So I think that
only X86_EFLAGS_RF can be lost, we probably do not care.

Oleg.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-18  9:31 [PATCH RFC bpf-next 0/3] uprobe: uretprobe speed up Jiri Olsa
                   ` (2 preceding siblings ...)
  2024-03-18  9:31 ` [PATCH RFC bpf-next 3/3] selftests/bpf: Mark uprobe trigger functions with nocf_check attribute Jiri Olsa
@ 2024-03-19 10:25 ` Oleg Nesterov
  2024-03-19 11:08   ` Jiri Olsa
  3 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2024-03-19 10:25 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

Obviously not for inclusion yet ;) untested, lacks the comments, and I am not
sure it makes sense.

But I am wondering if this change can speedup uretprobes a bit more. Any chance
you can test it?

With 1/3 sys_uretprobe() changes regs->r11/cx, this is correct but implies iret.
See the /* SYSRET requires RCX == RIP and R11 == EFLAGS */ code in do_syscall_64().

With this patch uretprobe_syscall_entry restores rcx/r11 itself and does retq,
sys_uretprobe() needs to hijack regs->ip after uprobe_handle_trampoline() to
make it possible.

Comments?

Oleg.
---

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 069371e86180..b99f1d80a8c8 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -319,6 +319,9 @@ asm (
 	"pushq %r11\n"
 	"movq $462, %rax\n"
 	"syscall\n"
+	"popq %r11\n"
+	"popq %rcx\n"
+	"retq\n"
 	".global uretprobe_syscall_end\n"
 	"uretprobe_syscall_end:\n"
 	".popsection\n"
@@ -336,23 +339,20 @@ void *arch_uprobe_trampoline(unsigned long *psize)
 SYSCALL_DEFINE0(uretprobe)
 {
 	struct pt_regs *regs = task_pt_regs(current);
-	unsigned long sregs[3], err;
+	unsigned long __user *ax_and_ret = (unsigned long __user *)regs->sp + 2;
+	unsigned long ip, err;
 
-	/*
-	 * We set rax and syscall itself changes rcx and r11, so the syscall
-	 * trampoline saves their original values on stack. We need to read
-	 * them and set original register values and fix the rsp pointer back.
-	 */
-	err = copy_from_user((void *) &sregs, (void *) regs->sp, sizeof(sregs));
-	WARN_ON_ONCE(err);
-
-	regs->r11 = sregs[0];
-	regs->cx = sregs[1];
-	regs->ax = sregs[2];
+	ip = regs->ip;
 	regs->orig_ax = -1;
-	regs->sp += sizeof(sregs);
+	err = get_user(regs->ax, ax_and_ret);
+	WARN_ON_ONCE(err);
 
 	uprobe_handle_trampoline(regs);
+
+	err = put_user(regs->ip, ax_and_ret);
+	WARN_ON_ONCE(err);
+	regs->ip = ip;
+
 	return regs->ax;
 }
 


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe
  2024-03-19  1:11   ` Andrii Nakryiko
  2024-03-19  6:32     ` Oleg Nesterov
@ 2024-03-19 10:54     ` Jiri Olsa
  1 sibling, 0 replies; 33+ messages in thread
From: Jiri Olsa @ 2024-03-19 10:54 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Mon, Mar 18, 2024 at 06:11:06PM -0700, Andrii Nakryiko wrote:

SNIP

> > diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> > index 7e8d46f4147f..af0a33ab06ee 100644
> > --- a/arch/x86/entry/syscalls/syscall_64.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> > @@ -383,6 +383,7 @@
> >  459    common  lsm_get_self_attr       sys_lsm_get_self_attr
> >  460    common  lsm_set_self_attr       sys_lsm_set_self_attr
> >  461    common  lsm_list_modules        sys_lsm_list_modules
> > +462    64      uretprobe               sys_uretprobe
> >
> >  #
> >  # Due to a historical design error, certain syscalls are numbered differently
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index 6c07f6daaa22..069371e86180 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -12,6 +12,7 @@
> >  #include <linux/ptrace.h>
> >  #include <linux/uprobes.h>
> >  #include <linux/uaccess.h>
> > +#include <linux/syscalls.h>
> >
> >  #include <linux/kdebug.h>
> >  #include <asm/processor.h>
> > @@ -308,6 +309,53 @@ static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool
> >  }
> >
> >  #ifdef CONFIG_X86_64
> > +
> > +asm (
> > +       ".pushsection .rodata\n"
> > +       ".global uretprobe_syscall_entry\n"
> > +       "uretprobe_syscall_entry:\n"
> > +       "pushq %rax\n"
> > +       "pushq %rcx\n"
> > +       "pushq %r11\n"
> > +       "movq $462, %rax\n"
> 
> nit: is it possible to avoid hardcoding 462 here? Can we use
> __NR_uretprobe instead?

yep, will do that

> 
> > +       "syscall\n"
> 
> oh, btw, do we need to save flags register as well or it's handled
> somehow? I think according to manual syscall instruction does
> something to rflags register. So do we need pushfq before syscall?

it's saved and restored by syscall instruction.. but apart from RF
flag as Oleg mentioned, it looks like we don't need to care about
that one

> 
> > +       ".global uretprobe_syscall_end\n"
> > +       "uretprobe_syscall_end:\n"
> > +       ".popsection\n"
> > +);
> > +
> > +extern u8 uretprobe_syscall_entry[];
> > +extern u8 uretprobe_syscall_end[];
> > +
> > +void *arch_uprobe_trampoline(unsigned long *psize)
> > +{
> > +       *psize = uretprobe_syscall_end - uretprobe_syscall_entry;
> > +       return uretprobe_syscall_entry;
> > +}
> > +
> > +SYSCALL_DEFINE0(uretprobe)
> > +{
> > +       struct pt_regs *regs = task_pt_regs(current);
> > +       unsigned long sregs[3], err;
> > +
> > +       /*
> > +        * We set rax and syscall itself changes rcx and r11, so the syscall
> > +        * trampoline saves their original values on stack. We need to read
> > +        * them and set original register values and fix the rsp pointer back.
> > +        */
> > +       err = copy_from_user((void *) &sregs, (void *) regs->sp, sizeof(sregs));
> > +       WARN_ON_ONCE(err);
> > +
> > +       regs->r11 = sregs[0];
> > +       regs->cx = sregs[1];
> > +       regs->ax = sregs[2];
> > +       regs->orig_ax = -1;
> > +       regs->sp += sizeof(sregs);
> > +
> > +       uprobe_handle_trampoline(regs);
> 
> probably worth leaving a comment that uprobe_handle_trampoline() is
> rewriting userspace RIP and so syscall "returns" to the original
> caller.
> 
> > +       return regs->ax;
> 
> and this is making sure that caller function gets the correct function
> return value, right? It's all a bit magical, so worth leaving a
> comment here, IMO.

ok will add more comments about that

thanks,
jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-19 10:25 ` [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret Oleg Nesterov
@ 2024-03-19 11:08   ` Jiri Olsa
  2024-03-19 16:25     ` Andrii Nakryiko
  2024-03-19 19:31     ` Jiri Olsa
  0 siblings, 2 replies; 33+ messages in thread
From: Jiri Olsa @ 2024-03-19 11:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Tue, Mar 19, 2024 at 11:25:24AM +0100, Oleg Nesterov wrote:
> Obviously not for inclusion yet ;) untested, lacks the comments, and I am not
> sure it makes sense.
> 
> But I am wondering if this change can speedup uretprobes a bit more. Any chance
> you can test it?
> 
> With 1/3 sys_uretprobe() changes regs->r11/cx, this is correct but implies iret.
> See the /* SYSRET requires RCX == RIP and R11 == EFLAGS */ code in do_syscall_64().

nice idea, looks like sysexit should be faster

> 
> With this patch uretprobe_syscall_entry restores rcx/r11 itself and does retq,
> sys_uretprobe() needs to hijack regs->ip after uprobe_handle_trampoline() to
> make it possible.
> 
> Comments?
> 
> Oleg.
> ---
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 069371e86180..b99f1d80a8c8 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -319,6 +319,9 @@ asm (
>  	"pushq %r11\n"
>  	"movq $462, %rax\n"
>  	"syscall\n"
> +	"popq %r11\n"
> +	"popq %rcx\n"
> +	"retq\n"

using rax space on stack for return pointer, cool :)

I'll run the test with this change

thanks,
jirka

>  	".global uretprobe_syscall_end\n"
>  	"uretprobe_syscall_end:\n"
>  	".popsection\n"
> @@ -336,23 +339,20 @@ void *arch_uprobe_trampoline(unsigned long *psize)
>  SYSCALL_DEFINE0(uretprobe)
>  {
>  	struct pt_regs *regs = task_pt_regs(current);
> -	unsigned long sregs[3], err;
> +	unsigned long __user *ax_and_ret = (unsigned long __user *)regs->sp + 2;
> +	unsigned long ip, err;
>  
> -	/*
> -	 * We set rax and syscall itself changes rcx and r11, so the syscall
> -	 * trampoline saves their original values on stack. We need to read
> -	 * them and set original register values and fix the rsp pointer back.
> -	 */
> -	err = copy_from_user((void *) &sregs, (void *) regs->sp, sizeof(sregs));
> -	WARN_ON_ONCE(err);
> -
> -	regs->r11 = sregs[0];
> -	regs->cx = sregs[1];
> -	regs->ax = sregs[2];
> +	ip = regs->ip;
>  	regs->orig_ax = -1;
> -	regs->sp += sizeof(sregs);
> +	err = get_user(regs->ax, ax_and_ret);
> +	WARN_ON_ONCE(err);
>  
>  	uprobe_handle_trampoline(regs);
> +
> +	err = put_user(regs->ip, ax_and_ret);
> +	WARN_ON_ONCE(err);
> +	regs->ip = ip;
> +
>  	return regs->ax;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] selftests/bpf: Add uretprobe syscall test
  2024-03-19  1:16   ` Andrii Nakryiko
@ 2024-03-19 11:09     ` Jiri Olsa
  0 siblings, 0 replies; 33+ messages in thread
From: Jiri Olsa @ 2024-03-19 11:09 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Mon, Mar 18, 2024 at 06:16:09PM -0700, Andrii Nakryiko wrote:
> On Mon, Mar 18, 2024 at 2:32 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Add uretprobe syscall test and compare register values before
> > and after the uretprobe is hit. Also compare the register values
> > seen from attached bpf program.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  tools/testing/selftests/bpf/Makefile          | 13 ++-
> >  .../bpf/prog_tests/arch/x86/uprobe_syscall.S  | 89 +++++++++++++++++++
> >  .../selftests/bpf/prog_tests/uprobe_syscall.c | 84 +++++++++++++++++
> >  .../selftests/bpf/progs/uprobe_syscall.c      | 15 ++++
> >  4 files changed, 200 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall.c
> >
> 
> Can all the above be achieved with inline assembly inside .c files? It
> would probably simplify logistics overall. We can guard with
> arch-specific #ifdefs, of course.

ok, probably yes.. I'll check

> 
> [...]
> 
> > diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall.c b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
> > new file mode 100644
> > index 000000000000..0cc7e8761410
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
> > @@ -0,0 +1,15 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include "vmlinux.h"
> > +#include <bpf/bpf_helpers.h>
> > +#include <string.h>
> > +
> > +struct pt_regs regs;
> > +
> > +char _license[] SEC("license") = "GPL";
> > +
> > +SEC("uretprobe//proc/self/exe:uprobe_syscall_arch_test")
> > +int uretprobe(struct pt_regs *ctx)
> > +{
> > +       memcpy(&regs, ctx, sizeof(regs));
> 
> nit: please use __builtin_memcpy(), given this is BPF code. And we
> don't need string.h include.

right, thanks

jirka

> 
> > +       return 0;
> > +}
> > --
> > 2.44.0
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 3/3] selftests/bpf: Mark uprobe trigger functions with nocf_check attribute
  2024-03-19  1:22   ` Andrii Nakryiko
@ 2024-03-19 11:11     ` Jiri Olsa
  2024-03-22 13:40       ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2024-03-19 11:11 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Mon, Mar 18, 2024 at 06:22:02PM -0700, Andrii Nakryiko wrote:
> On Mon, Mar 18, 2024 at 2:32 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Some distros seem to enable the -fcf-protection=branch by default,
> > which breaks our setup on first instruction of uprobe trigger
> > functions and place there endbr64 instruction.
> >
> > Marking them with nocf_check attribute to skip that.
> >
> > Adding -Wno-attributes for bench objects, becase nocf_check can
> > be used only when -fcf-protection=branch is enabled, otherwise
> > we get a warning and break compilation.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  tools/include/linux/compiler.h                     | 4 ++++
> >  tools/testing/selftests/bpf/Makefile               | 2 +-
> >  tools/testing/selftests/bpf/benchs/bench_trigger.c | 6 +++---
> >  3 files changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h
> > index 7b65566f3e42..14038ce04ca4 100644
> > --- a/tools/include/linux/compiler.h
> > +++ b/tools/include/linux/compiler.h
> > @@ -58,6 +58,10 @@
> >  #define noinline
> >  #endif
> >
> > +#ifndef __nocfcheck
> > +#define __nocfcheck __attribute__((nocf_check))
> > +#endif
> 
> Let's preserve spelling of the attribut, __nocf_check ?
> 
> BTW, just FYI, seems like kernel is defining it as:
> 
> #define __noendbr    __attribute__((nocf_check))
> 
> Thought somewhere deep in x86-specific code, so probably not a good
> idea to use it here?

ugh, I missed it.. better to use __noendbr

> 
> > +
> >  /* Are two types/vars the same type (ignoring qualifiers)? */
> >  #ifndef __same_type
> >  # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
> > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > index e425a946276b..506d3d592093 100644
> > --- a/tools/testing/selftests/bpf/Makefile
> > +++ b/tools/testing/selftests/bpf/Makefile
> > @@ -726,7 +726,7 @@ $(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ)
> >  # Benchmark runner
> >  $(OUTPUT)/bench_%.o: benchs/bench_%.c bench.h $(BPFOBJ)
> >         $(call msg,CC,,$@)
> > -       $(Q)$(CC) $(CFLAGS) -O2 -c $(filter %.c,$^) $(LDLIBS) -o $@
> > +       $(Q)$(CC) $(CFLAGS) -O2 -Wno-attributes -c $(filter %.c,$^) $(LDLIBS) -o $@
> 
> let's better use `#pragma warning disable` in relevant .c files,
> instead of this global flag?

ok, will try that

thanks,
jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe
  2024-03-19  6:32     ` Oleg Nesterov
@ 2024-03-19 16:20       ` Andrii Nakryiko
  0 siblings, 0 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2024-03-19 16:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	bpf, Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Mon, Mar 18, 2024 at 11:33 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 03/18, Andrii Nakryiko wrote:
> >
> > > +       "syscall\n"
> >
> > oh, btw, do we need to save flags register as well or it's handled
> > somehow? I think according to manual syscall instruction does
> > something to rflags register. So do we need pushfq before syscall?
>
> The comment above entry_SYSCALL_64() says
>
>         64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11
>
> then entry_SYSCALL_64 does
>
>         pushq   %r11                                    /* pt_regs->flags */
>
> which should be restored on return to userspace. So I think that
> only X86_EFLAGS_RF can be lost, we probably do not care.
>

Cool, thanks for elaborating!


> Oleg.
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-19 11:08   ` Jiri Olsa
@ 2024-03-19 16:25     ` Andrii Nakryiko
  2024-03-19 16:38       ` Oleg Nesterov
  2024-03-19 19:35       ` Jiri Olsa
  2024-03-19 19:31     ` Jiri Olsa
  1 sibling, 2 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2024-03-19 16:25 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Tue, Mar 19, 2024 at 4:08 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Tue, Mar 19, 2024 at 11:25:24AM +0100, Oleg Nesterov wrote:
> > Obviously not for inclusion yet ;) untested, lacks the comments, and I am not
> > sure it makes sense.
> >
> > But I am wondering if this change can speedup uretprobes a bit more. Any chance
> > you can test it?
> >
> > With 1/3 sys_uretprobe() changes regs->r11/cx, this is correct but implies iret.
> > See the /* SYSRET requires RCX == RIP and R11 == EFLAGS */ code in do_syscall_64().
>
> nice idea, looks like sysexit should be faster
>
> >
> > With this patch uretprobe_syscall_entry restores rcx/r11 itself and does retq,
> > sys_uretprobe() needs to hijack regs->ip after uprobe_handle_trampoline() to
> > make it possible.
> >
> > Comments?
> >
> > Oleg.
> > ---
> >
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index 069371e86180..b99f1d80a8c8 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -319,6 +319,9 @@ asm (
> >       "pushq %r11\n"
> >       "movq $462, %rax\n"
> >       "syscall\n"
> > +     "popq %r11\n"
> > +     "popq %rcx\n"
> > +     "retq\n"
>
> using rax space on stack for return pointer, cool :)
>
> I'll run the test with this change
>

I can do some benchmarking on my side as well, given I have everything
set up for this anyways. Thanks for the help with speeding all this
up!

BTW, Jiri, what are you plans regarding sys_uprobe (entry probe
optimization through syscall), while we are on the topic?


> thanks,
> jirka
>
> >       ".global uretprobe_syscall_end\n"
> >       "uretprobe_syscall_end:\n"
> >       ".popsection\n"
> > @@ -336,23 +339,20 @@ void *arch_uprobe_trampoline(unsigned long *psize)
> >  SYSCALL_DEFINE0(uretprobe)
> >  {
> >       struct pt_regs *regs = task_pt_regs(current);
> > -     unsigned long sregs[3], err;
> > +     unsigned long __user *ax_and_ret = (unsigned long __user *)regs->sp + 2;
> > +     unsigned long ip, err;
> >
> > -     /*
> > -      * We set rax and syscall itself changes rcx and r11, so the syscall
> > -      * trampoline saves their original values on stack. We need to read
> > -      * them and set original register values and fix the rsp pointer back.
> > -      */
> > -     err = copy_from_user((void *) &sregs, (void *) regs->sp, sizeof(sregs));
> > -     WARN_ON_ONCE(err);
> > -
> > -     regs->r11 = sregs[0];
> > -     regs->cx = sregs[1];
> > -     regs->ax = sregs[2];
> > +     ip = regs->ip;
> >       regs->orig_ax = -1;
> > -     regs->sp += sizeof(sregs);
> > +     err = get_user(regs->ax, ax_and_ret);
> > +     WARN_ON_ONCE(err);
> >
> >       uprobe_handle_trampoline(regs);
> > +
> > +     err = put_user(regs->ip, ax_and_ret);
> > +     WARN_ON_ONCE(err);
> > +     regs->ip = ip;
> > +
> >       return regs->ax;
> >  }
> >
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-19 16:25     ` Andrii Nakryiko
@ 2024-03-19 16:38       ` Oleg Nesterov
  2024-03-19 19:35       ` Jiri Olsa
  1 sibling, 0 replies; 33+ messages in thread
From: Oleg Nesterov @ 2024-03-19 16:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	bpf, Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On 03/19, Andrii Nakryiko wrote:
>
> On Tue, Mar 19, 2024 at 4:08 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> >
> > I'll run the test with this change
> >
>
> I can do some benchmarking on my side as well, given I have everything
> set up for this anyways. Thanks for the help with speeding all this
> up!

Great, thank you both!

but just in case let me repeat, this is just a poc, the patch is obviously
incomplete and needs more discussion even _if_ it actually makes the syscall
faster, and I am not sure it does.

Oleg.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-19 11:08   ` Jiri Olsa
  2024-03-19 16:25     ` Andrii Nakryiko
@ 2024-03-19 19:31     ` Jiri Olsa
  2024-03-19 20:13       ` Andrii Nakryiko
  2024-03-20 11:04       ` Jiri Olsa
  1 sibling, 2 replies; 33+ messages in thread
From: Jiri Olsa @ 2024-03-19 19:31 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Tue, Mar 19, 2024 at 12:08:35PM +0100, Jiri Olsa wrote:
> On Tue, Mar 19, 2024 at 11:25:24AM +0100, Oleg Nesterov wrote:
> > Obviously not for inclusion yet ;) untested, lacks the comments, and I am not
> > sure it makes sense.
> > 
> > But I am wondering if this change can speedup uretprobes a bit more. Any chance
> > you can test it?
> > 
> > With 1/3 sys_uretprobe() changes regs->r11/cx, this is correct but implies iret.
> > See the /* SYSRET requires RCX == RIP and R11 == EFLAGS */ code in do_syscall_64().
> 
> nice idea, looks like sysexit should be faster
> 
> > 
> > With this patch uretprobe_syscall_entry restores rcx/r11 itself and does retq,
> > sys_uretprobe() needs to hijack regs->ip after uprobe_handle_trampoline() to
> > make it possible.
> > 
> > Comments?
> > 
> > Oleg.
> > ---
> > 
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index 069371e86180..b99f1d80a8c8 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -319,6 +319,9 @@ asm (
> >  	"pushq %r11\n"
> >  	"movq $462, %rax\n"
> >  	"syscall\n"
> > +	"popq %r11\n"
> > +	"popq %rcx\n"
> > +	"retq\n"
> 
> using rax space on stack for return pointer, cool :)
> 
> I'll run the test with this change

I got bigger speed up on intel, amd stays the same (I'll double check that)

current:
  base           :   16.133 ± 0.035M/s
  uprobe-nop     :    3.003 ± 0.002M/s
  uprobe-push    :    2.829 ± 0.001M/s
  uprobe-ret     :    1.101 ± 0.001M/s
  uretprobe-nop  :    1.485 ± 0.001M/s
  uretprobe-push :    1.447 ± 0.000M/s
  uretprobe-ret  :    0.812 ± 0.000M/s

fix:
  base           :   16.522 ± 0.103M/s
  uprobe-nop     :    2.920 ± 0.034M/s
  uprobe-push    :    2.749 ± 0.047M/s
  uprobe-ret     :    1.094 ± 0.003M/s
  uretprobe-nop  :    2.004 ± 0.006M/s  < ~34% speed up
  uretprobe-push :    1.940 ± 0.003M/s  < ~34% speed up
  uretprobe-ret  :    0.921 ± 0.050M/s  < ~13% speed up

original fix:
  base           :   15.704 ± 0.076M/s
  uprobe-nop     :    2.841 ± 0.008M/s
  uprobe-push    :    2.666 ± 0.029M/s
  uprobe-ret     :    1.037 ± 0.008M/s
  uretprobe-nop  :    1.718 ± 0.010M/s  < ~25% speed up
  uretprobe-push :    1.658 ± 0.008M/s  < ~23% speed up
  uretprobe-ret  :    0.853 ± 0.004M/s  < ~14% speed up


jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-19 16:25     ` Andrii Nakryiko
  2024-03-19 16:38       ` Oleg Nesterov
@ 2024-03-19 19:35       ` Jiri Olsa
  1 sibling, 0 replies; 33+ messages in thread
From: Jiri Olsa @ 2024-03-19 19:35 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Tue, Mar 19, 2024 at 09:25:57AM -0700, Andrii Nakryiko wrote:
> On Tue, Mar 19, 2024 at 4:08 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Tue, Mar 19, 2024 at 11:25:24AM +0100, Oleg Nesterov wrote:
> > > Obviously not for inclusion yet ;) untested, lacks the comments, and I am not
> > > sure it makes sense.
> > >
> > > But I am wondering if this change can speedup uretprobes a bit more. Any chance
> > > you can test it?
> > >
> > > With 1/3 sys_uretprobe() changes regs->r11/cx, this is correct but implies iret.
> > > See the /* SYSRET requires RCX == RIP and R11 == EFLAGS */ code in do_syscall_64().
> >
> > nice idea, looks like sysexit should be faster
> >
> > >
> > > With this patch uretprobe_syscall_entry restores rcx/r11 itself and does retq,
> > > sys_uretprobe() needs to hijack regs->ip after uprobe_handle_trampoline() to
> > > make it possible.
> > >
> > > Comments?
> > >
> > > Oleg.
> > > ---
> > >
> > > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > > index 069371e86180..b99f1d80a8c8 100644
> > > --- a/arch/x86/kernel/uprobes.c
> > > +++ b/arch/x86/kernel/uprobes.c
> > > @@ -319,6 +319,9 @@ asm (
> > >       "pushq %r11\n"
> > >       "movq $462, %rax\n"
> > >       "syscall\n"
> > > +     "popq %r11\n"
> > > +     "popq %rcx\n"
> > > +     "retq\n"
> >
> > using rax space on stack for return pointer, cool :)
> >
> > I'll run the test with this change
> >
> 
> I can do some benchmarking on my side as well, given I have everything

that'd be great, thanks

> set up for this anyways. Thanks for the help with speeding all this
> up!
> 
> BTW, Jiri, what are you plans regarding sys_uprobe (entry probe
> optimization through syscall), while we are on the topic?

I plan to work on that after this one is sorted out

jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-19 19:31     ` Jiri Olsa
@ 2024-03-19 20:13       ` Andrii Nakryiko
  2024-03-20 11:04       ` Jiri Olsa
  1 sibling, 0 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2024-03-19 20:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Tue, Mar 19, 2024 at 12:32 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Tue, Mar 19, 2024 at 12:08:35PM +0100, Jiri Olsa wrote:
> > On Tue, Mar 19, 2024 at 11:25:24AM +0100, Oleg Nesterov wrote:
> > > Obviously not for inclusion yet ;) untested, lacks the comments, and I am not
> > > sure it makes sense.
> > >
> > > But I am wondering if this change can speedup uretprobes a bit more. Any chance
> > > you can test it?
> > >
> > > With 1/3 sys_uretprobe() changes regs->r11/cx, this is correct but implies iret.
> > > See the /* SYSRET requires RCX == RIP and R11 == EFLAGS */ code in do_syscall_64().
> >
> > nice idea, looks like sysexit should be faster
> >
> > >
> > > With this patch uretprobe_syscall_entry restores rcx/r11 itself and does retq,
> > > sys_uretprobe() needs to hijack regs->ip after uprobe_handle_trampoline() to
> > > make it possible.
> > >
> > > Comments?
> > >
> > > Oleg.
> > > ---
> > >
> > > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > > index 069371e86180..b99f1d80a8c8 100644
> > > --- a/arch/x86/kernel/uprobes.c
> > > +++ b/arch/x86/kernel/uprobes.c
> > > @@ -319,6 +319,9 @@ asm (
> > >     "pushq %r11\n"
> > >     "movq $462, %rax\n"
> > >     "syscall\n"
> > > +   "popq %r11\n"
> > > +   "popq %rcx\n"
> > > +   "retq\n"
> >
> > using rax space on stack for return pointer, cool :)
> >
> > I'll run the test with this change
>
> I got bigger speed up on intel, amd stays the same (I'll double check that)
>
> current:
>   base           :   16.133 ± 0.035M/s
>   uprobe-nop     :    3.003 ± 0.002M/s
>   uprobe-push    :    2.829 ± 0.001M/s
>   uprobe-ret     :    1.101 ± 0.001M/s
>   uretprobe-nop  :    1.485 ± 0.001M/s
>   uretprobe-push :    1.447 ± 0.000M/s
>   uretprobe-ret  :    0.812 ± 0.000M/s
>
> fix:
>   base           :   16.522 ± 0.103M/s
>   uprobe-nop     :    2.920 ± 0.034M/s
>   uprobe-push    :    2.749 ± 0.047M/s
>   uprobe-ret     :    1.094 ± 0.003M/s
>   uretprobe-nop  :    2.004 ± 0.006M/s  < ~34% speed up
>   uretprobe-push :    1.940 ± 0.003M/s  < ~34% speed up
>   uretprobe-ret  :    0.921 ± 0.050M/s  < ~13% speed up
>
> original fix:
>   base           :   15.704 ± 0.076M/s
>   uprobe-nop     :    2.841 ± 0.008M/s
>   uprobe-push    :    2.666 ± 0.029M/s
>   uprobe-ret     :    1.037 ± 0.008M/s
>   uretprobe-nop  :    1.718 ± 0.010M/s  < ~25% speed up
>   uretprobe-push :    1.658 ± 0.008M/s  < ~23% speed up
>   uretprobe-ret  :    0.853 ± 0.004M/s  < ~14% speed up
>

My machine is slower, even though I turned out mitigations and stuff
like that, I feel like there are still some slow downs. But either
way, data is at least consistent as far as baseline goes (it's called
syscall-count now in my local changes I'm yet to submit), and yes,
Oleg's change does produce a noticeable speed up:

baseline
========
usermode-count :   79.509 ± 0.038M/s
syscall-count  :    9.550 ± 0.002M/s
uprobe-nop     :    1.530 ± 0.000M/s
uprobe-push    :    1.457 ± 0.000M/s
uprobe-ret     :    0.642 ± 0.000M/s
uretprobe-nop  :    0.777 ± 0.000M/s
uretprobe-push :    0.761 ± 0.000M/s
uretprobe-ret  :    0.459 ± 0.000M/s

Jiri
====
usermode-count :   79.515 ± 0.014M/s
syscall-count  :    9.439 ± 0.006M/s
uprobe-nop     :    1.520 ± 0.001M/s
uprobe-push    :    1.464 ± 0.000M/s
uprobe-ret     :    0.640 ± 0.000M/s
uretprobe-nop  :    0.893 ± 0.000M/s (+15%)
uretprobe-push :    0.867 ± 0.000M/s (+14%)
uretprobe-ret  :    0.498 ± 0.000M/s (+8.5%)

Oleg+Jiri
=========
usermode-count :   79.471 ± 0.078M/s
syscall-count  :    9.434 ± 0.007M/s
uprobe-nop     :    1.516 ± 0.003M/s
uprobe-push    :    1.463 ± 0.000M/s
uprobe-ret     :    0.640 ± 0.001M/s
uretprobe-nop  :    1.020 ± 0.001M/s (+31%)
uretprobe-push :    0.998 ± 0.001M/s (+31%)
uretprobe-ret  :    0.537 ± 0.000M/s (+17%)

So it's 2x of just Jiri's changes, which is a very nice boost! I only
tested on Intel CPU.


>
> jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-19 19:31     ` Jiri Olsa
  2024-03-19 20:13       ` Andrii Nakryiko
@ 2024-03-20 11:04       ` Jiri Olsa
  2024-03-20 14:37         ` Oleg Nesterov
  1 sibling, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2024-03-20 11:04 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Tue, Mar 19, 2024 at 08:31:55PM +0100, Jiri Olsa wrote:
> On Tue, Mar 19, 2024 at 12:08:35PM +0100, Jiri Olsa wrote:
> > On Tue, Mar 19, 2024 at 11:25:24AM +0100, Oleg Nesterov wrote:
> > > Obviously not for inclusion yet ;) untested, lacks the comments, and I am not
> > > sure it makes sense.
> > > 
> > > But I am wondering if this change can speedup uretprobes a bit more. Any chance
> > > you can test it?
> > > 
> > > With 1/3 sys_uretprobe() changes regs->r11/cx, this is correct but implies iret.
> > > See the /* SYSRET requires RCX == RIP and R11 == EFLAGS */ code in do_syscall_64().
> > 
> > nice idea, looks like sysexit should be faster
> > 
> > > 
> > > With this patch uretprobe_syscall_entry restores rcx/r11 itself and does retq,
> > > sys_uretprobe() needs to hijack regs->ip after uprobe_handle_trampoline() to
> > > make it possible.
> > > 
> > > Comments?
> > > 
> > > Oleg.
> > > ---
> > > 
> > > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > > index 069371e86180..b99f1d80a8c8 100644
> > > --- a/arch/x86/kernel/uprobes.c
> > > +++ b/arch/x86/kernel/uprobes.c
> > > @@ -319,6 +319,9 @@ asm (
> > >  	"pushq %r11\n"
> > >  	"movq $462, %rax\n"
> > >  	"syscall\n"
> > > +	"popq %r11\n"
> > > +	"popq %rcx\n"
> > > +	"retq\n"
> > 
> > using rax space on stack for return pointer, cool :)
> > 
> > I'll run the test with this change
> 
> I got bigger speed up on intel, amd stays the same (I'll double check that)

yes, I'm getting no speed up on AMD, but Intel's great

Oleg,
are you ok if I squash the patches together or you
want to send it separately?

jirka


> 
> current:
>   base           :   16.133 ± 0.035M/s
>   uprobe-nop     :    3.003 ± 0.002M/s
>   uprobe-push    :    2.829 ± 0.001M/s
>   uprobe-ret     :    1.101 ± 0.001M/s
>   uretprobe-nop  :    1.485 ± 0.001M/s
>   uretprobe-push :    1.447 ± 0.000M/s
>   uretprobe-ret  :    0.812 ± 0.000M/s
> 
> fix:
>   base           :   16.522 ± 0.103M/s
>   uprobe-nop     :    2.920 ± 0.034M/s
>   uprobe-push    :    2.749 ± 0.047M/s
>   uprobe-ret     :    1.094 ± 0.003M/s
>   uretprobe-nop  :    2.004 ± 0.006M/s  < ~34% speed up
>   uretprobe-push :    1.940 ± 0.003M/s  < ~34% speed up
>   uretprobe-ret  :    0.921 ± 0.050M/s  < ~13% speed up
> 
> original fix:
>   base           :   15.704 ± 0.076M/s
>   uprobe-nop     :    2.841 ± 0.008M/s
>   uprobe-push    :    2.666 ± 0.029M/s
>   uprobe-ret     :    1.037 ± 0.008M/s
>   uretprobe-nop  :    1.718 ± 0.010M/s  < ~25% speed up
>   uretprobe-push :    1.658 ± 0.008M/s  < ~23% speed up
>   uretprobe-ret  :    0.853 ± 0.004M/s  < ~14% speed up
> 
> 
> jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-20 11:04       ` Jiri Olsa
@ 2024-03-20 14:37         ` Oleg Nesterov
  2024-03-20 15:28           ` Oleg Nesterov
  0 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2024-03-20 14:37 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On 03/20, Jiri Olsa wrote:
>
> are you ok if I squash the patches together

Yes, thanks, I am fine.

But lets discuss this change a bit more. So, with this poc we have the
(intentionally) oversimplified

	SYSCALL_DEFINE0(uretprobe)
	{
		struct pt_regs *regs = task_pt_regs(current);
		unsigned long __user *ax_and_ret = (unsigned long __user *)regs->sp + 2;
		unsigned long ip, err;

		ip = regs->ip;
		regs->orig_ax = -1;
		err = get_user(regs->ax, ax_and_ret);
		WARN_ON_ONCE(err);

		uprobe_handle_trampoline(regs);

		err = put_user(regs->ip, ax_and_ret);
		WARN_ON_ONCE(err);
		regs->ip = ip;

		return regs->ax;
	}

I have no idea what uprobe consumers / bpf programs can do, so let me ask:

	- uprobe_consumer's will see the "wrong" values of regs->cx/r11/sp
	  Is it OK? If not - easy to fix.

	- can uprobe_consumer change regs->cx/r11 ? If yes - easy to fix.

	- can uprobe_consumer change regs->sp ? If yes - easy to fix too,
	  but needs a separate check/code.

Oleg.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-20 14:37         ` Oleg Nesterov
@ 2024-03-20 15:28           ` Oleg Nesterov
  2024-03-20 17:44             ` Andrii Nakryiko
  2024-03-21  9:59             ` Jiri Olsa
  0 siblings, 2 replies; 33+ messages in thread
From: Oleg Nesterov @ 2024-03-20 15:28 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On 03/20, Oleg Nesterov wrote:
>
> On 03/20, Jiri Olsa wrote:
> >
> > are you ok if I squash the patches together
>
> Yes, thanks, I am fine.
>
> But lets discuss this change a bit more. So, with this poc we have the
> (intentionally) oversimplified
>
> 	SYSCALL_DEFINE0(uretprobe)
> 	{
> 		struct pt_regs *regs = task_pt_regs(current);
> 		unsigned long __user *ax_and_ret = (unsigned long __user *)regs->sp + 2;
> 		unsigned long ip, err;
>
> 		ip = regs->ip;
> 		regs->orig_ax = -1;
> 		err = get_user(regs->ax, ax_and_ret);
> 		WARN_ON_ONCE(err);
>
> 		uprobe_handle_trampoline(regs);
>
> 		err = put_user(regs->ip, ax_and_ret);
> 		WARN_ON_ONCE(err);
> 		regs->ip = ip;
>
> 		return regs->ax;
> 	}
>
> I have no idea what uprobe consumers / bpf programs can do, so let me ask:
>
> 	- uprobe_consumer's will see the "wrong" values of regs->cx/r11/sp
> 	  Is it OK? If not - easy to fix.
>
> 	- can uprobe_consumer change regs->cx/r11 ? If yes - easy to fix.
>
> 	- can uprobe_consumer change regs->sp ? If yes - easy to fix too,
> 	  but needs a separate check/code.

IOW. If answer is "yes" to all the questions above, then we probably need
something like

	SYSCALL_DEFINE0(uretprobe)
	{
		struct pt_regs *regs = task_pt_regs(current);
		unsigned long err, ip, sp, r11_cx_ax[3];

		err = copy_from_user(r11_cx_ax, (void __user*)regs->sp, sizeof(r11_cx_ax));
		WARN_ON_ONCE(err);

		// Q1: apart from ax, do we really care?
		// expose the "right" values of r11/cx/ax/sp to uprobe_consumer's
		regs->r11 = r11_cx_ax[0];
		regs->cx  = r11_cx_ax[1];
		regs->ax  = r11_cx_ax[2];
		regs->sp += sizeof(r11_cx_ax);
		regs->orig_ax = -1;

		ip = regs->ip;
		sp = regs->sp;

		uprobe_handle_trampoline(regs);

		// Q2: is it possible? do we care?
		// uprobe_consumer has changed sp, we can do nothing,
		// just return via iret.
		if (regs->sp != sp)
			return regs->ax;
		regs->sp -= sizeof(r11_cx_ax);

		// Q3: is it possible? do we care?
		// for the case uprobe_consumer has changed r11/cx
		r11_cx_ax[0] = regs->r11;
		r11_cx_ax[1] = regs->cx;

		// comment to explain this hack
		r11_cx_ax[2] = regs->ip;
		regs->ip = ip;

		err = copy_to_user((void __user*)regs->sp, r11_cx_ax, sizeof(r11_cx_ax));
		WARN_ON_ONCE(err);

		// ensure sysret, see do_syscall_64()
		regs->r11 = regs->flags;
		regs->cx  = regs->ip;

		return regs->ax;
	}

Oleg.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-20 15:28           ` Oleg Nesterov
@ 2024-03-20 17:44             ` Andrii Nakryiko
  2024-03-20 19:08               ` Jiri Olsa
  2024-03-21  9:59             ` Jiri Olsa
  1 sibling, 1 reply; 33+ messages in thread
From: Andrii Nakryiko @ 2024-03-20 17:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	bpf, Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Wed, Mar 20, 2024 at 8:30 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 03/20, Oleg Nesterov wrote:
> >
> > On 03/20, Jiri Olsa wrote:
> > >
> > > are you ok if I squash the patches together
> >
> > Yes, thanks, I am fine.
> >
> > But lets discuss this change a bit more. So, with this poc we have the
> > (intentionally) oversimplified
> >
> >       SYSCALL_DEFINE0(uretprobe)
> >       {
> >               struct pt_regs *regs = task_pt_regs(current);
> >               unsigned long __user *ax_and_ret = (unsigned long __user *)regs->sp + 2;
> >               unsigned long ip, err;
> >
> >               ip = regs->ip;
> >               regs->orig_ax = -1;
> >               err = get_user(regs->ax, ax_and_ret);
> >               WARN_ON_ONCE(err);
> >
> >               uprobe_handle_trampoline(regs);
> >
> >               err = put_user(regs->ip, ax_and_ret);
> >               WARN_ON_ONCE(err);
> >               regs->ip = ip;
> >
> >               return regs->ax;
> >       }
> >
> > I have no idea what uprobe consumers / bpf programs can do, so let me ask:
> >
> >       - uprobe_consumer's will see the "wrong" values of regs->cx/r11/sp
> >         Is it OK? If not - easy to fix.
> >
> >       - can uprobe_consumer change regs->cx/r11 ? If yes - easy to fix.
> >
> >       - can uprobe_consumer change regs->sp ? If yes - easy to fix too,
> >         but needs a separate check/code.
>
> IOW. If answer is "yes" to all the questions above, then we probably need
> something like

yes to first, so ideally we fix registers to "correct" values
(especially sp), but no to the last two (at least as far as BPF is
concerned)

>
>         SYSCALL_DEFINE0(uretprobe)
>         {
>                 struct pt_regs *regs = task_pt_regs(current);
>                 unsigned long err, ip, sp, r11_cx_ax[3];
>
>                 err = copy_from_user(r11_cx_ax, (void __user*)regs->sp, sizeof(r11_cx_ax));
>                 WARN_ON_ONCE(err);
>
>                 // Q1: apart from ax, do we really care?
>                 // expose the "right" values of r11/cx/ax/sp to uprobe_consumer's
>                 regs->r11 = r11_cx_ax[0];
>                 regs->cx  = r11_cx_ax[1];
>                 regs->ax  = r11_cx_ax[2];
>                 regs->sp += sizeof(r11_cx_ax);
>                 regs->orig_ax = -1;
>
>                 ip = regs->ip;
>                 sp = regs->sp;
>
>                 uprobe_handle_trampoline(regs);
>
>                 // Q2: is it possible? do we care?
>                 // uprobe_consumer has changed sp, we can do nothing,
>                 // just return via iret.
>                 if (regs->sp != sp)
>                         return regs->ax;
>                 regs->sp -= sizeof(r11_cx_ax);
>
>                 // Q3: is it possible? do we care?
>                 // for the case uprobe_consumer has changed r11/cx
>                 r11_cx_ax[0] = regs->r11;
>                 r11_cx_ax[1] = regs->cx;
>
>                 // comment to explain this hack
>                 r11_cx_ax[2] = regs->ip;
>                 regs->ip = ip;
>
>                 err = copy_to_user((void __user*)regs->sp, r11_cx_ax, sizeof(r11_cx_ax));
>                 WARN_ON_ONCE(err);
>
>                 // ensure sysret, see do_syscall_64()
>                 regs->r11 = regs->flags;
>                 regs->cx  = regs->ip;
>
>                 return regs->ax;
>         }
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-20 17:44             ` Andrii Nakryiko
@ 2024-03-20 19:08               ` Jiri Olsa
  2024-03-21 10:10                 ` Oleg Nesterov
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2024-03-20 19:08 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Oleg Nesterov, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Wed, Mar 20, 2024 at 10:44:30AM -0700, Andrii Nakryiko wrote:
> On Wed, Mar 20, 2024 at 8:30 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 03/20, Oleg Nesterov wrote:
> > >
> > > On 03/20, Jiri Olsa wrote:
> > > >
> > > > are you ok if I squash the patches together
> > >
> > > Yes, thanks, I am fine.
> > >
> > > But lets discuss this change a bit more. So, with this poc we have the
> > > (intentionally) oversimplified
> > >
> > >       SYSCALL_DEFINE0(uretprobe)
> > >       {
> > >               struct pt_regs *regs = task_pt_regs(current);
> > >               unsigned long __user *ax_and_ret = (unsigned long __user *)regs->sp + 2;
> > >               unsigned long ip, err;
> > >
> > >               ip = regs->ip;
> > >               regs->orig_ax = -1;
> > >               err = get_user(regs->ax, ax_and_ret);
> > >               WARN_ON_ONCE(err);
> > >
> > >               uprobe_handle_trampoline(regs);
> > >
> > >               err = put_user(regs->ip, ax_and_ret);
> > >               WARN_ON_ONCE(err);
> > >               regs->ip = ip;
> > >
> > >               return regs->ax;
> > >       }
> > >
> > > I have no idea what uprobe consumers / bpf programs can do, so let me ask:
> > >
> > >       - uprobe_consumer's will see the "wrong" values of regs->cx/r11/sp
> > >         Is it OK? If not - easy to fix.
> > >
> > >       - can uprobe_consumer change regs->cx/r11 ? If yes - easy to fix.
> > >
> > >       - can uprobe_consumer change regs->sp ? If yes - easy to fix too,
> > >         but needs a separate check/code.
> >
> > IOW. If answer is "yes" to all the questions above, then we probably need
> > something like
> 
> yes to first, so ideally we fix registers to "correct" values
> (especially sp), but no to the last two (at least as far as BPF is
> concerned)

I think we should keep the same behaviour as it was for the trap,
so I think we should restore all registers and allow consumer to change it

jirka

> 
> >
> >         SYSCALL_DEFINE0(uretprobe)
> >         {
> >                 struct pt_regs *regs = task_pt_regs(current);
> >                 unsigned long err, ip, sp, r11_cx_ax[3];
> >
> >                 err = copy_from_user(r11_cx_ax, (void __user*)regs->sp, sizeof(r11_cx_ax));
> >                 WARN_ON_ONCE(err);
> >
> >                 // Q1: apart from ax, do we really care?
> >                 // expose the "right" values of r11/cx/ax/sp to uprobe_consumer's
> >                 regs->r11 = r11_cx_ax[0];
> >                 regs->cx  = r11_cx_ax[1];
> >                 regs->ax  = r11_cx_ax[2];
> >                 regs->sp += sizeof(r11_cx_ax);
> >                 regs->orig_ax = -1;
> >
> >                 ip = regs->ip;
> >                 sp = regs->sp;
> >
> >                 uprobe_handle_trampoline(regs);
> >
> >                 // Q2: is it possible? do we care?
> >                 // uprobe_consumer has changed sp, we can do nothing,
> >                 // just return via iret.
> >                 if (regs->sp != sp)
> >                         return regs->ax;
> >                 regs->sp -= sizeof(r11_cx_ax);
> >
> >                 // Q3: is it possible? do we care?
> >                 // for the case uprobe_consumer has changed r11/cx
> >                 r11_cx_ax[0] = regs->r11;
> >                 r11_cx_ax[1] = regs->cx;
> >
> >                 // comment to explain this hack
> >                 r11_cx_ax[2] = regs->ip;
> >                 regs->ip = ip;
> >
> >                 err = copy_to_user((void __user*)regs->sp, r11_cx_ax, sizeof(r11_cx_ax));
> >                 WARN_ON_ONCE(err);
> >
> >                 // ensure sysret, see do_syscall_64()
> >                 regs->r11 = regs->flags;
> >                 regs->cx  = regs->ip;
> >
> >                 return regs->ax;
> >         }
> >
> > Oleg.
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-20 15:28           ` Oleg Nesterov
  2024-03-20 17:44             ` Andrii Nakryiko
@ 2024-03-21  9:59             ` Jiri Olsa
  2024-03-21 10:17               ` Oleg Nesterov
  1 sibling, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2024-03-21  9:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	bpf, Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Wed, Mar 20, 2024 at 04:28:48PM +0100, Oleg Nesterov wrote:

SNIP

> 	SYSCALL_DEFINE0(uretprobe)
> 	{
> 		struct pt_regs *regs = task_pt_regs(current);
> 		unsigned long err, ip, sp, r11_cx_ax[3];
> 
> 		err = copy_from_user(r11_cx_ax, (void __user*)regs->sp, sizeof(r11_cx_ax));
> 		WARN_ON_ONCE(err);
> 
> 		// Q1: apart from ax, do we really care?
> 		// expose the "right" values of r11/cx/ax/sp to uprobe_consumer's
> 		regs->r11 = r11_cx_ax[0];
> 		regs->cx  = r11_cx_ax[1];
> 		regs->ax  = r11_cx_ax[2];
> 		regs->sp += sizeof(r11_cx_ax);
> 		regs->orig_ax = -1;
> 
> 		ip = regs->ip;
> 		sp = regs->sp;
> 
> 		uprobe_handle_trampoline(regs);
> 
> 		// Q2: is it possible? do we care?
> 		// uprobe_consumer has changed sp, we can do nothing,
> 		// just return via iret.
> 		if (regs->sp != sp)
> 			return regs->ax;
> 		regs->sp -= sizeof(r11_cx_ax);
> 
> 		// Q3: is it possible? do we care?
> 		// for the case uprobe_consumer has changed r11/cx
> 		r11_cx_ax[0] = regs->r11;
> 		r11_cx_ax[1] = regs->cx;

I wonder we could add test for this as well, and check we return
proper register values in case the consuer changed them, will check

> 
> 		// comment to explain this hack
> 		r11_cx_ax[2] = regs->ip;
> 		regs->ip = ip;

we still need restore regs->ip in case do_syscall_64 decides to do
iret for some reason, right?

overall lgtm, thanks

jirka

> 
> 		err = copy_to_user((void __user*)regs->sp, r11_cx_ax, sizeof(r11_cx_ax));
> 		WARN_ON_ONCE(err);
> 
> 		// ensure sysret, see do_syscall_64()
> 		regs->r11 = regs->flags;
> 		regs->cx  = regs->ip;
> 
> 		return regs->ax;
> 	}
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-20 19:08               ` Jiri Olsa
@ 2024-03-21 10:10                 ` Oleg Nesterov
  0 siblings, 0 replies; 33+ messages in thread
From: Oleg Nesterov @ 2024-03-21 10:10 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov (AMD),
	x86

On 03/20, Jiri Olsa wrote:
>
> On Wed, Mar 20, 2024 at 10:44:30AM -0700, Andrii Nakryiko wrote:
> > On Wed, Mar 20, 2024 at 8:30 AM Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > > I have no idea what uprobe consumers / bpf programs can do, so let me ask:
> > > >
> > > >       - uprobe_consumer's will see the "wrong" values of regs->cx/r11/sp
> > > >         Is it OK? If not - easy to fix.
> > > >
> > > >       - can uprobe_consumer change regs->cx/r11 ? If yes - easy to fix.
> > > >
> > > >       - can uprobe_consumer change regs->sp ? If yes - easy to fix too,
> > > >         but needs a separate check/code.
> > >
> > > IOW. If answer is "yes" to all the questions above, then we probably need
> > > something like
> >
> > yes to first, so ideally we fix registers to "correct" values
> > (especially sp), but no to the last two (at least as far as BPF is
> > concerned)
>
> I think we should keep the same behaviour as it was for the trap,
> so I think we should restore all registers and allow consumer to change it

OK, agreed. Then something like the code below.

Oleg.

> > >         SYSCALL_DEFINE0(uretprobe)
> > >         {
> > >                 struct pt_regs *regs = task_pt_regs(current);
> > >                 unsigned long err, ip, sp, r11_cx_ax[3];
> > >
> > >                 err = copy_from_user(r11_cx_ax, (void __user*)regs->sp, sizeof(r11_cx_ax));
> > >                 WARN_ON_ONCE(err);
> > >
> > >                 // Q1: apart from ax, do we really care?
> > >                 // expose the "right" values of r11/cx/ax/sp to uprobe_consumer's
> > >                 regs->r11 = r11_cx_ax[0];
> > >                 regs->cx  = r11_cx_ax[1];
> > >                 regs->ax  = r11_cx_ax[2];
> > >                 regs->sp += sizeof(r11_cx_ax);
> > >                 regs->orig_ax = -1;
> > >
> > >                 ip = regs->ip;
> > >                 sp = regs->sp;
> > >
> > >                 uprobe_handle_trampoline(regs);
> > >
> > >                 // Q2: is it possible? do we care?
> > >                 // uprobe_consumer has changed sp, we can do nothing,
> > >                 // just return via iret.
> > >                 if (regs->sp != sp)
> > >                         return regs->ax;
> > >                 regs->sp -= sizeof(r11_cx_ax);
> > >
> > >                 // Q3: is it possible? do we care?
> > >                 // for the case uprobe_consumer has changed r11/cx
> > >                 r11_cx_ax[0] = regs->r11;
> > >                 r11_cx_ax[1] = regs->cx;
> > >
> > >                 // comment to explain this hack
> > >                 r11_cx_ax[2] = regs->ip;
> > >                 regs->ip = ip;
> > >
> > >                 err = copy_to_user((void __user*)regs->sp, r11_cx_ax, sizeof(r11_cx_ax));
> > >                 WARN_ON_ONCE(err);
> > >
> > >                 // ensure sysret, see do_syscall_64()
> > >                 regs->r11 = regs->flags;
> > >                 regs->cx  = regs->ip;
> > >
> > >                 return regs->ax;
> > >         }
> > >
> > > Oleg.
> > >
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-21  9:59             ` Jiri Olsa
@ 2024-03-21 10:17               ` Oleg Nesterov
  2024-03-21 10:52                 ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2024-03-21 10:17 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On 03/21, Jiri Olsa wrote:
>
> On Wed, Mar 20, 2024 at 04:28:48PM +0100, Oleg Nesterov wrote:
>
> SNIP
>
> > 	SYSCALL_DEFINE0(uretprobe)
> > 	{
> > 		struct pt_regs *regs = task_pt_regs(current);
> > 		unsigned long err, ip, sp, r11_cx_ax[3];
> >
> > 		err = copy_from_user(r11_cx_ax, (void __user*)regs->sp, sizeof(r11_cx_ax));
> > 		WARN_ON_ONCE(err);
> >
> > 		// Q1: apart from ax, do we really care?
> > 		// expose the "right" values of r11/cx/ax/sp to uprobe_consumer's
> > 		regs->r11 = r11_cx_ax[0];
> > 		regs->cx  = r11_cx_ax[1];
> > 		regs->ax  = r11_cx_ax[2];
> > 		regs->sp += sizeof(r11_cx_ax);
> > 		regs->orig_ax = -1;
> >
> > 		ip = regs->ip;
> > 		sp = regs->sp;
> >
> > 		uprobe_handle_trampoline(regs);
> >
> > 		// Q2: is it possible? do we care?
> > 		// uprobe_consumer has changed sp, we can do nothing,
> > 		// just return via iret.
> > 		if (regs->sp != sp)
> > 			return regs->ax;
> > 		regs->sp -= sizeof(r11_cx_ax);
> >
> > 		// Q3: is it possible? do we care?
> > 		// for the case uprobe_consumer has changed r11/cx
> > 		r11_cx_ax[0] = regs->r11;
> > 		r11_cx_ax[1] = regs->cx;
>
> I wonder we could add test for this as well, and check we return
> proper register values in case the consuer changed them, will check
>
> >
> > 		// comment to explain this hack
> > 		r11_cx_ax[2] = regs->ip;
> > 		regs->ip = ip;
>
> we still need restore regs->ip in case do_syscall_64 decides to do
> iret for some reason, right?

I don't understand... could you spell?

AFAICS everything should work correctly even if do_syscall_64() returns F
and entry_SYSCALL_64() returns via iret. No?

> overall lgtm, thanks

OK, great, feel free to update this code according to your preferences and
use it in V2.

Oleg.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-21 10:17               ` Oleg Nesterov
@ 2024-03-21 10:52                 ` Jiri Olsa
  2024-03-21 12:14                   ` Oleg Nesterov
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2024-03-21 10:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	bpf, Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Thu, Mar 21, 2024 at 11:17:51AM +0100, Oleg Nesterov wrote:
> On 03/21, Jiri Olsa wrote:
> >
> > On Wed, Mar 20, 2024 at 04:28:48PM +0100, Oleg Nesterov wrote:
> >
> > SNIP
> >
> > > 	SYSCALL_DEFINE0(uretprobe)
> > > 	{
> > > 		struct pt_regs *regs = task_pt_regs(current);
> > > 		unsigned long err, ip, sp, r11_cx_ax[3];
> > >
> > > 		err = copy_from_user(r11_cx_ax, (void __user*)regs->sp, sizeof(r11_cx_ax));
> > > 		WARN_ON_ONCE(err);
> > >
> > > 		// Q1: apart from ax, do we really care?
> > > 		// expose the "right" values of r11/cx/ax/sp to uprobe_consumer's
> > > 		regs->r11 = r11_cx_ax[0];
> > > 		regs->cx  = r11_cx_ax[1];
> > > 		regs->ax  = r11_cx_ax[2];
> > > 		regs->sp += sizeof(r11_cx_ax);
> > > 		regs->orig_ax = -1;
> > >
> > > 		ip = regs->ip;
> > > 		sp = regs->sp;
> > >
> > > 		uprobe_handle_trampoline(regs);
> > >
> > > 		// Q2: is it possible? do we care?
> > > 		// uprobe_consumer has changed sp, we can do nothing,
> > > 		// just return via iret.
> > > 		if (regs->sp != sp)
> > > 			return regs->ax;
> > > 		regs->sp -= sizeof(r11_cx_ax);
> > >
> > > 		// Q3: is it possible? do we care?
> > > 		// for the case uprobe_consumer has changed r11/cx
> > > 		r11_cx_ax[0] = regs->r11;
> > > 		r11_cx_ax[1] = regs->cx;
> >
> > I wonder we could add test for this as well, and check we return
> > proper register values in case the consuer changed them, will check
> >
> > >
> > > 		// comment to explain this hack
> > > 		r11_cx_ax[2] = regs->ip;
> > > 		regs->ip = ip;
> >
> > we still need restore regs->ip in case do_syscall_64 decides to do
> > iret for some reason, right?
> 
> I don't understand... could you spell?

I was wondering why to restore regs->ip for sysret path, but do_syscall_64
can decide to do iret return (for which we need proper regs->ip) even if we
prepare cx/r11 registers for sysexit

>
> AFAICS everything should work correctly even if do_syscall_64() returns F
> and entry_SYSCALL_64() returns via iret. No?
> 
> > overall lgtm, thanks
> 
> OK, great, feel free to update this code according to your preferences and
> use it in V2.

will do, thanks

jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-21 10:52                 ` Jiri Olsa
@ 2024-03-21 12:14                   ` Oleg Nesterov
  2024-03-21 20:29                     ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2024-03-21 12:14 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On 03/21, Jiri Olsa wrote:
>
> On Thu, Mar 21, 2024 at 11:17:51AM +0100, Oleg Nesterov wrote:
> > On 03/21, Jiri Olsa wrote:
> > >
> > > On Wed, Mar 20, 2024 at 04:28:48PM +0100, Oleg Nesterov wrote:
> > >
> > > SNIP
> > >
> > > > 	SYSCALL_DEFINE0(uretprobe)
> > > > 	{
> > > > 		struct pt_regs *regs = task_pt_regs(current);
> > > > 		unsigned long err, ip, sp, r11_cx_ax[3];
> > > >
> > > > 		err = copy_from_user(r11_cx_ax, (void __user*)regs->sp, sizeof(r11_cx_ax));
> > > > 		WARN_ON_ONCE(err);
> > > >
> > > > 		// Q1: apart from ax, do we really care?
> > > > 		// expose the "right" values of r11/cx/ax/sp to uprobe_consumer's
> > > > 		regs->r11 = r11_cx_ax[0];
> > > > 		regs->cx  = r11_cx_ax[1];
> > > > 		regs->ax  = r11_cx_ax[2];
> > > > 		regs->sp += sizeof(r11_cx_ax);
> > > > 		regs->orig_ax = -1;
> > > >
> > > > 		ip = regs->ip;
> > > > 		sp = regs->sp;
> > > >
> > > > 		uprobe_handle_trampoline(regs);
> > > >
> > > > 		// Q2: is it possible? do we care?
> > > > 		// uprobe_consumer has changed sp, we can do nothing,
> > > > 		// just return via iret.
> > > > 		if (regs->sp != sp)
> > > > 			return regs->ax;
> > > > 		regs->sp -= sizeof(r11_cx_ax);
> > > >
> > > > 		// Q3: is it possible? do we care?
> > > > 		// for the case uprobe_consumer has changed r11/cx
> > > > 		r11_cx_ax[0] = regs->r11;
> > > > 		r11_cx_ax[1] = regs->cx;
> > >
> > > I wonder we could add test for this as well, and check we return
> > > proper register values in case the consuer changed them, will check
> > >
> > > >
> > > > 		// comment to explain this hack
> > > > 		r11_cx_ax[2] = regs->ip;
> > > > 		regs->ip = ip;
> > >
> > > we still need restore regs->ip in case do_syscall_64 decides to do
> > > iret for some reason, right?
> >
> > I don't understand... could you spell?
>
> I was wondering why to restore regs->ip for sysret path, but do_syscall_64
> can decide to do iret return (for which we need proper regs->ip) even if we
> prepare cx/r11 registers for sysexit

Still don't understand... Yes, we prepare cx/r11 to avoid iret if possible.
But (apart from performance) we do not care if do_syscall_64() picks iret.
Either way

			regs->ip = ip;

above ensures that usermode returns to uretprobe_syscall_entry right after
the syscall insn. Then popq %r11/cx will restore r11/cx even if they were
changed by uprobe_consumer's. And then "retq" will return to the address
"returned" by handle_trampoline(regs) because we do

			// comment to explain this hack
			r11_cx_ax[2] = regs->ip;

after handle_trampoline(). This all doesn't depend on iret-or-sysret.

OK, I am sure you understand this, so I guess I misunderstood your concerns.

Oleg.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-21 12:14                   ` Oleg Nesterov
@ 2024-03-21 20:29                     ` Jiri Olsa
  2024-03-22  8:48                       ` Oleg Nesterov
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2024-03-21 20:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	bpf, Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On Thu, Mar 21, 2024 at 01:14:56PM +0100, Oleg Nesterov wrote:
> On 03/21, Jiri Olsa wrote:
> >
> > On Thu, Mar 21, 2024 at 11:17:51AM +0100, Oleg Nesterov wrote:
> > > On 03/21, Jiri Olsa wrote:
> > > >
> > > > On Wed, Mar 20, 2024 at 04:28:48PM +0100, Oleg Nesterov wrote:
> > > >
> > > > SNIP
> > > >
> > > > > 	SYSCALL_DEFINE0(uretprobe)
> > > > > 	{
> > > > > 		struct pt_regs *regs = task_pt_regs(current);
> > > > > 		unsigned long err, ip, sp, r11_cx_ax[3];
> > > > >
> > > > > 		err = copy_from_user(r11_cx_ax, (void __user*)regs->sp, sizeof(r11_cx_ax));
> > > > > 		WARN_ON_ONCE(err);
> > > > >
> > > > > 		// Q1: apart from ax, do we really care?
> > > > > 		// expose the "right" values of r11/cx/ax/sp to uprobe_consumer's
> > > > > 		regs->r11 = r11_cx_ax[0];
> > > > > 		regs->cx  = r11_cx_ax[1];
> > > > > 		regs->ax  = r11_cx_ax[2];
> > > > > 		regs->sp += sizeof(r11_cx_ax);
> > > > > 		regs->orig_ax = -1;
> > > > >
> > > > > 		ip = regs->ip;
> > > > > 		sp = regs->sp;
> > > > >
> > > > > 		uprobe_handle_trampoline(regs);
> > > > >
> > > > > 		// Q2: is it possible? do we care?
> > > > > 		// uprobe_consumer has changed sp, we can do nothing,
> > > > > 		// just return via iret.
> > > > > 		if (regs->sp != sp)
> > > > > 			return regs->ax;
> > > > > 		regs->sp -= sizeof(r11_cx_ax);
> > > > >
> > > > > 		// Q3: is it possible? do we care?
> > > > > 		// for the case uprobe_consumer has changed r11/cx
> > > > > 		r11_cx_ax[0] = regs->r11;
> > > > > 		r11_cx_ax[1] = regs->cx;
> > > >
> > > > I wonder we could add test for this as well, and check we return
> > > > proper register values in case the consuer changed them, will check
> > > >
> > > > >
> > > > > 		// comment to explain this hack
> > > > > 		r11_cx_ax[2] = regs->ip;
> > > > > 		regs->ip = ip;
> > > >
> > > > we still need restore regs->ip in case do_syscall_64 decides to do
> > > > iret for some reason, right?
> > >
> > > I don't understand... could you spell?
> >
> > I was wondering why to restore regs->ip for sysret path, but do_syscall_64
> > can decide to do iret return (for which we need proper regs->ip) even if we
> > prepare cx/r11 registers for sysexit
> 
> Still don't understand... Yes, we prepare cx/r11 to avoid iret if possible.
> But (apart from performance) we do not care if do_syscall_64() picks iret.
> Either way
> 
> 			regs->ip = ip;
> 
> above ensures that usermode returns to uretprobe_syscall_entry right after
> the syscall insn. 

hm, I think above ensures that do_syscall_64 will skip the 'regs->cx != regs->ip'
check.. and after the sysret returns to rcx register value and ignores regs->ip

but in any case we need to set it

> ... Then popq %r11/cx will restore r11/cx even if they were
> changed by uprobe_consumer's. And then "retq" will return to the address
> "returned" by handle_trampoline(regs) because we do
> 
> 			// comment to explain this hack
> 			r11_cx_ax[2] = regs->ip;
> 
> after handle_trampoline(). This all doesn't depend on iret-or-sysret.
> 
> OK, I am sure you understand this, so I guess I misunderstood your concerns.

thanks for the patience ;-)

jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret
  2024-03-21 20:29                     ` Jiri Olsa
@ 2024-03-22  8:48                       ` Oleg Nesterov
  0 siblings, 0 replies; 33+ messages in thread
From: Oleg Nesterov @ 2024-03-22  8:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	x86

On 03/21, Jiri Olsa wrote:
>
> On Thu, Mar 21, 2024 at 01:14:56PM +0100, Oleg Nesterov wrote:
> > Either way
> >
> > 			regs->ip = ip;
> >
> > above ensures that usermode returns to uretprobe_syscall_entry right after
> > the syscall insn.
>
> hm, I think above ensures that do_syscall_64 will skip the 'regs->cx != regs->ip'
> check.. and after the sysret returns to rcx register value and ignores regs->ip
                                                                 ^^^^^^^^^^^^^^^^

Yes, that is why do_syscall_64() returns F if regs->cx != regs->ip.

IOW. To oversimplify, the logic is

	// %rcx == ret ip
	entry_SYSCALL_64:

		pt_regs->ip = %rcx;
		pt_regs->cx = %rcx;

		do-syscall;

		%rcx = pt_regs->cx; // POP_REGS

		if (%rcx == pt_regs->ip) {
			// OK, we can use sysret which returns to rcx
			sysret;
		} else {
			// debugger/whatever changed rcx or ip, can't use sysret.
			// return to pt_regs->ip, see the "Return frame for iretq"
			// comment in struct pt_regs.
			iret;
		}

So return-to-usermode always returns to regs->ip and restores all the registers
from pt_regs, just it can be faster if we can use sysret.

> but in any case we need to set it

Yes, yes, sure.

Oleg.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC bpf-next 3/3] selftests/bpf: Mark uprobe trigger functions with nocf_check attribute
  2024-03-19 11:11     ` Jiri Olsa
@ 2024-03-22 13:40       ` Jiri Olsa
  0 siblings, 0 replies; 33+ messages in thread
From: Jiri Olsa @ 2024-03-22 13:40 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Oleg Nesterov, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, bpf, Song Liu, Yonghong Song,
	John Fastabend, Peter Zijlstra, Thomas Gleixner,
	Borislav Petkov (AMD),
	x86

On Tue, Mar 19, 2024 at 12:11:06PM +0100, Jiri Olsa wrote:
> On Mon, Mar 18, 2024 at 06:22:02PM -0700, Andrii Nakryiko wrote:
> > On Mon, Mar 18, 2024 at 2:32 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > >
> > > Some distros seem to enable the -fcf-protection=branch by default,
> > > which breaks our setup on first instruction of uprobe trigger
> > > functions and place there endbr64 instruction.
> > >
> > > Marking them with nocf_check attribute to skip that.
> > >
> > > Adding -Wno-attributes for bench objects, becase nocf_check can
> > > be used only when -fcf-protection=branch is enabled, otherwise
> > > we get a warning and break compilation.
> > >
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > >  tools/include/linux/compiler.h                     | 4 ++++
> > >  tools/testing/selftests/bpf/Makefile               | 2 +-
> > >  tools/testing/selftests/bpf/benchs/bench_trigger.c | 6 +++---
> > >  3 files changed, 8 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h
> > > index 7b65566f3e42..14038ce04ca4 100644
> > > --- a/tools/include/linux/compiler.h
> > > +++ b/tools/include/linux/compiler.h
> > > @@ -58,6 +58,10 @@
> > >  #define noinline
> > >  #endif
> > >
> > > +#ifndef __nocfcheck
> > > +#define __nocfcheck __attribute__((nocf_check))
> > > +#endif
> > 
> > Let's preserve spelling of the attribut, __nocf_check ?
> > 
> > BTW, just FYI, seems like kernel is defining it as:
> > 
> > #define __noendbr    __attribute__((nocf_check))
> > 
> > Thought somewhere deep in x86-specific code, so probably not a good
> > idea to use it here?
> 
> ugh, I missed it.. better to use __noendbr

nah, I'll keep using __nocf_check, __noendbr is bery arch
specific as you said

jirka

> 
> > 
> > > +
> > >  /* Are two types/vars the same type (ignoring qualifiers)? */
> > >  #ifndef __same_type
> > >  # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
> > > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > > index e425a946276b..506d3d592093 100644
> > > --- a/tools/testing/selftests/bpf/Makefile
> > > +++ b/tools/testing/selftests/bpf/Makefile
> > > @@ -726,7 +726,7 @@ $(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ)
> > >  # Benchmark runner
> > >  $(OUTPUT)/bench_%.o: benchs/bench_%.c bench.h $(BPFOBJ)
> > >         $(call msg,CC,,$@)
> > > -       $(Q)$(CC) $(CFLAGS) -O2 -c $(filter %.c,$^) $(LDLIBS) -o $@
> > > +       $(Q)$(CC) $(CFLAGS) -O2 -Wno-attributes -c $(filter %.c,$^) $(LDLIBS) -o $@
> > 
> > let's better use `#pragma warning disable` in relevant .c files,
> > instead of this global flag?
> 
> ok, will try that
> 
> thanks,
> jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-03-22 13:40 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-18  9:31 [PATCH RFC bpf-next 0/3] uprobe: uretprobe speed up Jiri Olsa
2024-03-18  9:31 ` [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe Jiri Olsa
2024-03-18 14:22   ` Oleg Nesterov
2024-03-19  1:11   ` Andrii Nakryiko
2024-03-19  6:32     ` Oleg Nesterov
2024-03-19 16:20       ` Andrii Nakryiko
2024-03-19 10:54     ` Jiri Olsa
2024-03-18  9:31 ` [PATCH RFC bpf-next 2/3] selftests/bpf: Add uretprobe syscall test Jiri Olsa
2024-03-19  1:16   ` Andrii Nakryiko
2024-03-19 11:09     ` Jiri Olsa
2024-03-18  9:31 ` [PATCH RFC bpf-next 3/3] selftests/bpf: Mark uprobe trigger functions with nocf_check attribute Jiri Olsa
2024-03-19  1:22   ` Andrii Nakryiko
2024-03-19 11:11     ` Jiri Olsa
2024-03-22 13:40       ` Jiri Olsa
2024-03-19 10:25 ` [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret Oleg Nesterov
2024-03-19 11:08   ` Jiri Olsa
2024-03-19 16:25     ` Andrii Nakryiko
2024-03-19 16:38       ` Oleg Nesterov
2024-03-19 19:35       ` Jiri Olsa
2024-03-19 19:31     ` Jiri Olsa
2024-03-19 20:13       ` Andrii Nakryiko
2024-03-20 11:04       ` Jiri Olsa
2024-03-20 14:37         ` Oleg Nesterov
2024-03-20 15:28           ` Oleg Nesterov
2024-03-20 17:44             ` Andrii Nakryiko
2024-03-20 19:08               ` Jiri Olsa
2024-03-21 10:10                 ` Oleg Nesterov
2024-03-21  9:59             ` Jiri Olsa
2024-03-21 10:17               ` Oleg Nesterov
2024-03-21 10:52                 ` Jiri Olsa
2024-03-21 12:14                   ` Oleg Nesterov
2024-03-21 20:29                     ` Jiri Olsa
2024-03-22  8:48                       ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.