linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86
@ 2021-06-18  7:05 Masami Hiramatsu
  2021-06-18  7:05 ` [PATCH -tip v8 01/13] ia64: kprobes: Fix to pass correct trampoline address to the handler Masami Hiramatsu
                   ` (14 more replies)
  0 siblings, 15 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:05 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

Hello,

Here is the 8th version of the series to fix the stacktrace with kretprobe on x86.

The previous version is;

 https://lore.kernel.org/bpf/162209754288.436794.3904335049560916855.stgit@devnote2/

This version fixes to call appropriate function and drop some unneeded
patches.


Changes from v7:
[03/13]: Call dereference_kernel_function_descriptor() for getting the
  address of kretprobe_trampoline.
[09/13]: Update the title and description to explain why it is needed.
[10/13][11/13]: Add Josh's Acked-by.



With this series, unwinder can unwind stack correctly from ftrace as below;

  # cd /sys/kernel/debug/tracing
  # echo > trace
  # echo 1 > options/sym-offset
  # echo r vfs_read >> kprobe_events
  # echo r full_proxy_read >> kprobe_events
  # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
  # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
  # echo 1 > events/kprobes/enable
  # cat /sys/kernel/debug/kprobes/list
ffffffff8133b740  r  full_proxy_read+0x0    [FTRACE]
ffffffff812560b0  r  vfs_read+0x0    [FTRACE]
  # echo 0 > events/kprobes/enable
  # cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 3/3   #P:8
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| /     delay
#           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
#              | |         |   ||||      |         |
           <...>-134     [007] ...1    16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
           <...>-134     [007] ...1    16.185901: <stack trace>
 => kretprobe_trace_func+0x209/0x300
 => kretprobe_dispatcher+0x4a/0x70
 => __kretprobe_trampoline_handler+0xd4/0x170
 => trampoline_handler+0x43/0x60
 => kretprobe_trampoline+0x2a/0x50
 => vfs_read+0x98/0x180
 => ksys_read+0x5f/0xe0
 => do_syscall_64+0x37/0x90
 => entry_SYSCALL_64_after_hwframe+0x44/0xae
           <...>-134     [007] ...1    16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)

This shows the double return probes (vfs_read and full_proxy_read) on the stack
correctly unwinded. (vfs_read will return to ksys_read+0x5f and full_proxy_read
will return to vfs_read+0x98)

This actually changes the kretprobe behavisor a bit, now the instraction pointer in
the pt_regs passed to kretprobe user handler is correctly set the real return
address. So user handlers can get it via instruction_pointer() API, and can use
stack_trace_save_regs().

You can also get this series from 
 git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v8


Thank you,

---

Josh Poimboeuf (1):
      x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code

Masami Hiramatsu (12):
      ia64: kprobes: Fix to pass correct trampoline address to the handler
      kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
      kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
      kprobes: Add kretprobe_find_ret_addr() for searching return address
      ARC: Add instruction_pointer_set() API
      ia64: Add instruction_pointer_set() API
      arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline
      kprobes: Enable stacktrace from pt_regs in kretprobe handler
      x86/kprobes: Push a fake return address at kretprobe_trampoline
      x86/unwind: Recover kretprobe trampoline entry
      tracing: Show kretprobe unknown indicator only for kretprobe_trampoline
      x86/kprobes: Fixup return address in generic trampoline handler


 arch/arc/include/asm/ptrace.h       |    5 ++
 arch/arc/kernel/kprobes.c           |    2 -
 arch/arm/probes/kprobes/core.c      |    5 +-
 arch/arm64/kernel/probes/kprobes.c  |    3 -
 arch/csky/kernel/probes/kprobes.c   |    2 -
 arch/ia64/include/asm/ptrace.h      |    5 ++
 arch/ia64/kernel/kprobes.c          |   15 ++---
 arch/mips/kernel/kprobes.c          |    3 -
 arch/parisc/kernel/kprobes.c        |    4 +
 arch/powerpc/kernel/kprobes.c       |   13 ----
 arch/riscv/kernel/probes/kprobes.c  |    2 -
 arch/s390/kernel/kprobes.c          |    2 -
 arch/sh/kernel/kprobes.c            |    2 -
 arch/sparc/kernel/kprobes.c         |    2 -
 arch/x86/include/asm/kprobes.h      |    1 
 arch/x86/include/asm/unwind.h       |   23 +++++++
 arch/x86/include/asm/unwind_hints.h |    5 ++
 arch/x86/kernel/kprobes/core.c      |   53 +++++++++++++++--
 arch/x86/kernel/unwind_frame.c      |    3 -
 arch/x86/kernel/unwind_guess.c      |    3 -
 arch/x86/kernel/unwind_orc.c        |   18 +++++-
 include/linux/kprobes.h             |   44 ++++++++++++--
 kernel/kprobes.c                    |  108 +++++++++++++++++++++++++----------
 kernel/trace/trace_output.c         |   17 +-----
 lib/error-inject.c                  |    3 +
 25 files changed, 238 insertions(+), 105 deletions(-)

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 01/13] ia64: kprobes: Fix to pass correct trampoline address to the handler
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
@ 2021-06-18  7:05 ` Masami Hiramatsu
  2021-07-05  7:46   ` Ingo Molnar
  2021-06-18  7:05 ` [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor() Masami Hiramatsu
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:05 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

Commit e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
missed to pass the wrong trampoline address (it passes the descriptor address
instead of function entry address).
This fixes it to pass correct trampoline address to __kretprobe_trampoline_handler().
This also changes to use correct symbol dereference function to get the
function address from the kretprobe_trampoline.

Fixes: e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes in v5:
  - Fix a compile error typo.
---
 arch/ia64/kernel/kprobes.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index 441ed04b1037..d4048518a1d7 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -398,7 +398,8 @@ static void kretprobe_trampoline(void)
 
 int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	regs->cr_iip = __kretprobe_trampoline_handler(regs, kretprobe_trampoline, NULL);
+	regs->cr_iip = __kretprobe_trampoline_handler(regs,
+		dereference_function_descriptor(kretprobe_trampoline), NULL);
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler
@@ -414,7 +415,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 	ri->fp = NULL;
 
 	/* Replace the return addr with trampoline addr */
-	regs->b0 = ((struct fnptr *)kretprobe_trampoline)->ip;
+	regs->b0 = (unsigned long)dereference_function_descriptor(kretprobe_trampoline);
 }
 
 /* Check the instruction in the slot is break */
@@ -902,14 +903,14 @@ static struct kprobe trampoline_p = {
 int __init arch_init_kprobes(void)
 {
 	trampoline_p.addr =
-		(kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip;
+		dereference_function_descriptor(kretprobe_trampoline);
 	return register_kprobe(&trampoline_p);
 }
 
 int __kprobes arch_trampoline_kprobe(struct kprobe *p)
 {
 	if (p->addr ==
-		(kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip)
+		dereference_function_descriptor(kretprobe_trampoline))
 		return 1;
 
 	return 0;


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
  2021-06-18  7:05 ` [PATCH -tip v8 01/13] ia64: kprobes: Fix to pass correct trampoline address to the handler Masami Hiramatsu
@ 2021-06-18  7:05 ` Masami Hiramatsu
  2021-07-05  7:48   ` Ingo Molnar
  2021-07-07 18:28   ` Andrii Nakryiko
  2021-06-18  7:05 ` [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler() Masami Hiramatsu
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:05 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

Replace arch_deref_entry_point() with dereference_symbol_descriptor()
because those are doing same thing.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryik <andrii@kernel.org>
---
 Changes in v6:
  - Use dereference_symbol_descriptor() so that it can handle address in
    modules correctly.
---
 arch/ia64/kernel/kprobes.c    |    5 -----
 arch/powerpc/kernel/kprobes.c |   11 -----------
 include/linux/kprobes.h       |    1 -
 kernel/kprobes.c              |    7 +------
 lib/error-inject.c            |    3 ++-
 5 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index d4048518a1d7..0f8573bbf520 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -891,11 +891,6 @@ int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
 	return ret;
 }
 
-unsigned long arch_deref_entry_point(void *entry)
-{
-	return ((struct fnptr *)entry)->ip;
-}
-
 static struct kprobe trampoline_p = {
 	.pre_handler = trampoline_probe_handler
 };
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index c64a5feaebbe..24472f2c2cfc 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -522,17 +522,6 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 }
 NOKPROBE_SYMBOL(kprobe_fault_handler);
 
-unsigned long arch_deref_entry_point(void *entry)
-{
-#ifdef PPC64_ELF_ABI_v1
-	if (!kernel_text_address((unsigned long)entry))
-		return ppc_global_function_entry(entry);
-	else
-#endif
-		return (unsigned long)entry;
-}
-NOKPROBE_SYMBOL(arch_deref_entry_point);
-
 static struct kprobe trampoline_p = {
 	.addr = (kprobe_opcode_t *) &kretprobe_trampoline,
 	.pre_handler = trampoline_probe_handler
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 523ffc7bc3a8..713c3a683011 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -382,7 +382,6 @@ int register_kprobe(struct kprobe *p);
 void unregister_kprobe(struct kprobe *p);
 int register_kprobes(struct kprobe **kps, int num);
 void unregister_kprobes(struct kprobe **kps, int num);
-unsigned long arch_deref_entry_point(void *);
 
 int register_kretprobe(struct kretprobe *rp);
 void unregister_kretprobe(struct kretprobe *rp);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index e41385afe79d..f8fe9d077b41 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1838,11 +1838,6 @@ static struct notifier_block kprobe_exceptions_nb = {
 	.priority = 0x7fffffff /* we need to be notified first */
 };
 
-unsigned long __weak arch_deref_entry_point(void *entry)
-{
-	return (unsigned long)entry;
-}
-
 #ifdef CONFIG_KRETPROBES
 
 unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
@@ -2305,7 +2300,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
 	int ret;
 
 	for (iter = start; iter < end; iter++) {
-		entry = arch_deref_entry_point((void *)*iter);
+		entry = (unsigned long)dereference_symbol_descriptor((void *)*iter);
 		ret = kprobe_add_ksym_blacklist(entry);
 		if (ret == -EINVAL)
 			continue;
diff --git a/lib/error-inject.c b/lib/error-inject.c
index c73651b15b76..2ff5ef689d72 100644
--- a/lib/error-inject.c
+++ b/lib/error-inject.c
@@ -8,6 +8,7 @@
 #include <linux/mutex.h>
 #include <linux/list.h>
 #include <linux/slab.h>
+#include <asm/sections.h>
 
 /* Whitelist of symbols that can be overridden for error injection. */
 static LIST_HEAD(error_injection_list);
@@ -64,7 +65,7 @@ static void populate_error_injection_list(struct error_injection_entry *start,
 
 	mutex_lock(&ei_mutex);
 	for (iter = start; iter < end; iter++) {
-		entry = arch_deref_entry_point((void *)iter->addr);
+		entry = (unsigned long)dereference_symbol_descriptor((void *)iter->addr);
 
 		if (!kernel_text_address(entry) ||
 		    !kallsyms_lookup_size_offset(entry, &size, &offset)) {


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
  2021-06-18  7:05 ` [PATCH -tip v8 01/13] ia64: kprobes: Fix to pass correct trampoline address to the handler Masami Hiramatsu
  2021-06-18  7:05 ` [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor() Masami Hiramatsu
@ 2021-06-18  7:05 ` Masami Hiramatsu
  2021-07-05  7:03   ` Ingo Molnar
  2021-07-05  7:49   ` Ingo Molnar
  2021-06-18  7:05 ` [PATCH -tip v8 04/13] kprobes: Add kretprobe_find_ret_addr() for searching return address Masami Hiramatsu
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:05 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

Remove trampoline_address from kretprobe_trampoline_handler().
Instead of passing the address, kretprobe_trampoline_handler()
can use new kretprobe_trampoline_addr().

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryik <andrii@kernel.org>
---
 Changes in v8:
   - Use dereference_kernel_function_descriptor() to
     get the kretprobe_trampoline address.
 Changes in v3:
   - Remove wrong kretprobe_trampoline declaration from
     arch/x86/include/asm/kprobes.h.
 Changes in v2:
   - Remove arch_deref_entry_point() from comment.
---
 arch/arc/kernel/kprobes.c          |    2 +-
 arch/arm/probes/kprobes/core.c     |    3 +--
 arch/arm64/kernel/probes/kprobes.c |    3 +--
 arch/csky/kernel/probes/kprobes.c  |    2 +-
 arch/ia64/kernel/kprobes.c         |    5 ++---
 arch/mips/kernel/kprobes.c         |    3 +--
 arch/parisc/kernel/kprobes.c       |    4 ++--
 arch/powerpc/kernel/kprobes.c      |    2 +-
 arch/riscv/kernel/probes/kprobes.c |    2 +-
 arch/s390/kernel/kprobes.c         |    2 +-
 arch/sh/kernel/kprobes.c           |    2 +-
 arch/sparc/kernel/kprobes.c        |    2 +-
 arch/x86/include/asm/kprobes.h     |    1 -
 arch/x86/kernel/kprobes/core.c     |    2 +-
 include/linux/kprobes.h            |   18 +++++++++++++-----
 kernel/kprobes.c                   |    3 +--
 16 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/arch/arc/kernel/kprobes.c b/arch/arc/kernel/kprobes.c
index 5f0415fc7328..3cee75c87f97 100644
--- a/arch/arc/kernel/kprobes.c
+++ b/arch/arc/kernel/kprobes.c
@@ -381,7 +381,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 static int __kprobes trampoline_probe_handler(struct kprobe *p,
 					      struct pt_regs *regs)
 {
-	regs->ret = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	regs->ret = __kretprobe_trampoline_handler(regs, NULL);
 
 	/* By returning a non zero value, we are telling the kprobe handler
 	 * that we don't want the post_handler to run
diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index 27e0af78e88b..583f6b1a2a6f 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -390,8 +390,7 @@ void __naked __kprobes kretprobe_trampoline(void)
 /* Called from kretprobe_trampoline */
 static __used __kprobes void *trampoline_handler(struct pt_regs *regs)
 {
-	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
-						    (void *)regs->ARM_fp);
+	return (void *)kretprobe_trampoline_handler(regs, (void *)regs->ARM_fp);
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index 004b86eff9c2..649c970e65a2 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -396,8 +396,7 @@ int __init arch_populate_kprobe_blacklist(void)
 
 void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
 {
-	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
-					(void *)kernel_stack_pointer(regs));
+	return (void *)kretprobe_trampoline_handler(regs, (void *)kernel_stack_pointer(regs));
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/csky/kernel/probes/kprobes.c b/arch/csky/kernel/probes/kprobes.c
index 68b22b499aeb..cc9dde2e4341 100644
--- a/arch/csky/kernel/probes/kprobes.c
+++ b/arch/csky/kernel/probes/kprobes.c
@@ -387,7 +387,7 @@ int __init arch_populate_kprobe_blacklist(void)
 
 void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
 {
-	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	return (void *)kretprobe_trampoline_handler(regs, NULL);
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index 0f8573bbf520..44c84c20b626 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -392,14 +392,13 @@ static void __kprobes set_current_kprobe(struct kprobe *p,
 	__this_cpu_write(current_kprobe, p);
 }
 
-static void kretprobe_trampoline(void)
+void kretprobe_trampoline(void)
 {
 }
 
 int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	regs->cr_iip = __kretprobe_trampoline_handler(regs,
-		dereference_function_descriptor(kretprobe_trampoline), NULL);
+	regs->cr_iip = __kretprobe_trampoline_handler(regs, NULL);
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler
diff --git a/arch/mips/kernel/kprobes.c b/arch/mips/kernel/kprobes.c
index 75bff0f77319..21a4fda1e2cb 100644
--- a/arch/mips/kernel/kprobes.c
+++ b/arch/mips/kernel/kprobes.c
@@ -486,8 +486,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 static int __kprobes trampoline_probe_handler(struct kprobe *p,
 						struct pt_regs *regs)
 {
-	instruction_pointer(regs) = __kretprobe_trampoline_handler(regs,
-						kretprobe_trampoline, NULL);
+	instruction_pointer(regs) = __kretprobe_trampoline_handler(regs, NULL);
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler
diff --git a/arch/parisc/kernel/kprobes.c b/arch/parisc/kernel/kprobes.c
index 6d21a515eea5..4a35ac6e2ca2 100644
--- a/arch/parisc/kernel/kprobes.c
+++ b/arch/parisc/kernel/kprobes.c
@@ -175,7 +175,7 @@ int __kprobes parisc_kprobe_ss_handler(struct pt_regs *regs)
 	return 1;
 }
 
-static inline void kretprobe_trampoline(void)
+void kretprobe_trampoline(void)
 {
 	asm volatile("nop");
 	asm volatile("nop");
@@ -193,7 +193,7 @@ static int __kprobes trampoline_probe_handler(struct kprobe *p,
 {
 	unsigned long orig_ret_address;
 
-	orig_ret_address = __kretprobe_trampoline_handler(regs, trampoline_p.addr, NULL);
+	orig_ret_address = __kretprobe_trampoline_handler(regs, NULL);
 	instruction_pointer_set(regs, orig_ret_address);
 
 	return 1;
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 24472f2c2cfc..025a9f83ae88 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -399,7 +399,7 @@ static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
 	unsigned long orig_ret_address;
 
-	orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	orig_ret_address = __kretprobe_trampoline_handler(regs, NULL);
 	/*
 	 * We get here through one of two paths:
 	 * 1. by taking a trap -> kprobe_handler() -> here
diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
index 247e33fa5bc7..07bc8804643e 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -370,7 +370,7 @@ int __init arch_populate_kprobe_blacklist(void)
 
 void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
 {
-	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	return (void *)kretprobe_trampoline_handler(regs, NULL);
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index 74b0bd2c24d4..1e6600765553 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -351,7 +351,7 @@ static void __used kretprobe_trampoline_holder(void)
  */
 static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	regs->psw.addr = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	regs->psw.addr = __kretprobe_trampoline_handler(regs, NULL);
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler
diff --git a/arch/sh/kernel/kprobes.c b/arch/sh/kernel/kprobes.c
index 1c7f358ef0be..8e76a35e6e33 100644
--- a/arch/sh/kernel/kprobes.c
+++ b/arch/sh/kernel/kprobes.c
@@ -303,7 +303,7 @@ static void __used kretprobe_trampoline_holder(void)
  */
 int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	regs->pc = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	regs->pc = __kretprobe_trampoline_handler(regs, NULL);
 
 	return 1;
 }
diff --git a/arch/sparc/kernel/kprobes.c b/arch/sparc/kernel/kprobes.c
index 4c05a4ee6a0e..401534236c2e 100644
--- a/arch/sparc/kernel/kprobes.c
+++ b/arch/sparc/kernel/kprobes.c
@@ -451,7 +451,7 @@ static int __kprobes trampoline_probe_handler(struct kprobe *p,
 {
 	unsigned long orig_ret_address = 0;
 
-	orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	orig_ret_address = __kretprobe_trampoline_handler(regs, NULL);
 	regs->tpc = orig_ret_address;
 	regs->tnpc = orig_ret_address + 4;
 
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index bd7f5886a789..71ea2eab43d5 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -49,7 +49,6 @@ extern __visible kprobe_opcode_t optprobe_template_end[];
 extern const int kretprobe_blacklist_size;
 
 void arch_remove_kprobe(struct kprobe *p);
-asmlinkage void kretprobe_trampoline(void);
 
 extern void arch_kprobe_override_function(struct pt_regs *regs);
 
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index c492ad3001ca..2dccb4347453 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1070,7 +1070,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 	regs->ip = (unsigned long)&kretprobe_trampoline;
 	regs->orig_ax = ~0UL;
 
-	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, &regs->sp);
+	return (void *)kretprobe_trampoline_handler(regs, &regs->sp);
 }
 NOKPROBE_SYMBOL(trampoline_handler);
 
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 713c3a683011..5ce677819a25 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -197,15 +197,23 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				   struct pt_regs *regs);
 extern int arch_trampoline_kprobe(struct kprobe *p);
 
+void kretprobe_trampoline(void);
+/*
+ * Since some architecture uses structured function pointer,
+ * use dereference_function_descriptor() to get real function address.
+ */
+static nokprobe_inline void *kretprobe_trampoline_addr(void)
+{
+	return dereference_kernel_function_descriptor(kretprobe_trampoline);
+}
+
 /* If the trampoline handler called from a kprobe, use this version */
 unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
-				void *trampoline_address,
-				void *frame_pointer);
+					     void *frame_pointer);
 
 static nokprobe_inline
 unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
-				void *trampoline_address,
-				void *frame_pointer)
+					   void *frame_pointer)
 {
 	unsigned long ret;
 	/*
@@ -214,7 +222,7 @@ unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
 	 * be running at this point.
 	 */
 	kprobe_busy_begin();
-	ret = __kretprobe_trampoline_handler(regs, trampoline_address, frame_pointer);
+	ret = __kretprobe_trampoline_handler(regs, frame_pointer);
 	kprobe_busy_end();
 
 	return ret;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index f8fe9d077b41..ad7a8c81ab06 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1841,7 +1841,6 @@ static struct notifier_block kprobe_exceptions_nb = {
 #ifdef CONFIG_KRETPROBES
 
 unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
-					     void *trampoline_address,
 					     void *frame_pointer)
 {
 	kprobe_opcode_t *correct_ret_addr = NULL;
@@ -1856,7 +1855,7 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 
 		BUG_ON(ri->fp != frame_pointer);
 
-		if (ri->ret_addr != trampoline_address) {
+		if (ri->ret_addr != kretprobe_trampoline_addr()) {
 			correct_ret_addr = ri->ret_addr;
 			/*
 			 * This is the real return address. Any other


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 04/13] kprobes: Add kretprobe_find_ret_addr() for searching return address
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (2 preceding siblings ...)
  2021-06-18  7:05 ` [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler() Masami Hiramatsu
@ 2021-06-18  7:05 ` Masami Hiramatsu
  2021-07-05  7:42   ` Ingo Molnar
  2021-06-18  7:06 ` [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code Masami Hiramatsu
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:05 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

Add kretprobe_find_ret_addr() for searching correct return address
from kretprobe instance list.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryik <andrii@kernel.org>
---
 Changes in v6:
  - Replace BUG_ON() with WARN_ON_ONCE() in __kretprobe_trampoline_handler().
 Changes in v3:
  - Remove generic stacktrace fixup. Instead, it should be solved in
    each unwinder. This just provide the generic interface.
 Changes in v2:
  - Add is_kretprobe_trampoline() for checking address outside of
    kretprobe_find_ret_addr()
  - Remove unneeded addr from kretprobe_find_ret_addr()
  - Rename fixup_kretprobe_tramp_addr() to fixup_kretprobe_trampoline()
---
 include/linux/kprobes.h |   22 +++++++++++
 kernel/kprobes.c        |   91 ++++++++++++++++++++++++++++++++++-------------
 2 files changed, 87 insertions(+), 26 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 5ce677819a25..08d3415e4418 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -207,6 +207,14 @@ static nokprobe_inline void *kretprobe_trampoline_addr(void)
 	return dereference_kernel_function_descriptor(kretprobe_trampoline);
 }
 
+static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
+{
+	return (void *)addr == kretprobe_trampoline_addr();
+}
+
+unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
+				      struct llist_node **cur);
+
 /* If the trampoline handler called from a kprobe, use this version */
 unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 					     void *frame_pointer);
@@ -506,6 +514,20 @@ static inline bool is_kprobe_optinsn_slot(unsigned long addr)
 }
 #endif
 
+#if !defined(CONFIG_KRETPROBES)
+static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
+{
+	return false;
+}
+
+static nokprobe_inline
+unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
+				      struct llist_node **cur)
+{
+	return 0;
+}
+#endif
+
 /* Returns true if kprobes handled the fault */
 static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs,
 					      unsigned int trap)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ad7a8c81ab06..650cbe738666 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1840,45 +1840,69 @@ static struct notifier_block kprobe_exceptions_nb = {
 
 #ifdef CONFIG_KRETPROBES
 
-unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
-					     void *frame_pointer)
+/* This assumes the tsk is current or the task which is not running. */
+static unsigned long __kretprobe_find_ret_addr(struct task_struct *tsk,
+					       struct llist_node **cur)
 {
-	kprobe_opcode_t *correct_ret_addr = NULL;
 	struct kretprobe_instance *ri = NULL;
-	struct llist_node *first, *node;
-	struct kretprobe *rp;
+	struct llist_node *node = *cur;
+
+	if (!node)
+		node = tsk->kretprobe_instances.first;
+	else
+		node = node->next;
 
-	/* Find all nodes for this frame. */
-	first = node = current->kretprobe_instances.first;
 	while (node) {
 		ri = container_of(node, struct kretprobe_instance, llist);
-
-		BUG_ON(ri->fp != frame_pointer);
-
 		if (ri->ret_addr != kretprobe_trampoline_addr()) {
-			correct_ret_addr = ri->ret_addr;
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			goto found;
+			*cur = node;
+			return (unsigned long)ri->ret_addr;
 		}
-
 		node = node->next;
 	}
-	pr_err("Oops! Kretprobe fails to find correct return address.\n");
-	BUG_ON(1);
+	return 0;
+}
+NOKPROBE_SYMBOL(__kretprobe_find_ret_addr);
 
-found:
-	/* Unlink all nodes for this frame. */
-	current->kretprobe_instances.first = node->next;
-	node->next = NULL;
+unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
+				      struct llist_node **cur)
+{
+	struct kretprobe_instance *ri = NULL;
+	unsigned long ret;
+
+	do {
+		ret = __kretprobe_find_ret_addr(tsk, cur);
+		if (!ret)
+			return ret;
+		ri = container_of(*cur, struct kretprobe_instance, llist);
+	} while (ri->fp != fp);
+
+	return ret;
+}
+NOKPROBE_SYMBOL(kretprobe_find_ret_addr);
 
-	/* Run them..  */
+unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
+					     void *frame_pointer)
+{
+	kprobe_opcode_t *correct_ret_addr = NULL;
+	struct kretprobe_instance *ri = NULL;
+	struct llist_node *first, *node = NULL;
+	struct kretprobe *rp;
+
+	/* Find correct address and all nodes for this frame. */
+	correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);
+	if (!correct_ret_addr) {
+		pr_err("Oops! Kretprobe fails to find correct return address.\n");
+		BUG_ON(1);
+	}
+
+	/* Run them. */
+	first = current->kretprobe_instances.first;
 	while (first) {
 		ri = container_of(first, struct kretprobe_instance, llist);
-		first = first->next;
+
+		if (WARN_ON_ONCE(ri->fp != frame_pointer))
+			break;
 
 		rp = get_kretprobe(ri);
 		if (rp && rp->handler) {
@@ -1889,6 +1913,21 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 			rp->handler(ri, regs);
 			__this_cpu_write(current_kprobe, prev);
 		}
+		if (first == node)
+			break;
+
+		first = first->next;
+	}
+
+	/* Unlink all nodes for this frame. */
+	first = current->kretprobe_instances.first;
+	current->kretprobe_instances.first = node->next;
+	node->next = NULL;
+
+	/* Recycle them.  */
+	while (first) {
+		ri = container_of(first, struct kretprobe_instance, llist);
+		first = first->next;
 
 		recycle_rp_inst(ri);
 	}


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (3 preceding siblings ...)
  2021-06-18  7:05 ` [PATCH -tip v8 04/13] kprobes: Add kretprobe_find_ret_addr() for searching return address Masami Hiramatsu
@ 2021-06-18  7:06 ` Masami Hiramatsu
  2021-07-05  8:02   ` Ingo Molnar
  2021-06-18  7:06 ` [PATCH -tip v8 06/13] ARC: Add instruction_pointer_set() API Masami Hiramatsu
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:06 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

From: Josh Poimboeuf <jpoimboe@redhat.com>

Add UNWIND_HINT_FUNC on kretporbe_trampoline code so that ORC
information is generated on the kretprobe_trampoline correctly.

Note that when the CONFIG_FRAME_POINTER=y, since the
kretprobe_trampoline skips updating frame pointer, the stack frame
of the kretprobe_trampoline seems non-standard. So this marks it
is STACK_FRAME_NON_STANDARD() and undefine UNWIND_HINT_FUNC.
Anyway, with the frame pointer, FP unwinder can unwind the stack
frame correctly without that hint.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryik <andrii@kernel.org>
---
 Changes in v4:
  - Apply UNWIND_HINT_FUNC only if CONFIG_FRAME_POINTER=n.
---
 arch/x86/include/asm/unwind_hints.h |    5 +++++
 arch/x86/kernel/kprobes/core.c      |   17 +++++++++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/unwind_hints.h b/arch/x86/include/asm/unwind_hints.h
index 8e574c0afef8..8b33674288ea 100644
--- a/arch/x86/include/asm/unwind_hints.h
+++ b/arch/x86/include/asm/unwind_hints.h
@@ -52,6 +52,11 @@
 	UNWIND_HINT sp_reg=ORC_REG_SP sp_offset=8 type=UNWIND_HINT_TYPE_FUNC
 .endm
 
+#else
+
+#define UNWIND_HINT_FUNC \
+	UNWIND_HINT(ORC_REG_SP, 8, UNWIND_HINT_TYPE_FUNC, 0)
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_UNWIND_HINTS_H */
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 2dccb4347453..74f049b6e77f 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1019,6 +1019,19 @@ int kprobe_int3_handler(struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(kprobe_int3_handler);
 
+#ifdef CONFIG_FRAME_POINTER
+/*
+ * kretprobe_trampoline skips updating frame pointer. The frame pointer
+ * saved in trampoline_handler points to the real caller function's
+ * frame pointer. Thus the kretprobe_trampoline doesn't seems to have a
+ * standard stack frame with CONFIG_FRAME_POINTER=y.
+ * Let's mark it non-standard function. Anyway, FP unwinder can correctly
+ * unwind without the hint.
+ */
+STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
+#undef UNWIND_HINT_FUNC
+#define UNWIND_HINT_FUNC
+#endif
 /*
  * When a retprobed function returns, this code saves registers and
  * calls trampoline_handler() runs, which calls the kretprobe's handler.
@@ -1031,6 +1044,7 @@ asm(
 	/* We don't bother saving the ss register */
 #ifdef CONFIG_X86_64
 	"	pushq %rsp\n"
+	UNWIND_HINT_FUNC
 	"	pushfq\n"
 	SAVE_REGS_STRING
 	"	movq %rsp, %rdi\n"
@@ -1041,6 +1055,7 @@ asm(
 	"	popfq\n"
 #else
 	"	pushl %esp\n"
+	UNWIND_HINT_FUNC
 	"	pushfl\n"
 	SAVE_REGS_STRING
 	"	movl %esp, %eax\n"
@@ -1054,8 +1069,6 @@ asm(
 	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
 );
 NOKPROBE_SYMBOL(kretprobe_trampoline);
-STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
-
 
 /*
  * Called from kretprobe_trampoline


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 06/13] ARC: Add instruction_pointer_set() API
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (4 preceding siblings ...)
  2021-06-18  7:06 ` [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code Masami Hiramatsu
@ 2021-06-18  7:06 ` Masami Hiramatsu
  2021-06-18  7:06 ` [PATCH -tip v8 07/13] ia64: " Masami Hiramatsu
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:06 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

Add instruction_pointer_set() API for arc.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/arc/include/asm/ptrace.h |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arc/include/asm/ptrace.h b/arch/arc/include/asm/ptrace.h
index 4c3c9be5bd16..cca8d6583e31 100644
--- a/arch/arc/include/asm/ptrace.h
+++ b/arch/arc/include/asm/ptrace.h
@@ -149,6 +149,11 @@ static inline long regs_return_value(struct pt_regs *regs)
 	return (long)regs->r0;
 }
 
+static inline void instruction_pointer_set(struct pt_regs *regs,
+					   unsigned long val)
+{
+	instruction_pointer(regs) = val;
+}
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_PTRACE_H */


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 07/13] ia64: Add instruction_pointer_set() API
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (5 preceding siblings ...)
  2021-06-18  7:06 ` [PATCH -tip v8 06/13] ARC: Add instruction_pointer_set() API Masami Hiramatsu
@ 2021-06-18  7:06 ` Masami Hiramatsu
  2021-06-18  7:06 ` [PATCH -tip v8 08/13] arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline Masami Hiramatsu
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:06 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

Add instruction_pointer_set() API for ia64.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
  Changes in v4:
   - Make the API macro for avoiding a build error.
---
 arch/ia64/include/asm/ptrace.h |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index 08179135905c..a024afbc70e5 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -51,6 +51,11 @@
  * the canonical representation by adding to instruction pointer.
  */
 # define instruction_pointer(regs) ((regs)->cr_iip + ia64_psr(regs)->ri)
+# define instruction_pointer_set(regs, val)	\
+  ({						\
+	ia64_psr(regs)->ri = (val & 0xf);	\
+	regs->cr_iip = (val & ~0xfULL);		\
+  })
 
 static inline unsigned long user_stack_pointer(struct pt_regs *regs)
 {


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 08/13] arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (6 preceding siblings ...)
  2021-06-18  7:06 ` [PATCH -tip v8 07/13] ia64: " Masami Hiramatsu
@ 2021-06-18  7:06 ` Masami Hiramatsu
  2021-07-05  8:04   ` Ingo Molnar
  2021-06-18  7:06 ` [PATCH -tip v8 09/13] kprobes: Enable stacktrace from pt_regs in kretprobe handler Masami Hiramatsu
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:06 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

Change kretprobe_trampoline to make a space for regs->ARM_pc so that
kretprobe_trampoline_handler can call instruction_pointer_set()
safely.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/arm/probes/kprobes/core.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index 583f6b1a2a6f..556b1560fb2c 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -374,11 +374,13 @@ int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
 void __naked __kprobes kretprobe_trampoline(void)
 {
 	__asm__ __volatile__ (
+		"sub	sp, sp, #16		\n\t"
 		"stmdb	sp!, {r0 - r11}		\n\t"
 		"mov	r0, sp			\n\t"
 		"bl	trampoline_handler	\n\t"
 		"mov	lr, r0			\n\t"
 		"ldmia	sp!, {r0 - r11}		\n\t"
+		"add	sp, sp, #16		\n\t"
 #ifdef CONFIG_THUMB2_KERNEL
 		"bx	lr			\n\t"
 #else


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 09/13] kprobes: Enable stacktrace from pt_regs in kretprobe handler
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (7 preceding siblings ...)
  2021-06-18  7:06 ` [PATCH -tip v8 08/13] arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline Masami Hiramatsu
@ 2021-06-18  7:06 ` Masami Hiramatsu
  2021-06-18 14:04   ` Josh Poimboeuf
  2021-06-18  7:06 ` [PATCH -tip v8 10/13] x86/kprobes: Push a fake return address at kretprobe_trampoline Masami Hiramatsu
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:06 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

Since the ORC unwinder from pt_regs requires setting up regs->ip
correctly, set the correct return address to the regs->ip before
calling user kretprobe handler.

This allows the kretrprobe handler to trace stack from the
kretprobe's pt_regs by stack_trace_save_regs() (eBPF will do
this), instead of stack tracing from the handler context by
stack_trace_save() (ftrace will do this).

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryik <andrii@kernel.org>
---
 Changes in v8:
  - Update comment to clarify why this is needed.
 Changes in v3:
  - Cast the correct_ret_addr to unsigned long.
---
 kernel/kprobes.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 650cbe738666..ba729ed05cb3 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1896,6 +1896,9 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 		BUG_ON(1);
 	}
 
+	/* Set the instruction pointer to the correct address */
+	instruction_pointer_set(regs, (unsigned long)correct_ret_addr);
+
 	/* Run them. */
 	first = current->kretprobe_instances.first;
 	while (first) {


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 10/13] x86/kprobes: Push a fake return address at kretprobe_trampoline
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (8 preceding siblings ...)
  2021-06-18  7:06 ` [PATCH -tip v8 09/13] kprobes: Enable stacktrace from pt_regs in kretprobe handler Masami Hiramatsu
@ 2021-06-18  7:06 ` Masami Hiramatsu
  2021-07-05  8:17   ` Ingo Molnar
  2021-06-18  7:07 ` [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry Masami Hiramatsu
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:06 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

This changes x86/kretprobe stack frame on kretprobe_trampoline
a bit, which now push the kretprobe_trampoline as a fake return
address at the bottom of the stack frame. With this fix, the ORC
unwinder will see the kretprobe_trampoline as a return address.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Andrii Nakryik <andrii@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/kprobes/core.c |   31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 74f049b6e77f..4d040aaf969b 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1041,28 +1041,31 @@ asm(
 	".global kretprobe_trampoline\n"
 	".type kretprobe_trampoline, @function\n"
 	"kretprobe_trampoline:\n"
-	/* We don't bother saving the ss register */
 #ifdef CONFIG_X86_64
-	"	pushq %rsp\n"
+	/* Push fake return address to tell the unwinder it's a kretprobe */
+	"	pushq $kretprobe_trampoline\n"
 	UNWIND_HINT_FUNC
+	/* Save the sp-8, this will be fixed later */
+	"	pushq %rsp\n"
 	"	pushfq\n"
 	SAVE_REGS_STRING
 	"	movq %rsp, %rdi\n"
 	"	call trampoline_handler\n"
-	/* Replace saved sp with true return address. */
-	"	movq %rax, 19*8(%rsp)\n"
 	RESTORE_REGS_STRING
+	"	addq $8, %rsp\n"
 	"	popfq\n"
 #else
-	"	pushl %esp\n"
+	/* Push fake return address to tell the unwinder it's a kretprobe */
+	"	pushl $kretprobe_trampoline\n"
 	UNWIND_HINT_FUNC
+	/* Save the sp-4, this will be fixed later */
+	"	pushl %esp\n"
 	"	pushfl\n"
 	SAVE_REGS_STRING
 	"	movl %esp, %eax\n"
 	"	call trampoline_handler\n"
-	/* Replace saved sp with true return address. */
-	"	movl %eax, 15*4(%esp)\n"
 	RESTORE_REGS_STRING
+	"	addl $4, %esp\n"
 	"	popfl\n"
 #endif
 	"	ret\n"
@@ -1073,8 +1076,10 @@ NOKPROBE_SYMBOL(kretprobe_trampoline);
 /*
  * Called from kretprobe_trampoline
  */
-__used __visible void *trampoline_handler(struct pt_regs *regs)
+__used __visible void trampoline_handler(struct pt_regs *regs)
 {
+	unsigned long *frame_pointer;
+
 	/* fixup registers */
 	regs->cs = __KERNEL_CS;
 #ifdef CONFIG_X86_32
@@ -1082,8 +1087,16 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 #endif
 	regs->ip = (unsigned long)&kretprobe_trampoline;
 	regs->orig_ax = ~0UL;
+	regs->sp += sizeof(long);
+	frame_pointer = ((unsigned long *)&regs->sp) + 1;
 
-	return (void *)kretprobe_trampoline_handler(regs, &regs->sp);
+	/* Replace fake return address with real one. */
+	*frame_pointer = kretprobe_trampoline_handler(regs, frame_pointer);
+	/*
+	 * Move flags to sp so that kretprobe_trapmoline can return
+	 * right after popf.
+	 */
+	regs->sp = regs->flags;
 }
 NOKPROBE_SYMBOL(trampoline_handler);
 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (9 preceding siblings ...)
  2021-06-18  7:06 ` [PATCH -tip v8 10/13] x86/kprobes: Push a fake return address at kretprobe_trampoline Masami Hiramatsu
@ 2021-06-18  7:07 ` Masami Hiramatsu
  2021-07-05 11:36   ` Peter Zijlstra
  2021-06-18  7:07 ` [PATCH -tip v8 12/13] tracing: Show kretprobe unknown indicator only for kretprobe_trampoline Masami Hiramatsu
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:07 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

Since the kretprobe replaces the function return address with
the kretprobe_trampoline on the stack, x86 unwinders can not
continue the stack unwinding at that point, or record
kretprobe_trampoline instead of correct return address.

To fix this issue, find the correct return address from task's
kretprobe_instances as like as function-graph tracer does.

With this fix, the unwinder can correctly unwind the stack
from kretprobe event on x86, as below.

           <...>-135     [003] ...1     6.722338: r_full_proxy_read_0: (vfs_read+0xab/0x1a0 <- full_proxy_read)
           <...>-135     [003] ...1     6.722377: <stack trace>
 => kretprobe_trace_func+0x209/0x2f0
 => kretprobe_dispatcher+0x4a/0x70
 => __kretprobe_trampoline_handler+0xca/0x150
 => trampoline_handler+0x44/0x70
 => kretprobe_trampoline+0x2a/0x50
 => vfs_read+0xab/0x1a0
 => ksys_read+0x5f/0xe0
 => do_syscall_64+0x33/0x40
 => entry_SYSCALL_64_after_hwframe+0x44/0xae


Reported-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Andrii Nakryik <andrii@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
  Changes in v7:
   - Remove superfluous #include <linux/kprobes.h>.
  Changes in v5:
   - Fix the case of interrupt happens on kretprobe_trampoline+0.
  Changes in v3:
   - Split out the kretprobe side patch
   - Fix build error when CONFIG_KRETPROBES=n.
  Changes in v2:
   - Remove kretprobe wrapper functions from unwind_orc.c
   - Do not fixup state->ip when unwinding with regs because
     kretprobe fixup instruction pointer before calling handler.
---
 arch/x86/include/asm/unwind.h  |   23 +++++++++++++++++++++++
 arch/x86/kernel/unwind_frame.c |    3 +--
 arch/x86/kernel/unwind_guess.c |    3 +--
 arch/x86/kernel/unwind_orc.c   |   18 ++++++++++++++----
 4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 70fc159ebe69..36d3971c0a2c 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -4,6 +4,7 @@
 
 #include <linux/sched.h>
 #include <linux/ftrace.h>
+#include <linux/kprobes.h>
 #include <asm/ptrace.h>
 #include <asm/stacktrace.h>
 
@@ -15,6 +16,7 @@ struct unwind_state {
 	unsigned long stack_mask;
 	struct task_struct *task;
 	int graph_idx;
+	struct llist_node *kr_cur;
 	bool error;
 #if defined(CONFIG_UNWINDER_ORC)
 	bool signal, full_regs;
@@ -99,6 +101,27 @@ void unwind_module_init(struct module *mod, void *orc_ip, size_t orc_ip_size,
 			void *orc, size_t orc_size) {}
 #endif
 
+static inline
+unsigned long unwind_recover_kretprobe(struct unwind_state *state,
+				       unsigned long addr, unsigned long *addr_p)
+{
+	return is_kretprobe_trampoline(addr) ?
+		kretprobe_find_ret_addr(state->task, addr_p, &state->kr_cur) :
+		addr;
+}
+
+/* Recover the return address modified by instrumentation (e.g. kretprobe) */
+static inline
+unsigned long unwind_recover_ret_addr(struct unwind_state *state,
+				     unsigned long addr, unsigned long *addr_p)
+{
+	unsigned long ret;
+
+	ret = ftrace_graph_ret_addr(state->task, &state->graph_idx,
+				    addr, addr_p);
+	return unwind_recover_kretprobe(state, ret, addr_p);
+}
+
 /*
  * This disables KASAN checking when reading a value from another task's stack,
  * since the other task could be running on another CPU and could have poisoned
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index d7c44b257f7f..8e1c50c86e5d 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -240,8 +240,7 @@ static bool update_stack_state(struct unwind_state *state,
 	else {
 		addr_p = unwind_get_return_address_ptr(state);
 		addr = READ_ONCE_TASK_STACK(state->task, *addr_p);
-		state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
-						  addr, addr_p);
+		state->ip = unwind_recover_ret_addr(state, addr, addr_p);
 	}
 
 	/* Save the original stack pointer for unwind_dump(): */
diff --git a/arch/x86/kernel/unwind_guess.c b/arch/x86/kernel/unwind_guess.c
index c49f10ffd8cd..884d68a6e714 100644
--- a/arch/x86/kernel/unwind_guess.c
+++ b/arch/x86/kernel/unwind_guess.c
@@ -15,8 +15,7 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 
 	addr = READ_ONCE_NOCHECK(*state->sp);
 
-	return ftrace_graph_ret_addr(state->task, &state->graph_idx,
-				     addr, state->sp);
+	return unwind_recover_ret_addr(state, addr, state->sp);
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index a1202536fc57..ad6a9aece379 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -534,9 +534,8 @@ bool unwind_next_frame(struct unwind_state *state)
 		if (!deref_stack_reg(state, ip_p, &state->ip))
 			goto err;
 
-		state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
-						  state->ip, (void *)ip_p);
-
+		state->ip = unwind_recover_ret_addr(state, state->ip,
+						    (unsigned long *)ip_p);
 		state->sp = sp;
 		state->regs = NULL;
 		state->prev_regs = NULL;
@@ -549,7 +548,15 @@ bool unwind_next_frame(struct unwind_state *state)
 					 (void *)orig_ip);
 			goto err;
 		}
-
+		/*
+		 * There is a small chance to interrupt at the entry of
+		 * kretprobe_trampoline where the ORC info doesn't exist.
+		 * That point is right after the RET to kretprobe_trampoline
+		 * which was modified return address. So the @addr_p must
+		 * be right before the regs->sp.
+		 */
+		state->ip = unwind_recover_kretprobe(state, state->ip,
+				(unsigned long *)(state->sp - sizeof(long)));
 		state->regs = (struct pt_regs *)sp;
 		state->prev_regs = NULL;
 		state->full_regs = true;
@@ -562,6 +569,9 @@ bool unwind_next_frame(struct unwind_state *state)
 					 (void *)orig_ip);
 			goto err;
 		}
+		/* See UNWIND_HINT_TYPE_REGS case comment. */
+		state->ip = unwind_recover_kretprobe(state, state->ip,
+				(unsigned long *)(state->sp - sizeof(long)));
 
 		if (state->full_regs)
 			state->prev_regs = state->regs;


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 12/13] tracing: Show kretprobe unknown indicator only for kretprobe_trampoline
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (10 preceding siblings ...)
  2021-06-18  7:07 ` [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry Masami Hiramatsu
@ 2021-06-18  7:07 ` Masami Hiramatsu
  2021-06-18  7:07 ` [PATCH -tip v8 13/13] x86/kprobes: Fixup return address in generic trampoline handler Masami Hiramatsu
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:07 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

ftrace shows "[unknown/kretprobe'd]" indicator all addresses in the
kretprobe_trampoline, but the modified address by kretprobe should
be only kretprobe_trampoline+0.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Andrii Nakryik <andrii@kernel.org>
---
 kernel/trace/trace_output.c |   17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index d0368a569bfa..e46780670742 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ftrace.h>
+#include <linux/kprobes.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/mm.h>
 
@@ -346,22 +347,12 @@ int trace_output_call(struct trace_iterator *iter, char *name, char *fmt, ...)
 }
 EXPORT_SYMBOL_GPL(trace_output_call);
 
-#ifdef CONFIG_KRETPROBES
-static inline const char *kretprobed(const char *name)
+static inline const char *kretprobed(const char *name, unsigned long addr)
 {
-	static const char tramp_name[] = "kretprobe_trampoline";
-	int size = sizeof(tramp_name);
-
-	if (strncmp(tramp_name, name, size) == 0)
+	if (is_kretprobe_trampoline(addr))
 		return "[unknown/kretprobe'd]";
 	return name;
 }
-#else
-static inline const char *kretprobed(const char *name)
-{
-	return name;
-}
-#endif /* CONFIG_KRETPROBES */
 
 void
 trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset)
@@ -374,7 +365,7 @@ trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset)
 		sprint_symbol(str, address);
 	else
 		kallsyms_lookup(address, NULL, NULL, NULL, str);
-	name = kretprobed(str);
+	name = kretprobed(str, address);
 
 	if (name && strlen(name)) {
 		trace_seq_puts(s, name);


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH -tip v8 13/13] x86/kprobes: Fixup return address in generic trampoline handler
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (11 preceding siblings ...)
  2021-06-18  7:07 ` [PATCH -tip v8 12/13] tracing: Show kretprobe unknown indicator only for kretprobe_trampoline Masami Hiramatsu
@ 2021-06-18  7:07 ` Masami Hiramatsu
  2021-07-05  8:34   ` Ingo Molnar
  2021-06-18 17:44 ` [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Andrii Nakryiko
  2021-06-28 13:50 ` Masami Hiramatsu
  14 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  7:07 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko

In x86, kretprobe trampoline address on the stack frame will
be replaced with the real return address after returning from
trampoline_handler. Before fixing the return address, the real
return address can be found in the current->kretprobe_instances.

However, since there is a window between updating the
current->kretprobe_instances and fixing the address on the stack,
if an interrupt caused at that timing and the interrupt handler
does stacktrace, it may fail to unwind because it can not get
the correct return address from current->kretprobe_instances.

This will minimize that window by fixing the return address
right before updating current->kretprobe_instances.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryik <andrii@kernel.org>
---
 Changes in v7:
  - Add a prototype for arch_kretprobe_fixup_return()
---
 arch/x86/kernel/kprobes/core.c |   15 +++++++++++++--
 include/linux/kprobes.h        |    3 +++
 kernel/kprobes.c               |    8 ++++++++
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4d040aaf969b..53c1dcfcb145 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1032,6 +1032,7 @@ STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
 #undef UNWIND_HINT_FUNC
 #define UNWIND_HINT_FUNC
 #endif
+
 /*
  * When a retprobed function returns, this code saves registers and
  * calls trampoline_handler() runs, which calls the kretprobe's handler.
@@ -1073,6 +1074,17 @@ asm(
 );
 NOKPROBE_SYMBOL(kretprobe_trampoline);
 
+void arch_kretprobe_fixup_return(struct pt_regs *regs,
+				 unsigned long correct_ret_addr)
+{
+	unsigned long *frame_pointer;
+
+	frame_pointer = ((unsigned long *)&regs->sp) + 1;
+
+	/* Replace fake return address with real one. */
+	*frame_pointer = correct_ret_addr;
+}
+
 /*
  * Called from kretprobe_trampoline
  */
@@ -1090,8 +1102,7 @@ __used __visible void trampoline_handler(struct pt_regs *regs)
 	regs->sp += sizeof(long);
 	frame_pointer = ((unsigned long *)&regs->sp) + 1;
 
-	/* Replace fake return address with real one. */
-	*frame_pointer = kretprobe_trampoline_handler(regs, frame_pointer);
+	kretprobe_trampoline_handler(regs, frame_pointer);
 	/*
 	 * Move flags to sp so that kretprobe_trapmoline can return
 	 * right after popf.
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 08d3415e4418..259bdc80e708 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -197,6 +197,9 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				   struct pt_regs *regs);
 extern int arch_trampoline_kprobe(struct kprobe *p);
 
+void arch_kretprobe_fixup_return(struct pt_regs *regs,
+				 unsigned long correct_ret_addr);
+
 void kretprobe_trampoline(void);
 /*
  * Since some architecture uses structured function pointer,
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ba729ed05cb3..72e8125fb0e9 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1881,6 +1881,12 @@ unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
 }
 NOKPROBE_SYMBOL(kretprobe_find_ret_addr);
 
+void __weak arch_kretprobe_fixup_return(struct pt_regs *regs,
+					unsigned long correct_ret_addr)
+{
+	/* Do nothing by default. */
+}
+
 unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 					     void *frame_pointer)
 {
@@ -1922,6 +1928,8 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 		first = first->next;
 	}
 
+	arch_kretprobe_fixup_return(regs, (unsigned long)correct_ret_addr);
+
 	/* Unlink all nodes for this frame. */
 	first = current->kretprobe_instances.first;
 	current->kretprobe_instances.first = node->next;


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 09/13] kprobes: Enable stacktrace from pt_regs in kretprobe handler
  2021-06-18  7:06 ` [PATCH -tip v8 09/13] kprobes: Enable stacktrace from pt_regs in kretprobe handler Masami Hiramatsu
@ 2021-06-18 14:04   ` Josh Poimboeuf
  0 siblings, 0 replies; 57+ messages in thread
From: Josh Poimboeuf @ 2021-06-18 14:04 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Ingo Molnar, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Fri, Jun 18, 2021 at 04:06:46PM +0900, Masami Hiramatsu wrote:
> Since the ORC unwinder from pt_regs requires setting up regs->ip
> correctly, set the correct return address to the regs->ip before
> calling user kretprobe handler.
> 
> This allows the kretrprobe handler to trace stack from the
> kretprobe's pt_regs by stack_trace_save_regs() (eBPF will do
> this), instead of stack tracing from the handler context by
> stack_trace_save() (ftrace will do this).
> 
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> Tested-by: Andrii Nakryik <andrii@kernel.org>

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>

> ---
>  Changes in v8:
>   - Update comment to clarify why this is needed.
>  Changes in v3:
>   - Cast the correct_ret_addr to unsigned long.
> ---
>  kernel/kprobes.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 650cbe738666..ba729ed05cb3 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1896,6 +1896,9 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
>  		BUG_ON(1);
>  	}
>  
> +	/* Set the instruction pointer to the correct address */
> +	instruction_pointer_set(regs, (unsigned long)correct_ret_addr);
> +
>  	/* Run them. */
>  	first = current->kretprobe_instances.first;
>  	while (first) {
> 

-- 
Josh


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (12 preceding siblings ...)
  2021-06-18  7:07 ` [PATCH -tip v8 13/13] x86/kprobes: Fixup return address in generic trampoline handler Masami Hiramatsu
@ 2021-06-18 17:44 ` Andrii Nakryiko
  2021-06-28 13:50 ` Masami Hiramatsu
  14 siblings, 0 replies; 57+ messages in thread
From: Andrii Nakryiko @ 2021-06-18 17:44 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	open list, bpf, Jakub Kicinski, Ingo Molnar, Alexei Starovoitov,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Kernel Team,
	Yonghong Song, linux-ia64, Abhishek Sagar

On Fri, Jun 18, 2021 at 12:05 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hello,
>
> Here is the 8th version of the series to fix the stacktrace with kretprobe on x86.
>
> The previous version is;
>
>  https://lore.kernel.org/bpf/162209754288.436794.3904335049560916855.stgit@devnote2/
>
> This version fixes to call appropriate function and drop some unneeded
> patches.
>
>
> Changes from v7:
> [03/13]: Call dereference_kernel_function_descriptor() for getting the
>   address of kretprobe_trampoline.
> [09/13]: Update the title and description to explain why it is needed.
> [10/13][11/13]: Add Josh's Acked-by.
>
>
>
> With this series, unwinder can unwind stack correctly from ftrace as below;
>
>   # cd /sys/kernel/debug/tracing
>   # echo > trace
>   # echo 1 > options/sym-offset
>   # echo r vfs_read >> kprobe_events
>   # echo r full_proxy_read >> kprobe_events
>   # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
>   # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
>   # echo 1 > events/kprobes/enable
>   # cat /sys/kernel/debug/kprobes/list
> ffffffff8133b740  r  full_proxy_read+0x0    [FTRACE]
> ffffffff812560b0  r  vfs_read+0x0    [FTRACE]
>   # echo 0 > events/kprobes/enable
>   # cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 3/3   #P:8
> #
> #                                _-----=> irqs-off
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| /     delay
> #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
> #              | |         |   ||||      |         |
>            <...>-134     [007] ...1    16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
>            <...>-134     [007] ...1    16.185901: <stack trace>
>  => kretprobe_trace_func+0x209/0x300
>  => kretprobe_dispatcher+0x4a/0x70
>  => __kretprobe_trampoline_handler+0xd4/0x170
>  => trampoline_handler+0x43/0x60
>  => kretprobe_trampoline+0x2a/0x50
>  => vfs_read+0x98/0x180
>  => ksys_read+0x5f/0xe0
>  => do_syscall_64+0x37/0x90
>  => entry_SYSCALL_64_after_hwframe+0x44/0xae
>            <...>-134     [007] ...1    16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)
>
> This shows the double return probes (vfs_read and full_proxy_read) on the stack
> correctly unwinded. (vfs_read will return to ksys_read+0x5f and full_proxy_read
> will return to vfs_read+0x98)
>
> This actually changes the kretprobe behavisor a bit, now the instraction pointer in
> the pt_regs passed to kretprobe user handler is correctly set the real return
> address. So user handlers can get it via instruction_pointer() API, and can use
> stack_trace_save_regs().
>
> You can also get this series from
>  git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v8
>
>
> Thank you,
>
> ---
>
> Josh Poimboeuf (1):
>       x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code
>
> Masami Hiramatsu (12):
>       ia64: kprobes: Fix to pass correct trampoline address to the handler
>       kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
>       kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
>       kprobes: Add kretprobe_find_ret_addr() for searching return address
>       ARC: Add instruction_pointer_set() API
>       ia64: Add instruction_pointer_set() API
>       arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline
>       kprobes: Enable stacktrace from pt_regs in kretprobe handler
>       x86/kprobes: Push a fake return address at kretprobe_trampoline
>       x86/unwind: Recover kretprobe trampoline entry
>       tracing: Show kretprobe unknown indicator only for kretprobe_trampoline
>       x86/kprobes: Fixup return address in generic trampoline handler
>

It works for BPF cases. Thanks!

Tested-by: Andrii Nakryiko <andrii@kernel.org>


>
>  arch/arc/include/asm/ptrace.h       |    5 ++
>  arch/arc/kernel/kprobes.c           |    2 -
>  arch/arm/probes/kprobes/core.c      |    5 +-
>  arch/arm64/kernel/probes/kprobes.c  |    3 -
>  arch/csky/kernel/probes/kprobes.c   |    2 -
>  arch/ia64/include/asm/ptrace.h      |    5 ++
>  arch/ia64/kernel/kprobes.c          |   15 ++---
>  arch/mips/kernel/kprobes.c          |    3 -
>  arch/parisc/kernel/kprobes.c        |    4 +
>  arch/powerpc/kernel/kprobes.c       |   13 ----
>  arch/riscv/kernel/probes/kprobes.c  |    2 -
>  arch/s390/kernel/kprobes.c          |    2 -
>  arch/sh/kernel/kprobes.c            |    2 -
>  arch/sparc/kernel/kprobes.c         |    2 -
>  arch/x86/include/asm/kprobes.h      |    1
>  arch/x86/include/asm/unwind.h       |   23 +++++++
>  arch/x86/include/asm/unwind_hints.h |    5 ++
>  arch/x86/kernel/kprobes/core.c      |   53 +++++++++++++++--
>  arch/x86/kernel/unwind_frame.c      |    3 -
>  arch/x86/kernel/unwind_guess.c      |    3 -
>  arch/x86/kernel/unwind_orc.c        |   18 +++++-
>  include/linux/kprobes.h             |   44 ++++++++++++--
>  kernel/kprobes.c                    |  108 +++++++++++++++++++++++++----------
>  kernel/trace/trace_output.c         |   17 +-----
>  lib/error-inject.c                  |    3 +
>  25 files changed, 238 insertions(+), 105 deletions(-)
>
> --
> Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86
  2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
                   ` (13 preceding siblings ...)
  2021-06-18 17:44 ` [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Andrii Nakryiko
@ 2021-06-28 13:50 ` Masami Hiramatsu
  14 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-06-28 13:50 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	linux-kernel, bpf, kuba, mingo, ast, Thomas Gleixner,
	Borislav Petkov, Peter Zijlstra, kernel-team, yhs, linux-ia64,
	Abhishek Sagar, Andrii Nakryiko, Qiang Wang, Muchun Song

Hi Ingo and Peter,

Can you merge this series to tip tree?
Josh reviewed the ORC unwinder parts, so I think it is a good time to pick it.
(And recently I got same effort from Qiang, he thinks this can be a phishing risk *)

* https://lore.kernel.org/bpf/CAMZfGtWPi4CuVOtmUpy2N9J_mvp+5=gSAFvqV1nmvDKP+CAvQA@mail.gmail.com/

Thank you,

On Fri, 18 Jun 2021 16:05:22 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Hello,
> 
> Here is the 8th version of the series to fix the stacktrace with kretprobe on x86.
> 
> The previous version is;
> 
>  https://lore.kernel.org/bpf/162209754288.436794.3904335049560916855.stgit@devnote2/
> 
> This version fixes to call appropriate function and drop some unneeded
> patches.
> 
> 
> Changes from v7:
> [03/13]: Call dereference_kernel_function_descriptor() for getting the
>   address of kretprobe_trampoline.
> [09/13]: Update the title and description to explain why it is needed.
> [10/13][11/13]: Add Josh's Acked-by.
> 
> 
> 
> With this series, unwinder can unwind stack correctly from ftrace as below;
> 
>   # cd /sys/kernel/debug/tracing
>   # echo > trace
>   # echo 1 > options/sym-offset
>   # echo r vfs_read >> kprobe_events
>   # echo r full_proxy_read >> kprobe_events
>   # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
>   # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
>   # echo 1 > events/kprobes/enable
>   # cat /sys/kernel/debug/kprobes/list
> ffffffff8133b740  r  full_proxy_read+0x0    [FTRACE]
> ffffffff812560b0  r  vfs_read+0x0    [FTRACE]
>   # echo 0 > events/kprobes/enable
>   # cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 3/3   #P:8
> #
> #                                _-----=> irqs-off
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| /     delay
> #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
> #              | |         |   ||||      |         |
>            <...>-134     [007] ...1    16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
>            <...>-134     [007] ...1    16.185901: <stack trace>
>  => kretprobe_trace_func+0x209/0x300
>  => kretprobe_dispatcher+0x4a/0x70
>  => __kretprobe_trampoline_handler+0xd4/0x170
>  => trampoline_handler+0x43/0x60
>  => kretprobe_trampoline+0x2a/0x50
>  => vfs_read+0x98/0x180
>  => ksys_read+0x5f/0xe0
>  => do_syscall_64+0x37/0x90
>  => entry_SYSCALL_64_after_hwframe+0x44/0xae
>            <...>-134     [007] ...1    16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)
> 
> This shows the double return probes (vfs_read and full_proxy_read) on the stack
> correctly unwinded. (vfs_read will return to ksys_read+0x5f and full_proxy_read
> will return to vfs_read+0x98)
> 
> This actually changes the kretprobe behavisor a bit, now the instraction pointer in
> the pt_regs passed to kretprobe user handler is correctly set the real return
> address. So user handlers can get it via instruction_pointer() API, and can use
> stack_trace_save_regs().
> 
> You can also get this series from 
>  git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v8
> 
> 
> Thank you,
> 
> ---
> 
> Josh Poimboeuf (1):
>       x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code
> 
> Masami Hiramatsu (12):
>       ia64: kprobes: Fix to pass correct trampoline address to the handler
>       kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
>       kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
>       kprobes: Add kretprobe_find_ret_addr() for searching return address
>       ARC: Add instruction_pointer_set() API
>       ia64: Add instruction_pointer_set() API
>       arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline
>       kprobes: Enable stacktrace from pt_regs in kretprobe handler
>       x86/kprobes: Push a fake return address at kretprobe_trampoline
>       x86/unwind: Recover kretprobe trampoline entry
>       tracing: Show kretprobe unknown indicator only for kretprobe_trampoline
>       x86/kprobes: Fixup return address in generic trampoline handler
> 
> 
>  arch/arc/include/asm/ptrace.h       |    5 ++
>  arch/arc/kernel/kprobes.c           |    2 -
>  arch/arm/probes/kprobes/core.c      |    5 +-
>  arch/arm64/kernel/probes/kprobes.c  |    3 -
>  arch/csky/kernel/probes/kprobes.c   |    2 -
>  arch/ia64/include/asm/ptrace.h      |    5 ++
>  arch/ia64/kernel/kprobes.c          |   15 ++---
>  arch/mips/kernel/kprobes.c          |    3 -
>  arch/parisc/kernel/kprobes.c        |    4 +
>  arch/powerpc/kernel/kprobes.c       |   13 ----
>  arch/riscv/kernel/probes/kprobes.c  |    2 -
>  arch/s390/kernel/kprobes.c          |    2 -
>  arch/sh/kernel/kprobes.c            |    2 -
>  arch/sparc/kernel/kprobes.c         |    2 -
>  arch/x86/include/asm/kprobes.h      |    1 
>  arch/x86/include/asm/unwind.h       |   23 +++++++
>  arch/x86/include/asm/unwind_hints.h |    5 ++
>  arch/x86/kernel/kprobes/core.c      |   53 +++++++++++++++--
>  arch/x86/kernel/unwind_frame.c      |    3 -
>  arch/x86/kernel/unwind_guess.c      |    3 -
>  arch/x86/kernel/unwind_orc.c        |   18 +++++-
>  include/linux/kprobes.h             |   44 ++++++++++++--
>  kernel/kprobes.c                    |  108 +++++++++++++++++++++++++----------
>  kernel/trace/trace_output.c         |   17 +-----
>  lib/error-inject.c                  |    3 +
>  25 files changed, 238 insertions(+), 105 deletions(-)
> 
> --
> Masami Hiramatsu (Linaro) <mhiramat@kernel.org>


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
  2021-06-18  7:05 ` [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler() Masami Hiramatsu
@ 2021-07-05  7:03   ` Ingo Molnar
  2021-07-05 10:03     ` Masami Hiramatsu
  2021-07-05  7:49   ` Ingo Molnar
  1 sibling, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2021-07-05  7:03 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko


* Masami Hiramatsu <mhiramat@kernel.org> wrote:

> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -197,15 +197,23 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
>  				   struct pt_regs *regs);
>  extern int arch_trampoline_kprobe(struct kprobe *p);
>  
> +void kretprobe_trampoline(void);
> +/*
> + * Since some architecture uses structured function pointer,
> + * use dereference_function_descriptor() to get real function address.
> + */
> +static nokprobe_inline void *kretprobe_trampoline_addr(void)
> +{
> +	return dereference_kernel_function_descriptor(kretprobe_trampoline);
> +}

Could we please also make 'kretprobe_trampoline' harder to use 
accidentally, such by naming it appropriately?

__kretprobe_trampoline would be a good first step I guess.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 04/13] kprobes: Add kretprobe_find_ret_addr() for searching return address
  2021-06-18  7:05 ` [PATCH -tip v8 04/13] kprobes: Add kretprobe_find_ret_addr() for searching return address Masami Hiramatsu
@ 2021-07-05  7:42   ` Ingo Molnar
  2021-07-05 14:11     ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2021-07-05  7:42 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko


* Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Add kretprobe_find_ret_addr() for searching correct return address
> from kretprobe instance list.

A better changelog:

   Add kretprobe_find_ret_addr() for searching the correct return address 
   from the kretprobe instances list.

But an explanation of *why* we want to add this function would be even 
better. Is it a cleanup? Is it in preparation for future changes?

Plus:

>  include/linux/kprobes.h |   22 +++++++++++
>  kernel/kprobes.c        |   91 ++++++++++++++++++++++++++++++++++-------------
>  2 files changed, 87 insertions(+), 26 deletions(-)
> 
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 5ce677819a25..08d3415e4418 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -207,6 +207,14 @@ static nokprobe_inline void *kretprobe_trampoline_addr(void)
>  	return dereference_kernel_function_descriptor(kretprobe_trampoline);
>  }
>  
> +static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
> +{
> +	return (void *)addr == kretprobe_trampoline_addr();
> +}
> +
> +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> +				      struct llist_node **cur);

These prototypes for helpers are put into a section of:

  #ifdef CONFIG_KRETPROBES

But:

> +#if !defined(CONFIG_KRETPROBES)
> +static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
> +{
> +	return false;
> +}
> +
> +static nokprobe_inline
> +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> +				      struct llist_node **cur)
> +{
> +	return 0;
> +}
> +#endif

Why does this use such a weird pattern? What is wrong with:

   #ifndef CONFIG_KRETPROBES

But more importantly, why isn't this in the regular '#else' block of the 
CONFIG_KRETPROBES block you added the other functions to ??

Why this intentional obfuscation combined with poor changelogs - is the 
kprobes code too easy to read, does it have too few bugs?

And this series is on v8 already, and nobody noticed this?

> +/* This assumes the tsk is current or the task which is not running. */
> +static unsigned long __kretprobe_find_ret_addr(struct task_struct *tsk,
> +					       struct llist_node **cur)


A better comment:

    /* This assumes 'tsk' is the current task, or is not running. */

We always escape variable names in English sentences. This is nothing new.

> +			*cur = node;
> +			return (unsigned long)ri->ret_addr;

Don't just randomly add forced type casts (which are dangerous, 
bug-inducing patterns of code) without examining whether it's justified.

But a compiler warning is not justification!

In this case the examination would involve:

  kepler:~/tip> git grep -w ret_addr kernel/kprobes.c 

  kernel/kprobes.c:               if (ri->ret_addr != kretprobe_trampoline_addr()) {
  kernel/kprobes.c:                       return (unsigned long)ri->ret_addr;
  kernel/kprobes.c:                       ri->ret_addr = correct_ret_addr;

  kepler:~/tip> git grep -w correct_ret_addr kernel/kprobes.c 

  kernel/kprobes.c:       kprobe_opcode_t *correct_ret_addr = NULL;
  kernel/kprobes.c:       correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);
  kernel/kprobes.c:       if (!correct_ret_addr) {
  kernel/kprobes.c:                       ri->ret_addr = correct_ret_addr;
  kernel/kprobes.c:       return (unsigned long)correct_ret_addr;

what we can see here is unnecessary type confusion & friction of the first 
degree:

 - 'correct_ret_addr' is 'kprobe_opcode_t *' (which is good), but the newly 
   introduced __kretprobe_find_ret_addr() function doesn't return such a 
   type - why?

 - struct_kretprobe_instance::ret_address has a 'kprobe_opcode_t *' type as 
   well - which is good.

 - kretprobe_find_ret_addr() uses 'unsigned long', but it returns the value 
   to __kretprobe_trampoline_handler(), which does *another* forced type 
   cast:

  +       correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);


So we have the following type conversions:

  kprobe_opcode_t * => unsigned long => unsigned long => kprobe_opcode_t *

Is there a technical reason why we cannot just use 'kprobe_opcode_t *'.

All other type casts in the kprobes code should be reviewed as well.

> -	BUG_ON(1);
> +	return 0;

And in the proper, intact type propagation model this would become
'return NULL' - which is *far* more obviously a 'not found' condition
than a random zero that might mean anything...

> +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> +				      struct llist_node **cur)
> +{
> +	struct kretprobe_instance *ri = NULL;
> +	unsigned long ret;
> +
> +	do {
> +		ret = __kretprobe_find_ret_addr(tsk, cur);
> +		if (!ret)
> +			return ret;
> +		ri = container_of(*cur, struct kretprobe_instance, llist);
> +	} while (ri->fp != fp);
> +
> +	return ret;

Here I see another type model problem: why is the frame pointer 'void *', 
which makes it way too easy to mix up with text pointers such as 
'kprobe_opcode_t *'?

In the x86 unwinder we use 'unsigned long *' as the frame pointer:

     unsigned long *bp

but it might also make sense to introduce a more opaque dedicated type 
within the kprobes code, such as 'frame_pointer_t'.

> +unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
> +					     void *frame_pointer)
> +{
> +	kprobe_opcode_t *correct_ret_addr = NULL;
> +	struct kretprobe_instance *ri = NULL;
> +	struct llist_node *first, *node = NULL;
> +	struct kretprobe *rp;
> +
> +	/* Find correct address and all nodes for this frame. */
> +	correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);
> +	if (!correct_ret_addr) {
> +		pr_err("Oops! Kretprobe fails to find correct return address.\n");

Could we please make user-facing messages less random? Right now we have:

  kepler:~/tip> git grep -E 'pr_.*\(' kernel/kprobes.c include/linux/kprobes.h include/asm-generic/kprobes.h $(find arch/ -name kprobes.c)

  arch/arm64/kernel/probes/kprobes.c:             pr_warn("Unrecoverable kprobe detected.\n");
  arch/csky/kernel/probes/kprobes.c:              pr_warn("Address not aligned.\n");
  arch/csky/kernel/probes/kprobes.c:              pr_warn("Unrecoverable kprobe detected.\n");
  arch/mips/kernel/kprobes.c:             pr_notice("Kprobes for ll and sc instructions are not"
  arch/mips/kernel/kprobes.c:             pr_notice("Kprobes for branch delayslot are not supported\n");
  arch/mips/kernel/kprobes.c:             pr_notice("Kprobes for compact branches are not supported\n");
  arch/mips/kernel/kprobes.c:     pr_notice("%s: unaligned epc - sending SIGBUS.\n", current->comm);
  arch/mips/kernel/kprobes.c:                     pr_notice("Kprobes: Error in evaluating branch\n");
  arch/riscv/kernel/probes/kprobes.c:             pr_warn("Address not aligned.\n");
  arch/riscv/kernel/probes/kprobes.c:             pr_warn("Unrecoverable kprobe detected.\n");
  arch/s390/kernel/kprobes.c:             pr_err("Invalid kprobe detected.\n");
  kernel/kprobes.c:               pr_debug("Failed to arm kprobe-ftrace at %pS (%d)\n",
  kernel/kprobes.c:                       pr_debug("Failed to init kprobe-ftrace (%d)\n", ret);
  kernel/kprobes.c:               pr_err("Oops! Kretprobe fails to find correct return address.\n");
  kernel/kprobes.c:       pr_err("Dumping kprobe:\n");
  kernel/kprobes.c:       pr_err("Name: %s\nOffset: %x\nAddress: %pS\n",
  kernel/kprobes.c:               pr_err("kprobes: failed to populate blacklist: %d\n", err);
  kernel/kprobes.c:               pr_err("Please take care of using kprobes.\n");
  kernel/kprobes.c:               pr_warn("Kprobes globally enabled, but failed to arm %d out of %d probes\n",
  kernel/kprobes.c:               pr_info("Kprobes globally enabled\n");
  kernel/kprobes.c:               pr_warn("Kprobes globally disabled, but failed to disarm %d out of %d probes\n",
  kernel/kprobes.c:               pr_info("Kprobes globally disabled\n");

In particular, what users may see in their syslog, when the kprobes code 
runs into trouble, is, roughly:

  kepler:~/tip> git grep -E 'pr_.*\(' kernel/kprobes.c include/linux/kprobes.h include/asm-generic/kprobes.h $(find arch/ -name kprobes.c) | cut -d\" -f2

  Unrecoverable kprobe detected.\n
  Address not aligned.\n
  Unrecoverable kprobe detected.\n
  Kprobes for ll and sc instructions are not
  Kprobes for branch delayslot are not supported\n
  Kprobes for compact branches are not supported\n
  %s: unaligned epc - sending SIGBUS.\n
  Kprobes: Error in evaluating branch\n
  Address not aligned.\n
  Unrecoverable kprobe detected.\n
  Invalid kprobe detected.\n
  Failed to arm kprobe-ftrace at %pS (%d)\n
  Failed to init kprobe-ftrace (%d)\n
  Oops! Kretprobe fails to find correct return address.\n
  Dumping kprobe:\n
  Name: %s\nOffset: %x\nAddress: %pS\n
  kprobes: failed to populate blacklist: %d\n
  Please take care of using kprobes.\n
  Kprobes globally enabled, but failed to arm %d out of %d probes\n
  Kprobes globally enabled\n
  Kprobes globally disabled, but failed to disarm %d out of %d probes\n
  Kprobes globally disabled\n

Ugh. Some of the messages don't even have 'kprobes' in them...

So my suggestion would be:

 - Introduce a subsystem syslog message prefix, via the standard pattern  of:

     #define pr_fmt(fmt) "kprobes: " fmt

 - Standardize the messages:

    - Start each message with a key noun that stresses the nature of the 
      failure.

    - *Make each message self-explanatory*, don't leave users hanging in 
      the air about what is going to happen next. Messages like:

               Address not aligned.\n

    - Check spelling. This:

                pr_err("kprobes: failed to populate blacklist: %d\n", err);
                pr_err("Please take care of using kprobes.\n");

      should be on a single line and should probably say something like:

                pr_err("kprobes: Failed to populate blacklist (error: %d), kprobes not restricted, be careful using them!.\n", err);

      and if checkpatch complains that the line is 'too long', ignore 
      checkpatch and keep the message self-contained.

 - and most importantly: provide a suggested *resolution*. 
   Passive-aggressive messages like:

      Oops! Kretprobe fails to find correct return address.\n

   are next to useless. Instead, always describe:

     - what happened,
     - what is the kernel going to do or not do,
     - is the kernel fine,
     - what can the user do about it.

   In this case, a better message would be:

      kretprobes: Return address not found, not executing handler. Kernel is probably fine, but check the system tool that did this.

Each and every message should be reviewed & fixed to meet these standards - 
or should be removed and replaced with a WARN_ON() if it's indicating an 
internal bug that cannot be caused by kprobes using tools, such as this 
one:

> +		if (WARN_ON_ONCE(ri->fp != frame_pointer))
> +			break;

I can help double checking the fixed messages, if you are unsure about any 
of them.

> +	/* Recycle them.  */
> +	while (first) {
> +		ri = container_of(first, struct kretprobe_instance, llist);
> +		first = first->next;
>  
>  		recycle_rp_inst(ri);
>  	}

It would be helpful to explain, a bit more verbose comment, what 
'recycling' is in this context. The code is not very helpful:

  NOKPROBE_SYMBOL(free_rp_inst_rcu);

  static void recycle_rp_inst(struct kretprobe_instance *ri)
  {
          struct kretprobe *rp = get_kretprobe(ri);

          if (likely(rp)) {
                  freelist_add(&ri->freelist, &rp->freelist);
          } else
                  call_rcu(&ri->rcu, free_rp_inst_rcu);
  }
  NOKPROBE_SYMBOL(recycle_rp_inst);

BTW., why are unnecessary curly braces used here?

The kprobes code urgently needs a quality boost.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 01/13] ia64: kprobes: Fix to pass correct trampoline address to the handler
  2021-06-18  7:05 ` [PATCH -tip v8 01/13] ia64: kprobes: Fix to pass correct trampoline address to the handler Masami Hiramatsu
@ 2021-07-05  7:46   ` Ingo Molnar
  2021-07-05 10:05     ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2021-07-05  7:46 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko


* Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Commit e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
> missed to pass the wrong trampoline address (it passes the descriptor address
> instead of function entry address).
> This fixes it to pass correct trampoline address to __kretprobe_trampoline_handler().
> This also changes to use correct symbol dereference function to get the
> function address from the kretprobe_trampoline.
> 
> Fixes: e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>

A better changelog:

  The following commit:

     Commit e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")

  Passed the wrong trampoline address to __kretprobe_trampoline_handler(): it
  passes the descriptor address instead of function entry address.

  Pass the right parameter.

  Also use correct symbol dereference function to get the function address
  from 'kretprobe_trampoline' - an IA64 special.

(Although I realize that much of this goes away just a couple of patches 
later.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
  2021-06-18  7:05 ` [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor() Masami Hiramatsu
@ 2021-07-05  7:48   ` Ingo Molnar
  2021-07-05 12:03     ` Masami Hiramatsu
  2021-07-07 18:28   ` Andrii Nakryiko
  1 sibling, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2021-07-05  7:48 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko


* Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Replace arch_deref_entry_point() with dereference_symbol_descriptor()
> because those are doing same thing.
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> Tested-by: Andrii Nakryik <andrii@kernel.org>

A better changelog:

  ~15 years ago kprobes grew the 'arch_deref_entry_point()' __weak function:

    3d7e33825d87: ("jprobes: make jprobes a little safer for users")

  But this is just open-coded dereference_symbol_descriptor() in essence, and
  its obscure nature was causing bugs.

  Just use the real thing.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
  2021-06-18  7:05 ` [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler() Masami Hiramatsu
  2021-07-05  7:03   ` Ingo Molnar
@ 2021-07-05  7:49   ` Ingo Molnar
  1 sibling, 0 replies; 57+ messages in thread
From: Ingo Molnar @ 2021-07-05  7:49 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko


* Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Remove trampoline_address from kretprobe_trampoline_handler().
> Instead of passing the address, kretprobe_trampoline_handler()
> can use new kretprobe_trampoline_addr().
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> Tested-by: Andrii Nakryik <andrii@kernel.org>

A better changelog:

   The __kretprobe_trampoline_handler() callback, called from low level 
   arch kprobes methods, has the 'trampoline_address' parameter, which is 
   entirely superfluous as it basically just replicates:

        dereference_kernel_function_descriptor(kretprobe_trampoline)

   In fact we had bugs in arch code where it wasn't replicated correctly.

   So remove this superfluous parameter and use kretprobe_trampoline_addr() 
   instead.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code
  2021-06-18  7:06 ` [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code Masami Hiramatsu
@ 2021-07-05  8:02   ` Ingo Molnar
  2021-07-09 15:31     ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2021-07-05  8:02 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko


* Masami Hiramatsu <mhiramat@kernel.org> wrote:

> From: Josh Poimboeuf <jpoimboe@redhat.com>
> 
> Add UNWIND_HINT_FUNC on kretporbe_trampoline code so that ORC
> information is generated on the kretprobe_trampoline correctly.

What is a 'kretporbe'?

> Note that when the CONFIG_FRAME_POINTER=y, since the
> kretprobe_trampoline skips updating frame pointer, the stack frame
> of the kretprobe_trampoline seems non-standard. So this marks it
> is STACK_FRAME_NON_STANDARD() and undefine UNWIND_HINT_FUNC.

What does 'marks it is' mean?

'undefine' UNWIND_HINT_FUNC?

Doesn't the patch do the exact opposite:

  > +#define UNWIND_HINT_FUNC \
  > +	UNWIND_HINT(ORC_REG_SP, 8, UNWIND_HINT_TYPE_FUNC, 0)

But it does undefine it in a specific spot:



> Anyway, with the frame pointer, FP unwinder can unwind the stack
> frame correctly without that hint.
> 
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> Tested-by: Andrii Nakryik <andrii@kernel.org>

I have to say these changelogs are very careless.

> +#else
> +

In headers, in longer CPP blocks, please always mark the '#else' branch 
with what it is the else branch of.

See the output of:

   kepler:~/tip> git grep '#else' arch/x86/include/asm/ | head

> +#ifdef CONFIG_FRAME_POINTER
> +/*
> + * kretprobe_trampoline skips updating frame pointer. The frame pointer
> + * saved in trampoline_handler points to the real caller function's
> + * frame pointer. Thus the kretprobe_trampoline doesn't seems to have a
> + * standard stack frame with CONFIG_FRAME_POINTER=y.
> + * Let's mark it non-standard function. Anyway, FP unwinder can correctly
> + * unwind without the hint.

s/doesn't seems to have a standard stack frame
 /doesn't have a standard stack frame

There's nothing 'seems' about the situation - it's a non-standard function 
entry and stack frame situation, and the unwinder needs to know about it.

> +STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> +#undef UNWIND_HINT_FUNC
> +#define UNWIND_HINT_FUNC
> +#endif
>  /*
>   * When a retprobed function returns, this code saves registers and
>   * calls trampoline_handler() runs, which calls the kretprobe's handler.
> @@ -1031,6 +1044,7 @@ asm(
>  	/* We don't bother saving the ss register */
>  #ifdef CONFIG_X86_64
>  	"	pushq %rsp\n"
> +	UNWIND_HINT_FUNC
>  	"	pushfq\n"
>  	SAVE_REGS_STRING
>  	"	movq %rsp, %rdi\n"
> @@ -1041,6 +1055,7 @@ asm(
>  	"	popfq\n"
>  #else
>  	"	pushl %esp\n"
> +	UNWIND_HINT_FUNC
>  	"	pushfl\n"
>  	SAVE_REGS_STRING
>  	"	movl %esp, %eax\n"

Why not provide an appropriate annotation method in <asm/unwind_hints.h>, 
so that other future code can use it too instead of reinventing the wheel?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 08/13] arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline
  2021-06-18  7:06 ` [PATCH -tip v8 08/13] arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline Masami Hiramatsu
@ 2021-07-05  8:04   ` Ingo Molnar
  2021-07-05 14:40     ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2021-07-05  8:04 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko


* Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Change kretprobe_trampoline to make a space for regs->ARM_pc so that
> kretprobe_trampoline_handler can call instruction_pointer_set()
> safely.

The idiom is "make space", but in any case, what does this mean?

Was the stack frame set up in kretprobe_trampoline() and calling 
trampoline_handler() buggy?

If yes, then explain the bad effects of the bug, and make all of this clear 
in the title & changelog.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 10/13] x86/kprobes: Push a fake return address at kretprobe_trampoline
  2021-06-18  7:06 ` [PATCH -tip v8 10/13] x86/kprobes: Push a fake return address at kretprobe_trampoline Masami Hiramatsu
@ 2021-07-05  8:17   ` Ingo Molnar
  2021-07-09 14:55     ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2021-07-05  8:17 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko


* Masami Hiramatsu <mhiramat@kernel.org> wrote:

> +	/* Replace fake return address with real one. */
> +	*frame_pointer = kretprobe_trampoline_handler(regs, frame_pointer);
> +	/*
> +	 * Move flags to sp so that kretprobe_trapmoline can return
> +	 * right after popf.

What is a trapmoline?

Also, in the x86 code we capitalize register and instruction names so that 
they are more distinctive and easier to read in the flow of English text.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 13/13] x86/kprobes: Fixup return address in generic trampoline handler
  2021-06-18  7:07 ` [PATCH -tip v8 13/13] x86/kprobes: Fixup return address in generic trampoline handler Masami Hiramatsu
@ 2021-07-05  8:34   ` Ingo Molnar
  2021-07-06 12:57     ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2021-07-05  8:34 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko


* Masami Hiramatsu <mhiramat@kernel.org> wrote:

> In x86, kretprobe trampoline address on the stack frame will
> be replaced with the real return address after returning from
> trampoline_handler. Before fixing the return address, the real
> return address can be found in the current->kretprobe_instances.
> 
> However, since there is a window between updating the
> current->kretprobe_instances and fixing the address on the stack,
> if an interrupt caused at that timing and the interrupt handler
> does stacktrace, it may fail to unwind because it can not get
> the correct return address from current->kretprobe_instances.
> 
> This will minimize that window by fixing the return address
> right before updating current->kretprobe_instances.

Is there still a window? I.e. is it "minimized" (to how big of a window?), 
or eliminated?

> +void arch_kretprobe_fixup_return(struct pt_regs *regs,
> +				 unsigned long correct_ret_addr)
> +{
> +	unsigned long *frame_pointer;
> +
> +	frame_pointer = ((unsigned long *)&regs->sp) + 1;
> +
> +	/* Replace fake return address with real one. */
> +	*frame_pointer = correct_ret_addr;

Firstly, why does &regs->sp have to be forced to 'unsigned long *'? 

pt_regs::sp is 'unsigned long' on both 32-bit and 64-bit kernels AFAICS.

Secondly, the new code modified by your patch now looks like this:

        frame_pointer = ((unsigned long *)&regs->sp) + 1;
 
+       kretprobe_trampoline_handler(regs, frame_pointer);

where:

+void arch_kretprobe_fixup_return(struct pt_regs *regs,
+                                unsigned long correct_ret_addr)
+{
+       unsigned long *frame_pointer;
+
+       frame_pointer = ((unsigned long *)&regs->sp) + 1;
+
+       /* Replace fake return address with real one. */
+       *frame_pointer = correct_ret_addr;
+}

So we first do:

        frame_pointer = ((unsigned long *)&regs->sp) + 1;

... and pass that in to arch_kretprobe_fixup_return() as 
'correct_ret_addr', which does:

+       frame_pointer = ((unsigned long *)&regs->sp) + 1;
+	*frame_pointer = correct_ret_addr;

... which looks like the exact same thing as:

	*frame_pointer = frame_pointer;

... obfuscated through a thick layer of type casts?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
  2021-07-05  7:03   ` Ingo Molnar
@ 2021-07-05 10:03     ` Masami Hiramatsu
  0 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-05 10:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

Hi Ingo,

On Mon, 5 Jul 2021 09:03:19 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > --- a/include/linux/kprobes.h
> > +++ b/include/linux/kprobes.h
> > @@ -197,15 +197,23 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
> >  				   struct pt_regs *regs);
> >  extern int arch_trampoline_kprobe(struct kprobe *p);
> >  
> > +void kretprobe_trampoline(void);
> > +/*
> > + * Since some architecture uses structured function pointer,
> > + * use dereference_function_descriptor() to get real function address.
> > + */
> > +static nokprobe_inline void *kretprobe_trampoline_addr(void)
> > +{
> > +	return dereference_kernel_function_descriptor(kretprobe_trampoline);
> > +}
> 
> Could we please also make 'kretprobe_trampoline' harder to use 
> accidentally, such by naming it appropriately?
> 
> __kretprobe_trampoline would be a good first step I guess.

Good idea! Let me update it.

Thank you,



-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 01/13] ia64: kprobes: Fix to pass correct trampoline address to the handler
  2021-07-05  7:46   ` Ingo Molnar
@ 2021-07-05 10:05     ` Masami Hiramatsu
  0 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-05 10:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Mon, 5 Jul 2021 09:46:33 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > Commit e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
> > missed to pass the wrong trampoline address (it passes the descriptor address
> > instead of function entry address).
> > This fixes it to pass correct trampoline address to __kretprobe_trampoline_handler().
> > This also changes to use correct symbol dereference function to get the
> > function address from the kretprobe_trampoline.
> > 
> > Fixes: e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> 
> A better changelog:
> 
>   The following commit:
> 
>      Commit e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
> 
>   Passed the wrong trampoline address to __kretprobe_trampoline_handler(): it
>   passes the descriptor address instead of function entry address.
> 
>   Pass the right parameter.
> 
>   Also use correct symbol dereference function to get the function address
>   from 'kretprobe_trampoline' - an IA64 special.

Thanks for rewriting! OK, I'll update it.

> 
> (Although I realize that much of this goes away just a couple of patches 
> later.)

Yes, but since this is a real bug. I think I should split it for backporting
to stable trees. (Oh, I also forgot to add Cc: stable. Sorry about that.)

Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-06-18  7:07 ` [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry Masami Hiramatsu
@ 2021-07-05 11:36   ` Peter Zijlstra
  2021-07-05 15:42     ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2021-07-05 11:36 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	linux-kernel, bpf, kuba, mingo, ast, Thomas Gleixner,
	Borislav Petkov, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Fri, Jun 18, 2021 at 04:07:06PM +0900, Masami Hiramatsu wrote:
> @@ -549,7 +548,15 @@ bool unwind_next_frame(struct unwind_state *state)
>  					 (void *)orig_ip);
>  			goto err;
>  		}
> -
> +		/*
> +		 * There is a small chance to interrupt at the entry of
> +		 * kretprobe_trampoline where the ORC info doesn't exist.
> +		 * That point is right after the RET to kretprobe_trampoline
> +		 * which was modified return address. So the @addr_p must
> +		 * be right before the regs->sp.
> +		 */
> +		state->ip = unwind_recover_kretprobe(state, state->ip,
> +				(unsigned long *)(state->sp - sizeof(long)));
>  		state->regs = (struct pt_regs *)sp;
>  		state->prev_regs = NULL;
>  		state->full_regs = true;
> @@ -562,6 +569,9 @@ bool unwind_next_frame(struct unwind_state *state)
>  					 (void *)orig_ip);
>  			goto err;
>  		}
> +		/* See UNWIND_HINT_TYPE_REGS case comment. */
> +		state->ip = unwind_recover_kretprobe(state, state->ip,
> +				(unsigned long *)(state->sp - sizeof(long)));
>  
>  		if (state->full_regs)
>  			state->prev_regs = state->regs;

Why doesn't the ftrace case have this? That is, why aren't both return
trampolines having the same general shape?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
  2021-07-05  7:48   ` Ingo Molnar
@ 2021-07-05 12:03     ` Masami Hiramatsu
  0 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-05 12:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Mon, 5 Jul 2021 09:48:09 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > Replace arch_deref_entry_point() with dereference_symbol_descriptor()
> > because those are doing same thing.
> > 
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> > Tested-by: Andrii Nakryik <andrii@kernel.org>
> 
> A better changelog:
> 
>   ~15 years ago kprobes grew the 'arch_deref_entry_point()' __weak function:
> 
>     3d7e33825d87: ("jprobes: make jprobes a little safer for users")
> 
>   But this is just open-coded dereference_symbol_descriptor() in essence, and
>   its obscure nature was causing bugs.
> 
>   Just use the real thing.

OK. BTW, I couldn't find actual bugs from it. What about this?

"its obscure nature was causing problems in the past."

Thank you,

> 
> Thanks,
> 
> 	Ingo


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 04/13] kprobes: Add kretprobe_find_ret_addr() for searching return address
  2021-07-05  7:42   ` Ingo Molnar
@ 2021-07-05 14:11     ` Masami Hiramatsu
  0 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-05 14:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

Hi Ingo,

On Mon, 5 Jul 2021 09:42:44 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > Add kretprobe_find_ret_addr() for searching correct return address
> > from kretprobe instance list.
> 
> A better changelog:
> 
>    Add kretprobe_find_ret_addr() for searching the correct return address 
>    from the kretprobe instances list.
> 
> But an explanation of *why* we want to add this function would be even 
> better. Is it a cleanup? Is it in preparation for future changes?

It's latter. This is for exposing kretprobe_find_ret_addr() and
is_kretprobe_trampoline(), which will be used in the 11/13.

> 
> Plus:
> 
> >  include/linux/kprobes.h |   22 +++++++++++
> >  kernel/kprobes.c        |   91 ++++++++++++++++++++++++++++++++++-------------
> >  2 files changed, 87 insertions(+), 26 deletions(-)
> > 
> > diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> > index 5ce677819a25..08d3415e4418 100644
> > --- a/include/linux/kprobes.h
> > +++ b/include/linux/kprobes.h
> > @@ -207,6 +207,14 @@ static nokprobe_inline void *kretprobe_trampoline_addr(void)
> >  	return dereference_kernel_function_descriptor(kretprobe_trampoline);
> >  }
> >  
> > +static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
> > +{
> > +	return (void *)addr == kretprobe_trampoline_addr();
> > +}
> > +
> > +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> > +				      struct llist_node **cur);
> 
> These prototypes for helpers are put into a section of:
> 
>   #ifdef CONFIG_KRETPROBES
> 
> But:
> 
> > +#if !defined(CONFIG_KRETPROBES)
> > +static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
> > +{
> > +	return false;
> > +}
> > +
> > +static nokprobe_inline
> > +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> > +				      struct llist_node **cur)
> > +{
> > +	return 0;
> > +}
> > +#endif
> 
> Why does this use such a weird pattern? What is wrong with:
> 
>    #ifndef CONFIG_KRETPROBES
> 
> But more importantly, why isn't this in the regular '#else' block of the 
> CONFIG_KRETPROBES block you added the other functions to ??

This is because there can be CONFIG_KPROBES=y but CONFIG_KRETPROBES=n case.

There are 3 combinations
1. CONFIG_KPROBES=y && CONFIG_KRETPROBES=y
2. CONFIG_KPROBES=y && CONFIG_KRETPROBES=n
3. CONFIG_KPROBES=n && CONFIG_KRETPROBES=n
The former definition covers case#1(note that this is in the #ifdef CONFIG_KPROBES),
and latter covers case #2 and #3.
(BTW, nowadays case #2 doesn't exist, so I think I should remove CONFIG_KRETPROBES)

Anyway, I'll put both at the last so that easier to read, something like

#ifdef CONFIG_KRETPROBES
static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
...
#else
static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
...
#endif

> 
> Why this intentional obfuscation combined with poor changelogs - is the 
> kprobes code too easy to read, does it have too few bugs?
> 
> And this series is on v8 already, and nobody noticed this?
> 
> > +/* This assumes the tsk is current or the task which is not running. */
> > +static unsigned long __kretprobe_find_ret_addr(struct task_struct *tsk,
> > +					       struct llist_node **cur)
> 
> 
> A better comment:
> 
>     /* This assumes 'tsk' is the current task, or is not running. */
> 
> We always escape variable names in English sentences. This is nothing new.

OK.

> 
> > +			*cur = node;
> > +			return (unsigned long)ri->ret_addr;
> 
> Don't just randomly add forced type casts (which are dangerous, 
> bug-inducing patterns of code) without examining whether it's justified.

Yes, I need to cleanup kprobes code, it seems too many '*' <-> 'unsinged long'
type castings.

> But a compiler warning is not justification!
> 
> In this case the examination would involve:
> 
>   kepler:~/tip> git grep -w ret_addr kernel/kprobes.c 
> 
>   kernel/kprobes.c:               if (ri->ret_addr != kretprobe_trampoline_addr()) {
>   kernel/kprobes.c:                       return (unsigned long)ri->ret_addr;
>   kernel/kprobes.c:                       ri->ret_addr = correct_ret_addr;
> 
>   kepler:~/tip> git grep -w correct_ret_addr kernel/kprobes.c 
> 
>   kernel/kprobes.c:       kprobe_opcode_t *correct_ret_addr = NULL;
>   kernel/kprobes.c:       correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);
>   kernel/kprobes.c:       if (!correct_ret_addr) {
>   kernel/kprobes.c:                       ri->ret_addr = correct_ret_addr;
>   kernel/kprobes.c:       return (unsigned long)correct_ret_addr;
> 
> what we can see here is unnecessary type confusion & friction of the first 
> degree:
> 
>  - 'correct_ret_addr' is 'kprobe_opcode_t *' (which is good), but the newly 
>    introduced __kretprobe_find_ret_addr() function doesn't return such a 
>    type - why?

OK, this is my mistake. Since 'kprobe_opcode_t *' is used only inside the
kprobes, I would like to make kretprobe_find_ret_addr() returning 'unsigned
long'. But I used 'unsigned long' for internal function too.

>  - struct_kretprobe_instance::ret_address has a 'kprobe_opcode_t *' type as 
>    well - which is good.
> 
>  - kretprobe_find_ret_addr() uses 'unsigned long', but it returns the value 
>    to __kretprobe_trampoline_handler(), which does *another* forced type 
>    cast:
> 
>   +       correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);

This is '__kretprobe_find_ret_addr()', an internal function, which should
be fixed to return 'kprobe_opcode_t *'.

But I would like to keep the 'kretprobe_find_ret_addr()' returns 'unsigned long'
because it is used from stack unwinder, which uses 'unsigned long' for the
address type. What would you think?

> So we have the following type conversions:
> 
>   kprobe_opcode_t * => unsigned long => unsigned long => kprobe_opcode_t *
> 
> Is there a technical reason why we cannot just use 'kprobe_opcode_t *'.

OK, I'll use the 'kprobe_opcode_t *' unless it is exposed to other subsystem.

> 
> All other type casts in the kprobes code should be reviewed as well.
> 
> > -	BUG_ON(1);
> > +	return 0;
> 
> And in the proper, intact type propagation model this would become
> 'return NULL' - which is *far* more obviously a 'not found' condition
> than a random zero that might mean anything...

OK.

> 
> > +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> > +				      struct llist_node **cur)
> > +{
> > +	struct kretprobe_instance *ri = NULL;
> > +	unsigned long ret;
> > +
> > +	do {
> > +		ret = __kretprobe_find_ret_addr(tsk, cur);
> > +		if (!ret)
> > +			return ret;
> > +		ri = container_of(*cur, struct kretprobe_instance, llist);
> > +	} while (ri->fp != fp);
> > +
> > +	return ret;
> 
> Here I see another type model problem: why is the frame pointer 'void *', 
> which makes it way too easy to mix up with text pointers such as 
> 'kprobe_opcode_t *'?

(at that moment, I just used same type of 'kretprobe_instance->fp')

> 
> In the x86 unwinder we use 'unsigned long *' as the frame pointer:
> 
>      unsigned long *bp
> 
> but it might also make sense to introduce a more opaque dedicated type 
> within the kprobes code, such as 'frame_pointer_t'.
> 
> > +unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
> > +					     void *frame_pointer)
> > +{
> > +	kprobe_opcode_t *correct_ret_addr = NULL;
> > +	struct kretprobe_instance *ri = NULL;
> > +	struct llist_node *first, *node = NULL;
> > +	struct kretprobe *rp;
> > +
> > +	/* Find correct address and all nodes for this frame. */
> > +	correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);
> > +	if (!correct_ret_addr) {
> > +		pr_err("Oops! Kretprobe fails to find correct return address.\n");
> 
> Could we please make user-facing messages less random? Right now we have:

OK. Those are historically randomly expanded. Now the time to clean up.

> 
>   kepler:~/tip> git grep -E 'pr_.*\(' kernel/kprobes.c include/linux/kprobes.h include/asm-generic/kprobes.h $(find arch/ -name kprobes.c)
> 
>   arch/arm64/kernel/probes/kprobes.c:             pr_warn("Unrecoverable kprobe detected.\n");
>   arch/csky/kernel/probes/kprobes.c:              pr_warn("Address not aligned.\n");
>   arch/csky/kernel/probes/kprobes.c:              pr_warn("Unrecoverable kprobe detected.\n");
>   arch/mips/kernel/kprobes.c:             pr_notice("Kprobes for ll and sc instructions are not"
>   arch/mips/kernel/kprobes.c:             pr_notice("Kprobes for branch delayslot are not supported\n");
>   arch/mips/kernel/kprobes.c:             pr_notice("Kprobes for compact branches are not supported\n");
>   arch/mips/kernel/kprobes.c:     pr_notice("%s: unaligned epc - sending SIGBUS.\n", current->comm);
>   arch/mips/kernel/kprobes.c:                     pr_notice("Kprobes: Error in evaluating branch\n");
>   arch/riscv/kernel/probes/kprobes.c:             pr_warn("Address not aligned.\n");
>   arch/riscv/kernel/probes/kprobes.c:             pr_warn("Unrecoverable kprobe detected.\n");
>   arch/s390/kernel/kprobes.c:             pr_err("Invalid kprobe detected.\n");
>   kernel/kprobes.c:               pr_debug("Failed to arm kprobe-ftrace at %pS (%d)\n",
>   kernel/kprobes.c:                       pr_debug("Failed to init kprobe-ftrace (%d)\n", ret);
>   kernel/kprobes.c:               pr_err("Oops! Kretprobe fails to find correct return address.\n");
>   kernel/kprobes.c:       pr_err("Dumping kprobe:\n");
>   kernel/kprobes.c:       pr_err("Name: %s\nOffset: %x\nAddress: %pS\n",
>   kernel/kprobes.c:               pr_err("kprobes: failed to populate blacklist: %d\n", err);
>   kernel/kprobes.c:               pr_err("Please take care of using kprobes.\n");
>   kernel/kprobes.c:               pr_warn("Kprobes globally enabled, but failed to arm %d out of %d probes\n",
>   kernel/kprobes.c:               pr_info("Kprobes globally enabled\n");
>   kernel/kprobes.c:               pr_warn("Kprobes globally disabled, but failed to disarm %d out of %d probes\n",
>   kernel/kprobes.c:               pr_info("Kprobes globally disabled\n");
> 
> In particular, what users may see in their syslog, when the kprobes code 
> runs into trouble, is, roughly:
> 
>   kepler:~/tip> git grep -E 'pr_.*\(' kernel/kprobes.c include/linux/kprobes.h include/asm-generic/kprobes.h $(find arch/ -name kprobes.c) | cut -d\" -f2
> 
>   Unrecoverable kprobe detected.\n
>   Address not aligned.\n
>   Unrecoverable kprobe detected.\n
>   Kprobes for ll and sc instructions are not
>   Kprobes for branch delayslot are not supported\n
>   Kprobes for compact branches are not supported\n
>   %s: unaligned epc - sending SIGBUS.\n
>   Kprobes: Error in evaluating branch\n
>   Address not aligned.\n
>   Unrecoverable kprobe detected.\n
>   Invalid kprobe detected.\n
>   Failed to arm kprobe-ftrace at %pS (%d)\n
>   Failed to init kprobe-ftrace (%d)\n
>   Oops! Kretprobe fails to find correct return address.\n
>   Dumping kprobe:\n
>   Name: %s\nOffset: %x\nAddress: %pS\n
>   kprobes: failed to populate blacklist: %d\n
>   Please take care of using kprobes.\n
>   Kprobes globally enabled, but failed to arm %d out of %d probes\n
>   Kprobes globally enabled\n
>   Kprobes globally disabled, but failed to disarm %d out of %d probes\n
>   Kprobes globally disabled\n
> 
> Ugh. Some of the messages don't even have 'kprobes' in them...

Indeed.

> 
> So my suggestion would be:
> 
>  - Introduce a subsystem syslog message prefix, via the standard pattern  of:
> 
>      #define pr_fmt(fmt) "kprobes: " fmt

OK.

> 
>  - Standardize the messages:
> 
>     - Start each message with a key noun that stresses the nature of the 
>       failure.
> 
>     - *Make each message self-explanatory*, don't leave users hanging in 
>       the air about what is going to happen next. Messages like:
> 
>                Address not aligned.\n
> 
>     - Check spelling. This:
> 
>                 pr_err("kprobes: failed to populate blacklist: %d\n", err);
>                 pr_err("Please take care of using kprobes.\n");
> 
>       should be on a single line and should probably say something like:
> 
>                 pr_err("kprobes: Failed to populate blacklist (error: %d), kprobes not restricted, be careful using them!.\n", err);
> 
>       and if checkpatch complains that the line is 'too long', ignore 
>       checkpatch and keep the message self-contained.

OK. 

> 
>  - and most importantly: provide a suggested *resolution*. 
>    Passive-aggressive messages like:
> 
>       Oops! Kretprobe fails to find correct return address.\n
> 
>    are next to useless. Instead, always describe:
> 
>      - what happened,
>      - what is the kernel going to do or not do,
>      - is the kernel fine,
>      - what can the user do about it.

OK, since above message is a kind of dying message, it must be important to notice
that the thread may not possible to continue to work.

> 
>    In this case, a better message would be:
> 
>       kretprobes: Return address not found, not executing handler. Kernel is probably fine, but check the system tool that did this.

If this happens, the kernel calls BUG_ON(1) because it is unrecovarable error
and there may be kernel bug. Can I say "kernel is probably fine"?

> 
> Each and every message should be reviewed & fixed to meet these standards - 
> or should be removed and replaced with a WARN_ON() if it's indicating an 
> internal bug that cannot be caused by kprobes using tools, such as this 
> one:
> 
> > +		if (WARN_ON_ONCE(ri->fp != frame_pointer))
> > +			break;
> 
> I can help double checking the fixed messages, if you are unsure about any 
> of them.

Thanks for you help!

> 
> > +	/* Recycle them.  */
> > +	while (first) {
> > +		ri = container_of(first, struct kretprobe_instance, llist);
> > +		first = first->next;
> >  
> >  		recycle_rp_inst(ri);
> >  	}
> 
> It would be helpful to explain, a bit more verbose comment, what 
> 'recycling' is in this context. The code is not very helpful:

OK.

> 
>   NOKPROBE_SYMBOL(free_rp_inst_rcu);
> 
>   static void recycle_rp_inst(struct kretprobe_instance *ri)
>   {
>           struct kretprobe *rp = get_kretprobe(ri);
> 
>           if (likely(rp)) {
>                   freelist_add(&ri->freelist, &rp->freelist);
>           } else
>                   call_rcu(&ri->rcu, free_rp_inst_rcu);
>   }
>   NOKPROBE_SYMBOL(recycle_rp_inst);
> 
> BTW., why are unnecessary curly braces used here?

Maybe I forgot to remove it when I introduced freelist_add() there...

> 
> The kprobes code urgently needs a quality boost.

OK, before this series, I'll clean it up first.

Thank you,

> 
> Thanks,
> 
> 	Ingo


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 08/13] arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline
  2021-07-05  8:04   ` Ingo Molnar
@ 2021-07-05 14:40     ` Masami Hiramatsu
  0 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-05 14:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Mon, 5 Jul 2021 10:04:41 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > Change kretprobe_trampoline to make a space for regs->ARM_pc so that
> > kretprobe_trampoline_handler can call instruction_pointer_set()
> > safely.
> 
> The idiom is "make space", but in any case, what does this mean?

Since arm's kretprobe_trampoline() saves partial pt_regs, regs->ARM_pc
is not accessible (it points the caller function's stack frame).
Therefore, this extends the stack frame for storing regs->ARM_pc.

> 
> Was the stack frame set up in kretprobe_trampoline() and calling 
> trampoline_handler() buggy?
> 
> If yes, then explain the bad effects of the bug, and make all of this clear 
> in the title & changelog.

This is actually buggy from the specification viewpoint. And if
the kretprobe handler sets the instruction pointer, it must be
ignored, but in reallty, it breaks the stack frame (this does
not happen in the ftrace/perf dynamic events, but a custom kretprobe
kernel module can do this.)

Thank you,

> 
> Thanks,
> 
> 	Ingo


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-05 11:36   ` Peter Zijlstra
@ 2021-07-05 15:42     ` Masami Hiramatsu
  2021-07-06  7:55       ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-05 15:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	linux-kernel, bpf, kuba, mingo, ast, Thomas Gleixner,
	Borislav Petkov, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Mon, 5 Jul 2021 13:36:14 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Jun 18, 2021 at 04:07:06PM +0900, Masami Hiramatsu wrote:
> > @@ -549,7 +548,15 @@ bool unwind_next_frame(struct unwind_state *state)
> >  					 (void *)orig_ip);
> >  			goto err;
> >  		}
> > -
> > +		/*
> > +		 * There is a small chance to interrupt at the entry of
> > +		 * kretprobe_trampoline where the ORC info doesn't exist.
> > +		 * That point is right after the RET to kretprobe_trampoline
> > +		 * which was modified return address. So the @addr_p must
> > +		 * be right before the regs->sp.
> > +		 */
> > +		state->ip = unwind_recover_kretprobe(state, state->ip,
> > +				(unsigned long *)(state->sp - sizeof(long)));
> >  		state->regs = (struct pt_regs *)sp;
> >  		state->prev_regs = NULL;
> >  		state->full_regs = true;
> > @@ -562,6 +569,9 @@ bool unwind_next_frame(struct unwind_state *state)
> >  					 (void *)orig_ip);
> >  			goto err;
> >  		}
> > +		/* See UNWIND_HINT_TYPE_REGS case comment. */
> > +		state->ip = unwind_recover_kretprobe(state, state->ip,
> > +				(unsigned long *)(state->sp - sizeof(long)));
> >  
> >  		if (state->full_regs)
> >  			state->prev_regs = state->regs;
> 
> Why doesn't the ftrace case have this? That is, why aren't both return
> trampolines having the same general shape?

Ah, this strongly depends what the trampoline code does.
For the kretprobe case, the PUSHQ at the entry of the kretprobe_trampoline()
does not covered by UNWIND_HINT_FUNC. Thus it needs to find 'correct_ret_addr'
by the frame pointer (which is next to the sp).

        "kretprobe_trampoline:\n"
#ifdef CONFIG_X86_64
        /* Push fake return address to tell the unwinder it's a kretprobe */
        "       pushq $kretprobe_trampoline\n"
        UNWIND_HINT_FUNC

But I'm not so sure how ftrace treat it. It seems that the return_to_handler()
doesn't care such case. (anyway, return_to_handler() does not return but jump
to the original call-site, in that case, the information will be lost.)

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-05 15:42     ` Masami Hiramatsu
@ 2021-07-06  7:55       ` Peter Zijlstra
  2021-07-06 15:11         ` Steven Rostedt
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2021-07-06  7:55 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	linux-kernel, bpf, kuba, mingo, ast, Thomas Gleixner,
	Borislav Petkov, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Tue, Jul 06, 2021 at 12:42:57AM +0900, Masami Hiramatsu wrote:
> On Mon, 5 Jul 2021 13:36:14 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, Jun 18, 2021 at 04:07:06PM +0900, Masami Hiramatsu wrote:
> > > @@ -549,7 +548,15 @@ bool unwind_next_frame(struct unwind_state *state)
> > >  					 (void *)orig_ip);
> > >  			goto err;
> > >  		}
> > > -
> > > +		/*
> > > +		 * There is a small chance to interrupt at the entry of
> > > +		 * kretprobe_trampoline where the ORC info doesn't exist.
> > > +		 * That point is right after the RET to kretprobe_trampoline
> > > +		 * which was modified return address. So the @addr_p must
> > > +		 * be right before the regs->sp.
> > > +		 */
> > > +		state->ip = unwind_recover_kretprobe(state, state->ip,
> > > +				(unsigned long *)(state->sp - sizeof(long)));
> > >  		state->regs = (struct pt_regs *)sp;
> > >  		state->prev_regs = NULL;
> > >  		state->full_regs = true;
> > > @@ -562,6 +569,9 @@ bool unwind_next_frame(struct unwind_state *state)
> > >  					 (void *)orig_ip);
> > >  			goto err;
> > >  		}
> > > +		/* See UNWIND_HINT_TYPE_REGS case comment. */
> > > +		state->ip = unwind_recover_kretprobe(state, state->ip,
> > > +				(unsigned long *)(state->sp - sizeof(long)));
> > >  
> > >  		if (state->full_regs)
> > >  			state->prev_regs = state->regs;
> > 
> > Why doesn't the ftrace case have this? That is, why aren't both return
> > trampolines having the same general shape?
> 
> Ah, this strongly depends what the trampoline code does.
> For the kretprobe case, the PUSHQ at the entry of the kretprobe_trampoline()
> does not covered by UNWIND_HINT_FUNC. Thus it needs to find 'correct_ret_addr'
> by the frame pointer (which is next to the sp).
> 
>         "kretprobe_trampoline:\n"
> #ifdef CONFIG_X86_64
>         /* Push fake return address to tell the unwinder it's a kretprobe */
>         "       pushq $kretprobe_trampoline\n"
>         UNWIND_HINT_FUNC
> 
> But I'm not so sure how ftrace treat it. It seems that the return_to_handler()
> doesn't care such case. (anyway, return_to_handler() does not return but jump
> to the original call-site, in that case, the information will be lost.)

I find it bothersome (OCD, sorry :-) that both return trampolines behave
differently. Doubly so because I know people (Steve in particular) have
been talking about unifying them.

Steve, can you clarify the ftrace side here? Afaict return_to_handler()
is similarly affected.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 13/13] x86/kprobes: Fixup return address in generic trampoline handler
  2021-07-05  8:34   ` Ingo Molnar
@ 2021-07-06 12:57     ` Masami Hiramatsu
  0 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-06 12:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Mon, 5 Jul 2021 10:34:58 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > In x86, kretprobe trampoline address on the stack frame will
> > be replaced with the real return address after returning from
> > trampoline_handler. Before fixing the return address, the real
> > return address can be found in the current->kretprobe_instances.
> > 
> > However, since there is a window between updating the
> > current->kretprobe_instances and fixing the address on the stack,
> > if an interrupt caused at that timing and the interrupt handler
> > does stacktrace, it may fail to unwind because it can not get
> > the correct return address from current->kretprobe_instances.
> > 
> > This will minimize that window by fixing the return address
> > right before updating current->kretprobe_instances.
> 
> Is there still a window? I.e. is it "minimized" (to how big of a window?), 
> or eliminated?

Oh, this will eliminate the window, because the return address is
fixed before updating the 'current->kretprobe_instance'.


> 
> > +void arch_kretprobe_fixup_return(struct pt_regs *regs,
> > +				 unsigned long correct_ret_addr)
> > +{
> > +	unsigned long *frame_pointer;
> > +
> > +	frame_pointer = ((unsigned long *)&regs->sp) + 1;
> > +
> > +	/* Replace fake return address with real one. */
> > +	*frame_pointer = correct_ret_addr;
> 
> Firstly, why does &regs->sp have to be forced to 'unsigned long *'? 
> 
> pt_regs::sp is 'unsigned long' on both 32-bit and 64-bit kernels AFAICS.

Ah, right.

> 
> Secondly, the new code modified by your patch now looks like this:
> 
>         frame_pointer = ((unsigned long *)&regs->sp) + 1;
>  
> +       kretprobe_trampoline_handler(regs, frame_pointer);
> 
> where:
> 
> +void arch_kretprobe_fixup_return(struct pt_regs *regs,
> +                                unsigned long correct_ret_addr)
> +{
> +       unsigned long *frame_pointer;
> +
> +       frame_pointer = ((unsigned long *)&regs->sp) + 1;
> +
> +       /* Replace fake return address with real one. */
> +       *frame_pointer = correct_ret_addr;
> +}
> 
> So we first do:
> 
>         frame_pointer = ((unsigned long *)&regs->sp) + 1;
> 
> ... and pass that in to arch_kretprobe_fixup_return() as 
> 'correct_ret_addr', which does:

No, 'correct_ret_addr' is found from 'current->kretprobe_instances'

        /* Find correct address and all nodes for this frame. */
        correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);

> 
> +       frame_pointer = ((unsigned long *)&regs->sp) + 1;
> +	*frame_pointer = correct_ret_addr;
> 
> ... which looks like the exact same thing as:
> 
> 	*frame_pointer = frame_pointer;
> 
> ... obfuscated through a thick layer of type casts?

Thus it will be the same thing as

	*frame_pointer = __kretprobe_find_ret_addr(current, &node);

Actually, this is a bit confusing because same 'frame_pointer' is
calcurated twice from 'regs->sp'. This is because the return address
is stored at 'frame_pointer' or not depends on the architecture.


Thank you,

> 
> Thanks,
> 
> 	Ingo


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-06  7:55       ` Peter Zijlstra
@ 2021-07-06 15:11         ` Steven Rostedt
  2021-07-07  8:20           ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Steven Rostedt @ 2021-07-06 15:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	linux-kernel, bpf, kuba, mingo, ast, Thomas Gleixner,
	Borislav Petkov, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Tue, 6 Jul 2021 09:55:03 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> > But I'm not so sure how ftrace treat it. It seems that the return_to_handler()
> > doesn't care such case. (anyway, return_to_handler() does not return but jump
> > to the original call-site, in that case, the information will be lost.)  
> 
> I find it bothersome (OCD, sorry :-) that both return trampolines behave
> differently. Doubly so because I know people (Steve in particular) have
> been talking about unifying them.

They were developed separately, and designed differently with different
goals in mind. Yes, I want to unify them, but trying to get the
different goals together, compounded by the fact that almost every arch
also implemented them differently (in which case, we need to find a way
to do it one arch at a time), makes the process extremely frustrating.

> 
> Steve, can you clarify the ftrace side here? Afaict return_to_handler()
> is similarly affected.

I'm not exactly sure what the issue is. As Masami stated, kretprobe
uses a ret to return to the calling function, but ftrace uses a jmp.

kretprobe return tracing is more complex than the function graph return
tracing is (which is one of the issues I need to overcome to unify
them), and when the function graph return trampoline was created, it
did things as simple as possible (and before ORC existed).

Is this something to worry about now, or should we look to fix his in
the unifying process?

-- Steve

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-06 15:11         ` Steven Rostedt
@ 2021-07-07  8:20           ` Peter Zijlstra
  2021-07-07  8:36             ` Peter Zijlstra
  2021-07-07 10:15             ` Masami Hiramatsu
  0 siblings, 2 replies; 57+ messages in thread
From: Peter Zijlstra @ 2021-07-07  8:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	linux-kernel, bpf, kuba, mingo, ast, Thomas Gleixner,
	Borislav Petkov, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko, wuqiang.matt

On Tue, Jul 06, 2021 at 11:11:36AM -0400, Steven Rostedt wrote:
> On Tue, 6 Jul 2021 09:55:03 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > But I'm not so sure how ftrace treat it. It seems that the return_to_handler()
> > > doesn't care such case. (anyway, return_to_handler() does not return but jump
> > > to the original call-site, in that case, the information will be lost.)  
> > 
> > I find it bothersome (OCD, sorry :-) that both return trampolines behave
> > differently. Doubly so because I know people (Steve in particular) have
> > been talking about unifying them.
> 
> They were developed separately, and designed differently with different
> goals in mind. Yes, I want to unify them, but trying to get the
> different goals together, compounded by the fact that almost every arch
> also implemented them differently (in which case, we need to find a way
> to do it one arch at a time), makes the process extremely frustrating.

Yeah.. that's going to be somewhat painful.

> > Steve, can you clarify the ftrace side here? Afaict return_to_handler()
> > is similarly affected.
> 
> I'm not exactly sure what the issue is. As Masami stated, kretprobe
> uses a ret to return to the calling function, but ftrace uses a jmp.

I'll have to re-read the ftrace bits, but from the top of my head you
cannot do an indirect jump and preserve all registers at the same time,
so a return stub must use jump from stack aka. ret.

> kretprobe return tracing is more complex than the function graph return
> tracing is (which is one of the issues I need to overcome to unify
> them),

I'm not sure it is. IIRC the biggest pain point with kretprobe is that
'silly' property that the kretprobe_instance are not the same between
kretprobes. Luckily, that's not actually used anywhere, so we can simply
rip that out.

That should also help Matt make the whole freelist thing faster, because
now the kretprobe instances are global.

> and when the function graph return trampoline was created, it
> did things as simple as possible (and before ORC existed).
> 
> Is this something to worry about now, or should we look to fix his in
> the unifying process?

There seems to be a lot of kretprobe activity now; so I figured we ought
to at least consider where we want to go so we don't make it harder
still.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-07  8:20           ` Peter Zijlstra
@ 2021-07-07  8:36             ` Peter Zijlstra
  2021-07-07 10:15             ` Masami Hiramatsu
  1 sibling, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2021-07-07  8:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	linux-kernel, bpf, kuba, mingo, ast, Thomas Gleixner,
	Borislav Petkov, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko, wuqiang.matt

On Wed, Jul 07, 2021 at 10:20:41AM +0200, Peter Zijlstra wrote:

> > > Steve, can you clarify the ftrace side here? Afaict return_to_handler()
> > > is similarly affected.
> > 
> > I'm not exactly sure what the issue is. As Masami stated, kretprobe
> > uses a ret to return to the calling function, but ftrace uses a jmp.
> 
> I'll have to re-read the ftrace bits, but from the top of my head you
> cannot do an indirect jump and preserve all registers at the same time,
> so a return stub must use jump from stack aka. ret.

Hmm... there's callee clobbered regs ofcourse, which don't need to be
preserved. And that's exactly what ftrace seems to be doing, and I don't
think there's any reason why kretprobe cannot do the same. Lemme try.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-07  8:20           ` Peter Zijlstra
  2021-07-07  8:36             ` Peter Zijlstra
@ 2021-07-07 10:15             ` Masami Hiramatsu
  2021-07-07 10:20               ` Peter Zijlstra
  1 sibling, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-07 10:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Masami Hiramatsu, Josh Poimboeuf, Ingo Molnar,
	X86 ML, Daniel Xu, linux-kernel, bpf, kuba, mingo, ast,
	Thomas Gleixner, Borislav Petkov, kernel-team, yhs, linux-ia64,
	Abhishek Sagar, Andrii Nakryiko, wuqiang.matt

On Wed, 7 Jul 2021 10:20:41 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> > > Steve, can you clarify the ftrace side here? Afaict return_to_handler()
> > > is similarly affected.
> > 
> > I'm not exactly sure what the issue is. As Masami stated, kretprobe
> > uses a ret to return to the calling function, but ftrace uses a jmp.
> 
> I'll have to re-read the ftrace bits, but from the top of my head you
> cannot do an indirect jump and preserve all registers at the same time,
> so a return stub must use jump from stack aka. ret.
> 
> > kretprobe return tracing is more complex than the function graph return
> > tracing is (which is one of the issues I need to overcome to unify
> > them),
> 
> I'm not sure it is. IIRC the biggest pain point with kretprobe is that
> 'silly' property that the kretprobe_instance are not the same between
> kretprobes. Luckily, that's not actually used anywhere, so we can simply
> rip that out.

I actually don't want to keep this feature because no one use it.
(only systemtap needs it?)

Anyway, if we keep the idea-level compatibility (not code level),
what we need is 'void *data' in the struct kretprobe_instance.
User who needs it can allocate their own instance data for their
kretprobes when initialising it and sets in their entry handler.

Then we can have a simple kretprobe_instance.

Thank you,



-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-07 10:15             ` Masami Hiramatsu
@ 2021-07-07 10:20               ` Peter Zijlstra
  2021-07-07 10:45                 ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2021-07-07 10:20 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	linux-kernel, bpf, kuba, mingo, ast, Thomas Gleixner,
	Borislav Petkov, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko, wuqiang.matt

On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:

> I actually don't want to keep this feature because no one use it.
> (only systemtap needs it?)

Yeah, you mentioned systemtap, but since that's out-of-tree I don't
care. Their problem.

> Anyway, if we keep the idea-level compatibility (not code level),
> what we need is 'void *data' in the struct kretprobe_instance.
> User who needs it can allocate their own instance data for their
> kretprobes when initialising it and sets in their entry handler.
> 
> Then we can have a simple kretprobe_instance.

When would you do the alloc? When installing the retprobe, but that
might be inside the allocator, which means you can't call the allocator
etc.. :-)

If we look at struct ftrace_ret_stack, it has a few fixed function
fields. The calltime one is all that is needed for the kretprobe
example code.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-07 10:20               ` Peter Zijlstra
@ 2021-07-07 10:45                 ` Masami Hiramatsu
  2021-07-07 13:29                   ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-07 10:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	linux-kernel, bpf, kuba, mingo, ast, Thomas Gleixner,
	Borislav Petkov, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko, wuqiang.matt

On Wed, 7 Jul 2021 12:20:57 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
> 
> > I actually don't want to keep this feature because no one use it.
> > (only systemtap needs it?)
> 
> Yeah, you mentioned systemtap, but since that's out-of-tree I don't
> care. Their problem.
> 
> > Anyway, if we keep the idea-level compatibility (not code level),
> > what we need is 'void *data' in the struct kretprobe_instance.
> > User who needs it can allocate their own instance data for their
> > kretprobes when initialising it and sets in their entry handler.
> > 
> > Then we can have a simple kretprobe_instance.
> 
> When would you do the alloc? When installing the retprobe, but that
> might be inside the allocator, which means you can't call the allocator
> etc.. :-)

Yes, so the user may need to allocate a pool right before register_kretprobe().
(whether per-kretprobe or per-task or global pool, that is user's choice.)

> 
> If we look at struct ftrace_ret_stack, it has a few fixed function
> fields. The calltime one is all that is needed for the kretprobe
> example code.

kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
stores callee function address in 'kretprobe::kp.addr'), a return
address and a frame pointer (*).

* note that this frame pointer might be used for fixing up the
stack trace, but the fixup method depends on the architecture.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-07 10:45                 ` Masami Hiramatsu
@ 2021-07-07 13:29                   ` Masami Hiramatsu
  2021-07-07 14:42                     ` Matt Wu
  0 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-07 13:29 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, Steven Rostedt, Josh Poimboeuf, Ingo Molnar,
	X86 ML, Daniel Xu, linux-kernel, bpf, kuba, mingo, ast,
	Thomas Gleixner, Borislav Petkov, kernel-team, yhs, linux-ia64,
	Abhishek Sagar, Andrii Nakryiko, wuqiang.matt

On Wed, 7 Jul 2021 19:45:30 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Wed, 7 Jul 2021 12:20:57 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
> > 
> > > I actually don't want to keep this feature because no one use it.
> > > (only systemtap needs it?)
> > 
> > Yeah, you mentioned systemtap, but since that's out-of-tree I don't
> > care. Their problem.

Yeah, maybe it is not hard to update.

> > 
> > > Anyway, if we keep the idea-level compatibility (not code level),
> > > what we need is 'void *data' in the struct kretprobe_instance.
> > > User who needs it can allocate their own instance data for their
> > > kretprobes when initialising it and sets in their entry handler.
> > > 
> > > Then we can have a simple kretprobe_instance.
> > 
> > When would you do the alloc? When installing the retprobe, but that
> > might be inside the allocator, which means you can't call the allocator
> > etc.. :-)
> 
> Yes, so the user may need to allocate a pool right before register_kretprobe().
> (whether per-kretprobe or per-task or global pool, that is user's choice.)
> 
> > 
> > If we look at struct ftrace_ret_stack, it has a few fixed function
> > fields. The calltime one is all that is needed for the kretprobe
> > example code.
> 
> kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
> stores callee function address in 'kretprobe::kp.addr'), a return
> address and a frame pointer (*).

Oops, I forgot to add "void *data" for storing user data.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-07 13:29                   ` Masami Hiramatsu
@ 2021-07-07 14:42                     ` Matt Wu
  2021-07-11 14:09                       ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Matt Wu @ 2021-07-07 14:42 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, Steven Rostedt, Josh Poimboeuf, Ingo Molnar,
	X86 ML, Daniel Xu, linux-kernel, bpf, kuba, mingo, ast,
	Thomas Gleixner, Borislav Petkov, kernel-team, yhs, linux-ia64,
	Abhishek Sagar, Andrii Nakryiko

On 2021/7/7 PM9:29, Masami Hiramatsu wrote:
> On Wed, 7 Jul 2021 19:45:30 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
>> On Wed, 7 Jul 2021 12:20:57 +0200
>> Peter Zijlstra <peterz@infradead.org> wrote:
>>
>>> On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
>>>
>>>> I actually don't want to keep this feature because no one use it.
>>>> (only systemtap needs it?)
>>>
>>> Yeah, you mentioned systemtap, but since that's out-of-tree I don't
>>> care. Their problem.
> 
> Yeah, maybe it is not hard to update.
> 
>>>
>>>> Anyway, if we keep the idea-level compatibility (not code level),
>>>> what we need is 'void *data' in the struct kretprobe_instance.
>>>> User who needs it can allocate their own instance data for their
>>>> kretprobes when initialising it and sets in their entry handler.
>>>>
>>>> Then we can have a simple kretprobe_instance.
>>>
>>> When would you do the alloc? When installing the retprobe, but that
>>> might be inside the allocator, which means you can't call the allocator
>>> etc.. :-)
>>
>> Yes, so the user may need to allocate a pool right before register_kretprobe().
>> (whether per-kretprobe or per-task or global pool, that is user's choice.)
>>
>>>
>>> If we look at struct ftrace_ret_stack, it has a few fixed function
>>> fields. The calltime one is all that is needed for the kretprobe
>>> example code.
>>
>> kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
>> stores callee function address in 'kretprobe::kp.addr'), a return
>> address and a frame pointer (*).
>  > Oops, I forgot to add "void *data" for storing user data.
> 

Should use "struct kretprobe_holder *rph", since "struct kretprobe" belongs
to 3rd-party module (which might be unloaded any time).

User's own pool might not work if the module can be unloaded. Better manage
the pool in kretprobe_holder, which needs no changes from user side.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
  2021-06-18  7:05 ` [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor() Masami Hiramatsu
  2021-07-05  7:48   ` Ingo Molnar
@ 2021-07-07 18:28   ` Andrii Nakryiko
  2021-07-08  4:08     ` Masami Hiramatsu
  1 sibling, 1 reply; 57+ messages in thread
From: Andrii Nakryiko @ 2021-07-07 18:28 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	open list, bpf, Jakub Kicinski, Ingo Molnar, Alexei Starovoitov,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Kernel Team,
	Yonghong Song, linux-ia64, Abhishek Sagar

On Fri, Jun 18, 2021 at 12:05 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Replace arch_deref_entry_point() with dereference_symbol_descriptor()
> because those are doing same thing.
>
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> Tested-by: Andrii Nakryik <andrii@kernel.org>

Hi Masami,

If you are going to post v9 anyway, can you please fix up my name, it
should be "Andrii Nakryiko", thanks!

> ---
>  Changes in v6:
>   - Use dereference_symbol_descriptor() so that it can handle address in
>     modules correctly.
> ---
>  arch/ia64/kernel/kprobes.c    |    5 -----
>  arch/powerpc/kernel/kprobes.c |   11 -----------
>  include/linux/kprobes.h       |    1 -
>  kernel/kprobes.c              |    7 +------
>  lib/error-inject.c            |    3 ++-
>  5 files changed, 3 insertions(+), 24 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
  2021-07-07 18:28   ` Andrii Nakryiko
@ 2021-07-08  4:08     ` Masami Hiramatsu
  0 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-08  4:08 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Steven Rostedt, Josh Poimboeuf, Ingo Molnar, X86 ML, Daniel Xu,
	open list, bpf, Jakub Kicinski, Ingo Molnar, Alexei Starovoitov,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Kernel Team,
	Yonghong Song, linux-ia64, Abhishek Sagar

On Wed, 7 Jul 2021 11:28:10 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Fri, Jun 18, 2021 at 12:05 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > Replace arch_deref_entry_point() with dereference_symbol_descriptor()
> > because those are doing same thing.
> >
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> > Tested-by: Andrii Nakryik <andrii@kernel.org>
> 
> Hi Masami,
> 
> If you are going to post v9 anyway, can you please fix up my name, it
> should be "Andrii Nakryiko", thanks!

Oops, sorry. OK, I'll fix it.

Thank you,

> 
> > ---
> >  Changes in v6:
> >   - Use dereference_symbol_descriptor() so that it can handle address in
> >     modules correctly.
> > ---
> >  arch/ia64/kernel/kprobes.c    |    5 -----
> >  arch/powerpc/kernel/kprobes.c |   11 -----------
> >  include/linux/kprobes.h       |    1 -
> >  kernel/kprobes.c              |    7 +------
> >  lib/error-inject.c            |    3 ++-
> >  5 files changed, 3 insertions(+), 24 deletions(-)
> >
> 
> [...]


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 10/13] x86/kprobes: Push a fake return address at kretprobe_trampoline
  2021-07-05  8:17   ` Ingo Molnar
@ 2021-07-09 14:55     ` Masami Hiramatsu
  0 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-09 14:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Mon, 5 Jul 2021 10:17:26 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > +	/* Replace fake return address with real one. */
> > +	*frame_pointer = kretprobe_trampoline_handler(regs, frame_pointer);
> > +	/*
> > +	 * Move flags to sp so that kretprobe_trapmoline can return
> > +	 * right after popf.
> 
> What is a trapmoline?

This means kretprobe_trampoline() code.

> 
> Also, in the x86 code we capitalize register and instruction names so that 
> they are more distinctive and easier to read in the flow of English text.

OK, let me update it.

Thank you,

> 
> Thanks,
> 
> 	Ingo


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code
  2021-07-05  8:02   ` Ingo Molnar
@ 2021-07-09 15:31     ` Masami Hiramatsu
  2021-07-10  1:41       ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-09 15:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Josh Poimboeuf, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Mon, 5 Jul 2021 10:02:47 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > From: Josh Poimboeuf <jpoimboe@redhat.com>
> > 
> > Add UNWIND_HINT_FUNC on kretporbe_trampoline code so that ORC
> > information is generated on the kretprobe_trampoline correctly.
> 
> What is a 'kretporbe'?

Oops, it's a typo.

> 
> > Note that when the CONFIG_FRAME_POINTER=y, since the
> > kretprobe_trampoline skips updating frame pointer, the stack frame
> > of the kretprobe_trampoline seems non-standard. So this marks it
> > is STACK_FRAME_NON_STANDARD() and undefine UNWIND_HINT_FUNC.
> 
> What does 'marks it is' mean?

Sorry, I meant, this marks the kretprobe_trampoline as non-standard
stack frame by STACK_FRAME_NON_STANDARD().

> 
> 'undefine' UNWIND_HINT_FUNC?
> 
> Doesn't the patch do the exact opposite:
> 
>   > +#define UNWIND_HINT_FUNC \
>   > +	UNWIND_HINT(ORC_REG_SP, 8, UNWIND_HINT_TYPE_FUNC, 0)
> 
> But it does undefine it in a specific spot:

Yes, if you think this is not correct way, what about the following?

#ifdef CONFIG_FRAME_POINTER
STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
#define KRETPROBE_UNWIND_HINT_FUNC
#else
#define KRETPROBE_UNWIND_HINT_FUNC	UNWIND_HINT_FUNC
#endif


> > Anyway, with the frame pointer, FP unwinder can unwind the stack
> > frame correctly without that hint.
> > 
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> > Tested-by: Andrii Nakryik <andrii@kernel.org>
> 
> I have to say these changelogs are very careless.

Sorry for inconvenience...

> 
> > +#else
> > +
> 
> In headers, in longer CPP blocks, please always mark the '#else' branch 
> with what it is the else branch of.

OK.

> 
> See the output of:
> 
>    kepler:~/tip> git grep '#else' arch/x86/include/asm/ | head

Thanks for the hint!

> 
> > +#ifdef CONFIG_FRAME_POINTER
> > +/*
> > + * kretprobe_trampoline skips updating frame pointer. The frame pointer
> > + * saved in trampoline_handler points to the real caller function's
> > + * frame pointer. Thus the kretprobe_trampoline doesn't seems to have a
> > + * standard stack frame with CONFIG_FRAME_POINTER=y.
> > + * Let's mark it non-standard function. Anyway, FP unwinder can correctly
> > + * unwind without the hint.
> 
> s/doesn't seems to have a standard stack frame
>  /doesn't have a standard stack frame
> 
> There's nothing 'seems' about the situation - it's a non-standard function 
> entry and stack frame situation, and the unwinder needs to know about it.

OK.

> 
> > +STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> > +#undef UNWIND_HINT_FUNC
> > +#define UNWIND_HINT_FUNC
> > +#endif
> >  /*
> >   * When a retprobed function returns, this code saves registers and
> >   * calls trampoline_handler() runs, which calls the kretprobe's handler.
> > @@ -1031,6 +1044,7 @@ asm(
> >  	/* We don't bother saving the ss register */
> >  #ifdef CONFIG_X86_64
> >  	"	pushq %rsp\n"
> > +	UNWIND_HINT_FUNC
> >  	"	pushfq\n"
> >  	SAVE_REGS_STRING
> >  	"	movq %rsp, %rdi\n"
> > @@ -1041,6 +1055,7 @@ asm(
> >  	"	popfq\n"
> >  #else
> >  	"	pushl %esp\n"
> > +	UNWIND_HINT_FUNC
> >  	"	pushfl\n"
> >  	SAVE_REGS_STRING
> >  	"	movl %esp, %eax\n"
> 
> Why not provide an appropriate annotation method in <asm/unwind_hints.h>, 
> so that other future code can use it too instead of reinventing the wheel?

Would you mean we should define the UNWIND_HINT_FUNC as a macro
which depends on CONFIG_FRAME_POINTER, in <asm/unwind_hints.h>?

Josh, what would you think?

Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code
  2021-07-09 15:31     ` Masami Hiramatsu
@ 2021-07-10  1:41       ` Masami Hiramatsu
  2021-07-10 19:01         ` Josh Poimboeuf
  0 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-10  1:41 UTC (permalink / raw)
  To: Josh Poimboeuf, Ingo Molnar
  Cc: Steven Rostedt, X86 ML, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	kernel-team, yhs, linux-ia64, Abhishek Sagar, Andrii Nakryiko,
	Masami Hiramatsu

Hi Ingo and Josh,

On Sat, 10 Jul 2021 00:31:40 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > > +STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> > > +#undef UNWIND_HINT_FUNC
> > > +#define UNWIND_HINT_FUNC
> > > +#endif
> > >  /*
> > >   * When a retprobed function returns, this code saves registers and
> > >   * calls trampoline_handler() runs, which calls the kretprobe's handler.
> > > @@ -1031,6 +1044,7 @@ asm(
> > >  	/* We don't bother saving the ss register */
> > >  #ifdef CONFIG_X86_64
> > >  	"	pushq %rsp\n"
> > > +	UNWIND_HINT_FUNC
> > >  	"	pushfq\n"
> > >  	SAVE_REGS_STRING
> > >  	"	movq %rsp, %rdi\n"
> > > @@ -1041,6 +1055,7 @@ asm(
> > >  	"	popfq\n"
> > >  #else
> > >  	"	pushl %esp\n"
> > > +	UNWIND_HINT_FUNC
> > >  	"	pushfl\n"
> > >  	SAVE_REGS_STRING
> > >  	"	movl %esp, %eax\n"
> > 
> > Why not provide an appropriate annotation method in <asm/unwind_hints.h>, 
> > so that other future code can use it too instead of reinventing the wheel?

I think I got what you meant. Let me summarize the issue.

In case of CONFIG_FRAME_POINTER=n, it is OK just adding UNWIND_HINT_FUNC.

In case of CONFIG_FRAME_POINTER=y, without STACK_FRAME_NON_STANDARD(),
the objtool complains that a CALL instruction without the frame pointer.
---
  arch/x86/kernel/kprobes/core.o: warning: objtool: __kretprobe_trampoline()+0x25: call without frame pointer save/setup
---

If we just add STACK_FRAME_NON_STANDARD() with UNWIND_HINT_FUNC macro,
the objtool complains that non-standard function has unwind hint.
---
arch/x86/kernel/kprobes/core.o: warning: objtool: __kretprobe_trampoline()+0x1: BUG: why am I validating an ignored function?
---

Thus, add STACK_FRAME_NON_STANDARD() and undefine UNWIND_HINT_FUNC macro,
the objtool doesn't complain.

This means that the STACK_FRAME_NON_STANDARD() and UNWIND_HINT_FUNC macro
are mutually exclusive. However, those macros are used different way.
The STACK_FRAME_NON_STANDARD() will have the target symbol and the
UNWIND_HINT_FUNC needs to be embedded in the target code.
Thus we can not combine them in general.

If we can have something like UNWIND_HINT_FUNC_NO_FP, it may solve this
issue without ugly #ifdef and #undef.

Is that correct?

Maybe I can add UNWIND_HINT_TYPE_FUNC_NO_FP for UNWIND_HINT and make objtool
ignore the call without frame pointer. This makes an exception that the
kretprobe_trampoline will be noted in '.discard.unwind_hints' section
instead of '.discard.func_stack_frame_non_standard' section. 

Or another idea is to introduce ANNOTATE_NO_FP_FUNCTION_CALL with a new
'.discard.no_fp_function_calls' section.

What do you think these ideas?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code
  2021-07-10  1:41       ` Masami Hiramatsu
@ 2021-07-10 19:01         ` Josh Poimboeuf
  2021-07-10 19:24           ` [PATCH 1/2] objtool: Add frame-pointer-specific function ignore Josh Poimboeuf
  2021-07-10 19:25           ` [PATCH 2/2] objtool: Ignore unwind hints for ignored functions Josh Poimboeuf
  0 siblings, 2 replies; 57+ messages in thread
From: Josh Poimboeuf @ 2021-07-10 19:01 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Steven Rostedt, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Sat, Jul 10, 2021 at 10:41:04AM +0900, Masami Hiramatsu wrote:
> Hi Ingo and Josh,
> 
> On Sat, 10 Jul 2021 00:31:40 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > > > +STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> > > > +#undef UNWIND_HINT_FUNC
> > > > +#define UNWIND_HINT_FUNC
> > > > +#endif
> > > >  /*
> > > >   * When a retprobed function returns, this code saves registers and
> > > >   * calls trampoline_handler() runs, which calls the kretprobe's handler.
> > > > @@ -1031,6 +1044,7 @@ asm(
> > > >  	/* We don't bother saving the ss register */
> > > >  #ifdef CONFIG_X86_64
> > > >  	"	pushq %rsp\n"
> > > > +	UNWIND_HINT_FUNC
> > > >  	"	pushfq\n"
> > > >  	SAVE_REGS_STRING
> > > >  	"	movq %rsp, %rdi\n"
> > > > @@ -1041,6 +1055,7 @@ asm(
> > > >  	"	popfq\n"
> > > >  #else
> > > >  	"	pushl %esp\n"
> > > > +	UNWIND_HINT_FUNC
> > > >  	"	pushfl\n"
> > > >  	SAVE_REGS_STRING
> > > >  	"	movl %esp, %eax\n"
> > > 
> > > Why not provide an appropriate annotation method in <asm/unwind_hints.h>, 
> > > so that other future code can use it too instead of reinventing the wheel?
> 
> I think I got what you meant. Let me summarize the issue.
> 
> In case of CONFIG_FRAME_POINTER=n, it is OK just adding UNWIND_HINT_FUNC.
> 
> In case of CONFIG_FRAME_POINTER=y, without STACK_FRAME_NON_STANDARD(),
> the objtool complains that a CALL instruction without the frame pointer.
> ---
>   arch/x86/kernel/kprobes/core.o: warning: objtool: __kretprobe_trampoline()+0x25: call without frame pointer save/setup
> ---
> 
> If we just add STACK_FRAME_NON_STANDARD() with UNWIND_HINT_FUNC macro,
> the objtool complains that non-standard function has unwind hint.
> ---
> arch/x86/kernel/kprobes/core.o: warning: objtool: __kretprobe_trampoline()+0x1: BUG: why am I validating an ignored function?

I'm thinking this latter warning indicates an objtool bug (as the BUG
warning claims).  If a function is ignored with
STACK_FRAME_NON_STANDARD() then objtool should probably also ignore its
hints.  Then we should be able to get rid of the #undef/#ifdef
UNWIND_HINT_FUNC silliness.

The other warning is correct and STACK_FRAME_NON_STANDARD() still needs
to be behind '#ifdef CONFIG_FRAME_POINTER' since the function is missing
a frame pointer.  So maybe we can make a STACK_FRAME_NON_STANDARD_FP()
or similar.

I'll post a few patches.

-- 
Josh


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 1/2] objtool: Add frame-pointer-specific function ignore
  2021-07-10 19:01         ` Josh Poimboeuf
@ 2021-07-10 19:24           ` Josh Poimboeuf
  2021-07-11  1:16             ` Masami Hiramatsu
  2021-07-29  2:31             ` Masami Hiramatsu
  2021-07-10 19:25           ` [PATCH 2/2] objtool: Ignore unwind hints for ignored functions Josh Poimboeuf
  1 sibling, 2 replies; 57+ messages in thread
From: Josh Poimboeuf @ 2021-07-10 19:24 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Steven Rostedt, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

Add a CONFIG_FRAME_POINTER-specific version of
STACK_FRAME_NON_STANDARD() for the case where a function is
intentionally missing frame pointer setup, but otherwise needs
objtool/ORC coverage when frame pointers are disabled.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 include/linux/objtool.h       | 11 +++++++++++
 tools/include/linux/objtool.h | 11 +++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/linux/objtool.h b/include/linux/objtool.h
index 7e72d975cb76..c9575ed91052 100644
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -66,6 +66,17 @@ struct unwind_hint {
 	static void __used __section(".discard.func_stack_frame_non_standard") \
 		*__func_stack_frame_non_standard_##func = func
 
+/*
+ * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
+ * for the case where a function is intentionally missing frame pointer setup,
+ * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
+ */
+#ifdef CONFIG_FRAME_POINTER
+#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
+#else
+#define STACK_FRAME_NON_STANDARD_FP(func)
+#endif
+
 #else /* __ASSEMBLY__ */
 
 /*
diff --git a/tools/include/linux/objtool.h b/tools/include/linux/objtool.h
index 7e72d975cb76..c9575ed91052 100644
--- a/tools/include/linux/objtool.h
+++ b/tools/include/linux/objtool.h
@@ -66,6 +66,17 @@ struct unwind_hint {
 	static void __used __section(".discard.func_stack_frame_non_standard") \
 		*__func_stack_frame_non_standard_##func = func
 
+/*
+ * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
+ * for the case where a function is intentionally missing frame pointer setup,
+ * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
+ */
+#ifdef CONFIG_FRAME_POINTER
+#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
+#else
+#define STACK_FRAME_NON_STANDARD_FP(func)
+#endif
+
 #else /* __ASSEMBLY__ */
 
 /*
-- 
2.31.1


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 2/2] objtool: Ignore unwind hints for ignored functions
  2021-07-10 19:01         ` Josh Poimboeuf
  2021-07-10 19:24           ` [PATCH 1/2] objtool: Add frame-pointer-specific function ignore Josh Poimboeuf
@ 2021-07-10 19:25           ` Josh Poimboeuf
  2021-07-11  2:07             ` Masami Hiramatsu
  1 sibling, 1 reply; 57+ messages in thread
From: Josh Poimboeuf @ 2021-07-10 19:25 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Steven Rostedt, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

If a function is ignored, also ignore its hints.  This is useful for the
case where the function ignore is conditional on frame pointers, e.g.
STACK_FRAME_NON_STANDARD_FP().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 tools/objtool/check.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index e5947fbb9e7a..67cbdcfcabae 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2909,7 +2909,7 @@ static int validate_unwind_hints(struct objtool_file *file, struct section *sec)
 	}
 
 	while (&insn->list != &file->insn_list && (!sec || insn->sec == sec)) {
-		if (insn->hint && !insn->visited) {
+		if (insn->hint && !insn->visited && !insn->ignore) {
 			ret = validate_branch(file, insn->func, insn, state);
 			if (ret && backtrace)
 				BT_FUNC("<=== (hint)", insn);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] objtool: Add frame-pointer-specific function ignore
  2021-07-10 19:24           ` [PATCH 1/2] objtool: Add frame-pointer-specific function ignore Josh Poimboeuf
@ 2021-07-11  1:16             ` Masami Hiramatsu
  2021-07-29  2:31             ` Masami Hiramatsu
  1 sibling, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-11  1:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Steven Rostedt, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Sat, 10 Jul 2021 12:24:33 -0700
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> Add a CONFIG_FRAME_POINTER-specific version of
> STACK_FRAME_NON_STANDARD() for the case where a function is
> intentionally missing frame pointer setup, but otherwise needs
> objtool/ORC coverage when frame pointers are disabled.

Looks good to me.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>

Thanks!

> 
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  include/linux/objtool.h       | 11 +++++++++++
>  tools/include/linux/objtool.h | 11 +++++++++++
>  2 files changed, 22 insertions(+)
> 
> diff --git a/include/linux/objtool.h b/include/linux/objtool.h
> index 7e72d975cb76..c9575ed91052 100644
> --- a/include/linux/objtool.h
> +++ b/include/linux/objtool.h
> @@ -66,6 +66,17 @@ struct unwind_hint {
>  	static void __used __section(".discard.func_stack_frame_non_standard") \
>  		*__func_stack_frame_non_standard_##func = func
>  
> +/*
> + * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
> + * for the case where a function is intentionally missing frame pointer setup,
> + * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
> + */
> +#ifdef CONFIG_FRAME_POINTER
> +#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
> +#else
> +#define STACK_FRAME_NON_STANDARD_FP(func)
> +#endif
> +
>  #else /* __ASSEMBLY__ */
>  
>  /*
> diff --git a/tools/include/linux/objtool.h b/tools/include/linux/objtool.h
> index 7e72d975cb76..c9575ed91052 100644
> --- a/tools/include/linux/objtool.h
> +++ b/tools/include/linux/objtool.h
> @@ -66,6 +66,17 @@ struct unwind_hint {
>  	static void __used __section(".discard.func_stack_frame_non_standard") \
>  		*__func_stack_frame_non_standard_##func = func
>  
> +/*
> + * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
> + * for the case where a function is intentionally missing frame pointer setup,
> + * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
> + */
> +#ifdef CONFIG_FRAME_POINTER
> +#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
> +#else
> +#define STACK_FRAME_NON_STANDARD_FP(func)
> +#endif
> +
>  #else /* __ASSEMBLY__ */
>  
>  /*
> -- 
> 2.31.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] objtool: Ignore unwind hints for ignored functions
  2021-07-10 19:25           ` [PATCH 2/2] objtool: Ignore unwind hints for ignored functions Josh Poimboeuf
@ 2021-07-11  2:07             ` Masami Hiramatsu
  0 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-11  2:07 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Steven Rostedt, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

On Sat, 10 Jul 2021 12:25:14 -0700
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> If a function is ignored, also ignore its hints.  This is useful for the
> case where the function ignore is conditional on frame pointers, e.g.
> STACK_FRAME_NON_STANDARD_FP().

This also looks good to me, and test with my series works fine.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>

Thanks!

> 
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  tools/objtool/check.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index e5947fbb9e7a..67cbdcfcabae 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -2909,7 +2909,7 @@ static int validate_unwind_hints(struct objtool_file *file, struct section *sec)
>  	}
>  
>  	while (&insn->list != &file->insn_list && (!sec || insn->sec == sec)) {
> -		if (insn->hint && !insn->visited) {
> +		if (insn->hint && !insn->visited && !insn->ignore) {
>  			ret = validate_branch(file, insn->func, insn, state);
>  			if (ret && backtrace)
>  				BT_FUNC("<=== (hint)", insn);
> -- 
> 2.31.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-07 14:42                     ` Matt Wu
@ 2021-07-11 14:09                       ` Masami Hiramatsu
  2021-07-11 15:28                         ` Matt Wu
  0 siblings, 1 reply; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-11 14:09 UTC (permalink / raw)
  To: Matt Wu
  Cc: Peter Zijlstra, Steven Rostedt, Josh Poimboeuf, Ingo Molnar,
	X86 ML, Daniel Xu, linux-kernel, bpf, kuba, mingo, ast,
	Thomas Gleixner, Borislav Petkov, kernel-team, yhs, linux-ia64,
	Abhishek Sagar, Andrii Nakryiko

On Wed, 7 Jul 2021 22:42:47 +0800
Matt Wu <wuqiang.matt@bytedance.com> wrote:

> On 2021/7/7 PM9:29, Masami Hiramatsu wrote:
> > On Wed, 7 Jul 2021 19:45:30 +0900
> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > 
> >> On Wed, 7 Jul 2021 12:20:57 +0200
> >> Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >>> On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
> >>>
> >>>> I actually don't want to keep this feature because no one use it.
> >>>> (only systemtap needs it?)
> >>>
> >>> Yeah, you mentioned systemtap, but since that's out-of-tree I don't
> >>> care. Their problem.
> > 
> > Yeah, maybe it is not hard to update.
> > 
> >>>
> >>>> Anyway, if we keep the idea-level compatibility (not code level),
> >>>> what we need is 'void *data' in the struct kretprobe_instance.
> >>>> User who needs it can allocate their own instance data for their
> >>>> kretprobes when initialising it and sets in their entry handler.
> >>>>
> >>>> Then we can have a simple kretprobe_instance.
> >>>
> >>> When would you do the alloc? When installing the retprobe, but that
> >>> might be inside the allocator, which means you can't call the allocator
> >>> etc.. :-)
> >>
> >> Yes, so the user may need to allocate a pool right before register_kretprobe().
> >> (whether per-kretprobe or per-task or global pool, that is user's choice.)
> >>
> >>>
> >>> If we look at struct ftrace_ret_stack, it has a few fixed function
> >>> fields. The calltime one is all that is needed for the kretprobe
> >>> example code.
> >>
> >> kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
> >> stores callee function address in 'kretprobe::kp.addr'), a return
> >> address and a frame pointer (*).
> >  > Oops, I forgot to add "void *data" for storing user data.
> > 
> 
> Should use "struct kretprobe_holder *rph", since "struct kretprobe" belongs
> to 3rd-party module (which might be unloaded any time).

Good catch. Yes, instead of 'struct kretprobe', we need to use the holder.

> User's own pool might not work if the module can be unloaded. Better manage
> the pool in kretprobe_holder, which needs no changes from user side.

No, since the 'data' will be only refered from user handler. If the kretprobe
is released, then the kretprobe_holder will clear the refernce to the 'struct
kretprobe'. Then, the user handler is never called. No one access the 'data'.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-11 14:09                       ` Masami Hiramatsu
@ 2021-07-11 15:28                         ` Matt Wu
  2021-07-12  4:57                           ` Masami Hiramatsu
  0 siblings, 1 reply; 57+ messages in thread
From: Matt Wu @ 2021-07-11 15:28 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, Steven Rostedt, Josh Poimboeuf, Ingo Molnar,
	X86 ML, Daniel Xu, linux-kernel, bpf, kuba, mingo, ast,
	Thomas Gleixner, Borislav Petkov, kernel-team, yhs, linux-ia64,
	Abhishek Sagar, Andrii Nakryiko

On 2021/7/11 PM10:09, Masami Hiramatsu wrote:
> On Wed, 7 Jul 2021 22:42:47 +0800
> Matt Wu <wuqiang.matt@bytedance.com> wrote:
> 
>> On 2021/7/7 PM9:29, Masami Hiramatsu wrote:
>>> On Wed, 7 Jul 2021 19:45:30 +0900
>>> Masami Hiramatsu <mhiramat@kernel.org> wrote:
>>>
>>>> On Wed, 7 Jul 2021 12:20:57 +0200
>>>> Peter Zijlstra <peterz@infradead.org> wrote:
>>>>
>>>>> On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
>>>>>
>>>>>> I actually don't want to keep this feature because no one use it.
>>>>>> (only systemtap needs it?)
>>>>>
>>>>> Yeah, you mentioned systemtap, but since that's out-of-tree I don't
>>>>> care. Their problem.
>>>
>>> Yeah, maybe it is not hard to update.
>>>
>>>>>
>>>>>> Anyway, if we keep the idea-level compatibility (not code level),
>>>>>> what we need is 'void *data' in the struct kretprobe_instance.
>>>>>> User who needs it can allocate their own instance data for their
>>>>>> kretprobes when initialising it and sets in their entry handler.
>>>>>>
>>>>>> Then we can have a simple kretprobe_instance.
>>>>>
>>>>> When would you do the alloc? When installing the retprobe, but that
>>>>> might be inside the allocator, which means you can't call the allocator
>>>>> etc.. :-)
>>>>
>>>> Yes, so the user may need to allocate a pool right before register_kretprobe().
>>>> (whether per-kretprobe or per-task or global pool, that is user's choice.)
>>>>
>>>>>
>>>>> If we look at struct ftrace_ret_stack, it has a few fixed function
>>>>> fields. The calltime one is all that is needed for the kretprobe
>>>>> example code.
>>>>
>>>> kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
>>>> stores callee function address in 'kretprobe::kp.addr'), a return
>>>> address and a frame pointer (*).
>>>   > Oops, I forgot to add "void *data" for storing user data.
>>>
>>
>> Should use "struct kretprobe_holder *rph", since "struct kretprobe" belongs
>> to 3rd-party module (which might be unloaded any time).
> 
> Good catch. Yes, instead of 'struct kretprobe', we need to use the holder.
> 
>> User's own pool might not work if the module can be unloaded. Better manage
>> the pool in kretprobe_holder, which needs no changes from user side.
> 
> No, since the 'data' will be only refered from user handler. If the kretprobe
> is released, then the kretprobe_holder will clear the refernce to the 'struct
> kretprobe'. Then, the user handler is never called. No one access the 'data'.

Indeed, there is no race of "data" accessing, since unregister_kretprobes()
is taking care of it.

This implementation just increases the complexity of caller to keep track
of all allocated instances and release them after unregistration.

But guys are likely to use kmalloc in pre-handler and kfree in post-handler,
which will lead to memory leaks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry
  2021-07-11 15:28                         ` Matt Wu
@ 2021-07-12  4:57                           ` Masami Hiramatsu
  0 siblings, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-12  4:57 UTC (permalink / raw)
  To: Matt Wu
  Cc: Peter Zijlstra, Steven Rostedt, Josh Poimboeuf, Ingo Molnar,
	X86 ML, Daniel Xu, linux-kernel, bpf, kuba, mingo, ast,
	Thomas Gleixner, Borislav Petkov, kernel-team, yhs, linux-ia64,
	Abhishek Sagar, Andrii Nakryiko

On Sun, 11 Jul 2021 23:28:49 +0800
Matt Wu <wuqiang.matt@bytedance.com> wrote:

> On 2021/7/11 PM10:09, Masami Hiramatsu wrote:
> > On Wed, 7 Jul 2021 22:42:47 +0800
> > Matt Wu <wuqiang.matt@bytedance.com> wrote:
> > 
> >> On 2021/7/7 PM9:29, Masami Hiramatsu wrote:
> >>> On Wed, 7 Jul 2021 19:45:30 +0900
> >>> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >>>
> >>>> On Wed, 7 Jul 2021 12:20:57 +0200
> >>>> Peter Zijlstra <peterz@infradead.org> wrote:
> >>>>
> >>>>> On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
> >>>>>
> >>>>>> I actually don't want to keep this feature because no one use it.
> >>>>>> (only systemtap needs it?)
> >>>>>
> >>>>> Yeah, you mentioned systemtap, but since that's out-of-tree I don't
> >>>>> care. Their problem.
> >>>
> >>> Yeah, maybe it is not hard to update.
> >>>
> >>>>>
> >>>>>> Anyway, if we keep the idea-level compatibility (not code level),
> >>>>>> what we need is 'void *data' in the struct kretprobe_instance.
> >>>>>> User who needs it can allocate their own instance data for their
> >>>>>> kretprobes when initialising it and sets in their entry handler.
> >>>>>>
> >>>>>> Then we can have a simple kretprobe_instance.
> >>>>>
> >>>>> When would you do the alloc? When installing the retprobe, but that
> >>>>> might be inside the allocator, which means you can't call the allocator
> >>>>> etc.. :-)
> >>>>
> >>>> Yes, so the user may need to allocate a pool right before register_kretprobe().
> >>>> (whether per-kretprobe or per-task or global pool, that is user's choice.)
> >>>>
> >>>>>
> >>>>> If we look at struct ftrace_ret_stack, it has a few fixed function
> >>>>> fields. The calltime one is all that is needed for the kretprobe
> >>>>> example code.
> >>>>
> >>>> kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
> >>>> stores callee function address in 'kretprobe::kp.addr'), a return
> >>>> address and a frame pointer (*).
> >>>   > Oops, I forgot to add "void *data" for storing user data.
> >>>
> >>
> >> Should use "struct kretprobe_holder *rph", since "struct kretprobe" belongs
> >> to 3rd-party module (which might be unloaded any time).
> > 
> > Good catch. Yes, instead of 'struct kretprobe', we need to use the holder.
> > 
> >> User's own pool might not work if the module can be unloaded. Better manage
> >> the pool in kretprobe_holder, which needs no changes from user side.
> > 
> > No, since the 'data' will be only refered from user handler. If the kretprobe
> > is released, then the kretprobe_holder will clear the refernce to the 'struct
> > kretprobe'. Then, the user handler is never called. No one access the 'data'.
> 
> Indeed, there is no race of "data" accessing, since unregister_kretprobes()
> is taking care of it.
> 
> This implementation just increases the complexity of caller to keep track
> of all allocated instances and release them after unregistration.

Yes, but user can manage it with an array of pointers (or directly allocate
an array of their desired data). Not hard to track it in that case.

> But guys are likely to use kmalloc in pre-handler and kfree in post-handler,
> which will lead to memory leaks.

I will note "do not allocate memory inside kprobe handler" on manual.
I think that's all what we need. We cannot stop someone shooting their feet
especially in the kernel...

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] objtool: Add frame-pointer-specific function ignore
  2021-07-10 19:24           ` [PATCH 1/2] objtool: Add frame-pointer-specific function ignore Josh Poimboeuf
  2021-07-11  1:16             ` Masami Hiramatsu
@ 2021-07-29  2:31             ` Masami Hiramatsu
  1 sibling, 0 replies; 57+ messages in thread
From: Masami Hiramatsu @ 2021-07-29  2:31 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Steven Rostedt, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, kernel-team, yhs, linux-ia64, Abhishek Sagar,
	Andrii Nakryiko

Hi Josh,

I found one mistake on this patch. You have to add STACK_FRAME_NON_STANDARD_FP() in
case of !CONFIG_STACK_VALIDATION.

diff --git a/include/linux/objtool.h b/include/linux/objtool.h
index c9575ed91052..aca52db2f3f3 100644
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -138,6 +138,7 @@ struct unwind_hint {
 #define UNWIND_HINT(sp_reg, sp_offset, type, end)      \
        "\n\t"
 #define STACK_FRAME_NON_STANDARD(func)
+#define STACK_FRAME_NON_STANDARD_FP(func)
 #else
 #define ANNOTATE_INTRA_FUNCTION_CALL
 .macro UNWIND_HINT sp_reg:req sp_offset=0 type:req end=0

Anyway, I will send my series including these patches and this fix as v10.

Thank you,


On Sat, 10 Jul 2021 12:24:33 -0700
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> Add a CONFIG_FRAME_POINTER-specific version of
> STACK_FRAME_NON_STANDARD() for the case where a function is
> intentionally missing frame pointer setup, but otherwise needs
> objtool/ORC coverage when frame pointers are disabled.
> 
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  include/linux/objtool.h       | 11 +++++++++++
>  tools/include/linux/objtool.h | 11 +++++++++++
>  2 files changed, 22 insertions(+)
> 
> diff --git a/include/linux/objtool.h b/include/linux/objtool.h
> index 7e72d975cb76..c9575ed91052 100644
> --- a/include/linux/objtool.h
> +++ b/include/linux/objtool.h
> @@ -66,6 +66,17 @@ struct unwind_hint {
>  	static void __used __section(".discard.func_stack_frame_non_standard") \
>  		*__func_stack_frame_non_standard_##func = func
>  
> +/*
> + * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
> + * for the case where a function is intentionally missing frame pointer setup,
> + * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
> + */
> +#ifdef CONFIG_FRAME_POINTER
> +#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
> +#else
> +#define STACK_FRAME_NON_STANDARD_FP(func)
> +#endif
> +
>  #else /* __ASSEMBLY__ */
>  
>  /*
> diff --git a/tools/include/linux/objtool.h b/tools/include/linux/objtool.h
> index 7e72d975cb76..c9575ed91052 100644
> --- a/tools/include/linux/objtool.h
> +++ b/tools/include/linux/objtool.h
> @@ -66,6 +66,17 @@ struct unwind_hint {
>  	static void __used __section(".discard.func_stack_frame_non_standard") \
>  		*__func_stack_frame_non_standard_##func = func
>  
> +/*
> + * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
> + * for the case where a function is intentionally missing frame pointer setup,
> + * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
> + */
> +#ifdef CONFIG_FRAME_POINTER
> +#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
> +#else
> +#define STACK_FRAME_NON_STANDARD_FP(func)
> +#endif
> +
>  #else /* __ASSEMBLY__ */
>  
>  /*
> -- 
> 2.31.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2021-07-29  2:31 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-18  7:05 [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
2021-06-18  7:05 ` [PATCH -tip v8 01/13] ia64: kprobes: Fix to pass correct trampoline address to the handler Masami Hiramatsu
2021-07-05  7:46   ` Ingo Molnar
2021-07-05 10:05     ` Masami Hiramatsu
2021-06-18  7:05 ` [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor() Masami Hiramatsu
2021-07-05  7:48   ` Ingo Molnar
2021-07-05 12:03     ` Masami Hiramatsu
2021-07-07 18:28   ` Andrii Nakryiko
2021-07-08  4:08     ` Masami Hiramatsu
2021-06-18  7:05 ` [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler() Masami Hiramatsu
2021-07-05  7:03   ` Ingo Molnar
2021-07-05 10:03     ` Masami Hiramatsu
2021-07-05  7:49   ` Ingo Molnar
2021-06-18  7:05 ` [PATCH -tip v8 04/13] kprobes: Add kretprobe_find_ret_addr() for searching return address Masami Hiramatsu
2021-07-05  7:42   ` Ingo Molnar
2021-07-05 14:11     ` Masami Hiramatsu
2021-06-18  7:06 ` [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code Masami Hiramatsu
2021-07-05  8:02   ` Ingo Molnar
2021-07-09 15:31     ` Masami Hiramatsu
2021-07-10  1:41       ` Masami Hiramatsu
2021-07-10 19:01         ` Josh Poimboeuf
2021-07-10 19:24           ` [PATCH 1/2] objtool: Add frame-pointer-specific function ignore Josh Poimboeuf
2021-07-11  1:16             ` Masami Hiramatsu
2021-07-29  2:31             ` Masami Hiramatsu
2021-07-10 19:25           ` [PATCH 2/2] objtool: Ignore unwind hints for ignored functions Josh Poimboeuf
2021-07-11  2:07             ` Masami Hiramatsu
2021-06-18  7:06 ` [PATCH -tip v8 06/13] ARC: Add instruction_pointer_set() API Masami Hiramatsu
2021-06-18  7:06 ` [PATCH -tip v8 07/13] ia64: " Masami Hiramatsu
2021-06-18  7:06 ` [PATCH -tip v8 08/13] arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline Masami Hiramatsu
2021-07-05  8:04   ` Ingo Molnar
2021-07-05 14:40     ` Masami Hiramatsu
2021-06-18  7:06 ` [PATCH -tip v8 09/13] kprobes: Enable stacktrace from pt_regs in kretprobe handler Masami Hiramatsu
2021-06-18 14:04   ` Josh Poimboeuf
2021-06-18  7:06 ` [PATCH -tip v8 10/13] x86/kprobes: Push a fake return address at kretprobe_trampoline Masami Hiramatsu
2021-07-05  8:17   ` Ingo Molnar
2021-07-09 14:55     ` Masami Hiramatsu
2021-06-18  7:07 ` [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry Masami Hiramatsu
2021-07-05 11:36   ` Peter Zijlstra
2021-07-05 15:42     ` Masami Hiramatsu
2021-07-06  7:55       ` Peter Zijlstra
2021-07-06 15:11         ` Steven Rostedt
2021-07-07  8:20           ` Peter Zijlstra
2021-07-07  8:36             ` Peter Zijlstra
2021-07-07 10:15             ` Masami Hiramatsu
2021-07-07 10:20               ` Peter Zijlstra
2021-07-07 10:45                 ` Masami Hiramatsu
2021-07-07 13:29                   ` Masami Hiramatsu
2021-07-07 14:42                     ` Matt Wu
2021-07-11 14:09                       ` Masami Hiramatsu
2021-07-11 15:28                         ` Matt Wu
2021-07-12  4:57                           ` Masami Hiramatsu
2021-06-18  7:07 ` [PATCH -tip v8 12/13] tracing: Show kretprobe unknown indicator only for kretprobe_trampoline Masami Hiramatsu
2021-06-18  7:07 ` [PATCH -tip v8 13/13] x86/kprobes: Fixup return address in generic trampoline handler Masami Hiramatsu
2021-07-05  8:34   ` Ingo Molnar
2021-07-06 12:57     ` Masami Hiramatsu
2021-06-18 17:44 ` [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86 Andrii Nakryiko
2021-06-28 13:50 ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).