All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
@ 2021-03-05 15:38 Masami Hiramatsu
  2021-03-05 15:39 ` [PATCH -tip 1/5] ia64: kprobes: Fix to pass correct trampoline address to the handler Masami Hiramatsu
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-05 15:38 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs

Hello,

Here is a series of patches for kprobes and stacktracer to fix the kretprobe
entries in the kernel stack. This was reported by Daniel Xu. I thought that
was in the bpftrace, but it is actually more generic issue.
So I decided to fix the issue in arch independent part.

While fixing the issue, I found a bug in ia64 related to kretprobe, which is
fixed by [1/5]. [2/5] and [3/5] is a kind of cleanup before fixing the main
issue. [4/5] is the patch to fix the stacktrace, which involves kretprobe
internal change. And [5/5] removing the stacktrace kretprobe fixup code in
ftrace. 

Daniel, can you also check that this fixes your issue too? I hope it is.

Note that this doesn't fixup all cases. Unfortunately, stacktracing the
other tasks (non current task) on the arch which doesn't support ARCH_STACKWALK,
I can not fix it in the arch independent code. Maybe each arch dependent
stacktrace implementation must fixup by themselves.

Thank you,

---

Masami Hiramatsu (5):
      ia64: kprobes: Fix to pass correct trampoline address to the handler
      kprobes: treewide: Replace arch_deref_entry_point() with dereference_function_descriptor()
      kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
      kprobes: stacktrace: Recover the address changed by kretprobe
      tracing: Remove kretprobe unknown indicator from stacktrace


 arch/arc/kernel/kprobes.c          |    2 -
 arch/arm/probes/kprobes/core.c     |    3 -
 arch/arm64/kernel/probes/kprobes.c |    3 -
 arch/csky/kernel/probes/kprobes.c  |    2 -
 arch/ia64/kernel/kprobes.c         |   15 ++----
 arch/mips/kernel/kprobes.c         |    3 -
 arch/parisc/kernel/kprobes.c       |    4 +-
 arch/powerpc/kernel/kprobes.c      |   13 -----
 arch/riscv/kernel/probes/kprobes.c |    2 -
 arch/s390/kernel/kprobes.c         |    2 -
 arch/sh/kernel/kprobes.c           |    2 -
 arch/sparc/kernel/kprobes.c        |    2 -
 arch/x86/kernel/kprobes/core.c     |    2 -
 include/linux/kprobes.h            |   32 +++++++++++--
 kernel/kprobes.c                   |   89 ++++++++++++++++++++++--------------
 kernel/stacktrace.c                |   21 ++++++++
 kernel/trace/trace_output.c        |   27 ++---------
 lib/error-inject.c                 |    3 +
 18 files changed, 126 insertions(+), 101 deletions(-)

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -tip 1/5] ia64: kprobes: Fix to pass correct trampoline address to the handler
  2021-03-05 15:38 [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Masami Hiramatsu
@ 2021-03-05 15:39 ` Masami Hiramatsu
  2021-03-05 15:39 ` [PATCH -tip 2/5] kprobes: treewide: Replace arch_deref_entry_point() with dereference_function_descriptor() Masami Hiramatsu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-05 15:39 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs

Commit e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
missed to pass the wrong trampoline address (it passes the descriptor address
instead of function entry address).
This fixes it to pass correct trampoline address to __kretprobe_trampoline_handler().
This also changes to use correct symbol dereference function to get the
function address from the kretprobe_trampoline.

Fixes: e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/ia64/kernel/kprobes.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index fc1ff8a4d7de..006fbc1d7ae9 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -398,7 +398,8 @@ static void kretprobe_trampoline(void)
 
 int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	regs->cr_iip = __kretprobe_trampoline_handler(regs, kretprobe_trampoline, NULL);
+	regs->cr_iip = __kretprobe_trampoline_handler(regs,
+		dereference_function_descriptor(kretprobe_trampoline), NULL);
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler
@@ -414,7 +415,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 	ri->fp = NULL;
 
 	/* Replace the return addr with trampoline addr */
-	regs->b0 = ((struct fnptr *)kretprobe_trampoline)->ip;
+	regs->b0 = (unsigned long)dereference_function_descriptor(kretprobe_trampoline);
 }
 
 /* Check the instruction in the slot is break */
@@ -918,14 +919,14 @@ static struct kprobe trampoline_p = {
 int __init arch_init_kprobes(void)
 {
 	trampoline_p.addr =
-		(kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip;
+		dereference_function_description(kretprobe_trampoline);
 	return register_kprobe(&trampoline_p);
 }
 
 int __kprobes arch_trampoline_kprobe(struct kprobe *p)
 {
 	if (p->addr ==
-		(kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip)
+		dereference_function_descriptor(kretprobe_trampoline))
 		return 1;
 
 	return 0;


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -tip 2/5] kprobes: treewide: Replace arch_deref_entry_point() with dereference_function_descriptor()
  2021-03-05 15:38 [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Masami Hiramatsu
  2021-03-05 15:39 ` [PATCH -tip 1/5] ia64: kprobes: Fix to pass correct trampoline address to the handler Masami Hiramatsu
@ 2021-03-05 15:39 ` Masami Hiramatsu
  2021-03-05 15:39 ` [PATCH -tip 3/5] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler() Masami Hiramatsu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-05 15:39 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs

Replace arch_deref_entry_point() with dereference_function_descriptor()
because those are doing same thing.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/ia64/kernel/kprobes.c    |    5 -----
 arch/powerpc/kernel/kprobes.c |   11 -----------
 include/linux/kprobes.h       |    1 -
 kernel/kprobes.c              |    7 +------
 lib/error-inject.c            |    3 ++-
 5 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index 006fbc1d7ae9..15871eb170c0 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -907,11 +907,6 @@ int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
 	return ret;
 }
 
-unsigned long arch_deref_entry_point(void *entry)
-{
-	return ((struct fnptr *)entry)->ip;
-}
-
 static struct kprobe trampoline_p = {
 	.pre_handler = trampoline_probe_handler
 };
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 01ab2163659e..eb0460949e1b 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -539,17 +539,6 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 }
 NOKPROBE_SYMBOL(kprobe_fault_handler);
 
-unsigned long arch_deref_entry_point(void *entry)
-{
-#ifdef PPC64_ELF_ABI_v1
-	if (!kernel_text_address((unsigned long)entry))
-		return ppc_global_function_entry(entry);
-	else
-#endif
-		return (unsigned long)entry;
-}
-NOKPROBE_SYMBOL(arch_deref_entry_point);
-
 static struct kprobe trampoline_p = {
 	.addr = (kprobe_opcode_t *) &kretprobe_trampoline,
 	.pre_handler = trampoline_probe_handler
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 1883a4a9f16a..d65c041b5c22 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -390,7 +390,6 @@ int register_kprobe(struct kprobe *p);
 void unregister_kprobe(struct kprobe *p);
 int register_kprobes(struct kprobe **kps, int num);
 void unregister_kprobes(struct kprobe **kps, int num);
-unsigned long arch_deref_entry_point(void *);
 
 int register_kretprobe(struct kretprobe *rp);
 void unregister_kretprobe(struct kretprobe *rp);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 745f08fdd7a6..2913de07f4a3 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1856,11 +1856,6 @@ static struct notifier_block kprobe_exceptions_nb = {
 	.priority = 0x7fffffff /* we need to be notified first */
 };
 
-unsigned long __weak arch_deref_entry_point(void *entry)
-{
-	return (unsigned long)entry;
-}
-
 #ifdef CONFIG_KRETPROBES
 
 unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
@@ -2324,7 +2319,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
 	int ret;
 
 	for (iter = start; iter < end; iter++) {
-		entry = arch_deref_entry_point((void *)*iter);
+		entry = (unsigned long)dereference_function_descriptor((void *)*iter);
 		ret = kprobe_add_ksym_blacklist(entry);
 		if (ret == -EINVAL)
 			continue;
diff --git a/lib/error-inject.c b/lib/error-inject.c
index c73651b15b76..f71875ac5f9f 100644
--- a/lib/error-inject.c
+++ b/lib/error-inject.c
@@ -8,6 +8,7 @@
 #include <linux/mutex.h>
 #include <linux/list.h>
 #include <linux/slab.h>
+#include <asm/sections.h>
 
 /* Whitelist of symbols that can be overridden for error injection. */
 static LIST_HEAD(error_injection_list);
@@ -64,7 +65,7 @@ static void populate_error_injection_list(struct error_injection_entry *start,
 
 	mutex_lock(&ei_mutex);
 	for (iter = start; iter < end; iter++) {
-		entry = arch_deref_entry_point((void *)iter->addr);
+		entry = (unsigned long)dereference_function_descriptor((void *)iter->addr);
 
 		if (!kernel_text_address(entry) ||
 		    !kallsyms_lookup_size_offset(entry, &size, &offset)) {


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -tip 3/5] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
  2021-03-05 15:38 [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Masami Hiramatsu
  2021-03-05 15:39 ` [PATCH -tip 1/5] ia64: kprobes: Fix to pass correct trampoline address to the handler Masami Hiramatsu
  2021-03-05 15:39 ` [PATCH -tip 2/5] kprobes: treewide: Replace arch_deref_entry_point() with dereference_function_descriptor() Masami Hiramatsu
@ 2021-03-05 15:39 ` Masami Hiramatsu
  2021-03-10 14:21   ` Miroslav Benes
  2021-03-05 15:39 ` [PATCH -tip 4/5] kprobes: stacktrace: Recover the address changed by kretprobe Masami Hiramatsu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-05 15:39 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs

Remove trampoline_address from kretprobe_trampoline_handler().
Instead of passing the address, kretprobe_trampoline_handler()
can use new kretprobe_trampoline_addr().

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/arc/kernel/kprobes.c          |    2 +-
 arch/arm/probes/kprobes/core.c     |    3 +--
 arch/arm64/kernel/probes/kprobes.c |    3 +--
 arch/csky/kernel/probes/kprobes.c  |    2 +-
 arch/ia64/kernel/kprobes.c         |    5 ++---
 arch/mips/kernel/kprobes.c         |    3 +--
 arch/parisc/kernel/kprobes.c       |    4 ++--
 arch/powerpc/kernel/kprobes.c      |    2 +-
 arch/riscv/kernel/probes/kprobes.c |    2 +-
 arch/s390/kernel/kprobes.c         |    2 +-
 arch/sh/kernel/kprobes.c           |    2 +-
 arch/sparc/kernel/kprobes.c        |    2 +-
 arch/x86/kernel/kprobes/core.c     |    2 +-
 include/linux/kprobes.h            |   18 +++++++++++++-----
 kernel/kprobes.c                   |    3 +--
 15 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/arch/arc/kernel/kprobes.c b/arch/arc/kernel/kprobes.c
index cabef45f11df..3ae01bb5820c 100644
--- a/arch/arc/kernel/kprobes.c
+++ b/arch/arc/kernel/kprobes.c
@@ -397,7 +397,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 static int __kprobes trampoline_probe_handler(struct kprobe *p,
 					      struct pt_regs *regs)
 {
-	regs->ret = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	regs->ret = __kretprobe_trampoline_handler(regs, NULL);
 
 	/* By returning a non zero value, we are telling the kprobe handler
 	 * that we don't want the post_handler to run
diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index a9653117ca0d..1782b41df095 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -413,8 +413,7 @@ void __naked __kprobes kretprobe_trampoline(void)
 /* Called from kretprobe_trampoline */
 static __used __kprobes void *trampoline_handler(struct pt_regs *regs)
 {
-	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
-						    (void *)regs->ARM_fp);
+	return (void *)kretprobe_trampoline_handler(regs, (void *)regs->ARM_fp);
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index 66aac2881ba8..fce681fdfce6 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -412,8 +412,7 @@ int __init arch_populate_kprobe_blacklist(void)
 
 void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
 {
-	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
-					(void *)kernel_stack_pointer(regs));
+	return (void *)kretprobe_trampoline_handler(regs, (void *)kernel_stack_pointer(regs));
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/csky/kernel/probes/kprobes.c b/arch/csky/kernel/probes/kprobes.c
index 589f090f48b9..cc589bc11904 100644
--- a/arch/csky/kernel/probes/kprobes.c
+++ b/arch/csky/kernel/probes/kprobes.c
@@ -404,7 +404,7 @@ int __init arch_populate_kprobe_blacklist(void)
 
 void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
 {
-	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	return (void *)kretprobe_trampoline_handler(regs, NULL);
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index 15871eb170c0..a008df8e7203 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -392,14 +392,13 @@ static void __kprobes set_current_kprobe(struct kprobe *p,
 	__this_cpu_write(current_kprobe, p);
 }
 
-static void kretprobe_trampoline(void)
+void kretprobe_trampoline(void)
 {
 }
 
 int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	regs->cr_iip = __kretprobe_trampoline_handler(regs,
-		dereference_function_descriptor(kretprobe_trampoline), NULL);
+	regs->cr_iip = __kretprobe_trampoline_handler(regs, NULL);
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler
diff --git a/arch/mips/kernel/kprobes.c b/arch/mips/kernel/kprobes.c
index 54dfba8fa77c..001a2f07ef44 100644
--- a/arch/mips/kernel/kprobes.c
+++ b/arch/mips/kernel/kprobes.c
@@ -489,8 +489,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 static int __kprobes trampoline_probe_handler(struct kprobe *p,
 						struct pt_regs *regs)
 {
-	instruction_pointer(regs) = __kretprobe_trampoline_handler(regs,
-						kretprobe_trampoline, NULL);
+	instruction_pointer(regs) = __kretprobe_trampoline_handler(regs, NULL);
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler
diff --git a/arch/parisc/kernel/kprobes.c b/arch/parisc/kernel/kprobes.c
index 6d21a515eea5..4a35ac6e2ca2 100644
--- a/arch/parisc/kernel/kprobes.c
+++ b/arch/parisc/kernel/kprobes.c
@@ -175,7 +175,7 @@ int __kprobes parisc_kprobe_ss_handler(struct pt_regs *regs)
 	return 1;
 }
 
-static inline void kretprobe_trampoline(void)
+void kretprobe_trampoline(void)
 {
 	asm volatile("nop");
 	asm volatile("nop");
@@ -193,7 +193,7 @@ static int __kprobes trampoline_probe_handler(struct kprobe *p,
 {
 	unsigned long orig_ret_address;
 
-	orig_ret_address = __kretprobe_trampoline_handler(regs, trampoline_p.addr, NULL);
+	orig_ret_address = __kretprobe_trampoline_handler(regs, NULL);
 	instruction_pointer_set(regs, orig_ret_address);
 
 	return 1;
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index eb0460949e1b..dfd532c43525 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -399,7 +399,7 @@ static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
 	unsigned long orig_ret_address;
 
-	orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	orig_ret_address = __kretprobe_trampoline_handler(regs, NULL);
 	/*
 	 * We get here through one of two paths:
 	 * 1. by taking a trap -> kprobe_handler() -> here
diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
index a2ec18662fee..619339f1d3ba 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -376,7 +376,7 @@ int __init arch_populate_kprobe_blacklist(void)
 
 void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
 {
-	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	return (void *)kretprobe_trampoline_handler(regs, NULL);
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index aae24dc75df6..b149e9169709 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -351,7 +351,7 @@ static void __used kretprobe_trampoline_holder(void)
  */
 static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	regs->psw.addr = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	regs->psw.addr = __kretprobe_trampoline_handler(regs, NULL);
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler
diff --git a/arch/sh/kernel/kprobes.c b/arch/sh/kernel/kprobes.c
index 756100b01e84..48356e81836a 100644
--- a/arch/sh/kernel/kprobes.c
+++ b/arch/sh/kernel/kprobes.c
@@ -303,7 +303,7 @@ static void __used kretprobe_trampoline_holder(void)
  */
 int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	regs->pc = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	regs->pc = __kretprobe_trampoline_handler(regs, NULL);
 
 	return 1;
 }
diff --git a/arch/sparc/kernel/kprobes.c b/arch/sparc/kernel/kprobes.c
index 217c21a6986a..fa30f9dadff8 100644
--- a/arch/sparc/kernel/kprobes.c
+++ b/arch/sparc/kernel/kprobes.c
@@ -468,7 +468,7 @@ static int __kprobes trampoline_probe_handler(struct kprobe *p,
 {
 	unsigned long orig_ret_address = 0;
 
-	orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+	orig_ret_address = __kretprobe_trampoline_handler(regs, NULL);
 	regs->tpc = orig_ret_address;
 	regs->tnpc = orig_ret_address + 4;
 
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 1f58e89eeccd..3c00b773fe2e 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1062,7 +1062,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 	regs->ip = (unsigned long)&kretprobe_trampoline;
 	regs->orig_ax = ~0UL;
 
-	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, &regs->sp);
+	return (void *)kretprobe_trampoline_handler(regs, &regs->sp);
 }
 NOKPROBE_SYMBOL(trampoline_handler);
 
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index d65c041b5c22..9596b6b15bd0 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -205,15 +205,23 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				   struct pt_regs *regs);
 extern int arch_trampoline_kprobe(struct kprobe *p);
 
+void kretprobe_trampoline(void);
+/*
+ * Since some architecture uses structured function pointer,
+ * use arch_deref_entry_point() to get real function address.
+ */
+static nokprobe_inline void *kretprobe_trampoline_addr(void)
+{
+	return dereference_function_descriptor(kretprobe_trampoline);
+}
+
 /* If the trampoline handler called from a kprobe, use this version */
 unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
-				void *trampoline_address,
-				void *frame_pointer);
+					     void *frame_pointer);
 
 static nokprobe_inline
 unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
-				void *trampoline_address,
-				void *frame_pointer)
+					   void *frame_pointer)
 {
 	unsigned long ret;
 	/*
@@ -222,7 +230,7 @@ unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
 	 * be running at this point.
 	 */
 	kprobe_busy_begin();
-	ret = __kretprobe_trampoline_handler(regs, trampoline_address, frame_pointer);
+	ret = __kretprobe_trampoline_handler(regs, frame_pointer);
 	kprobe_busy_end();
 
 	return ret;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2913de07f4a3..75c0a58c19c2 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1859,7 +1859,6 @@ static struct notifier_block kprobe_exceptions_nb = {
 #ifdef CONFIG_KRETPROBES
 
 unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
-					     void *trampoline_address,
 					     void *frame_pointer)
 {
 	kprobe_opcode_t *correct_ret_addr = NULL;
@@ -1874,7 +1873,7 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 
 		BUG_ON(ri->fp != frame_pointer);
 
-		if (ri->ret_addr != trampoline_address) {
+		if (ri->ret_addr != kretprobe_trampoline_addr()) {
 			correct_ret_addr = ri->ret_addr;
 			/*
 			 * This is the real return address. Any other


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -tip 4/5] kprobes: stacktrace: Recover the address changed by kretprobe
  2021-03-05 15:38 [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Masami Hiramatsu
                   ` (2 preceding siblings ...)
  2021-03-05 15:39 ` [PATCH -tip 3/5] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler() Masami Hiramatsu
@ 2021-03-05 15:39 ` Masami Hiramatsu
  2021-03-05 15:39 ` [PATCH -tip 5/5] tracing: Remove kretprobe unknown indicator from stacktrace Masami Hiramatsu
  2021-03-05 19:16 ` [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Daniel Xu
  5 siblings, 0 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-05 15:39 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs

Recover the return address on the stack which changed by the
kretprobe. Note that this does not recover the address on the
!current stack trace if CONFIG_ARCH_STACKWALK=n because old
stack trace interface doesn't lock the stack in the generic
stack_trace_save*() functions.

So with this patch, ftrace correctly shows the stacktrace
as below;

 # echo r vfs_read > kprobe_events
 # echo stacktrace > events/kprobes/r_vfs_read_0/trigger
 # echo 1 > events/kprobes/r_vfs_read_0/enable
 # echo 1 > options/sym-offset
 # less trace
...

              sh-132     [007] ...1    22.524917: <stack trace>
 => kretprobe_dispatcher+0x7d/0xc0
 => __kretprobe_trampoline_handler+0xdb/0x1b0
 => trampoline_handler+0x48/0x60
 => kretprobe_trampoline+0x2a/0x50
 => ksys_read+0x70/0xf0
 => __x64_sys_read+0x1a/0x20
 => do_syscall_64+0x38/0x50
 => entry_SYSCALL_64_after_hwframe+0x44/0xae
 => 0

The trampoline_handler+0x48 is actual call site address,
not modified by kretprobe.

Reported-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 include/linux/kprobes.h |   13 ++++++++
 kernel/kprobes.c        |   79 ++++++++++++++++++++++++++++++++---------------
 kernel/stacktrace.c     |   21 ++++++++++++
 3 files changed, 87 insertions(+), 26 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9596b6b15bd0..d8dd9f026de9 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -215,6 +215,10 @@ static nokprobe_inline void *kretprobe_trampoline_addr(void)
 	return dereference_function_descriptor(kretprobe_trampoline);
 }
 
+unsigned long kretprobe_find_ret_addr(unsigned long addr,
+				      struct task_struct *tsk,
+				      struct llist_node **cur);
+
 /* If the trampoline handler called from a kprobe, use this version */
 unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 					     void *frame_pointer);
@@ -514,6 +518,15 @@ static inline bool is_kprobe_optinsn_slot(unsigned long addr)
 }
 #endif
 
+#if !defined(CONFIG_KPROBES) || !defined(CONFIG_KRETPROBES)
+static inline unsigned long kretprobe_find_ret_addr(unsigned long addr,
+					      struct task_struct *tsk,
+					      struct llist_node **cur)
+{
+	return addr;
+}
+#endif
+
 /* Returns true if kprobes handled the fault */
 static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs,
 					      unsigned int trap)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 75c0a58c19c2..76b5e6b03bef 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1858,45 +1858,57 @@ static struct notifier_block kprobe_exceptions_nb = {
 
 #ifdef CONFIG_KRETPROBES
 
-unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
-					     void *frame_pointer)
+/* This assumes the tsk is current or the task which is not running. */
+unsigned long kretprobe_find_ret_addr(unsigned long addr,
+				      struct task_struct *tsk,
+				      struct llist_node **cur)
 {
-	kprobe_opcode_t *correct_ret_addr = NULL;
 	struct kretprobe_instance *ri = NULL;
-	struct llist_node *first, *node;
-	struct kretprobe *rp;
+	struct llist_node *node = *cur;
 
-	/* Find all nodes for this frame. */
-	first = node = current->kretprobe_instances.first;
-	while (node) {
-		ri = container_of(node, struct kretprobe_instance, llist);
+	if (addr != (unsigned long)kretprobe_trampoline_addr())
+		return addr;
 
-		BUG_ON(ri->fp != frame_pointer);
+	if (!node)
+		node = tsk->kretprobe_instances.first;
+	else
+		node = node->next;
 
+	while (node) {
+		ri = container_of(node, struct kretprobe_instance, llist);
 		if (ri->ret_addr != kretprobe_trampoline_addr()) {
-			correct_ret_addr = ri->ret_addr;
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			goto found;
+			*cur = node;
+			return (unsigned long)ri->ret_addr;
 		}
-
 		node = node->next;
 	}
-	pr_err("Oops! Kretprobe fails to find correct return address.\n");
-	BUG_ON(1);
+	return 0;
+}
+NOKPROBE_SYMBOL(kretprobe_find_ret_addr);
 
-found:
-	/* Unlink all nodes for this frame. */
-	current->kretprobe_instances.first = node->next;
-	node->next = NULL;
+unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
+					     void *frame_pointer)
+{
+	kprobe_opcode_t *correct_ret_addr = NULL;
+	struct kretprobe_instance *ri = NULL;
+	struct llist_node *first, *node = NULL;
+	struct kretprobe *rp;
+
+	/* Find correct address and all nodes for this frame. */
+	correct_ret_addr = (void *)kretprobe_find_ret_addr(
+			(unsigned long)kretprobe_trampoline_addr(),
+			current, &node);
+	if (!correct_ret_addr) {
+		pr_err("Oops! Kretprobe fails to find correct return address.\n");
+		BUG_ON(1);
+	}
 
-	/* Run them..  */
+	/* Run them. */
+	first = current->kretprobe_instances.first;
 	while (first) {
 		ri = container_of(first, struct kretprobe_instance, llist);
-		first = first->next;
+
+		BUG_ON(ri->fp != frame_pointer);
 
 		rp = get_kretprobe(ri);
 		if (rp && rp->handler) {
@@ -1907,6 +1919,21 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 			rp->handler(ri, regs);
 			__this_cpu_write(current_kprobe, prev);
 		}
+		if (first == node)
+			break;
+
+		first = first->next;
+	}
+
+	/* Unlink all nodes for this frame. */
+	first = current->kretprobe_instances.first;
+	current->kretprobe_instances.first = node->next;
+	node->next = NULL;
+
+	/* Recycle them.  */
+	while (first) {
+		ri = container_of(first, struct kretprobe_instance, llist);
+		first = first->next;
 
 		recycle_rp_inst(ri);
 	}
diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
index 9f8117c7cfdd..c224858366e7 100644
--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c
@@ -13,6 +13,7 @@
 #include <linux/export.h>
 #include <linux/kallsyms.h>
 #include <linux/stacktrace.h>
+#include <linux/kprobes.h>
 
 /**
  * stack_trace_print - Print the entries in the stack trace
@@ -69,6 +70,17 @@ int stack_trace_snprint(char *buf, size_t size, const unsigned long *entries,
 }
 EXPORT_SYMBOL_GPL(stack_trace_snprint);
 
+static void fixup_kretprobe_tramp_addr(unsigned long *store, unsigned int len,
+				       struct task_struct *tsk)
+{
+	struct llist_node *cur = NULL;
+
+	while (len--) {
+		*store = kretprobe_find_ret_addr(*store, tsk, &cur);
+		store++;
+	}
+}
+
 #ifdef CONFIG_ARCH_STACKWALK
 
 struct stacktrace_cookie {
@@ -119,6 +131,7 @@ unsigned int stack_trace_save(unsigned long *store, unsigned int size,
 	};
 
 	arch_stack_walk(consume_entry, &c, current, NULL);
+	fixup_kretprobe_tramp_addr(store, c.len, current);
 	return c.len;
 }
 EXPORT_SYMBOL_GPL(stack_trace_save);
@@ -147,6 +160,7 @@ unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
 		return 0;
 
 	arch_stack_walk(consume_entry, &c, tsk, NULL);
+	fixup_kretprobe_tramp_addr(store, c.len, tsk);
 	put_task_stack(tsk);
 	return c.len;
 }
@@ -171,6 +185,7 @@ unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
 	};
 
 	arch_stack_walk(consume_entry, &c, current, regs);
+	fixup_kretprobe_tramp_addr(store, c.len, current);
 	return c.len;
 }
 
@@ -205,6 +220,8 @@ int stack_trace_save_tsk_reliable(struct task_struct *tsk, unsigned long *store,
 		return 0;
 
 	ret = arch_stack_walk_reliable(consume_entry, &c, tsk);
+	if (!ret)
+		fixup_kretprobe_tramp_addr(store, c.len, tsk);
 	put_task_stack(tsk);
 	return ret ? ret : c.len;
 }
@@ -276,6 +293,8 @@ unsigned int stack_trace_save(unsigned long *store, unsigned int size,
 	};
 
 	save_stack_trace(&trace);
+	fixup_kretprobe_tramp_addr(store, trace.nr_entries, current);
+
 	return trace.nr_entries;
 }
 EXPORT_SYMBOL_GPL(stack_trace_save);
@@ -323,6 +342,8 @@ unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
 	};
 
 	save_stack_trace_regs(regs, &trace);
+	fixup_kretprobe_tramp_addr(store, trace.nr_entries, current);
+
 	return trace.nr_entries;
 }
 


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -tip 5/5] tracing: Remove kretprobe unknown indicator from stacktrace
  2021-03-05 15:38 [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Masami Hiramatsu
                   ` (3 preceding siblings ...)
  2021-03-05 15:39 ` [PATCH -tip 4/5] kprobes: stacktrace: Recover the address changed by kretprobe Masami Hiramatsu
@ 2021-03-05 15:39 ` Masami Hiramatsu
  2021-03-05 15:44   ` Steven Rostedt
  2021-03-05 19:16 ` [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Daniel Xu
  5 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-05 15:39 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar
  Cc: X86 ML, Masami Hiramatsu, Daniel Xu, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs

Since the stacktrace API fixup the kretprobed address correctly,
there is no need to convert the "kretprobe_trampoline" to
 "[unknown/kretprobe'd]" anymore. Remove it.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/trace/trace_output.c |   27 ++++-----------------------
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 61255bad7e01..f5f8b081b668 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -346,37 +346,18 @@ int trace_output_call(struct trace_iterator *iter, char *name, char *fmt, ...)
 }
 EXPORT_SYMBOL_GPL(trace_output_call);
 
-#ifdef CONFIG_KRETPROBES
-static inline const char *kretprobed(const char *name)
-{
-	static const char tramp_name[] = "kretprobe_trampoline";
-	int size = sizeof(tramp_name);
-
-	if (strncmp(tramp_name, name, size) == 0)
-		return "[unknown/kretprobe'd]";
-	return name;
-}
-#else
-static inline const char *kretprobed(const char *name)
-{
-	return name;
-}
-#endif /* CONFIG_KRETPROBES */
-
 void
 trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset)
 {
 #ifdef CONFIG_KALLSYMS
-	char str[KSYM_SYMBOL_LEN];
-	const char *name;
+	char name[KSYM_SYMBOL_LEN];
 
 	if (offset)
-		sprint_symbol(str, address);
+		sprint_symbol(name, address);
 	else
-		kallsyms_lookup(address, NULL, NULL, NULL, str);
-	name = kretprobed(str);
+		kallsyms_lookup(address, NULL, NULL, NULL, name);
 
-	if (name && strlen(name)) {
+	if (strlen(name)) {
 		trace_seq_puts(s, name);
 		return;
 	}


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 5/5] tracing: Remove kretprobe unknown indicator from stacktrace
  2021-03-05 15:39 ` [PATCH -tip 5/5] tracing: Remove kretprobe unknown indicator from stacktrace Masami Hiramatsu
@ 2021-03-05 15:44   ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2021-03-05 15:44 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, X86 ML, Daniel Xu, linux-kernel, bpf, kuba, mingo,
	ast, tglx, kernel-team, yhs

On Sat,  6 Mar 2021 00:39:50 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Since the stacktrace API fixup the kretprobed address correctly,
> there is no need to convert the "kretprobe_trampoline" to
>  "[unknown/kretprobe'd]" anymore. Remove it.
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> ---
>  kernel/trace/trace_output.c |   27 ++++-----------------------
>  1 file changed, 4 insertions(+), 23 deletions(-)
> 

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 61255bad7e01..f5f8b081b668 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -346,37 +346,18 @@ int trace_output_call(struct trace_iterator *iter, char *name, char *fmt, ...)
>  }
>  EXPORT_SYMBOL_GPL(trace_output_call);
>  
> -#ifdef CONFIG_KRETPROBES
> -static inline const char *kretprobed(const char *name)
> -{
> -	static const char tramp_name[] = "kretprobe_trampoline";
> -	int size = sizeof(tramp_name);
> -
> -	if (strncmp(tramp_name, name, size) == 0)
> -		return "[unknown/kretprobe'd]";
> -	return name;
> -}
> -#else
> -static inline const char *kretprobed(const char *name)
> -{
> -	return name;
> -}
> -#endif /* CONFIG_KRETPROBES */
> -
>  void
>  trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset)
>  {
>  #ifdef CONFIG_KALLSYMS
> -	char str[KSYM_SYMBOL_LEN];
> -	const char *name;
> +	char name[KSYM_SYMBOL_LEN];
>  
>  	if (offset)
> -		sprint_symbol(str, address);
> +		sprint_symbol(name, address);
>  	else
> -		kallsyms_lookup(address, NULL, NULL, NULL, str);
> -	name = kretprobed(str);
> +		kallsyms_lookup(address, NULL, NULL, NULL, name);
>  
> -	if (name && strlen(name)) {
> +	if (strlen(name)) {
>  		trace_seq_puts(s, name);
>  		return;
>  	}


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-05 15:38 [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Masami Hiramatsu
                   ` (4 preceding siblings ...)
  2021-03-05 15:39 ` [PATCH -tip 5/5] tracing: Remove kretprobe unknown indicator from stacktrace Masami Hiramatsu
@ 2021-03-05 19:16 ` Daniel Xu
  2021-03-06  1:13   ` Masami Hiramatsu
  5 siblings, 1 reply; 28+ messages in thread
From: Daniel Xu @ 2021-03-05 19:16 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs

Hi Masami,

On Sat, Mar 06, 2021 at 12:38:57AM +0900, Masami Hiramatsu wrote:
> Hello,
> 
> Here is a series of patches for kprobes and stacktracer to fix the kretprobe
> entries in the kernel stack. This was reported by Daniel Xu. I thought that
> was in the bpftrace, but it is actually more generic issue.
> So I decided to fix the issue in arch independent part.
> 
> While fixing the issue, I found a bug in ia64 related to kretprobe, which is
> fixed by [1/5]. [2/5] and [3/5] is a kind of cleanup before fixing the main
> issue. [4/5] is the patch to fix the stacktrace, which involves kretprobe
> internal change. And [5/5] removing the stacktrace kretprobe fixup code in
> ftrace. 
> 
> Daniel, can you also check that this fixes your issue too? I hope it is.

Unfortunately, this patch series does not fix the issue I reported.

I think the reason your tests work is because you're using ftrace and
the ORC unwinder is aware of ftrace trampolines (see
arch/x86/kernel/unwind_orc.c:orc_ftrace_find).

bpftrace kprobes go through perf event subsystem (ie not ftrace) so
naturally orc_ftrace_find() does not find an associated trampoline. ORC
unwinding fails in this case because
arch/x86/kernel/kprobes/core.c:trampoline_handler sets

    regs->ip = (unsigned long)&kretprobe_trampoline;

and `kretprobe_trampoline` is marked

    STACK_FRAME_NON_STANDARD(kretprobe_trampoline);

so it doesn't have a valid ORC entry. Thus, ORC immediately bails when
trying to unwind past the first frame.

The only way I can think of to fix this issue is to make the ORC
unwinder aware of kretprobe (ie the patch I sent earlier). I'm hoping
you have another idea if my patch isn't acceptable.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-05 19:16 ` [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Daniel Xu
@ 2021-03-06  1:13   ` Masami Hiramatsu
  2021-03-07 21:23     ` Daniel Xu
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-06  1:13 UTC (permalink / raw)
  To: Daniel Xu
  Cc: Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs, Josh Poimboeuf

On Fri, 5 Mar 2021 11:16:45 -0800
Daniel Xu <dxu@dxuuu.xyz> wrote:

> Hi Masami,
> 
> On Sat, Mar 06, 2021 at 12:38:57AM +0900, Masami Hiramatsu wrote:
> > Hello,
> > 
> > Here is a series of patches for kprobes and stacktracer to fix the kretprobe
> > entries in the kernel stack. This was reported by Daniel Xu. I thought that
> > was in the bpftrace, but it is actually more generic issue.
> > So I decided to fix the issue in arch independent part.
> > 
> > While fixing the issue, I found a bug in ia64 related to kretprobe, which is
> > fixed by [1/5]. [2/5] and [3/5] is a kind of cleanup before fixing the main
> > issue. [4/5] is the patch to fix the stacktrace, which involves kretprobe
> > internal change. And [5/5] removing the stacktrace kretprobe fixup code in
> > ftrace. 
> > 
> > Daniel, can you also check that this fixes your issue too? I hope it is.
> 
> Unfortunately, this patch series does not fix the issue I reported.

Ah, OK. This was my mistake I didn't choose ORC unwinder for test kernel.

> 
> I think the reason your tests work is because you're using ftrace and
> the ORC unwinder is aware of ftrace trampolines (see
> arch/x86/kernel/unwind_orc.c:orc_ftrace_find).

OK, so it has to be fixed in ORC unwinder.

> 
> bpftrace kprobes go through perf event subsystem (ie not ftrace) so
> naturally orc_ftrace_find() does not find an associated trampoline. ORC
> unwinding fails in this case because
> arch/x86/kernel/kprobes/core.c:trampoline_handler sets
> 
>     regs->ip = (unsigned long)&kretprobe_trampoline;
> 
> and `kretprobe_trampoline` is marked
> 
>     STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> 
> so it doesn't have a valid ORC entry. Thus, ORC immediately bails when
> trying to unwind past the first frame.

Hm, in ftrace case, it decode kretprobe's stackframe and stoped right
after kretprobe_trampoline callsite.

 => kretprobe_trace_func+0x21f/0x340
 => kretprobe_dispatcher+0x73/0xb0
 => __kretprobe_trampoline_handler+0xcd/0x1a0
 => trampoline_handler+0x3d/0x50
 => kretprobe_trampoline+0x25/0x50
 => 0

kretprobe replaces the real return address with kretprobe_trampoline
and kretprobe_trampoline *calls* trampoline_handler (this part depends
on architecture implementation).
Thus, if kretprobe_trampoline has no stack frame information, ORC may
fails at the first kretprobe_trampoline+0x25, that is different from
the kretprobe_trampoline+0, so the hack doesn't work.

Hmm, how the other inline-asm function makes its stackframe info?
If I get the kretprobe_trampoline+0, I can fix it up.

> The only way I can think of to fix this issue is to make the ORC
> unwinder aware of kretprobe (ie the patch I sent earlier). I'm hoping
> you have another idea if my patch isn't acceptable.

OK, anyway, your patch doesn't care the case that the multiple functions
are probed by kretprobes. In that case, you'll see several entries are
replaced by the kretprobe_trampoline. To find it correctly, you have
to pass a state-holder (@cur of the kretprobe_find_ret_addr())
to the fixup entries.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-06  1:13   ` Masami Hiramatsu
@ 2021-03-07 21:23     ` Daniel Xu
  2021-03-08  2:52       ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Xu @ 2021-03-07 21:23 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs, Josh Poimboeuf

On Sat, Mar 06, 2021 at 10:13:57AM +0900, Masami Hiramatsu wrote:
> On Fri, 5 Mar 2021 11:16:45 -0800
> Daniel Xu <dxu@dxuuu.xyz> wrote:
> 
> > Hi Masami,
> > 
> > On Sat, Mar 06, 2021 at 12:38:57AM +0900, Masami Hiramatsu wrote:
> > > Hello,
> > > 
> > > Here is a series of patches for kprobes and stacktracer to fix the kretprobe
> > > entries in the kernel stack. This was reported by Daniel Xu. I thought that
> > > was in the bpftrace, but it is actually more generic issue.
> > > So I decided to fix the issue in arch independent part.
> > > 
> > > While fixing the issue, I found a bug in ia64 related to kretprobe, which is
> > > fixed by [1/5]. [2/5] and [3/5] is a kind of cleanup before fixing the main
> > > issue. [4/5] is the patch to fix the stacktrace, which involves kretprobe
> > > internal change. And [5/5] removing the stacktrace kretprobe fixup code in
> > > ftrace. 
> > > 
> > > Daniel, can you also check that this fixes your issue too? I hope it is.
> > 
> > Unfortunately, this patch series does not fix the issue I reported.
> 
> Ah, OK. This was my mistake I didn't choose ORC unwinder for test kernel.
> 
> > 
> > I think the reason your tests work is because you're using ftrace and
> > the ORC unwinder is aware of ftrace trampolines (see
> > arch/x86/kernel/unwind_orc.c:orc_ftrace_find).
> 
> OK, so it has to be fixed in ORC unwinder.
> 
> > 
> > bpftrace kprobes go through perf event subsystem (ie not ftrace) so
> > naturally orc_ftrace_find() does not find an associated trampoline. ORC
> > unwinding fails in this case because
> > arch/x86/kernel/kprobes/core.c:trampoline_handler sets
> > 
> >     regs->ip = (unsigned long)&kretprobe_trampoline;
> > 
> > and `kretprobe_trampoline` is marked
> > 
> >     STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> > 
> > so it doesn't have a valid ORC entry. Thus, ORC immediately bails when
> > trying to unwind past the first frame.
> 
> Hm, in ftrace case, it decode kretprobe's stackframe and stoped right
> after kretprobe_trampoline callsite.
> 
>  => kretprobe_trace_func+0x21f/0x340
>  => kretprobe_dispatcher+0x73/0xb0
>  => __kretprobe_trampoline_handler+0xcd/0x1a0
>  => trampoline_handler+0x3d/0x50
>  => kretprobe_trampoline+0x25/0x50
>  => 0
> 
> kretprobe replaces the real return address with kretprobe_trampoline
> and kretprobe_trampoline *calls* trampoline_handler (this part depends
> on architecture implementation).
> Thus, if kretprobe_trampoline has no stack frame information, ORC may
> fails at the first kretprobe_trampoline+0x25, that is different from
> the kretprobe_trampoline+0, so the hack doesn't work.

I'm not sure I follow 100% what you're saying, but assuming you're
asking why bpftrace fails at `kretprobe_trampoline+0` instead of
`kretprobe_trampoline+0x25`:

`regs` is set to &kretprobe_trampoline:

    regs->ip = (unsigned long)&kretprobe_trampoline;

Then the kretprobe event is dispatched like this:

    kretprobe_trampoline_handler
      __kretprobe_trampoline_handler
        rp->handler // ie kernel/trace/trace_kprobe.c:kretprobe_dispatcher
          kretprobe_perf_func
            trace_bpf_call(.., regs)
              BPF_PROG_RUN_ARRAY_CHECK
                bpf_get_stackid(regs, .., ..) // in bpftrace prog 

And then `bpf_get_stackid` unwinds the stack via:

    bpf_get_stackid
      get_perf_callchain(regs, ...)
        perf_callchain_kernel(.., regs)
          perf_callchain_store(.., regs->ip) // !!! first unwound entry
          unwind_start
          unwind_next_frame

In summary: unwinding via BPF begins at regs->ip, which
`trampoline_handler` sets to `&kretprobe_trampoline+0x0`.

> Hmm, how the other inline-asm function makes its stackframe info?
> If I get the kretprobe_trampoline+0, I can fix it up.

So I think I misunderstood the mechanics before. I think `call
trampoline_handler` does set up a new frame. My current guess is that
ftrace doesn't thread through `regs` like the bpf code path does. I'm
not very familiar with ftrace internals so I haven't looked.

> 
> > The only way I can think of to fix this issue is to make the ORC
> > unwinder aware of kretprobe (ie the patch I sent earlier). I'm hoping
> > you have another idea if my patch isn't acceptable.
> 
> OK, anyway, your patch doesn't care the case that the multiple functions
> are probed by kretprobes. In that case, you'll see several entries are
> replaced by the kretprobe_trampoline. To find it correctly, you have
> to pass a state-holder (@cur of the kretprobe_find_ret_addr())
> to the fixup entries.

I'll see if I can figure something out tomorrow.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-07 21:23     ` Daniel Xu
@ 2021-03-08  2:52       ` Masami Hiramatsu
  2021-03-08 13:05         ` Masami Hiramatsu
                           ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-08  2:52 UTC (permalink / raw)
  To: Daniel Xu
  Cc: Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs, Josh Poimboeuf

On Sun, 7 Mar 2021 13:23:33 -0800
Daniel Xu <dxu@dxuuu.xyz> wrote:

> > kretprobe replaces the real return address with kretprobe_trampoline
> > and kretprobe_trampoline *calls* trampoline_handler (this part depends
> > on architecture implementation).
> > Thus, if kretprobe_trampoline has no stack frame information, ORC may
> > fails at the first kretprobe_trampoline+0x25, that is different from
> > the kretprobe_trampoline+0, so the hack doesn't work.
> 
> I'm not sure I follow 100% what you're saying, but assuming you're
> asking why bpftrace fails at `kretprobe_trampoline+0` instead of
> `kretprobe_trampoline+0x25`:
> 
> `regs` is set to &kretprobe_trampoline:
> 
>     regs->ip = (unsigned long)&kretprobe_trampoline;

Ah, OK. bpftrace does the adjustment.

> Then the kretprobe event is dispatched like this:
> 
>     kretprobe_trampoline_handler
>       __kretprobe_trampoline_handler
>         rp->handler // ie kernel/trace/trace_kprobe.c:kretprobe_dispatcher
>           kretprobe_perf_func
>             trace_bpf_call(.., regs)
>               BPF_PROG_RUN_ARRAY_CHECK
>                 bpf_get_stackid(regs, .., ..) // in bpftrace prog 
> 
> And then `bpf_get_stackid` unwinds the stack via:
> 
>     bpf_get_stackid
>       get_perf_callchain(regs, ...)
>         perf_callchain_kernel(.., regs)
>           perf_callchain_store(.., regs->ip) // !!! first unwound entry
>           unwind_start
>           unwind_next_frame
> 
> In summary: unwinding via BPF begins at regs->ip, which
> `trampoline_handler` sets to `&kretprobe_trampoline+0x0`.

OK, maybe you are using stack_trace_save_regs() with pt_regs instead of
stack_trace_save(). this means it started from regs at saved by
kretprobe always.

In the ftrace, we are using stack_trace_save() which is based on
the current stack, this means stack unwinder tracks back the stack of
kretprobe itself at first. So it saw the kretprobe_trampoline+0x25 
(return address of the trampoline_handler) and stop unwinding because
the call frame information (ORC information) and the return address
are not there.

This issue is not only the ftrace, but also backtrace in interrupt
handler and kretprobe handler.

> > Hmm, how the other inline-asm function makes its stackframe info?
> > If I get the kretprobe_trampoline+0, I can fix it up.
> 
> So I think I misunderstood the mechanics before. I think `call
> trampoline_handler` does set up a new frame. My current guess is that
> ftrace doesn't thread through `regs` like the bpf code path does. I'm
> not very familiar with ftrace internals so I haven't looked.

Yes, that's right. Since I made a kretprobe event on top of the ftrace
event framework, it doesn't pass the regs to the event trigger.


> > > The only way I can think of to fix this issue is to make the ORC
> > > unwinder aware of kretprobe (ie the patch I sent earlier). I'm hoping
> > > you have another idea if my patch isn't acceptable.
> > 
> > OK, anyway, your patch doesn't care the case that the multiple functions
> > are probed by kretprobes. In that case, you'll see several entries are
> > replaced by the kretprobe_trampoline. To find it correctly, you have
> > to pass a state-holder (@cur of the kretprobe_find_ret_addr())
> > to the fixup entries.
> 
> I'll see if I can figure something out tomorrow.

To help your understanding, let me explain.

If we have a code here

caller_func:
0x00 add sp, 0x20	/* 0x20 bytes stack frame allocated */
...
0x10 call target_func
0x15 ... /* return address */

On the stack in the entry of target_func, we have

[stack]
0x0e0 caller_func+0x15
... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
0x100 /* caller_func return address */

And when we put a kretprobe on the target_func, the stack will be

[stack]
0x0e0 kretprobe_trampoline
... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
0x100 /* caller_func return address */

* "caller_func+0x15" is saved in current->kretprobe_instances.first.

When returning from the target_func, call consumed the 0x0e0 and
jump to kretprobe_trampoline. Let's see the assembler code.

        ".text\n"
        ".global kretprobe_trampoline\n"
        ".type kretprobe_trampoline, @function\n"
        "kretprobe_trampoline:\n"
        /* We don't bother saving the ss register */
        "       pushq %rsp\n"
        "       pushfq\n"
        SAVE_REGS_STRING
        "       movq %rsp, %rdi\n"
        "       call trampoline_handler\n"
        /* Replace saved sp with true return address. */
        "       movq %rax, 19*8(%rsp)\n"
        RESTORE_REGS_STRING
        "       popfq\n"
        "       ret\n"

When the entry of trampoline_handler, stack is like this;

[stack]
0x040 kretprobe_trampoline+0x25
0x048 r15
...     /* pt_regs */
0x0d8 flags
0x0e0 rsp (=0x0e0)
... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
0x100 /* caller_func return address */

And after returned from trampoline_handler, "movq" changes the
stack like this.

[stack]
0x040 kretprobe_trampoline+0x25
0x048 r15
...     /* pt_regs */
0x0d8 flags
0x0e0 caller_func+0x15
... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
0x100 /* caller_func return address */


So at the kretprobe handler, we have 2 issues.
1) the return address (caller_func+0x15) is not on the stack.
   this can be solved by searching from current->kretprobe_instances.
2) the stack frame size of kretprobe_trampoline is unknown
   Since the stackframe is fixed, the fixed number (0x98) can be used.

However, those solutions are only for the kretprobe handler. The stacktrace
from interrupt handler hit in the kretprobe_trampoline still doesn't work.

So, here is my idea;

1) Change the trampline code to prepare stack frame at first and save
   registers on it, instead of "push". This will makes ORC easy to setup
   stackframe information for this code.
2) change the return addres fixup timing. Instead of using return value
   of trampoline handler, before removing the real return address from
   current->kretprobe_instances.
3) Then, if orc_find() finds the ip is in the kretprobe_trampoline, it
   checks the contents of the end of stackframe (at the place of regs->sp)
   is same as the address of it. If it is, it can find the correct address
   from current->kretprobe_instances. If not, there is a correct address.

I need to find how the ORC info is prepared for 1), maybe UNWIND_HINT macro
may help?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-08  2:52       ` Masami Hiramatsu
@ 2021-03-08 13:05         ` Masami Hiramatsu
  2021-03-09  1:19         ` Josh Poimboeuf
  2021-03-09 21:34         ` Daniel Xu
  2 siblings, 0 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-08 13:05 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Daniel Xu, Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs, Josh Poimboeuf

On Mon, 8 Mar 2021 11:52:10 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> So, here is my idea;
> 
> 1) Change the trampline code to prepare stack frame at first and save
>    registers on it, instead of "push". This will makes ORC easy to setup
>    stackframe information for this code.
> 2) change the return addres fixup timing. Instead of using return value
>    of trampoline handler, before removing the real return address from
>    current->kretprobe_instances.
> 3) Then, if orc_find() finds the ip is in the kretprobe_trampoline, it
>    checks the contents of the end of stackframe (at the place of regs->sp)
>    is same as the address of it. If it is, it can find the correct address
>    from current->kretprobe_instances. If not, there is a correct address.

Another trickly idea is put a call on top of kretprobe_trampoline like this.

        "__kretprobe_trampoline:\n"
        "       call kretprobe_trampoline\n"
        "kretprobe_trampoline:\n"
        "       pushq %rsp\n"
        "       pushfq\n"
        SAVE_REGS_STRING
        "       movq %rsp, %rdi\n"
        "       call trampoline_handler\n"
        /* Replace __kretprobe_trampoline with true return address. */
        "       movq %rax, 20*8(%rsp)\n"
        RESTORE_REGS_STRING
        "       popfq\n"
        "       popq %rsp\n"
        "       ret\n"

This will leave a marker (kretprobe_trampoline or __kretprobe_trampoline+5) on
the top of stack, and the stack frame seems like a normal function. If objtool
can make an ORC info by disassembling kretprobe_trampoline, I guess it is
easy to make a stack frame information.

But anyway, from the inside of target function, it still see "__kretprobe_trampoline"
on the stack instead of caller_func, so orc_kretprobe_find() is still needed.

I'm not familier with the UNWIND_HINT macro, so if it is easy to handle the original
case, I think my first idea will be better.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-08  2:52       ` Masami Hiramatsu
  2021-03-08 13:05         ` Masami Hiramatsu
@ 2021-03-09  1:19         ` Josh Poimboeuf
  2021-03-10  9:57           ` Masami Hiramatsu
  2021-03-09 21:34         ` Daniel Xu
  2 siblings, 1 reply; 28+ messages in thread
From: Josh Poimboeuf @ 2021-03-09  1:19 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Daniel Xu, Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

On Mon, Mar 08, 2021 at 11:52:10AM +0900, Masami Hiramatsu wrote:
> So at the kretprobe handler, we have 2 issues.
> 1) the return address (caller_func+0x15) is not on the stack.
>    this can be solved by searching from current->kretprobe_instances.
> 2) the stack frame size of kretprobe_trampoline is unknown
>    Since the stackframe is fixed, the fixed number (0x98) can be used.
> 
> However, those solutions are only for the kretprobe handler. The stacktrace
> from interrupt handler hit in the kretprobe_trampoline still doesn't work.
> 
> So, here is my idea;
> 
> 1) Change the trampline code to prepare stack frame at first and save
>    registers on it, instead of "push". This will makes ORC easy to setup
>    stackframe information for this code.
> 2) change the return addres fixup timing. Instead of using return value
>    of trampoline handler, before removing the real return address from
>    current->kretprobe_instances.
> 3) Then, if orc_find() finds the ip is in the kretprobe_trampoline, it
>    checks the contents of the end of stackframe (at the place of regs->sp)
>    is same as the address of it. If it is, it can find the correct address
>    from current->kretprobe_instances. If not, there is a correct address.
> 
> I need to find how the ORC info is prepared for 1), maybe UNWIND_HINT macro
> may help?

Hi Masami,

If I understand correctly, for #1 you need an unwind hint which treats
the instruction *after* the "pushq %rsp" as the beginning of the
function.

I'm thinking this should work:


diff --git a/arch/x86/include/asm/unwind_hints.h b/arch/x86/include/asm/unwind_hints.h
index 8e574c0afef8..8b33674288ea 100644
--- a/arch/x86/include/asm/unwind_hints.h
+++ b/arch/x86/include/asm/unwind_hints.h
@@ -52,6 +52,11 @@
 	UNWIND_HINT sp_reg=ORC_REG_SP sp_offset=8 type=UNWIND_HINT_TYPE_FUNC
 .endm
 
+#else
+
+#define UNWIND_HINT_FUNC \
+	UNWIND_HINT(ORC_REG_SP, 8, UNWIND_HINT_TYPE_FUNC, 0)
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_UNWIND_HINTS_H */
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index df776cdca327..38ff1387f95d 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -767,6 +767,7 @@ asm(
 	/* We don't bother saving the ss register */
 #ifdef CONFIG_X86_64
 	"	pushq %rsp\n"
+	UNWIND_HINT_FUNC
 	"	pushfq\n"
 	SAVE_REGS_STRING
 	"	movq %rsp, %rdi\n"
@@ -790,7 +791,6 @@ asm(
 	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
 );
 NOKPROBE_SYMBOL(kretprobe_trampoline);
-STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
 
 
 /*


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-08  2:52       ` Masami Hiramatsu
  2021-03-08 13:05         ` Masami Hiramatsu
  2021-03-09  1:19         ` Josh Poimboeuf
@ 2021-03-09 21:34         ` Daniel Xu
  2021-03-10 10:50           ` Masami Hiramatsu
  2 siblings, 1 reply; 28+ messages in thread
From: Daniel Xu @ 2021-03-09 21:34 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs, Josh Poimboeuf

Hi Masami,

Just want to clarify a few points:

On Mon, Mar 08, 2021 at 11:52:10AM +0900, Masami Hiramatsu wrote:
> On Sun, 7 Mar 2021 13:23:33 -0800
> Daniel Xu <dxu@dxuuu.xyz> wrote:
> To help your understanding, let me explain.
> 
> If we have a code here
> 
> caller_func:
> 0x00 add sp, 0x20	/* 0x20 bytes stack frame allocated */
> ...
> 0x10 call target_func
> 0x15 ... /* return address */
> 
> On the stack in the entry of target_func, we have
> 
> [stack]
> 0x0e0 caller_func+0x15
> ... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
> 0x100 /* caller_func return address */
> 
> And when we put a kretprobe on the target_func, the stack will be
> 
> [stack]
> 0x0e0 kretprobe_trampoline
> ... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
> 0x100 /* caller_func return address */
> 
> * "caller_func+0x15" is saved in current->kretprobe_instances.first.
> 
> When returning from the target_func, call consumed the 0x0e0 and
> jump to kretprobe_trampoline. Let's see the assembler code.
> 
>         ".text\n"
>         ".global kretprobe_trampoline\n"
>         ".type kretprobe_trampoline, @function\n"
>         "kretprobe_trampoline:\n"
>         /* We don't bother saving the ss register */
>         "       pushq %rsp\n"
>         "       pushfq\n"
>         SAVE_REGS_STRING
>         "       movq %rsp, %rdi\n"
>         "       call trampoline_handler\n"
>         /* Replace saved sp with true return address. */
>         "       movq %rax, 19*8(%rsp)\n"
>         RESTORE_REGS_STRING
>         "       popfq\n"
>         "       ret\n"
> 
> When the entry of trampoline_handler, stack is like this;
> 
> [stack]
> 0x040 kretprobe_trampoline+0x25
> 0x048 r15
> ...     /* pt_regs */
> 0x0d8 flags
> 0x0e0 rsp (=0x0e0)
> ... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
> 0x100 /* caller_func return address */
> 
> And after returned from trampoline_handler, "movq" changes the
> stack like this.
> 
> [stack]
> 0x040 kretprobe_trampoline+0x25
> 0x048 r15
> ...     /* pt_regs */
> 0x0d8 flags
> 0x0e0 caller_func+0x15
> ... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
> 0x100 /* caller_func return address */

Thanks for the detailed explanation. I think I understand kretprobe
mechanics from a somewhat high level (kprobe saves real return address
on entry, overwrites return address to trampoline, then trampoline
runs handler and finally resets return address to real return address).

I don't usually write much assembly so the details escape me somewhat.

> So at the kretprobe handler, we have 2 issues.
> 1) the return address (caller_func+0x15) is not on the stack.
>    this can be solved by searching from current->kretprobe_instances.

Yes, agreed.

> 2) the stack frame size of kretprobe_trampoline is unknown
>    Since the stackframe is fixed, the fixed number (0x98) can be used.

I'm confused why this is relevant. Is it so ORC knows where to find
saved return address in the frame?

> However, those solutions are only for the kretprobe handler. The stacktrace
> from interrupt handler hit in the kretprobe_trampoline still doesn't work.
> 
> So, here is my idea;
> 
> 1) Change the trampline code to prepare stack frame at first and save
>    registers on it, instead of "push". This will makes ORC easy to setup
>    stackframe information for this code.

I'm confused on the details here. But this is what Josh solves in his
patch, right?

> 2) change the return addres fixup timing. Instead of using return value
>    of trampoline handler, before removing the real return address from
>    current->kretprobe_instances.

Is the idea to have `kretprobe_trampoline` place the real return address
on the stack (while telling ORC where to find it) _before_ running `call
trampoline_handler` ? So that an unwind from inside the user defined
kretprobe handler simply unwinds correctly?

And to be extra clear, this would only work for stack_trace_save() and
not stack_trace_save_regs()?

> 3) Then, if orc_find() finds the ip is in the kretprobe_trampoline, it
>    checks the contents of the end of stackframe (at the place of regs->sp)
>    is same as the address of it. If it is, it can find the correct address
>    from current->kretprobe_instances. If not, there is a correct address.

What do you mean by "it" w.r.t. "is the same address of it"? I'm
confused on this point.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-09  1:19         ` Josh Poimboeuf
@ 2021-03-10  9:57           ` Masami Hiramatsu
  2021-03-10 15:08             ` Josh Poimboeuf
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-10  9:57 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Daniel Xu, Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

Hi Josh,

On Mon, 8 Mar 2021 19:19:45 -0600
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Mon, Mar 08, 2021 at 11:52:10AM +0900, Masami Hiramatsu wrote:
> > So at the kretprobe handler, we have 2 issues.
> > 1) the return address (caller_func+0x15) is not on the stack.
> >    this can be solved by searching from current->kretprobe_instances.
> > 2) the stack frame size of kretprobe_trampoline is unknown
> >    Since the stackframe is fixed, the fixed number (0x98) can be used.
> > 
> > However, those solutions are only for the kretprobe handler. The stacktrace
> > from interrupt handler hit in the kretprobe_trampoline still doesn't work.
> > 
> > So, here is my idea;
> > 
> > 1) Change the trampline code to prepare stack frame at first and save
> >    registers on it, instead of "push". This will makes ORC easy to setup
> >    stackframe information for this code.
> > 2) change the return addres fixup timing. Instead of using return value
> >    of trampoline handler, before removing the real return address from
> >    current->kretprobe_instances.
> > 3) Then, if orc_find() finds the ip is in the kretprobe_trampoline, it
> >    checks the contents of the end of stackframe (at the place of regs->sp)
> >    is same as the address of it. If it is, it can find the correct address
> >    from current->kretprobe_instances. If not, there is a correct address.
> > 
> > I need to find how the ORC info is prepared for 1), maybe UNWIND_HINT macro
> > may help?
> 
> Hi Masami,
> 
> If I understand correctly, for #1 you need an unwind hint which treats
> the instruction *after* the "pushq %rsp" as the beginning of the
> function.

Thanks for the patch. In that case, should I still change the stack allocation?
Or can I continue to use a series of "push/pop" ?

> 
> I'm thinking this should work:

OK, Let me test it.

Thanks!

> 
> 
> diff --git a/arch/x86/include/asm/unwind_hints.h b/arch/x86/include/asm/unwind_hints.h
> index 8e574c0afef8..8b33674288ea 100644
> --- a/arch/x86/include/asm/unwind_hints.h
> +++ b/arch/x86/include/asm/unwind_hints.h
> @@ -52,6 +52,11 @@
>  	UNWIND_HINT sp_reg=ORC_REG_SP sp_offset=8 type=UNWIND_HINT_TYPE_FUNC
>  .endm
>  
> +#else
> +
> +#define UNWIND_HINT_FUNC \
> +	UNWIND_HINT(ORC_REG_SP, 8, UNWIND_HINT_TYPE_FUNC, 0)
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_X86_UNWIND_HINTS_H */
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index df776cdca327..38ff1387f95d 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -767,6 +767,7 @@ asm(
>  	/* We don't bother saving the ss register */
>  #ifdef CONFIG_X86_64
>  	"	pushq %rsp\n"
> +	UNWIND_HINT_FUNC
>  	"	pushfq\n"
>  	SAVE_REGS_STRING
>  	"	movq %rsp, %rdi\n"
> @@ -790,7 +791,6 @@ asm(
>  	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
>  );
>  NOKPROBE_SYMBOL(kretprobe_trampoline);
> -STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
>  
>  
>  /*
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-09 21:34         ` Daniel Xu
@ 2021-03-10 10:50           ` Masami Hiramatsu
  0 siblings, 0 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-10 10:50 UTC (permalink / raw)
  To: Daniel Xu
  Cc: Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel, bpf, kuba,
	mingo, ast, tglx, kernel-team, yhs, Josh Poimboeuf

On Tue, 9 Mar 2021 13:34:42 -0800
Daniel Xu <dxu@dxuuu.xyz> wrote:

> Hi Masami,
> 
> Just want to clarify a few points:
> 
> On Mon, Mar 08, 2021 at 11:52:10AM +0900, Masami Hiramatsu wrote:
> > On Sun, 7 Mar 2021 13:23:33 -0800
> > Daniel Xu <dxu@dxuuu.xyz> wrote:
> > To help your understanding, let me explain.
> > 
> > If we have a code here
> > 
> > caller_func:
> > 0x00 add sp, 0x20	/* 0x20 bytes stack frame allocated */
> > ...
> > 0x10 call target_func
> > 0x15 ... /* return address */
> > 
> > On the stack in the entry of target_func, we have
> > 
> > [stack]
> > 0x0e0 caller_func+0x15
> > ... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
> > 0x100 /* caller_func return address */
> > 
> > And when we put a kretprobe on the target_func, the stack will be
> > 
> > [stack]
> > 0x0e0 kretprobe_trampoline
> > ... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
> > 0x100 /* caller_func return address */
> > 
> > * "caller_func+0x15" is saved in current->kretprobe_instances.first.
> > 
> > When returning from the target_func, call consumed the 0x0e0 and
> > jump to kretprobe_trampoline. Let's see the assembler code.
> > 
> >         ".text\n"
> >         ".global kretprobe_trampoline\n"
> >         ".type kretprobe_trampoline, @function\n"
> >         "kretprobe_trampoline:\n"
> >         /* We don't bother saving the ss register */
> >         "       pushq %rsp\n"
> >         "       pushfq\n"
> >         SAVE_REGS_STRING
> >         "       movq %rsp, %rdi\n"
> >         "       call trampoline_handler\n"
> >         /* Replace saved sp with true return address. */
> >         "       movq %rax, 19*8(%rsp)\n"
> >         RESTORE_REGS_STRING
> >         "       popfq\n"
> >         "       ret\n"
> > 
> > When the entry of trampoline_handler, stack is like this;
> > 
> > [stack]
> > 0x040 kretprobe_trampoline+0x25
> > 0x048 r15
> > ...     /* pt_regs */
> > 0x0d8 flags
> > 0x0e0 rsp (=0x0e0)
> > ... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
> > 0x100 /* caller_func return address */
> > 
> > And after returned from trampoline_handler, "movq" changes the
> > stack like this.
> > 
> > [stack]
> > 0x040 kretprobe_trampoline+0x25
> > 0x048 r15
> > ...     /* pt_regs */
> > 0x0d8 flags
> > 0x0e0 caller_func+0x15
> > ... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
> > 0x100 /* caller_func return address */
> 
> Thanks for the detailed explanation. I think I understand kretprobe
> mechanics from a somewhat high level (kprobe saves real return address
> on entry, overwrites return address to trampoline, then trampoline
> runs handler and finally resets return address to real return address).
> 
> I don't usually write much assembly so the details escape me somewhat.
> 
> > So at the kretprobe handler, we have 2 issues.
> > 1) the return address (caller_func+0x15) is not on the stack.
> >    this can be solved by searching from current->kretprobe_instances.
> 
> Yes, agreed.
> 
> > 2) the stack frame size of kretprobe_trampoline is unknown
> >    Since the stackframe is fixed, the fixed number (0x98) can be used.
> 
> I'm confused why this is relevant. Is it so ORC knows where to find
> saved return address in the frame?

No, because the kretprobe_trampoline is somewhat special. Usually, at the
function entry, there is a return address on the top of stack, but
kretprobe_trampoline doesn't have it.
So we have to put a hint at the function entry to mark there should be
a next return address. (and ORC unwinder must find it)

> > However, those solutions are only for the kretprobe handler. The stacktrace
> > from interrupt handler hit in the kretprobe_trampoline still doesn't work.
> > 
> > So, here is my idea;
> > 
> > 1) Change the trampline code to prepare stack frame at first and save
> >    registers on it, instead of "push". This will makes ORC easy to setup
> >    stackframe information for this code.
> 
> I'm confused on the details here. But this is what Josh solves in his
> patch, right?

I'm not so sure how objtool makes the ORC information. If it can trace the
push/pop correctly, yes, it is solved.

> > 2) change the return addres fixup timing. Instead of using return value
> >    of trampoline handler, before removing the real return address from
> >    current->kretprobe_instances.
> 
> Is the idea to have `kretprobe_trampoline` place the real return address
> on the stack (while telling ORC where to find it) _before_ running `call
> trampoline_handler` ? So that an unwind from inside the user defined
> kretprobe handler simply unwinds correctly?

No, unless calling the trampoline_handler, we can not get the real return
address. Thus, the __kretprobe_trampoline_handler() will call return address
fixup function right before unlink the current->kretprobe_instances.

> And to be extra clear, this would only work for stack_trace_save() and
> not stack_trace_save_regs()?

Yes, for the stack_trace_save_regs() and the stack-tracing inside the
kretprobe'd target function, we still need a hack as same as orc_ftrace_find().

> 
> > 3) Then, if orc_find() finds the ip is in the kretprobe_trampoline, it
> >    checks the contents of the end of stackframe (at the place of regs->sp)
> >    is same as the address of it. If it is, it can find the correct address
> >    from current->kretprobe_instances. If not, there is a correct address.
> 
> What do you mean by "it" w.r.t. "is the same address of it"? I'm
> confused on this point.

Oh I meant,

3) Then, if orc_find() finds the ip is in the kretprobe_trampoline, orc_find()
    checks the contents of the end of stackframe (at the place of regs->sp)
   is same as the address of the stackframe (Note that kretprobe_trampoline
   does "push %sp" at first). If so, orc_find() can find the correct address
   from current->kretprobe_instances. If not, there is a correct address.

I need to see the orc unwinder carefully, orc_find() only gets the ip but
to find stackframe, I think this should be fixed in the caller of orc_find().

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 3/5] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
  2021-03-05 15:39 ` [PATCH -tip 3/5] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler() Masami Hiramatsu
@ 2021-03-10 14:21   ` Miroslav Benes
  2021-03-10 15:42     ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Miroslav Benes @ 2021-03-10 14:21 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Ingo Molnar, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

Hi Masami,

> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -205,15 +205,23 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
>  				   struct pt_regs *regs);
>  extern int arch_trampoline_kprobe(struct kprobe *p);
>  
> +void kretprobe_trampoline(void);
> +/*
> + * Since some architecture uses structured function pointer,
> + * use arch_deref_entry_point() to get real function address.

s/arch_deref_entry_point/dereference_function_descriptor/ ?

> + */
> +static nokprobe_inline void *kretprobe_trampoline_addr(void)
> +{
> +	return dereference_function_descriptor(kretprobe_trampoline);
> +}
> +

Would it make sense to use this in s390 and powerpc reliable unwinders?

Both

arch/s390/kernel/stacktrace.c:arch_stack_walk_reliable()
arch/powerpc/kernel/stacktrace.c:__save_stack_trace_tsk_reliable()

have

	if (state.ip == (unsigned long)kretprobe_trampoline)
		return -EINVAL;

which you wanted to hide previously if I am not mistaken.

Miroslav

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-10  9:57           ` Masami Hiramatsu
@ 2021-03-10 15:08             ` Josh Poimboeuf
  2021-03-10 15:55               ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Josh Poimboeuf @ 2021-03-10 15:08 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Daniel Xu, Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

On Wed, Mar 10, 2021 at 06:57:34PM +0900, Masami Hiramatsu wrote:
> > If I understand correctly, for #1 you need an unwind hint which treats
> > the instruction *after* the "pushq %rsp" as the beginning of the
> > function.
> 
> Thanks for the patch. In that case, should I still change the stack allocation?
> Or can I continue to use a series of "push/pop" ?

You can continue to use push/pop.  Objtool is only getting confused by
the unbalanced stack of the function (more pushes than pops).  The
unwind hint should fix that.

-- 
Josh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 3/5] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
  2021-03-10 14:21   ` Miroslav Benes
@ 2021-03-10 15:42     ` Masami Hiramatsu
  2021-03-11 13:37       ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-10 15:42 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Steven Rostedt, Ingo Molnar, X86 ML, Daniel Xu, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

On Wed, 10 Mar 2021 15:21:01 +0100 (CET)
Miroslav Benes <mbenes@suse.cz> wrote:

> Hi Masami,
> 
> > --- a/include/linux/kprobes.h
> > +++ b/include/linux/kprobes.h
> > @@ -205,15 +205,23 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
> >  				   struct pt_regs *regs);
> >  extern int arch_trampoline_kprobe(struct kprobe *p);
> >  
> > +void kretprobe_trampoline(void);
> > +/*
> > + * Since some architecture uses structured function pointer,
> > + * use arch_deref_entry_point() to get real function address.
> 
> s/arch_deref_entry_point/dereference_function_descriptor/ ?

Ah, I missed it. Thanks!

> 
> > + */
> > +static nokprobe_inline void *kretprobe_trampoline_addr(void)
> > +{
> > +	return dereference_function_descriptor(kretprobe_trampoline);
> > +}
> > +
> 
> Would it make sense to use this in s390 and powerpc reliable unwinders?
> 
> Both
> 
> arch/s390/kernel/stacktrace.c:arch_stack_walk_reliable()
> arch/powerpc/kernel/stacktrace.c:__save_stack_trace_tsk_reliable()
> 
> have
> 
> 	if (state.ip == (unsigned long)kretprobe_trampoline)
> 		return -EINVAL;
> 
> which you wanted to hide previously if I am not mistaken.

I think, if "ip" means "instruction pointer", it should point
the real instruction address, which is dereferenced from function
descriptor. So using kretprobe_trampoline_addr() is good.

But of course, it depends on the architecture. I'm not familiar
with the powerpc and s390, I need those maintainer's help...

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-10 15:08             ` Josh Poimboeuf
@ 2021-03-10 15:55               ` Masami Hiramatsu
  2021-03-10 18:31                 ` Josh Poimboeuf
  2021-03-10 22:46                 ` Daniel Xu
  0 siblings, 2 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-10 15:55 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Daniel Xu, Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

Hi Josh and Daniel,

On Wed, 10 Mar 2021 09:08:45 -0600
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Wed, Mar 10, 2021 at 06:57:34PM +0900, Masami Hiramatsu wrote:
> > > If I understand correctly, for #1 you need an unwind hint which treats
> > > the instruction *after* the "pushq %rsp" as the beginning of the
> > > function.
> > 
> > Thanks for the patch. In that case, should I still change the stack allocation?
> > Or can I continue to use a series of "push/pop" ?
> 
> You can continue to use push/pop.  Objtool is only getting confused by
> the unbalanced stack of the function (more pushes than pops).  The
> unwind hint should fix that.

With you patch, I made a fix for ORC unwinder. I confirmed it works with
2 multiple kretprobes on the call path like this ;

cd /sys/kernel/debug/tracing/
echo r vfs_read >> kprobe_events
echo r full_proxy_read >> kprobe_events
echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
echo 1 > events/kprobes/enable
echo 1 > options/sym-offset
cat /sys/kernel/debug/kprobes/list
echo 0 > events/kprobes/enable
cat trace

# tracer: nop
#
# entries-in-buffer/entries-written: 3/3   #P:8
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| /     delay
#           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
#              | |         |   ||||      |         |
           <...>-136     [004] ...1   648.481281: r_full_proxy_read_0: (vfs_read+0xab/0x1a0 <- full_proxy_read)
           <...>-136     [004] ...1   648.481310: <stack trace>
 => kretprobe_trace_func+0x209/0x2f0
 => kretprobe_dispatcher+0x4a/0x70
 => __kretprobe_trampoline_handler+0xcd/0x170
 => trampoline_handler+0x3d/0x50
 => kretprobe_trampoline+0x25/0x50
 => vfs_read+0xab/0x1a0
 => ksys_read+0x5f/0xe0
 => do_syscall_64+0x33/0x40
 => entry_SYSCALL_64_after_hwframe+0x44/0xae
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0

I didn't tested it with bpftrace, but this also handles 
regs->ip == kretprobe_trampoline case. So it should work.

commit aa452d999b524b1851f69cc947be3e1a2f3ca1ec
Author: Masami Hiramatsu <mhiramat@kernel.org>
Date:   Sat Mar 6 08:34:51 2021 +0900

    x86/unwind/orc: Fixup kretprobe trampoline entry
    
    Since the kretprobe replaces the function return address with
    the kretprobe_trampoline on the stack, the ORC unwinder can not
    continue the stack unwinding at that point.
    
    To fix this issue, correct state->ip as like as function-graph
    tracer in the unwind_next_frame().
    
    Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>

diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 70fc159ebe69..ab5e45b848d5 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -4,6 +4,7 @@
 
 #include <linux/sched.h>
 #include <linux/ftrace.h>
+#include <linux/llist.h>
 #include <asm/ptrace.h>
 #include <asm/stacktrace.h>
 
@@ -20,6 +21,9 @@ struct unwind_state {
 	bool signal, full_regs;
 	unsigned long sp, bp, ip;
 	struct pt_regs *regs, *prev_regs;
+#if defined(CONFIG_KRETPROBES)
+	struct llist_node *kr_iter;
+#endif
 #elif defined(CONFIG_UNWINDER_FRAME_POINTER)
 	bool got_irq;
 	unsigned long *bp, *orig_sp, ip;
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 2a1d47f47eee..94869516cfc0 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -2,6 +2,7 @@
 #include <linux/objtool.h>
 #include <linux/module.h>
 #include <linux/sort.h>
+#include <linux/kprobes.h>
 #include <asm/ptrace.h>
 #include <asm/stacktrace.h>
 #include <asm/unwind.h>
@@ -414,6 +415,30 @@ static bool get_reg(struct unwind_state *state, unsigned int reg_off,
 	return false;
 }
 
+#ifdef CONFIG_KRETPROBES
+static unsigned long orc_kretprobe_correct_ip(struct unwind_state *state)
+{
+	return kretprobe_find_ret_addr(
+			(unsigned long)kretprobe_trampoline_addr(),
+			state->task, &state->kr_iter);
+}
+
+static bool is_kretprobe_trampoline_address(unsigned long ip)
+{
+	return ip == (unsigned long)kretprobe_trampoline_addr();
+}
+#else
+static unsigned long orc_kretprobe_correct_ip(struct unwind_state *state)
+{
+	return state->ip;
+}
+
+static bool is_kretprobe_trampoline_address(unsigned long ip)
+{
+	return false;
+}
+#endif
+
 bool unwind_next_frame(struct unwind_state *state)
 {
 	unsigned long ip_p, sp, tmp, orig_ip = state->ip, prev_sp = state->sp;
@@ -536,6 +561,18 @@ bool unwind_next_frame(struct unwind_state *state)
 
 		state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
 						  state->ip, (void *)ip_p);
+		/*
+		 * There are special cases when the stack unwinder is called
+		 * from the kretprobe handler or the interrupt handler which
+		 * occurs in the kretprobe trampoline code. In those cases,
+		 * %sp is shown on the stack instead of the return address.
+		 * Or, when the unwinder find the return address is replaced
+		 * by kretprobe_trampoline.
+		 * In those cases, correct address can be found in kretprobe.
+		 */
+		if (state->ip == sp ||
+		    is_kretprobe_trampoline_address(state->ip))
+			state->ip = orc_kretprobe_correct_ip(state);
 
 		state->sp = sp;
 		state->regs = NULL;
@@ -649,6 +686,12 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
 		state->full_regs = true;
 		state->signal = true;
 
+		/*
+		 * When the unwinder called with regs from kretprobe handler,
+		 * the regs->ip starts from kretprobe_trampoline address.
+		 */
+		if (is_kretprobe_trampoline_address(state->ip))
+			state->ip = orc_kretprobe_correct_ip(state);
 	} else if (task == current) {
 		asm volatile("lea (%%rip), %0\n\t"
 			     "mov %%rsp, %1\n\t"
-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-10 15:55               ` Masami Hiramatsu
@ 2021-03-10 18:31                 ` Josh Poimboeuf
  2021-03-11  0:20                   ` Masami Hiramatsu
  2021-03-10 22:46                 ` Daniel Xu
  1 sibling, 1 reply; 28+ messages in thread
From: Josh Poimboeuf @ 2021-03-10 18:31 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Daniel Xu, Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

On Thu, Mar 11, 2021 at 12:55:09AM +0900, Masami Hiramatsu wrote:
> +#ifdef CONFIG_KRETPROBES
> +static unsigned long orc_kretprobe_correct_ip(struct unwind_state *state)
> +{
> +	return kretprobe_find_ret_addr(
> +			(unsigned long)kretprobe_trampoline_addr(),
> +			state->task, &state->kr_iter);
> +}
> +
> +static bool is_kretprobe_trampoline_address(unsigned long ip)
> +{
> +	return ip == (unsigned long)kretprobe_trampoline_addr();
> +}
> +#else
> +static unsigned long orc_kretprobe_correct_ip(struct unwind_state *state)
> +{
> +	return state->ip;
> +}
> +
> +static bool is_kretprobe_trampoline_address(unsigned long ip)
> +{
> +	return false;
> +}
> +#endif
> +

Can this code go in a kprobes file?  I'd rather not clutter ORC with it,
and maybe it would be useful for other arches or unwinders.

>  bool unwind_next_frame(struct unwind_state *state)
>  {
>  	unsigned long ip_p, sp, tmp, orig_ip = state->ip, prev_sp = state->sp;
> @@ -536,6 +561,18 @@ bool unwind_next_frame(struct unwind_state *state)
>  
>  		state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
>  						  state->ip, (void *)ip_p);
> +		/*
> +		 * There are special cases when the stack unwinder is called
> +		 * from the kretprobe handler or the interrupt handler which
> +		 * occurs in the kretprobe trampoline code. In those cases,
> +		 * %sp is shown on the stack instead of the return address.
> +		 * Or, when the unwinder find the return address is replaced
> +		 * by kretprobe_trampoline.
> +		 * In those cases, correct address can be found in kretprobe.
> +		 */
> +		if (state->ip == sp ||

Why is the 'state->ip == sp' needed?

> +		    is_kretprobe_trampoline_address(state->ip))
> +			state->ip = orc_kretprobe_correct_ip(state);

This is similar in concept to ftrace_graph_ret_addr(), right?  Would it
be possible to have a similar API?  Like

		state->ip = kretprobe_ret_addr(state->task, &state->kr_iter, state->ip);

and without the conditional.

>  
>  		state->sp = sp;
>  		state->regs = NULL;
> @@ -649,6 +686,12 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
>  		state->full_regs = true;
>  		state->signal = true;
>  
> +		/*
> +		 * When the unwinder called with regs from kretprobe handler,
> +		 * the regs->ip starts from kretprobe_trampoline address.
> +		 */
> +		if (is_kretprobe_trampoline_address(state->ip))
> +			state->ip = orc_kretprobe_correct_ip(state);

Shouldn't __kretprobe_trampoline_handler() just set regs->ip to
'correct_ret_addr' before passing the regs to the handler?  I'd think
that would be a less surprising value for regs->ip than
'&kretprobe_trampoline'.

And it would make the unwinder just work automatically when unwinding
from the handler using the regs.

It would also work when unwinding from the handler's stack, if we put an
UNWIND_HINT_REGS after saving the regs.

The only (rare) case it wouldn't work would be unwinding from an
interrupt before regs->ip gets set properly.  In which case we'd still
need the above call to orc_kretprobe_correct_ip() or so.

-- 
Josh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-10 15:55               ` Masami Hiramatsu
  2021-03-10 18:31                 ` Josh Poimboeuf
@ 2021-03-10 22:46                 ` Daniel Xu
  1 sibling, 0 replies; 28+ messages in thread
From: Daniel Xu @ 2021-03-10 22:46 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Josh Poimboeuf, Steven Rostedt, Ingo Molnar, X86 ML,
	linux-kernel, bpf, kuba, mingo, ast, tglx, kernel-team, yhs

On Thu, Mar 11, 2021 at 12:55:09AM +0900, Masami Hiramatsu wrote:
> Hi Josh and Daniel,
<...>

> commit aa452d999b524b1851f69cc947be3e1a2f3ca1ec
> Author: Masami Hiramatsu <mhiramat@kernel.org>
> Date:   Sat Mar 6 08:34:51 2021 +0900
> 
>     x86/unwind/orc: Fixup kretprobe trampoline entry
>     
>     Since the kretprobe replaces the function return address with
>     the kretprobe_trampoline on the stack, the ORC unwinder can not
>     continue the stack unwinding at that point.
>     
>     To fix this issue, correct state->ip as like as function-graph
>     tracer in the unwind_next_frame().
>     
>     Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> 

I applied your original patchset + Josh's patch + this patch and can
confirm it works for bpftrace + kretprobes.

Tested-by: Daniel Xu <dxu@dxuuu.xyz>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-10 18:31                 ` Josh Poimboeuf
@ 2021-03-11  0:20                   ` Masami Hiramatsu
  2021-03-11  1:06                     ` Josh Poimboeuf
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-11  0:20 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Daniel Xu, Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

On Wed, 10 Mar 2021 12:31:13 -0600
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Thu, Mar 11, 2021 at 12:55:09AM +0900, Masami Hiramatsu wrote:
> > +#ifdef CONFIG_KRETPROBES
> > +static unsigned long orc_kretprobe_correct_ip(struct unwind_state *state)
> > +{
> > +	return kretprobe_find_ret_addr(
> > +			(unsigned long)kretprobe_trampoline_addr(),
> > +			state->task, &state->kr_iter);
> > +}
> > +
> > +static bool is_kretprobe_trampoline_address(unsigned long ip)
> > +{
> > +	return ip == (unsigned long)kretprobe_trampoline_addr();
> > +}
> > +#else
> > +static unsigned long orc_kretprobe_correct_ip(struct unwind_state *state)
> > +{
> > +	return state->ip;
> > +}
> > +
> > +static bool is_kretprobe_trampoline_address(unsigned long ip)
> > +{
> > +	return false;
> > +}
> > +#endif
> > +
> 
> Can this code go in a kprobes file?  I'd rather not clutter ORC with it,
> and maybe it would be useful for other arches or unwinders.

Yes, anyway dummy kretprobe_find_ret_addr() and kretprobe_trampoline_addr()
should be defined !CONFIG_KRETPROBES case.

> 
> >  bool unwind_next_frame(struct unwind_state *state)
> >  {
> >  	unsigned long ip_p, sp, tmp, orig_ip = state->ip, prev_sp = state->sp;
> > @@ -536,6 +561,18 @@ bool unwind_next_frame(struct unwind_state *state)
> >  
> >  		state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> >  						  state->ip, (void *)ip_p);
> > +		/*
> > +		 * There are special cases when the stack unwinder is called
> > +		 * from the kretprobe handler or the interrupt handler which
> > +		 * occurs in the kretprobe trampoline code. In those cases,
> > +		 * %sp is shown on the stack instead of the return address.
> > +		 * Or, when the unwinder find the return address is replaced
> > +		 * by kretprobe_trampoline.
> > +		 * In those cases, correct address can be found in kretprobe.
> > +		 */
> > +		if (state->ip == sp ||
> 
> Why is the 'state->ip == sp' needed?

As I commented above, until kretprobe_trampoline writes back the real
address to the stack, sp value is there (which has been pushed by the
'pushq %rsp' at the entry of kretprobe_trampoline.)

        ".type kretprobe_trampoline, @function\n"
        "kretprobe_trampoline:\n"
        /* We don't bother saving the ss register */
        "       pushq %rsp\n"				// THIS
        "       pushfq\n"

Thus, from inside the kretprobe handler, like ftrace, you'll see
the sp value instead of the real return address.

> > +		    is_kretprobe_trampoline_address(state->ip))
> > +			state->ip = orc_kretprobe_correct_ip(state);
> 
> This is similar in concept to ftrace_graph_ret_addr(), right?  Would it
> be possible to have a similar API?  Like
> 
> 		state->ip = kretprobe_ret_addr(state->task, &state->kr_iter, state->ip);

OK, but,

> and without the conditional.

As I said, it is not possible because "state->ip == sp" check depends on
ORC unwinder.

> >  		state->sp = sp;
> >  		state->regs = NULL;
> > @@ -649,6 +686,12 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
> >  		state->full_regs = true;
> >  		state->signal = true;
> >  
> > +		/*
> > +		 * When the unwinder called with regs from kretprobe handler,
> > +		 * the regs->ip starts from kretprobe_trampoline address.
> > +		 */
> > +		if (is_kretprobe_trampoline_address(state->ip))
> > +			state->ip = orc_kretprobe_correct_ip(state);
> 
> Shouldn't __kretprobe_trampoline_handler() just set regs->ip to
> 'correct_ret_addr' before passing the regs to the handler?  I'd think
> that would be a less surprising value for regs->ip than
> '&kretprobe_trampoline'.

Hmm, actually current implementation on x86 mimics the behevior of
the int3 exception (which many architectures still do).

Previously the kretprobe_trampoline is a place holder like this.

        "kretprobe_trampoline:\n"
        "       nop\n"

And arch_init_kprobes() puts a kprobe (int3) there.
So in that case regs->ip should be kretprobe_trampoline.
User handler (usually architecutre independent) finds the
correct_ret_addr in kretprobe_instance.ret_addr field.

> And it would make the unwinder just work automatically when unwinding
> from the handler using the regs.
> 
> It would also work when unwinding from the handler's stack, if we put an
> UNWIND_HINT_REGS after saving the regs.

At that moment, the real return address is not identified. So we can not
put it.

> 
> The only (rare) case it wouldn't work would be unwinding from an
> interrupt before regs->ip gets set properly.  In which case we'd still
> need the above call to orc_kretprobe_correct_ip() or so.


Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-11  0:20                   ` Masami Hiramatsu
@ 2021-03-11  1:06                     ` Josh Poimboeuf
  2021-03-11  1:54                       ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Josh Poimboeuf @ 2021-03-11  1:06 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Daniel Xu, Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

On Thu, Mar 11, 2021 at 09:20:18AM +0900, Masami Hiramatsu wrote:
> > >  bool unwind_next_frame(struct unwind_state *state)
> > >  {
> > >  	unsigned long ip_p, sp, tmp, orig_ip = state->ip, prev_sp = state->sp;
> > > @@ -536,6 +561,18 @@ bool unwind_next_frame(struct unwind_state *state)
> > >  
> > >  		state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> > >  						  state->ip, (void *)ip_p);
> > > +		/*
> > > +		 * There are special cases when the stack unwinder is called
> > > +		 * from the kretprobe handler or the interrupt handler which
> > > +		 * occurs in the kretprobe trampoline code. In those cases,
> > > +		 * %sp is shown on the stack instead of the return address.
> > > +		 * Or, when the unwinder find the return address is replaced
> > > +		 * by kretprobe_trampoline.
> > > +		 * In those cases, correct address can be found in kretprobe.
> > > +		 */
> > > +		if (state->ip == sp ||
> > 
> > Why is the 'state->ip == sp' needed?
> 
> As I commented above, until kretprobe_trampoline writes back the real
> address to the stack, sp value is there (which has been pushed by the
> 'pushq %rsp' at the entry of kretprobe_trampoline.)
> 
>         ".type kretprobe_trampoline, @function\n"
>         "kretprobe_trampoline:\n"
>         /* We don't bother saving the ss register */
>         "       pushq %rsp\n"				// THIS
>         "       pushfq\n"
> 
> Thus, from inside the kretprobe handler, like ftrace, you'll see
> the sp value instead of the real return address.

I see.  If you change is_kretprobe_trampoline_address() to include the
entire function, like:

static bool is_kretprobe_trampoline_address(unsigned long ip)
{
	return (void *)ip >= kretprobe_trampoline &&
	       (void *)ip < kretprobe_trampoline_end;
}

then the unwinder won't ever read the bogus %rsp value into state->ip,
and the 'state->ip == sp' check can be removed.

> > And it would make the unwinder just work automatically when unwinding
> > from the handler using the regs.
> > 
> > It would also work when unwinding from the handler's stack, if we put an
> > UNWIND_HINT_REGS after saving the regs.
> 
> At that moment, the real return address is not identified. So we can not
> put it.

True, at the time the regs are originally saved, the real return address
isn't available.  But by the time the user handler is called, the return
address *is* available.  So if the real return address were placed in
regs->ip before calling the handler, the unwinder could find it there,
when called from the handler.

Then we wouldn't need the call to orc_kretprobe_correct_ip() in
__unwind_start().

But maybe it's not possible due to the regs->ip expectations of legacy
handlers?

-- 
Josh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-11  1:06                     ` Josh Poimboeuf
@ 2021-03-11  1:54                       ` Masami Hiramatsu
  2021-03-11 16:51                         ` Josh Poimboeuf
  0 siblings, 1 reply; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-11  1:54 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Daniel Xu, Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

On Wed, 10 Mar 2021 19:06:15 -0600
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Thu, Mar 11, 2021 at 09:20:18AM +0900, Masami Hiramatsu wrote:
> > > >  bool unwind_next_frame(struct unwind_state *state)
> > > >  {
> > > >  	unsigned long ip_p, sp, tmp, orig_ip = state->ip, prev_sp = state->sp;
> > > > @@ -536,6 +561,18 @@ bool unwind_next_frame(struct unwind_state *state)
> > > >  
> > > >  		state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> > > >  						  state->ip, (void *)ip_p);
> > > > +		/*
> > > > +		 * There are special cases when the stack unwinder is called
> > > > +		 * from the kretprobe handler or the interrupt handler which
> > > > +		 * occurs in the kretprobe trampoline code. In those cases,
> > > > +		 * %sp is shown on the stack instead of the return address.
> > > > +		 * Or, when the unwinder find the return address is replaced
> > > > +		 * by kretprobe_trampoline.
> > > > +		 * In those cases, correct address can be found in kretprobe.
> > > > +		 */
> > > > +		if (state->ip == sp ||
> > > 
> > > Why is the 'state->ip == sp' needed?
> > 
> > As I commented above, until kretprobe_trampoline writes back the real
> > address to the stack, sp value is there (which has been pushed by the
> > 'pushq %rsp' at the entry of kretprobe_trampoline.)
> > 
> >         ".type kretprobe_trampoline, @function\n"
> >         "kretprobe_trampoline:\n"
> >         /* We don't bother saving the ss register */
> >         "       pushq %rsp\n"				// THIS
> >         "       pushfq\n"
> > 
> > Thus, from inside the kretprobe handler, like ftrace, you'll see
> > the sp value instead of the real return address.
> 
> I see.  If you change is_kretprobe_trampoline_address() to include the
> entire function, like:
> 
> static bool is_kretprobe_trampoline_address(unsigned long ip)
> {
> 	return (void *)ip >= kretprobe_trampoline &&
> 	       (void *)ip < kretprobe_trampoline_end;
> }
> 
> then the unwinder won't ever read the bogus %rsp value into state->ip,
> and the 'state->ip == sp' check can be removed.

Hmm, I couldn't get your point. Since sp is the address of stack,
it always out of text address.

> 
> > > And it would make the unwinder just work automatically when unwinding
> > > from the handler using the regs.
> > > 
> > > It would also work when unwinding from the handler's stack, if we put an
> > > UNWIND_HINT_REGS after saving the regs.
> > 
> > At that moment, the real return address is not identified. So we can not
> > put it.
> 
> True, at the time the regs are originally saved, the real return address
> isn't available.  But by the time the user handler is called, the return
> address *is* available.  So if the real return address were placed in
> regs->ip before calling the handler, the unwinder could find it there,
> when called from the handler.

OK, but this is not arch independent specification. I can make a hack
only for x86, but that is not clean implementation, hmm.

> 
> Then we wouldn't need the call to orc_kretprobe_correct_ip() in
> __unwind_start().

What about the ORC implementation in other architecture? Is that for x86 only?

> 
> But maybe it's not possible due to the regs->ip expectations of legacy
> handlers?

Usually, the legacy handlers will ignore it, the official way to access
the correct return address is kretprobe_instance.ret_addr. Because it is
arch independent.

Nowadays there are instruction_pointer() and instruction_pointer_set() APIs
in many (not all) architecutre, so I can try to replace to use it instead
of the kretprobe_instance.ret_addr.
(and it will break the out-of-tree codes)


Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 3/5] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
  2021-03-10 15:42     ` Masami Hiramatsu
@ 2021-03-11 13:37       ` Masami Hiramatsu
  0 siblings, 0 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-11 13:37 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Miroslav Benes, Steven Rostedt, Ingo Molnar, X86 ML, Daniel Xu,
	linux-kernel, bpf, kuba, mingo, ast, tglx, kernel-team, yhs

Hi Miroslav,

On Thu, 11 Mar 2021 00:42:25 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > > + */
> > > +static nokprobe_inline void *kretprobe_trampoline_addr(void)
> > > +{
> > > +	return dereference_function_descriptor(kretprobe_trampoline);
> > > +}
> > > +
> > 
> > Would it make sense to use this in s390 and powerpc reliable unwinders?
> > 
> > Both
> > 
> > arch/s390/kernel/stacktrace.c:arch_stack_walk_reliable()
> > arch/powerpc/kernel/stacktrace.c:__save_stack_trace_tsk_reliable()
> > 
> > have
> > 
> > 	if (state.ip == (unsigned long)kretprobe_trampoline)
> > 		return -EINVAL;
> > 
> > which you wanted to hide previously if I am not mistaken.
> 
> I think, if "ip" means "instruction pointer", it should point
> the real instruction address, which is dereferenced from function
> descriptor. So using kretprobe_trampoline_addr() is good.

Ah, sorry I misunderstood the question. 

Yes, the per-arch stacktrace implementation must be fixed afterwards.
It is reliable or not depends on the actual unwinder implementation,
for example, on x86, frame-pointer based unwinder can unwind kretprobe,
but ORC based one doesn't (and discussing with Josh how to solve it)

Anyway since it strongly depends on the architecture, I would like to
leave those for each architecture stacktrace maitainer in this series.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-11  1:54                       ` Masami Hiramatsu
@ 2021-03-11 16:51                         ` Josh Poimboeuf
  2021-03-12  3:22                           ` Masami Hiramatsu
  0 siblings, 1 reply; 28+ messages in thread
From: Josh Poimboeuf @ 2021-03-11 16:51 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Daniel Xu, Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

On Thu, Mar 11, 2021 at 10:54:38AM +0900, Masami Hiramatsu wrote:
> On Wed, 10 Mar 2021 19:06:15 -0600
> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > On Thu, Mar 11, 2021 at 09:20:18AM +0900, Masami Hiramatsu wrote:
> > > > >  bool unwind_next_frame(struct unwind_state *state)
> > > > >  {
> > > > >  	unsigned long ip_p, sp, tmp, orig_ip = state->ip, prev_sp = state->sp;
> > > > > @@ -536,6 +561,18 @@ bool unwind_next_frame(struct unwind_state *state)
> > > > >  
> > > > >  		state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> > > > >  						  state->ip, (void *)ip_p);
> > > > > +		/*
> > > > > +		 * There are special cases when the stack unwinder is called
> > > > > +		 * from the kretprobe handler or the interrupt handler which
> > > > > +		 * occurs in the kretprobe trampoline code. In those cases,
> > > > > +		 * %sp is shown on the stack instead of the return address.
> > > > > +		 * Or, when the unwinder find the return address is replaced
> > > > > +		 * by kretprobe_trampoline.
> > > > > +		 * In those cases, correct address can be found in kretprobe.
> > > > > +		 */
> > > > > +		if (state->ip == sp ||
> > > > 
> > > > Why is the 'state->ip == sp' needed?
> > > 
> > > As I commented above, until kretprobe_trampoline writes back the real
> > > address to the stack, sp value is there (which has been pushed by the
> > > 'pushq %rsp' at the entry of kretprobe_trampoline.)
> > > 
> > >         ".type kretprobe_trampoline, @function\n"
> > >         "kretprobe_trampoline:\n"
> > >         /* We don't bother saving the ss register */
> > >         "       pushq %rsp\n"				// THIS
> > >         "       pushfq\n"
> > > 
> > > Thus, from inside the kretprobe handler, like ftrace, you'll see
> > > the sp value instead of the real return address.
> > 
> > I see.  If you change is_kretprobe_trampoline_address() to include the
> > entire function, like:
> > 
> > static bool is_kretprobe_trampoline_address(unsigned long ip)
> > {
> > 	return (void *)ip >= kretprobe_trampoline &&
> > 	       (void *)ip < kretprobe_trampoline_end;
> > }
> > 
> > then the unwinder won't ever read the bogus %rsp value into state->ip,
> > and the 'state->ip == sp' check can be removed.
> 
> Hmm, I couldn't get your point. Since sp is the address of stack,
> it always out of text address.

When unwinding from trampoline_handler(), state->ip will point to the
instruction after the call:

	call trampoline_handler
	movq %rax, 19*8(%rsp)   <-- state->ip points to this insn

But then, the above version of is_kretprobe_trampoline_address() is
true, so state->ip gets immediately replaced with the real return
address:

	if (is_kretprobe_trampoline_address(state->ip))
		state->ip = orc_kretprobe_correct_ip(state);

so the unwinder skips over the kretprobe_trampoline() frame and goes
straight to the frame of the real return address.  Thus it never reads
this bogus return value into state->ip:

	pushq %rsp

which is why the weird 'state->ip == sp' check is no longer needed.

The only "downside" is that the unwinder skips the
kretprobe_trampoline() frame.  (note that downside wouldn't exist in
the case of UNWIND_HINT_REGS + valid regs->ip).

> > > > And it would make the unwinder just work automatically when unwinding
> > > > from the handler using the regs.
> > > > 
> > > > It would also work when unwinding from the handler's stack, if we put an
> > > > UNWIND_HINT_REGS after saving the regs.
> > > 
> > > At that moment, the real return address is not identified. So we can not
> > > put it.
> > 
> > True, at the time the regs are originally saved, the real return address
> > isn't available.  But by the time the user handler is called, the return
> > address *is* available.  So if the real return address were placed in
> > regs->ip before calling the handler, the unwinder could find it there,
> > when called from the handler.
> 
> OK, but this is not arch independent specification. I can make a hack
> only for x86, but that is not clean implementation, hmm.
> 
> > 
> > Then we wouldn't need the call to orc_kretprobe_correct_ip() in
> > __unwind_start().
> 
> What about the ORC implementation in other architecture? Is that for
> x86 only?

ORC is x86 only.

> > But maybe it's not possible due to the regs->ip expectations of legacy
> > handlers?
> 
> Usually, the legacy handlers will ignore it, the official way to access
> the correct return address is kretprobe_instance.ret_addr. Because it is
> arch independent.
> 
> Nowadays there are instruction_pointer() and instruction_pointer_set() APIs
> in many (not all) architecutre, so I can try to replace to use it instead
> of the kretprobe_instance.ret_addr.
> (and it will break the out-of-tree codes)

That sounds better to me, though I don't have an understanding of what
it would break.

-- 
Josh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
  2021-03-11 16:51                         ` Josh Poimboeuf
@ 2021-03-12  3:22                           ` Masami Hiramatsu
  0 siblings, 0 replies; 28+ messages in thread
From: Masami Hiramatsu @ 2021-03-12  3:22 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Daniel Xu, Steven Rostedt, Ingo Molnar, X86 ML, linux-kernel,
	bpf, kuba, mingo, ast, tglx, kernel-team, yhs

On Thu, 11 Mar 2021 10:51:10 -0600
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Thu, Mar 11, 2021 at 10:54:38AM +0900, Masami Hiramatsu wrote:
> > On Wed, 10 Mar 2021 19:06:15 -0600
> > Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > 
> > > On Thu, Mar 11, 2021 at 09:20:18AM +0900, Masami Hiramatsu wrote:
> > > > > >  bool unwind_next_frame(struct unwind_state *state)
> > > > > >  {
> > > > > >  	unsigned long ip_p, sp, tmp, orig_ip = state->ip, prev_sp = state->sp;
> > > > > > @@ -536,6 +561,18 @@ bool unwind_next_frame(struct unwind_state *state)
> > > > > >  
> > > > > >  		state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> > > > > >  						  state->ip, (void *)ip_p);
> > > > > > +		/*
> > > > > > +		 * There are special cases when the stack unwinder is called
> > > > > > +		 * from the kretprobe handler or the interrupt handler which
> > > > > > +		 * occurs in the kretprobe trampoline code. In those cases,
> > > > > > +		 * %sp is shown on the stack instead of the return address.
> > > > > > +		 * Or, when the unwinder find the return address is replaced
> > > > > > +		 * by kretprobe_trampoline.
> > > > > > +		 * In those cases, correct address can be found in kretprobe.
> > > > > > +		 */
> > > > > > +		if (state->ip == sp ||
> > > > > 
> > > > > Why is the 'state->ip == sp' needed?
> > > > 
> > > > As I commented above, until kretprobe_trampoline writes back the real
> > > > address to the stack, sp value is there (which has been pushed by the
> > > > 'pushq %rsp' at the entry of kretprobe_trampoline.)
> > > > 
> > > >         ".type kretprobe_trampoline, @function\n"
> > > >         "kretprobe_trampoline:\n"
> > > >         /* We don't bother saving the ss register */
> > > >         "       pushq %rsp\n"				// THIS
> > > >         "       pushfq\n"
> > > > 
> > > > Thus, from inside the kretprobe handler, like ftrace, you'll see
> > > > the sp value instead of the real return address.
> > > 
> > > I see.  If you change is_kretprobe_trampoline_address() to include the
> > > entire function, like:
> > > 
> > > static bool is_kretprobe_trampoline_address(unsigned long ip)
> > > {
> > > 	return (void *)ip >= kretprobe_trampoline &&
> > > 	       (void *)ip < kretprobe_trampoline_end;
> > > }
> > > 
> > > then the unwinder won't ever read the bogus %rsp value into state->ip,
> > > and the 'state->ip == sp' check can be removed.
> > 
> > Hmm, I couldn't get your point. Since sp is the address of stack,
> > it always out of text address.
> 
> When unwinding from trampoline_handler(), state->ip will point to the
> instruction after the call:
> 
> 	call trampoline_handler
> 	movq %rax, 19*8(%rsp)   <-- state->ip points to this insn
> 
> But then, the above version of is_kretprobe_trampoline_address() is
> true, so state->ip gets immediately replaced with the real return
> address:
> 
> 	if (is_kretprobe_trampoline_address(state->ip))
> 		state->ip = orc_kretprobe_correct_ip(state);

Hmm, but that ip is for the "next" frame, isn't it?

[stack]
0x040 kretprobe_trampoline+0x25 -+
0x048 r15                        | 
...     /* pt_regs */            +- ORC sp_offset at the kretprobe_trampoline+0x24
0x0d8 flags                     -+
0x0e0 rsp (=0x0e0) /* will be replaced by the real return address */

Your idea seems replacing stack@0x040 with the real return address.
In that case, may orc_find() returns the wrong ORC info?


> so the unwinder skips over the kretprobe_trampoline() frame and goes
> straight to the frame of the real return address.  Thus it never reads
> this bogus return value into state->ip:
> 
> 	pushq %rsp
> 
> which is why the weird 'state->ip == sp' check is no longer needed.

But that is what the kretprobe_trampoline actually does. I would rather
like to see the real information from stacktrace.

> The only "downside" is that the unwinder skips the
> kretprobe_trampoline() frame.  (note that downside wouldn't exist in
> the case of UNWIND_HINT_REGS + valid regs->ip).

As I said, I would like to see the kretprobe_trampoline+0x25 on
my stacktrace, because kretprobe doesn't replace the kretprobe_trampoline
but the target_func on the stack.

> > > > > And it would make the unwinder just work automatically when unwinding
> > > > > from the handler using the regs.
> > > > > 
> > > > > It would also work when unwinding from the handler's stack, if we put an
> > > > > UNWIND_HINT_REGS after saving the regs.
> > > > 
> > > > At that moment, the real return address is not identified. So we can not
> > > > put it.
> > > 
> > > True, at the time the regs are originally saved, the real return address
> > > isn't available.  But by the time the user handler is called, the return
> > > address *is* available.  So if the real return address were placed in
> > > regs->ip before calling the handler, the unwinder could find it there,
> > > when called from the handler.
> > 
> > OK, but this is not arch independent specification. I can make a hack
> > only for x86, but that is not clean implementation, hmm.
> > 
> > > 
> > > Then we wouldn't need the call to orc_kretprobe_correct_ip() in
> > > __unwind_start().
> > 
> > What about the ORC implementation in other architecture? Is that for
> > x86 only?
> 
> ORC is x86 only.
> 
> > > But maybe it's not possible due to the regs->ip expectations of legacy
> > > handlers?
> > 
> > Usually, the legacy handlers will ignore it, the official way to access
> > the correct return address is kretprobe_instance.ret_addr. Because it is
> > arch independent.
> > 
> > Nowadays there are instruction_pointer() and instruction_pointer_set() APIs
> > in many (not all) architecutre, so I can try to replace to use it instead
> > of the kretprobe_instance.ret_addr.
> > (and it will break the out-of-tree codes)
> 
> That sounds better to me, though I don't have an understanding of what
> it would break.

OK, anyway try to do that and see what happen.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-03-12  3:23 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-05 15:38 [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Masami Hiramatsu
2021-03-05 15:39 ` [PATCH -tip 1/5] ia64: kprobes: Fix to pass correct trampoline address to the handler Masami Hiramatsu
2021-03-05 15:39 ` [PATCH -tip 2/5] kprobes: treewide: Replace arch_deref_entry_point() with dereference_function_descriptor() Masami Hiramatsu
2021-03-05 15:39 ` [PATCH -tip 3/5] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler() Masami Hiramatsu
2021-03-10 14:21   ` Miroslav Benes
2021-03-10 15:42     ` Masami Hiramatsu
2021-03-11 13:37       ` Masami Hiramatsu
2021-03-05 15:39 ` [PATCH -tip 4/5] kprobes: stacktrace: Recover the address changed by kretprobe Masami Hiramatsu
2021-03-05 15:39 ` [PATCH -tip 5/5] tracing: Remove kretprobe unknown indicator from stacktrace Masami Hiramatsu
2021-03-05 15:44   ` Steven Rostedt
2021-03-05 19:16 ` [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Daniel Xu
2021-03-06  1:13   ` Masami Hiramatsu
2021-03-07 21:23     ` Daniel Xu
2021-03-08  2:52       ` Masami Hiramatsu
2021-03-08 13:05         ` Masami Hiramatsu
2021-03-09  1:19         ` Josh Poimboeuf
2021-03-10  9:57           ` Masami Hiramatsu
2021-03-10 15:08             ` Josh Poimboeuf
2021-03-10 15:55               ` Masami Hiramatsu
2021-03-10 18:31                 ` Josh Poimboeuf
2021-03-11  0:20                   ` Masami Hiramatsu
2021-03-11  1:06                     ` Josh Poimboeuf
2021-03-11  1:54                       ` Masami Hiramatsu
2021-03-11 16:51                         ` Josh Poimboeuf
2021-03-12  3:22                           ` Masami Hiramatsu
2021-03-10 22:46                 ` Daniel Xu
2021-03-09 21:34         ` Daniel Xu
2021-03-10 10:50           ` Masami Hiramatsu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.