linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v4] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
@ 2019-04-22 16:26 Kairui Song
  2019-04-23 11:35 ` Peter Zijlstra
  2019-04-29  6:37 ` [tip:perf/core] perf/x86: Make perf callchains " tip-bot for Kairui Song
  0 siblings, 2 replies; 8+ messages in thread
From: Kairui Song @ 2019-04-22 16:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Josh Poimboeuf, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Alexei Starovoitov, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, Dave Young, Kairui Song

Currently perf callchain doesn't work well with ORC unwinder
when sampling from trace point. We'll get useless in kernel callchain
like this:

perf  6429 [000]    22.498450:             kmem:mm_page_alloc: page=0x176a17 pfn=1534487 order=0 migratetype=0 gfp_flags=GFP_KERNEL
    ffffffffbe23e32e __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
	7efdf7f7d3e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
	5651468729c1 [unknown] (/usr/bin/perf)
	5651467ee82a main+0x69a (/usr/bin/perf)
	7efdf7eaf413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
    5541f689495641d7 [unknown] ([unknown])

The root cause is that, for trace point events, it doesn't provide a
real snapshot of the hardware registers. Instead perf tries to get
required caller's registers and compose a fake register snapshot
which suppose to contain enough information for start a unwinding.
However without CONFIG_FRAME_POINTER, if failed to get caller's BP as the
frame pointer, so current frame pointer is returned instead. We get
a invalid register combination which confuse the unwinder, and end the
stacktrace early.

So in such case just don't try dump BP, and let the unwinder start
directly when the register is not a real snapshot. And Use SP
as the skip mark, unwinder will skip all the frames until it meet
the frame of the trace point caller.

Tested with frame pointer unwinder and ORC unwinder, this make perf
callchain get the full kernel space stacktrace again like this:

perf  6503 [000]  1567.570191:             kmem:mm_page_alloc: page=0x16c904 pfn=1493252 order=0 migratetype=0 gfp_flags=GFP_KERNEL
    ffffffffb523e2ae __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52383bd __get_free_pages+0xd (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52fd28a __pollwait+0x8a (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb521426f perf_poll+0x2f (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52fe3e2 do_sys_poll+0x252 (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52ff027 __x64_sys_poll+0x37 (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb500418b do_syscall_64+0x5b (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb5a0008c entry_SYSCALL_64_after_hwframe+0x44 (/lib/modules/5.1.0-rc3+/build/vmlinux)
	7f71e92d03e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
	55a22960d9c1 [unknown] (/usr/bin/perf)
	55a22958982a main+0x69a (/usr/bin/perf)
	7f71e9202413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
    5541f689495641d7 [unknown] ([unknown])

Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
---

Update from V3:
  - Alway start the unwinding directly on fake registers, so we have
    a unified path for both with/without frame pointer and simplify
    the code, as posted by Josh Poimboeuf

Update from V2:
  - Instead of looking at if BP is 0, use X86_EFLAGS_FIXED flag bit as
    the indicator of where the pt_regs is valid for unwinding. As
    suggested by Peter Zijlstra
  - Update some comments accordingly.

Update from V1:
  Get rid of a lot of unneccessary code and just don't dump a inaccurate
  BP, and use SP as the marker for target frame.

 arch/x86/events/core.c            | 21 +++++++++++++++++----
 arch/x86/include/asm/perf_event.h |  7 +------
 arch/x86/include/asm/stacktrace.h | 13 -------------
 include/linux/perf_event.h        |  2 +-
 4 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 81911e11a15d..9856b5b91b9c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2348,6 +2348,15 @@ void arch_perf_update_userpage(struct perf_event *event,
 	cyc2ns_read_end();
 }
 
+/*
+ * Determine whether the regs were taken from an irq/exception handler rather
+ * than from perf_arch_fetch_caller_regs().
+ */
+static bool perf_hw_regs(struct pt_regs *regs)
+{
+	return regs->flags & X86_EFLAGS_FIXED;
+}
+
 void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
@@ -2359,11 +2368,15 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 		return;
 	}
 
-	if (perf_callchain_store(entry, regs->ip))
-		return;
+	if (perf_hw_regs(regs)) {
+		if (perf_callchain_store(entry, regs->ip))
+			return;
+		unwind_start(&state, current, regs, NULL);
+	} else {
+		unwind_start(&state, current, NULL, (void *)regs->sp);
+	}
 
-	for (unwind_start(&state, current, regs, NULL); !unwind_done(&state);
-	     unwind_next_frame(&state)) {
+	for (; !unwind_done(&state); unwind_next_frame(&state)) {
 		addr = unwind_get_return_address(&state);
 		if (!addr || perf_callchain_store(entry, addr))
 			return;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8bdf74902293..f4854cd0905b 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -260,14 +260,9 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
  */
 #define perf_arch_fetch_caller_regs(regs, __ip)		{	\
 	(regs)->ip = (__ip);					\
-	(regs)->bp = caller_frame_pointer();			\
+	(regs)->sp = (unsigned long)__builtin_frame_address(0);	\
 	(regs)->cs = __KERNEL_CS;				\
 	regs->flags = 0;					\
-	asm volatile(						\
-		_ASM_MOV "%%"_ASM_SP ", %0\n"			\
-		: "=m" ((regs)->sp)				\
-		:: "memory"					\
-	);							\
 }
 
 struct perf_guest_switch_msr {
diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index f335aad404a4..beef7ad9e43a 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -98,19 +98,6 @@ struct stack_frame_ia32 {
     u32 return_address;
 };
 
-static inline unsigned long caller_frame_pointer(void)
-{
-	struct stack_frame *frame;
-
-	frame = __builtin_frame_address(0);
-
-#ifdef CONFIG_FRAME_POINTER
-	frame = frame->next_frame;
-#endif
-
-	return (unsigned long)frame;
-}
-
 void show_opcodes(struct pt_regs *regs, const char *loglvl);
 void show_ip(struct pt_regs *regs, const char *loglvl);
 #endif /* _ASM_X86_STACKTRACE_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e47ef764f613..ab135abe62e0 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1059,7 +1059,7 @@ static inline void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned lo
  * the nth caller. We only need a few of the regs:
  * - ip for PERF_SAMPLE_IP
  * - cs for user_mode() tests
- * - bp for callchains
+ * - sp for callchains
  * - eflags, for future purposes, just in case
  */
 static inline void perf_fetch_caller_regs(struct pt_regs *regs)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v4] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-22 16:26 [RFC PATCH v4] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER Kairui Song
@ 2019-04-23 11:35 ` Peter Zijlstra
  2019-04-24 12:42   ` Kairui Song
  2019-04-29  6:37 ` [tip:perf/core] perf/x86: Make perf callchains " tip-bot for Kairui Song
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2019-04-23 11:35 UTC (permalink / raw)
  To: Kairui Song
  Cc: linux-kernel, Josh Poimboeuf, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Alexei Starovoitov, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, Dave Young

On Tue, Apr 23, 2019 at 12:26:52AM +0800, Kairui Song wrote:
> Currently perf callchain doesn't work well with ORC unwinder
> when sampling from trace point. We'll get useless in kernel callchain
> like this:
> 
> perf  6429 [000]    22.498450:             kmem:mm_page_alloc: page=0x176a17 pfn=1534487 order=0 migratetype=0 gfp_flags=GFP_KERNEL
>     ffffffffbe23e32e __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
> 	7efdf7f7d3e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
> 	5651468729c1 [unknown] (/usr/bin/perf)
> 	5651467ee82a main+0x69a (/usr/bin/perf)
> 	7efdf7eaf413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
>     5541f689495641d7 [unknown] ([unknown])
> 
> The root cause is that, for trace point events, it doesn't provide a
> real snapshot of the hardware registers. Instead perf tries to get
> required caller's registers and compose a fake register snapshot
> which suppose to contain enough information for start a unwinding.
> However without CONFIG_FRAME_POINTER, if failed to get caller's BP as the
> frame pointer, so current frame pointer is returned instead. We get
> a invalid register combination which confuse the unwinder, and end the
> stacktrace early.
> 
> So in such case just don't try dump BP, and let the unwinder start
> directly when the register is not a real snapshot. And Use SP
> as the skip mark, unwinder will skip all the frames until it meet
> the frame of the trace point caller.
> 
> Tested with frame pointer unwinder and ORC unwinder, this make perf
> callchain get the full kernel space stacktrace again like this:
> 
> perf  6503 [000]  1567.570191:             kmem:mm_page_alloc: page=0x16c904 pfn=1493252 order=0 migratetype=0 gfp_flags=GFP_KERNEL
>     ffffffffb523e2ae __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb52383bd __get_free_pages+0xd (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb52fd28a __pollwait+0x8a (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb521426f perf_poll+0x2f (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb52fe3e2 do_sys_poll+0x252 (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb52ff027 __x64_sys_poll+0x37 (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb500418b do_syscall_64+0x5b (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb5a0008c entry_SYSCALL_64_after_hwframe+0x44 (/lib/modules/5.1.0-rc3+/build/vmlinux)
> 	7f71e92d03e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
> 	55a22960d9c1 [unknown] (/usr/bin/perf)
> 	55a22958982a main+0x69a (/usr/bin/perf)
> 	7f71e9202413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
>     5541f689495641d7 [unknown] ([unknown])
> 
> Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Kairui Song <kasong@redhat.com>

Thanks!

> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index e47ef764f613..ab135abe62e0 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1059,7 +1059,7 @@ static inline void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned lo
>   * the nth caller. We only need a few of the regs:
>   * - ip for PERF_SAMPLE_IP
>   * - cs for user_mode() tests
> - * - bp for callchains
> + * - sp for callchains
>   * - eflags, for future purposes, just in case
>   */
>  static inline void perf_fetch_caller_regs(struct pt_regs *regs)

I've extended that like so:

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1058,12 +1058,18 @@ static inline void perf_arch_fetch_calle
 #endif
 
 /*
- * Take a snapshot of the regs. Skip ip and frame pointer to
- * the nth caller. We only need a few of the regs:
+ * When generating a perf sample in-line, instead of from an interrupt /
+ * exception, we lack a pt_regs. This is typically used from software events
+ * like: SW_CONTEXT_SWITCHES, SW_MIGRATIONS and the tie-in with tracepoints.
+ *
+ * We typically don't need a full set, but (for x86) do require:
  * - ip for PERF_SAMPLE_IP
  * - cs for user_mode() tests
- * - sp for callchains
- * - eflags, for future purposes, just in case
+ * - sp for PERF_SAMPLE_CALLCHAIN
+ * - eflags for MISC bits and CALLCHAIN (see: perf_hw_regs())
+ *
+ * NOTE: assumes @regs is otherwise already 0 filled; this is important for
+ * things like PERF_SAMPLE_REGS_INTR.
  */
 static inline void perf_fetch_caller_regs(struct pt_regs *regs)
 {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v4] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-23 11:35 ` Peter Zijlstra
@ 2019-04-24 12:42   ` Kairui Song
  2019-04-24 12:52     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Kairui Song @ 2019-04-24 12:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux Kernel Mailing List, Josh Poimboeuf, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Alexei Starovoitov, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, Dave Young

On Tue, Apr 23, 2019 at 7:35 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Apr 23, 2019 at 12:26:52AM +0800, Kairui Song wrote:
> > Currently perf callchain doesn't work well with ORC unwinder
> > when sampling from trace point. We'll get useless in kernel callchain
> > like this:
> >
> > perf  6429 [000]    22.498450:             kmem:mm_page_alloc: page=0x176a17 pfn=1534487 order=0 migratetype=0 gfp_flags=GFP_KERNEL
> >     ffffffffbe23e32e __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
> >       7efdf7f7d3e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
> >       5651468729c1 [unknown] (/usr/bin/perf)
> >       5651467ee82a main+0x69a (/usr/bin/perf)
> >       7efdf7eaf413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
> >     5541f689495641d7 [unknown] ([unknown])
> >
> > The root cause is that, for trace point events, it doesn't provide a
> > real snapshot of the hardware registers. Instead perf tries to get
> > required caller's registers and compose a fake register snapshot
> > which suppose to contain enough information for start a unwinding.
> > However without CONFIG_FRAME_POINTER, if failed to get caller's BP as the
> > frame pointer, so current frame pointer is returned instead. We get
> > a invalid register combination which confuse the unwinder, and end the
> > stacktrace early.
> >
> > So in such case just don't try dump BP, and let the unwinder start
> > directly when the register is not a real snapshot. And Use SP
> > as the skip mark, unwinder will skip all the frames until it meet
> > the frame of the trace point caller.
> >
> > Tested with frame pointer unwinder and ORC unwinder, this make perf
> > callchain get the full kernel space stacktrace again like this:
> >
> > perf  6503 [000]  1567.570191:             kmem:mm_page_alloc: page=0x16c904 pfn=1493252 order=0 migratetype=0 gfp_flags=GFP_KERNEL
> >     ffffffffb523e2ae __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
> >     ffffffffb52383bd __get_free_pages+0xd (/lib/modules/5.1.0-rc3+/build/vmlinux)
> >     ffffffffb52fd28a __pollwait+0x8a (/lib/modules/5.1.0-rc3+/build/vmlinux)
> >     ffffffffb521426f perf_poll+0x2f (/lib/modules/5.1.0-rc3+/build/vmlinux)
> >     ffffffffb52fe3e2 do_sys_poll+0x252 (/lib/modules/5.1.0-rc3+/build/vmlinux)
> >     ffffffffb52ff027 __x64_sys_poll+0x37 (/lib/modules/5.1.0-rc3+/build/vmlinux)
> >     ffffffffb500418b do_syscall_64+0x5b (/lib/modules/5.1.0-rc3+/build/vmlinux)
> >     ffffffffb5a0008c entry_SYSCALL_64_after_hwframe+0x44 (/lib/modules/5.1.0-rc3+/build/vmlinux)
> >       7f71e92d03e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
> >       55a22960d9c1 [unknown] (/usr/bin/perf)
> >       55a22958982a main+0x69a (/usr/bin/perf)
> >       7f71e9202413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
> >     5541f689495641d7 [unknown] ([unknown])
> >
> > Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > Signed-off-by: Kairui Song <kasong@redhat.com>
>
> Thanks!
>
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index e47ef764f613..ab135abe62e0 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -1059,7 +1059,7 @@ static inline void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned lo
> >   * the nth caller. We only need a few of the regs:
> >   * - ip for PERF_SAMPLE_IP
> >   * - cs for user_mode() tests
> > - * - bp for callchains
> > + * - sp for callchains
> >   * - eflags, for future purposes, just in case
> >   */
> >  static inline void perf_fetch_caller_regs(struct pt_regs *regs)
>
> I've extended that like so:
>
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1058,12 +1058,18 @@ static inline void perf_arch_fetch_calle
>  #endif
>
>  /*
> - * Take a snapshot of the regs. Skip ip and frame pointer to
> - * the nth caller. We only need a few of the regs:
> + * When generating a perf sample in-line, instead of from an interrupt /
> + * exception, we lack a pt_regs. This is typically used from software events
> + * like: SW_CONTEXT_SWITCHES, SW_MIGRATIONS and the tie-in with tracepoints.
> + *
> + * We typically don't need a full set, but (for x86) do require:
>   * - ip for PERF_SAMPLE_IP
>   * - cs for user_mode() tests
> - * - sp for callchains
> - * - eflags, for future purposes, just in case
> + * - sp for PERF_SAMPLE_CALLCHAIN
> + * - eflags for MISC bits and CALLCHAIN (see: perf_hw_regs())
> + *
> + * NOTE: assumes @regs is otherwise already 0 filled; this is important for
> + * things like PERF_SAMPLE_REGS_INTR.
>   */
>  static inline void perf_fetch_caller_regs(struct pt_regs *regs)
>  {

Sure, the updated comments looks much better. Will the maintainer
squash the comment update or should I send a V5?

--
Best Regards,
Kairui Song

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v4] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-24 12:42   ` Kairui Song
@ 2019-04-24 12:52     ` Peter Zijlstra
  2019-05-07 16:45       ` Alexei Starovoitov
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2019-04-24 12:52 UTC (permalink / raw)
  To: Kairui Song
  Cc: Linux Kernel Mailing List, Josh Poimboeuf, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Alexei Starovoitov, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, Dave Young

On Wed, Apr 24, 2019 at 08:42:40AM -0400, Kairui Song wrote:

> Sure, the updated comments looks much better. Will the maintainer
> squash the comment update or should I send a V5?

I've squashed it, I've just not gotten around to stuffing it a git tree
yet. Should happen 'soon'.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:perf/core] perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER
  2019-04-22 16:26 [RFC PATCH v4] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER Kairui Song
  2019-04-23 11:35 ` Peter Zijlstra
@ 2019-04-29  6:37 ` tip-bot for Kairui Song
  1 sibling, 0 replies; 8+ messages in thread
From: tip-bot for Kairui Song @ 2019-04-29  6:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, peterz, tglx, acme, alexei.starovoitov, jolsa, hpa,
	kasong, dyoung, linux-kernel, jpoimboe, bp, mingo,
	alexander.shishkin, namhyung

Commit-ID:  d15d356887e770c5f2dcf963b52c7cb510c9e42d
Gitweb:     https://git.kernel.org/tip/d15d356887e770c5f2dcf963b52c7cb510c9e42d
Author:     Kairui Song <kasong@redhat.com>
AuthorDate: Tue, 23 Apr 2019 00:26:52 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 29 Apr 2019 08:25:05 +0200

perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER

Currently perf callchain doesn't work well with ORC unwinder
when sampling from trace point. We'll get useless in kernel callchain
like this:

perf  6429 [000]    22.498450:             kmem:mm_page_alloc: page=0x176a17 pfn=1534487 order=0 migratetype=0 gfp_flags=GFP_KERNEL
    ffffffffbe23e32e __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
	7efdf7f7d3e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
	5651468729c1 [unknown] (/usr/bin/perf)
	5651467ee82a main+0x69a (/usr/bin/perf)
	7efdf7eaf413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
    5541f689495641d7 [unknown] ([unknown])

The root cause is that, for trace point events, it doesn't provide a
real snapshot of the hardware registers. Instead perf tries to get
required caller's registers and compose a fake register snapshot
which suppose to contain enough information for start a unwinding.
However without CONFIG_FRAME_POINTER, if failed to get caller's BP as the
frame pointer, so current frame pointer is returned instead. We get
a invalid register combination which confuse the unwinder, and end the
stacktrace early.

So in such case just don't try dump BP, and let the unwinder start
directly when the register is not a real snapshot. Use SP
as the skip mark, unwinder will skip all the frames until it meet
the frame of the trace point caller.

Tested with frame pointer unwinder and ORC unwinder, this makes perf
callchain get the full kernel space stacktrace again like this:

perf  6503 [000]  1567.570191:             kmem:mm_page_alloc: page=0x16c904 pfn=1493252 order=0 migratetype=0 gfp_flags=GFP_KERNEL
    ffffffffb523e2ae __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52383bd __get_free_pages+0xd (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52fd28a __pollwait+0x8a (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb521426f perf_poll+0x2f (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52fe3e2 do_sys_poll+0x252 (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52ff027 __x64_sys_poll+0x37 (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb500418b do_syscall_64+0x5b (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb5a0008c entry_SYSCALL_64_after_hwframe+0x44 (/lib/modules/5.1.0-rc3+/build/vmlinux)
	7f71e92d03e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
	55a22960d9c1 [unknown] (/usr/bin/perf)
	55a22958982a main+0x69a (/usr/bin/perf)
	7f71e9202413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
    5541f689495641d7 [unknown] ([unknown])

Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190422162652.15483-1-kasong@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/events/core.c            | 21 +++++++++++++++++----
 arch/x86/include/asm/perf_event.h |  7 +------
 arch/x86/include/asm/stacktrace.h | 13 -------------
 include/linux/perf_event.h        | 14 ++++++++++----
 4 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index de1a924a4914..f315425d8468 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2382,6 +2382,15 @@ void arch_perf_update_userpage(struct perf_event *event,
 	cyc2ns_read_end();
 }
 
+/*
+ * Determine whether the regs were taken from an irq/exception handler rather
+ * than from perf_arch_fetch_caller_regs().
+ */
+static bool perf_hw_regs(struct pt_regs *regs)
+{
+	return regs->flags & X86_EFLAGS_FIXED;
+}
+
 void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
@@ -2393,11 +2402,15 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 		return;
 	}
 
-	if (perf_callchain_store(entry, regs->ip))
-		return;
+	if (perf_hw_regs(regs)) {
+		if (perf_callchain_store(entry, regs->ip))
+			return;
+		unwind_start(&state, current, regs, NULL);
+	} else {
+		unwind_start(&state, current, NULL, (void *)regs->sp);
+	}
 
-	for (unwind_start(&state, current, regs, NULL); !unwind_done(&state);
-	     unwind_next_frame(&state)) {
+	for (; !unwind_done(&state); unwind_next_frame(&state)) {
 		addr = unwind_get_return_address(&state);
 		if (!addr || perf_callchain_store(entry, addr))
 			return;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 04768a3a5454..1392d5e6e8d6 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -308,14 +308,9 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
  */
 #define perf_arch_fetch_caller_regs(regs, __ip)		{	\
 	(regs)->ip = (__ip);					\
-	(regs)->bp = caller_frame_pointer();			\
+	(regs)->sp = (unsigned long)__builtin_frame_address(0);	\
 	(regs)->cs = __KERNEL_CS;				\
 	regs->flags = 0;					\
-	asm volatile(						\
-		_ASM_MOV "%%"_ASM_SP ", %0\n"			\
-		: "=m" ((regs)->sp)				\
-		:: "memory"					\
-	);							\
 }
 
 struct perf_guest_switch_msr {
diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index f335aad404a4..beef7ad9e43a 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -98,19 +98,6 @@ struct stack_frame_ia32 {
     u32 return_address;
 };
 
-static inline unsigned long caller_frame_pointer(void)
-{
-	struct stack_frame *frame;
-
-	frame = __builtin_frame_address(0);
-
-#ifdef CONFIG_FRAME_POINTER
-	frame = frame->next_frame;
-#endif
-
-	return (unsigned long)frame;
-}
-
 void show_opcodes(struct pt_regs *regs, const char *loglvl);
 void show_ip(struct pt_regs *regs, const char *loglvl);
 #endif /* _ASM_X86_STACKTRACE_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f3864e1c5569..cf023db0e8a2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1058,12 +1058,18 @@ static inline void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned lo
 #endif
 
 /*
- * Take a snapshot of the regs. Skip ip and frame pointer to
- * the nth caller. We only need a few of the regs:
+ * When generating a perf sample in-line, instead of from an interrupt /
+ * exception, we lack a pt_regs. This is typically used from software events
+ * like: SW_CONTEXT_SWITCHES, SW_MIGRATIONS and the tie-in with tracepoints.
+ *
+ * We typically don't need a full set, but (for x86) do require:
  * - ip for PERF_SAMPLE_IP
  * - cs for user_mode() tests
- * - bp for callchains
- * - eflags, for future purposes, just in case
+ * - sp for PERF_SAMPLE_CALLCHAIN
+ * - eflags for MISC bits and CALLCHAIN (see: perf_hw_regs())
+ *
+ * NOTE: assumes @regs is otherwise already 0 filled; this is important for
+ * things like PERF_SAMPLE_REGS_INTR.
  */
 static inline void perf_fetch_caller_regs(struct pt_regs *regs)
 {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v4] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-24 12:52     ` Peter Zijlstra
@ 2019-05-07 16:45       ` Alexei Starovoitov
  2019-05-07 16:57         ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2019-05-07 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kairui Song, Linux Kernel Mailing List, Josh Poimboeuf,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Thomas Gleixner, Borislav Petkov,
	Dave Young

On Wed, Apr 24, 2019 at 5:52 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Apr 24, 2019 at 08:42:40AM -0400, Kairui Song wrote:
>
> > Sure, the updated comments looks much better. Will the maintainer
> > squash the comment update or should I send a V5?
>
> I've squashed it, I've just not gotten around to stuffing it a git tree
> yet. Should happen 'soon'.

Was it applied and on the way to Linus yet ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v4] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-05-07 16:45       ` Alexei Starovoitov
@ 2019-05-07 16:57         ` Peter Zijlstra
  2019-05-07 18:39           ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2019-05-07 16:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kairui Song, Linux Kernel Mailing List, Josh Poimboeuf,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Thomas Gleixner, Borislav Petkov,
	Dave Young

On Tue, May 07, 2019 at 09:45:47AM -0700, Alexei Starovoitov wrote:
> On Wed, Apr 24, 2019 at 5:52 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Apr 24, 2019 at 08:42:40AM -0400, Kairui Song wrote:
> >
> > > Sure, the updated comments looks much better. Will the maintainer
> > > squash the comment update or should I send a V5?
> >
> > I've squashed it, I've just not gotten around to stuffing it a git tree
> > yet. Should happen 'soon'.
> 
> Was it applied and on the way to Linus yet ?

AFAICT it has already landed in Linus' tree.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v4] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-05-07 16:57         ` Peter Zijlstra
@ 2019-05-07 18:39           ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2019-05-07 18:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, Kairui Song, Linux Kernel Mailing List,
	Josh Poimboeuf, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, Dave Young


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, May 07, 2019 at 09:45:47AM -0700, Alexei Starovoitov wrote:
> > On Wed, Apr 24, 2019 at 5:52 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, Apr 24, 2019 at 08:42:40AM -0400, Kairui Song wrote:
> > >
> > > > Sure, the updated comments looks much better. Will the maintainer
> > > > squash the comment update or should I send a V5?
> > >
> > > I've squashed it, I've just not gotten around to stuffing it a git tree
> > > yet. Should happen 'soon'.
> > 
> > Was it applied and on the way to Linus yet ?
> 
> AFAICT it has already landed in Linus' tree.

Yeah, it's now the following commit upstream:

  d15d356887e7: perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-05-07 18:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-22 16:26 [RFC PATCH v4] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER Kairui Song
2019-04-23 11:35 ` Peter Zijlstra
2019-04-24 12:42   ` Kairui Song
2019-04-24 12:52     ` Peter Zijlstra
2019-05-07 16:45       ` Alexei Starovoitov
2019-05-07 16:57         ` Peter Zijlstra
2019-05-07 18:39           ` Ingo Molnar
2019-04-29  6:37 ` [tip:perf/core] perf/x86: Make perf callchains " tip-bot for Kairui Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).